#18050 closed defect (fixed)

Relay submitted a descriptor with 0 DirPort due to a self-test race condition

Reported by: teor Owned by: teor
Priority: Medium Milestone: Tor: 0.2.7.x-final
Component: Core Tor/Tor Version: Tor: 0.2.6.10
Severity: Normal Keywords: 027-backport TorCoreTeam201605
Cc: starlight@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

A relay operator reports their relay submitted a descriptor with a 0 DirPort after a version upgrade, then 62 seconds later submitted a descriptor with the configured DirPort.

Unfortunately, this was right on the consensus deadline, so they had one consensus where their DirPort was 0.

https://lists.torproject.org/pipermail/tor-relays/2016-January/008523.html

Child Tickets

Change History (13)

comment:1 Changed 23 months ago by starlight

Cc: starlight@… added

Complete list of impacted relays:

BF0FB582E37F738CD33C3651125F2772705BB8E8 12-28:17 quadhead
86E78DD3720C78DA8673182EF96C54B162CD660C 12-13:11 kitten1
6DE61A6F72C1E5418A66BFED80DFB63E4C77668F 12-19:11 eriador
39F096961ED2576975C866D450373A9913AFDC92 12-28:06 metaether
92CFD9565B24646CAC2D172D3DB503D69E777B8A 12-16:14 bakunin
6E7CB6E783C1B67B79D0EBBE7D48BC09BD441201 12-14:07 Paris
9E0CBB6958CE61DC631E8660963EB644BE92B256 12-12:04 freshman
D74ABE34845190E242EC74BA28B8C89B0A480D4B 12-14:07 Helen
14E3111C54BB532FB39AA3C1367D0970DA2937D5 12-18:17 idfTor2
78BBAC4B66067E8D25AA78852257C08D2E7EB357 12-18:09 grothendieck
EC639EDAA5121B47DBDF3D6B01A22E48A8CB6CC7 12-24:08 rainbowdash

The above are all relays that were actually on the 12/11 list and did not appear on the 01/12 list.

Also two examples where updateFallbackDirs.py reported a port change as the first encountered reason for
exclusion on 01/12 though these relays were excluded on 12/11 for a different reason:

C43FA6474A9F071E9120DF63ED6EB8FDBA105234 12-25:03 ArachnideFR5
5665A3904C89E22E971305EE8C1997BCA4123C69 11-15:18 GermanCraft

comment:2 Changed 23 months ago by teor

Keywords: 026-maybe-backport 027-maybe-backport added
Version: Tor: 0.2.6.10

It would be nice to backport any fix to 0.2.6 and 0.2.7, so this issue doesn't happen each time we release a new version of Tor. But we usually reserve backports for security or serious functionality issues, so I don't know if a backport will happen.

comment:3 Changed 23 months ago by starlight

Looked briefly at local logs and the logic. Current design appears to always publish a descriptor with DirPort=0 during boot. run_scheduled_events() publishes a changed descriptor once every 60 seconds, and the typical boot sequence runs the DirPort reachability at slightly over 60 seconds from start.

:16:06 Tor[5806]: Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.
:16:07 Tor[5806]: Bootstrapped 85%: Finishing handshake with first hop
:16:07 Tor[5806]: Bootstrapped 90%: Establishing a Tor circuit
:16:08 Tor[5806]: Tor has successfully opened a circuit. Looks like client functionality is working.
:16:08 Tor[5806]: Bootstrapped 100%: Done
:17:09 Tor[5806]: Self-testing indicates your DirPort is reachable from the outside. Excellent.

When this happens just before the consensus vote time at minute 50 of each hour, the race condition identified by Teor occurs. Reexamined half of the above events and in all cases problem descriptor was published at 46-49 minutes.

Perhaps consider_publishable_server() should be adjusted to delay the descriptor until the DirPort test is complete.

comment:4 in reply to:  3 Changed 23 months ago by teor

Replying to starlight:

Looked briefly at local logs and the logic. Current design appears to always publish a descriptor with DirPort=0 during boot. run_scheduled_events() publishes a changed descriptor once every 60 seconds, and the typical boot sequence runs the DirPort reachability at slightly over 60 seconds from start.
...
When this happens just before the consensus vote time at minute 50 of each hour, the race condition identified by Teor occurs. Reexamined half of the above events and in all cases problem descriptor was published at 46-49 minutes.

Perhaps consider_publishable_server() should be adjusted to delay the descriptor until the DirPort test is complete.

I agree.

This sounds somewhat similar to #17782. When its address changes, tor doesn't test ORPort reachability. So it will go ahead and publish a descriptor with the wrong address, as long as an old address was reachable at some point.

We should make tor wait for ORPort and DirPort reachability every time:

  • it starts up,
  • the config changes address, ORPort, or DirPort.

There's a drawback here, which is that tor won't ever publish a descriptor if only the ORPort is reachable (perhaps due to a broken firewall config). I think we should have a timeout after which tor warns, then publishes the descriptor without the DirPort. Given the time ranges we're seeing, the timeout should be at least 20 minutes.

(Relays with no DirPort will still be used for directory requests once #12538 is merged.)

comment:5 Changed 23 months ago by starlight

Just remembered noticing earlier a strange last address-change date/time for my relay. On looking I see it was affected by this glitch six months ago:

last_changed_address_or_port:2015-06-12 18:00:00

Have log entries for it:

06-12-17-cons:r Binnacle <fp> aWzbTbPNq8OoPHFtMnxMTVOHDuY 15:11:13 108.53.208.157 443 80
06-12-18-cons:r Binnacle <fp> dIaQvTy5UjVYtdRVa2xPDSwhpAw 17:49:21 108.53.208.157 443 0
06-12-19-cons:r Binnacle <fp> RoqngB0ium+unPem09nCwnzBqI4 18:32:44 108.53.208.157 443 80
Jun 12 17:49:17 Tor: Parsing GEOIP IPv4 file ./geoip.
Jun 12 17:49:17 Tor: Configured to measure statistics. Look for the *-stats files that will first be written to the data directory in 24 hours from now.
Jun 12 17:49:17 Tor: Your Tor server's identity key fingerprint is 'Binnacle 4F0DB7E687FC7C0AE55C8F243DA8B0EB27FBF1F2'
Jun 12 17:49:21 Tor: We now have enough directory information to build circuits.
Jun 12 17:49:21 Tor: Bootstrapped 80%: Connecting to the Tor network.
Jun 12 17:49:21 Tor: Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.
Jun 12 17:49:22 Tor: Bootstrapped 85%: Finishing handshake with first hop.
Jun 12 17:49:22 Tor: Bootstrapped 90%: Establishing a Tor circuit.
Jun 12 17:49:23 Tor: Tor has successfully opened a circuit. Looks like client functionality is working.
Jun 12 17:49:23 Tor: Bootstrapped 100%: Done.
Jun 12 17:50:24 Tor: Self-testing indicates your DirPort is reachable from the outside. Excellent.
Jun 12 17:51:25 Tor: Performing bandwidth self-test...done.

In this case DirPort=0 was published just before minute 50 and and DirPort=80 just after, but in the case of kitten both appear to have been pushed just before minute 50:

12-13-09-cons:r kitten1 <fp> vG0ZWLi31UXoqz4H2H/hweSvxqo 04:44:19 62.210.124.124 9001 9030
12-13-10-cons:r kitten1 <fp> vG0ZWLi31UXoqz4H2H/hweSvxqo 04:44:19 62.210.124.124 9001 9030
12-13-11-cons:r kitten1 <fp> yUnPgGOwhq5QfUEe1Eg0zKCU+w0 10:48:46 62.210.124.124 9001 0
12-13-12-cons:r kitten1 <fp> JrZF6+VnnYZuQHRllufg53dbjH8 10:49:48 62.210.124.124 9001 9030
12-13-13-cons:r kitten1 <fp> JrZF6+VnnYZuQHRllufg53dbjH8 10:49:48 62.210.124.124 9001 9030

However I suspect the system clock on that relay's server is skewed and so the actual descriptor pushes were before and after the consensus vote event.

Last edited 23 months ago by starlight (previous) (diff)

comment:6 Changed 23 months ago by starlight

Searched the logs and found two incidents when the relay was started in minute 48 of the hour. Checked the consensus history and the DirPort=0 glitch did not appear either time.

So untill this issue is fixed the advice for anyone wishing to avoid having their relay knocked off the Fallback Directory list is to not start the daemon in minute 49 of the hour and probably avoid minute 48 for good measure. Highly advisable to run ntpd on the server and have accurate time.

comment:7 in reply to:  5 Changed 23 months ago by aeris

Replying to starlight:

However I suspect the system clock on that relay's server is skewed and so the actual descriptor pushes were before and after the consensus vote event.

I’m the owner of kitten1 and it’s NTP-sync’ed
I just check it clock, all seems good.

comment:8 Changed 22 months ago by teor

Keywords: 026-backport 027-backport added; 026-maybe-backport 027-maybe-backport removed
Status: newneeds_review

Please see my branch bug18050 based on maint-0.2.6 on https://github.com/teor2345/tor.git

The commit merges cleanly into maint-0.2.7 and master.
The unit tests and make test-network[-all] pass on maint-0.2.6, maint-0.2.7, and master.
It also passes src/test/test-network.sh --sleep 600 on master with AssumeReachable 0 (chutney usually has AssumeReachable 1, which disables reachability checking).

The fix is a one-line change to decide_if_publishable_server() to restore the original intent of a1f1fa6ab in 0.1.0.1-rc. From that commit message:

Try to pull down a directory via Tor to see if our DirPort is working.
Try to extend a circuit back to us to see if our ORPort is working.
Only publish a descriptor if they're both reachable.

It then makes consequential changes to log messages and the function comment to make the new behaviour clear.

If we decide to backport this, we don't need to do a release for this fix. It just needs to be included in the next minor release in each series, to avoid relay version upgrades triggering this bug.

comment:9 Changed 22 months ago by nickm

Milestone: Tor: 0.2.8.x-finalTor: 0.2.6.x-final

Applied to 0.2.7 and later; considering for 0.2.6 backport.

comment:10 Changed 19 months ago by arma

Keywords: 026-backport removed
Milestone: Tor: 0.2.6.x-finalTor: 0.2.7.x-final
Resolution: fixed
Status: needs_reviewclosed

I vote against backport to 0.2.6 -- mainly because there will likely never be another 0.2.6 release, and if there is, approximately nobody will update to it.

I spoke to teor and he agreed that close is fine.

comment:11 Changed 18 months ago by arma

Resolution: fixed
Status: closedreopened

comment:12 Changed 18 months ago by arma

Owner: set to teor
Status: reopenedassigned

comment:13 Changed 18 months ago by arma

Keywords: TorCoreTeam201605 added
Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.