Opened 8 months ago

Last modified 7 days ago

#25686 needs_information defect

Mystery bug causes relays to attempt many many descriptor publishes, with no X-Desc-Gen-Reason header

Reported by: arma Owned by: nickm
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: 034-roadmap-proposed, regression, 033-backport, 034-removed-must-2018-09-05, 032-unreached-backport
Cc: Actual Points:
Parent ID: Points:
Reviewer: dgoulet Sponsor:

Description

Sometimes relays get into a state where they try to publish a new descriptor every second, and this state lasts for hours or days.

I instrumented moria1 to keep track of the X-Desc-Gen-Reason headers in each descriptor upload attempt:

diff --git a/src/or/directory.c b/src/or/directory.c
index c419b61..4570b20 100644
--- a/src/or/directory.c
+++ b/src/or/directory.c
@@ -5102,6 +5102,15 @@ directory_handle_command_post,(dir_connection_t *conn, const char *headers,
     const char *msg = "[None]";
     uint8_t purpose = authdir_mode_bridge(options) ?
                       ROUTER_PURPOSE_BRIDGE : ROUTER_PURPOSE_GENERAL;
+
+    {
+      char *genreason = http_get_header(headers, "X-Desc-Gen-Reason: ");
+      log_info(LD_DIRSERV,
+               "New descriptor post, because: %s",
+               genreason ? genreason : "not specified");
+      tor_free(genreason);
+    }
+
     was_router_added_t r = dirserv_add_multiple_descriptors(body, purpose,
                                              conn->base_.address, &msg);
     tor_assert(msg);

And then I counted up the number of upload attempts of each type over the last two-ish weeks:

$ grep "New descriptor post, because:" moria1-info|cut -d: -f5-|sort|uniq -c
  20685  bandwidth has changed
      1  Chosen Or/DirPort changed
  53191  config change
    601  configured managed proxies
  41663  DirPort found reachable
     96  dns resolvers back
    191  IP address changed
 131619  not listed in consensus
1647625  not specified
  31440  ORPort found reachable
   8518  rotated onion key
     28  set onion key
 120487  time for new descriptor
    156  Tor just started
  25362  version listed in consensus is quite old

Now, the weird "not listed in consensus" and "version listed in consensus is quite old" ones are #25685.

But there are a huge number that are simply lacking this header. These tend to come from a relatively small number of relays that are just bombing me with publish attempts.

In fact, out of those 1647625 attempts that didn't provide a reason, nearly all of them got discarded:

$ grep -A1 "New descriptor post, because: not specified" moria1-info|grep "Not replacing descriptor"|wc -l
1645051

We should try to track down what bug on the relay side causes republishes without including a reason header. Maybe we do this by examining the code and looking for mistakes where a republish can happen without also setting the reason header?

See #3942 for a time long ago that we had problems with listing our reason. And see #21642 for where a lot of this analysis started.

Child Tickets

Change History (16)

comment:1 Changed 8 months ago by arma

Reason Num attempts Num discarded
bandwidth has changed 20702 109
Chosen Or/DirPort changed 1 0
config change 53271 121
configured managed proxies 601 432
DirPort found reachable 41703 22767
dns resolvers back 96 58
IP address changed 193 8
not listed in consensus 131741 61425
not specified 1650992 1647089
ORPort found reachable 31474 1795
rotated onion key 8527 0
set onion key 28 8
time for new descriptor 120656 0
Tor just started 157 21
version listed in consensus is quite old 25389 9370
<total> 2085515 1743203
Last edited 8 months ago by arma (previous) (diff)

comment:2 Changed 8 months ago by nickm

Could I have the "not specified"s broken down by version?

comment:3 Changed 8 months ago by nickm

Keywords: 034-must regression 033-backport added
Milestone: Tor: 0.3.4.x-final

comment:4 Changed 8 months ago by arma

Ok, I added

@@ -683,6 +683,8 @@ dirserv_add_descriptor(routerinfo_t *ri, const char **msg, c
onst char *source)
     goto fail;
   }
 
+  log_info(LD_DIR, "Assessing new descriptor: %s", ri->platform);
+
   /* Check whether this descriptor is semantically identical to the last one
    * from this server.  (We do this here and not in router_add_to_routerlist
    * because we want to be able to accept the newest router descriptor that

and am now running

tail -f moria1-info|grep -A1 "not specified"|grep Assessing

So far it's just getting bombed by two relays:
Me0W:
https://metrics.torproject.org/rs.html#details/91954A36BB0C32861325D99CCC264E619C75C437
running 0.2.5.16
and happachwg:
https://metrics.torproject.org/rs.html#details/64342130F715C4CD0DFF7C207D996600F41CE7C1
running 0.2.5.14
but I'll leave it for some days and see if we get more variety.

comment:6 Changed 8 months ago by arma

Ok, and the node-Tor implementations don't include the header either, but that's not entirely surprising. :)

comment:7 in reply to:  2 Changed 7 months ago by arma

Replying to nickm:

Could I have the "not specified"s broken down by version?

1420799 happachwg: Tor 0.2.5.14 on Linux
1060285 oscarettor: Tor 0.2.5.16 on Linux
 312064 Me0W: Tor 0.2.5.16 on Linux
 150777 Unnamed: Tor 0.2.5.14 on Linux
  84782 ChateauTor2: Tor 0.2.5.16 on Linux
  47898 Actarus: Tor 0.2.4.29 on Linux
  32897 Netwars: Tor 0.2.5.16 on Linux
   2184 Tor 0.2.5.16 on Linux
   1722 Tor 0.2.5.14 on Linux
    968 cherryjam1: Tor 0.2.5.16 on Linux
    554 default: Tor 0.2.4.23 on Windows 7
    505 ordb3: node-Tor 0.1.0 on Linux x86_64
    505 ordb1: node-Tor 0.1.0 on Linux x86_64
    504 ordb2: node-Tor 0.1.0 on Linux x86_64
    477 WestCoastTor07: Tor 0.3.2.10 on Linux
    441 tspcnt: Tor 0.3.2.10 on Linux
    323 cherryjam: Tor 0.2.5.16 on Linux
    165 Unnamed: Tor 0.2.2.35 (git-b04388f9e7546a9f) on Windows 7 Service Pack 1 [workstation]
    141 L29Ah: Tor 0.3.3.2-alpha on Linux
    103 fuzzycat: Tor 0.3.2.10 on Linux
     90 ididntedittheconfig: Tor 0.2.1.32 (git-23da6c2e5051b65a) on Linux i686
     89 gatsby1: Tor 0.3.2.10 on Linux
     73 Unnamed: Tor 0.2.2.39 (git-2f7e71c2e896772f) on Linux x86_64
     71 Unnamed: Tor 0.2.2.35 (git-73ff13ab3cc9570d) on Linux x86_64
     65 algebrasibling: Tor 0.3.1.9 on Linux
     46 darksoul03: Tor 0.3.2.10 on Linux
     45 RaysTorServer: Tor 0.2.1.31 (git-8435cd95d2acc0c2) on Linux x86_64
     44 linode: Tor 0.2.1.32 (git-23da6c2e5051b65a) on Linux i686
     44 jojo07: Tor 0.3.1.9 on Linux
     44 darksoul02: Tor 0.3.2.10 on Linux
     44 Caprica: Tor 0.2.2.39 (git-2f7e71c2e896772f) on Linux i686
     42 darksoul04: Tor 0.3.2.10 on Linux
     42 BFFofDURRUTI: Tor 0.2.2.35 (git-73ff13ab3cc9570d) on Linux i686
     40 georgeorwell: Tor 0.2.2.35 (git-b04388f9e7546a9f) on Linux i686
     39 Voltaire: Tor 0.2.2.35 (git-b04388f9e7546a9f) on Darwin i386
     36 ivanzoid: Tor 0.2.2.35 (git-73ff13ab3cc9570d) on Linux i686
     36 armandothegreat: Tor 0.2.2.35 (git-73ff13ab3cc9570d) on Linux x86_64
     30 xvmccpli: Tor 0.2.2.35 (git-73ff13ab3cc9570d) on Linux i686
     30 Unnamed: Tor 0.2.2.39 (git-2f7e71c2e896772f) on Linux i686
     30 torrific: Tor 0.2.2.35 (git-4f42b0a93422f70e) on Linux i686
     30 EvilWire: Tor 0.2.2.39 (git-2f7e71c2e896772f) on Linux x86_64
     26 unanimous: Tor 0.2.2.33 (git-8522652d8e9213d4) on Very recent version of Windows [major=6,minor=2]  [workstation]
     23 M0aUiT7b5vZ3: Tor 0.2.1.29 (r318f470bc5f2ad43) on Linux x86_64
     15 Unnamed: Tor 0.2.1.26 on Very recent version of Windows [major=6,minor=1] Service Pack 1 [workstation] {personal} {terminal services, single user}
     14 theRipper1: Tor 0.3.2.10 on Linux
     11 BlueSteel: Tor 0.3.2.10 on Linux
      6 Ms3cTOR56: Tor 0.3.2.10 on Linux
      6 default: Tor 0.2.2.35 (git-b04388f9e7546a9f) on Windows 8  [workstation]
      4 Unnamed: Tor 0.2.1.30 on Darwin x86_64
      2 Unnamed: Tor 0.2.1.26 on Very recent version of Windows [major=6,minor=2]  [workstation] {personal} {terminal services, single user}
      2 leanhvu: Tor 0.2.1.26 on Windows XP Service Pack 4 [workstation] {terminal services, single user}
      2 BaTMan: Tor 0.2.2.35 (git-b04388f9e7546a9f) on Windows 7 Service Pack 1 [workstation]
      1 wrogregpx2: Tor 0.2.9.14 on Linux
      1 Unnamed: Tor 0.2.2.39 (git-bec76476efb71549) on Windows XP Service Pack 2 [workstation]
      1 Unnamed: Tor 0.2.2.37 (git-fce6eb1c44e87bc2) on Windows XP Service Pack 3 [workstation]
      1 Unnamed: Tor 0.2.2.35 (git-b04388f9e7546a9f) on Darwin x86_64
      1 Unnamed: Tor 0.2.1.29 (r8e9b25e6c7a2e70c) on Very recent version of Windows [major=6,minor=1]  [workstation] {terminal services, single user}
      1 TorOp01: Tor 0.2.9.14 on Linux
      1 torexit42: Tor 0.2.9.14 on Linux
      1 Tor 0.3.2.10 on Linux
      1 relayJuice: Tor 0.2.1.26 on Very recent version of Windows [major=6,minor=1] Service Pack 1 [workstation] {terminal services, single user}
      1 prserver: Tor 0.2.9.15 on Linux
      1 OneMoreLinuxTor4: Tor 0.2.9.14 on Linux
      1 nsalitomerice: Tor 0.2.9.14 on Linux
      1 mesrouilles: Tor 0.2.9.14 on Linux
      1 Maddyx: Tor 0.2.9.14 on Linux
      1 KingJasperTheGreat: Tor 0.2.9.14 on Linux
      1 100Mbps4u: Tor 0.3.0.9 on Linux

comment:8 Changed 6 months ago by nickm

So, it appears that the lead offenders here are the ones running 0.2.5 and earlier. That's good. The answer here is probably to just those IPs if they're being annoying.

On the other hand, it seems that some Tors running versions with the #22885 fix of 7f329206488cf84 are still sometimes sending descriptors with no reasons. That's bad; I thought we fixed that.

comment:9 Changed 6 months ago by nickm

Owner: set to nickm
Status: newaccepted

comment:10 Changed 5 months ago by nickm

Here are the aberrant entries, in case anything weird jumps out about them.

    477 WestCoastTor07: Tor 0.3.2.10 on Linux
    441 tspcnt: Tor 0.3.2.10 on Linux
    141 L29Ah: Tor 0.3.3.2-alpha on Linux
    103 fuzzycat: Tor 0.3.2.10 on Linux
     89 gatsby1: Tor 0.3.2.10 on Linux
     46 darksoul03: Tor 0.3.2.10 on Linux
     44 jojo07: Tor 0.3.1.9 on Linux
     44 darksoul02: Tor 0.3.2.10 on Linux
     42 darksoul04: Tor 0.3.2.10 on Linux
     14 theRipper1: Tor 0.3.2.10 on Linux
     11 BlueSteel: Tor 0.3.2.10 on Linux
      6 Ms3cTOR56: Tor 0.3.2.10 on Linux
      1 Tor 0.3.2.10 on Linux

comment:11 Changed 5 months ago by nickm

Status: acceptedneeds_review

Please have a look at my branch bug25686_diagnostic_032, with PR at https://github.com/torproject/tor/pull/142 .

There are four possible reasons for a Tor reason to upload without X-Desc-Gen-Reason:

  1. They are running an old version of our software, before 0.3.2.stable.
  2. They are not running our software, but they are claiming they are.
  3. They are uploading through a proxy that strips X-Desc-Gen-Reason.
  4. They somehow had a bug in their software.

I think that everybody who has this problem IRL is in condition 1, 2, or 3. This patch is an attempt to make it so 4 is very very obvious, so we can diagnose it easily. It's potentially backportable as far as 0.3.2.

comment:12 Changed 5 months ago by nickm

Keywords: 032-backport added

comment:13 Changed 5 months ago by dgoulet

Reviewer: dgoulet
Status: needs_reviewmerge_ready

Considering the goal of (4), I think this patch looks good.

I wonder if Roger would take a look before merging but considering his schedule, we can always improve later if needed.

comment:14 Changed 5 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified
Status: merge_readyneeds_information

Okay, merging this to 0.3.2 and forward. Putting it in "needs information" in case we learn anything from these new messages.

comment:15 Changed 3 months ago by teor

Keywords: 034-removed-must-2018-09-05 added; 034-must removed

These tickets were removed from 0.3.4, so they can't be 034-must.

comment:16 Changed 7 days ago by teor

Keywords: 032-unreached-backport added; 032-backport removed

0.3.2 is end of life, so 032-backport is now 032-unreached-backport.

Note: See TracTickets for help on using tickets.