Opened 3 years ago

Closed 3 years ago

#20610 closed defect (fixed)

Rate limit router_pick_published_address log message

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: 0.2.9.x-final
Component: Core Tor/Tor Version: Tor: 0.2.8.9
Severity: Normal Keywords: regression
Cc: sirmatt@…, Knight Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

At some point, we started looking at our own address much more often. This made the following message appear in the log very often:

router_pick_published_address(): Success: chose address 'x.x.x.x'

Child Tickets

Attachments (1)

stdout-tor-1000.log.tgz (107.2 KB) - added by pastly 3 years ago.
Log file from "never" used relay in a Shadow simulation. Decompresses to ~5MiB

Download all attachments as: .zip

Change History (17)

comment:1 Changed 3 years ago by teor

Keywords: regression added
Milestone: Tor: 0.2.9.x-final

We should probably fix this in 0.2.9?

comment:2 Changed 3 years ago by pastly

Cc: sirmatt@… added

comment:3 Changed 3 years ago by arma

I think we should track down when it got much more commonly called, and refactor or cache or whatever so it stops needing to do that.

For example, we ask ourselves if we know our address when making a decision about whether to go to authorities or other caches when fetching dir info.

comment:4 Changed 3 years ago by nickm

as a kludge, you can just throw in a tor_assert_nonfatal_unreached() in your local copy of router_pick_published_address() and see what the stack traces say...

comment:5 Changed 3 years ago by Knight

Cc: Knight added

comment:6 Changed 3 years ago by nickm

Actually, it looks like this is already cached, in config.c. So there's no harm in just moving the message. What do you think of just moving this log message from router_pick_published_address() to the end of resolve_my_address() ?

comment:7 Changed 3 years ago by teor

Seems sensible to me.

comment:8 Changed 3 years ago by teor

#20423 contains a report of this message being logged excessively, along with other issues.

comment:9 Changed 3 years ago by arma

I think the architecture fix that we will want is to recognize that for most of the calls to router_pick_published_address(), and especially the ones that we call often, what we really mean to ask is "did we figure out our address yet?", not "please go figure out our address if we haven't done it yet". So if we pass an "only look at the cached variables" option to the function, which asks it not to call the heavyweight functions, we'll be a lot happier. We'll want to make sure that we call the heavyweight functions once near startup. Those steps will make things better for the future.

But first! There is a mystery: It looks like in some circumstances we started calling this function a crazy number of times, starting in 0.2.8.x. And we didn't do that in 0.2.7.x. What changed?

I would indeed like to get some call graphs or stack traces or debug lines the like, for when it happens a whole lot.

Fortunately, on https://lists.torproject.org/pipermail/tor-relays/2016-November/010926.html, pastly said

I run Tor network simulations and see this all the time. I haven't taken
the time to look into it, and always have assumed it's just a minor
misconfiguration in the simulator or something that only exists
_because_ it's a simulation.

so I'm going to take that as a "yes, this crazy number of calls to router_pick_published_address is repeatable, and also I can help debug it." Is that a fair hope, pastly? :)

comment:10 Changed 3 years ago by arma

router_pick_published_address() is called in maint-0.2.9 from five places:

  • In control.c's getinfo_helper_misc(), in response to a 'getinfo address' request. I think it's fair to say that's not our culprit.
  • In dirserv.c's directory_fetches_from_authorities(). This is very likely the problem.
  • In router.c's router_rebuild_descriptor() when the current descriptor is dirty, to see if we don't know our address so we should short-circuit the rest of this function until we do.
  • In router.c's router_build_fresh_descriptor() to make sure we didn't try to build a descriptor mistakenly. I don't think rebuilding the descriptor happens too often, but hey I could be wrong.
  • In transports.c's pt_get_extra_info_descriptor_string(), to figure out what address to write in our extrainfo descriptor when we're offering a pluggable transport. I'm also not worried about this one.

comment:11 Changed 3 years ago by arma

directory_fetches_from_authorities() is called in maint-0.2.9 from six places:

  • directory.c's directory_get_from_dirserver(), when we launch a new directory fetch/post of any kind.
  • directory.c's dir_routerdesc_download_failed() and dir_microdesc_download_failed(), which are called when we didn't get a descriptor we wanted, to decide how to proceed.
  • dirserv.c's directory_fetches_dir_info_early(), which is used to decide schedules for fetching the new consensus and for fetching new descriptors we just learned about.
  • main.c's directory_info_has_arrived() which we call whenever we finish receiving any directory object.
  • main.c's fetch_networkstatus_callback(), which is called once a second during bootstrap and once a minute otherwise.
  • networkstatus.c's networkstatus_consensus_can_use_extra_fallbacks(), which is new and called from a variety of places so I'm suspicious of it.

Changed 3 years ago by pastly

Attachment: stdout-tor-1000.log.tgz added

Log file from "never" used relay in a Shadow simulation. Decompresses to ~5MiB

comment:12 Changed 3 years ago by arma

Preliminary instrumentation makes me think the bursts are download_status_reset() calling find_dl_schedule() calling networkstatus_consensus_can_use_extra_fallbacks() calling directory_fetches_from_authorities() calling router_pick_published_address().

Oh hey, is download_status_reset() being called for every single descriptor?

comment:13 Changed 3 years ago by arma

For posterity, the commit in my 20610-instrument branch is what led me to the above statement.

You can run it as a relay, and grep your info-level logs for "router_pick_published_address" and "find_dl_schedule".

comment:14 Changed 3 years ago by arma

I posted a patch for this issue to #20423.

comment:15 Changed 3 years ago by pastly

Patch on #20423 works to reduce spam.

comment:16 Changed 3 years ago by nickm

Resolution: fixed
Status: newclosed

Cool. Since the patch on #20423 was was merged, I'll close this ticket too. Please let me know if it should still be open.

Note: See TracTickets for help on using tickets.