Opened 5 years ago

Closed 4 years ago

#12170 closed defect (fixed)

Investigate performance issues surrounding count_usable_descriptors()

Reported by: nickm Owned by:
Priority: Medium Milestone: Tor: 0.2.5.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: 024-backport tor-relay performance
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

According to a gprof output generated by Andrea (#11322), her busy Tor node called count_usable_descriptors 65368 times, mostly from router_have_minimum_dir_info(). This is expensive because it iterates over all the nodes and does a lot of siphash / digestmap / tor_memeq stuff.

Why are we calling router_have_minimum_dir_info() so much? Almost entirely because of second_elapsed_callback().

But why is router_have_minimum_dir_info() invoking update_router_have_minimum_dir_info so often? If I'm reading these numbers right, it's doing so once every 5 calls. That's not right; it's supposed to cache the result of update_router_have_minimum_dir_info() for a long time, until somebody calls router_dir_info_changed(). Who is doing that?

According to that profile, the top two callers are:

                0.00    0.00    4851/23430       router_add_to_routerlist [266]
                0.00    0.00   17823/23430       channel_do_open_actions <cycle 2> [144]

router_add_to_routerlist() calls should be clustered; they shouldn't cause most of the re-invocations of update_router_have_minimum_dir_info(). So let's look at channel_do_open_actions().

It's calling router_set_status(), which is calling router_dir_info_changed unconditionally!

Two issues there:

  • I'm not sure that router_set_status should be calling router_dir_info_changed at all. Does changing our opinion about a node's is_running status count as a change in whether we know most of the nodes in the network? Should it?
  • It surely shouldn't be calling it unconditionally.


Child Tickets

Change History (9)

comment:1 Changed 5 years ago by nickm

Status: newneeds_review

Candidate fix in bug12170_024 in my public repository. This is a very simple backport candidate.

comment:2 Changed 5 years ago by nickm

Parent ID: #11332

comment:3 in reply to:  description Changed 5 years ago by arma

Replying to nickm:

Does changing our opinion about a node's is_running status count as a change in whether we know most of the nodes in the network? Should it?

I think the answer is currently yes here -- if we mark a lot of relays down, then we might not have enough relays available to make a (good enough) circuit.

The broader question I guess is whether we should be believing our local view over the consensus view -- traditionally they differed because something changed since the consensus that we saw with our own eyes, but we can also imagine that some relays aren't reachable from our location, or active attacks, or etc. That's not really quite this question though.

comment:4 Changed 5 years ago by arma

Re your patch, I think the change is fine. Two questions:

First:

+    if (bool_neq(node->is_running, up))
+      router_dir_info_changed();
+
     node->is_running = up;

Why not put the assignment inside the if also, since we know it won't happen otherwise and it's clearer to say it? Or is it clearer to show that no matter what, afterwards the value of is_running will be the same as the value of up? This is a style question.

Second: in your commit message, "This patch makes us call it only when we have a router that we previously believed to be down, and we found it to be up." -- note that router_set_status() can be called with "up" false, meaning it's down, and I think your patch affects that side of the situation too (and it should).

comment:5 in reply to:  4 Changed 5 years ago by nickm

Replying to arma:

Re your patch, I think the change is fine. Two questions:

First:

+    if (bool_neq(node->is_running, up))
+      router_dir_info_changed();
+
     node->is_running = up;

Why not put the assignment inside the if also, since we know it won't happen otherwise and it's clearer to say it? Or is it clearer to show that no matter what, afterwards the value of is_running will be the same as the value of up? This is a style question.

I favor leaving it as it is, for clarity. YMMV.

Second: in your commit message, "This patch makes us call it only when we have a router that we previously believed to be down, and we found it to be up." -- note that router_set_status() can be called with "up" false, meaning it's down, and I think your patch affects that side of the situation too (and it should).

True, that would be more accurate. I was thinking about the case mentioned in the commit, since that case was >99.9% of the calls in Andrea's profile

comment:6 Changed 5 years ago by arma

Ok. With a change to the "This patch makes us call it only when" paragraph in the commit message, and append "or not running" to "or marked a router as running" in the changes entry, I think we're good to go here. Thanks!

comment:7 Changed 5 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.4.x-final

The branch with arma's suggested fixes is bug12170_024_v2. I have merged it to master and am marking for possible 0.2.4 backport.

comment:8 Changed 4 years ago by nickm

Parent ID: #11332

comment:9 Changed 4 years ago by nickm

Milestone: Tor: 0.2.4.x-finalTor: 0.2.5.x-final
Resolution: fixed
Status: needs_reviewclosed

These aren't getting backported. Recommended solution: upgrade to 0.2.5.x stable releases.

Note: See TracTickets for help on using tickets.