Opened 4 weeks ago

Last modified 4 weeks ago

#30291 new enhancement

Optimize our path selection code

Reported by: dgoulet Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-performance tor-hs path-selection refactoring tor-dos
Cc: Actual Points:
Parent ID: #30221 Points: 10
Reviewer: Sponsor: Sponsor27-can


(This is a more complete ticket than #13739 and thus supersedes it.)

An onion service can be ask to open many rendezvous circuit by a client. With a vanilla Tor, the resource consumption there is roughly 1:2 (service:client) because the client needs to open two circuits, one to the RP and one to the IP and then sends one cell on that circuit to introduce. The service will do a bit less than that because its introduction circuit is already established so it simply needs to open a rendezvous circuit.

However, an attacker (DoS) can make a client skip the RP circuit creation, pin relays in a path to the introduction point and simply open an introduction circuit each time through that path (which in theory should be made very fast and also controlled by the attacker).

This means that the work ratio goes from 1:2 to 1:1 because the client only opens one circuit, send a single cell, close it and repeat.

But, in reality, it gets worst for the service because of the performance of our path selection code. It is heavy. For starter, we need to randomly select 3 nodes from the consensus for each rendezvous circuit. But to select those nodes, we go over all of them for each node to exclude some, include others, go through a series of tests such has tor_addr_family() and node_has_preferred_descriptor() that are quite heavy.

The CPU profile of an overloaded service shows that the majority of CPU is spent in selecting these nodes. Here is a perf report output for the top 5:

+   11.10%  tor      tor                    [.] smartlist_remove                                                                                         
+   10.96%  tor      tor                    [.] fmonty                                                                                                   
+    8.77%  tor      tor                    [.] tor_memeq                                                                                                
+    6.56%  tor      tor                    [.] tor_addr_family                                                                                          
+    5.40%  tor      tor                    [.] tor_addr_is_null             

The top smartlist_remove() comes from:

   - smartlist_remove                                                                                                                                    
      - 11.10% smartlist_subtract                                                                                                                        
         - circuit_launch_by_extend_info                                                                                                                 
            - 10.93% launch_rendezvous_point_circuit                                                                                                     

Amazingly enough, we spend more time selecting path than doing crypto on a service that gets DDoS by introduction requests (#29607).

There aren't many straight forward approach here. Optimizing the code path bits by bits could be an approach. Most likely, our first step is to have a benchmark for our current path selection code and try to improve on it incrementally.

There are also more hackish approach like pre-building a list of RP nodes at each consensus (or when torrc options are changed like ExcludeNodes) and then randomly picking nodes from there would go faster because they wouldn't need to go through long iterations of tests, it would be already done.

One example is router_choose_random_node() that goes over the entire nodes list just to pick nodes to exclude, then again to test each nodes router_add_running_nodes_to_smartlist() and then potentially does a series of smartlist_subtract() that iterate over the entire list again. This is in theory O(n) + O(n) + O(n) actually theoretically being O(n) but reality is far from the theory here in CPU consumption ;).

Child Tickets

#30307defectclosednickmMake router_choose_random_node() linear instead of quadratic
#30308defectclosednickmUse tor_mem_is_zero to check for broken node identity in node_is_a_configured_bridge()

Change History (5)

comment:1 Changed 4 weeks ago by nickm

Another possibility here is to use another data structure besides a smartlist in this context. If we need to be able to remove elements in O(1), it might make sense to use a pointer-indexed hashtable, and then to switch over to an array only when we're choosing by weighted bandwidth.

A further issue I see above is that, with the exception of fmonty, the other functions you're listing above really ought to be trivial. Either their call overhead is too high and we should inline them, or they're doing more than they need to do, or we are calling them waaaay too much.

comment:2 Changed 4 weeks ago by dgoulet

Here are more details from my profile on the rest of the CPU offenders:

-    8.77%  tor      tor                    [.] tor_memeq                                                                                       
   - tor_memeq                                                                                                                                  
      - 8.63% tor_digest_is_zero                                                                                                                
         - 8.62% node_is_a_configured_bridge                                                                                                    
            + node_has_preferred_descriptor      


-    6.56%  tor      tor                    [.] tor_addr_family                                                                                 
   - tor_addr_family                                                                                                                            
      - 4.99% tor_addr_is_null                                                                                                                  
         - 3.96% tor_addr_is_valid                                                                                                              
            - 3.66% node_get_pref_ipv6_orport                                                                                                   
               + 3.66% nodelist_add_node_and_family  


-    5.40%  tor      tor                    [.] tor_addr_is_null                                                                                
   - tor_addr_is_null                                                                                                                           
      - 2.76% addrs_in_same_network_family                                                                                                      
         - 2.76% nodelist_add_node_and_family                                                                                                   
            + circuit_launch_by_extend_info                                                                                                     
      - 2.15% tor_addr_is_valid                                                                                                                 
         + 1.34% node_get_pref_ipv6_orport                                                                                                      
         + 0.81% bridge_exists_with_ipv6_addr_and_port

comment:3 Changed 4 weeks ago by nickm

I've tried to fix up a couple of these issues in a pair of child tickets. There will be more work to do here though.

comment:4 in reply to:  description Changed 4 weeks ago by nickm

Replying to dgoulet:

This is in theory O(n) + O(n) + O(n) actually theoretically being O(n) but reality is far from the theory here in CPU consumption ;).

For the record, it's actually worse. smartlist_subtract(l1, l2) is not linear, but quadratic: it runs in O(smartlist_len(l1) * smartlist_len(l2). Fix (for this case of it) in #30307

comment:5 Changed 4 weeks ago by dgoulet

Performance Update! Now that both child tickets have been merged, here is the new profile offenders:

-   12.54%  tor      tor                    [.] tor_addr_is_null                                                                                                                               
   - tor_addr_is_null                                                                                                                                                                          
      - 8.59% tor_addr_is_valid                                                                                                                                                                
         - 7.80% node_get_pref_ipv6_orport                                                                                                                                                     
            + 7.80% nodelist_add_node_and_family                                                                                                                                               
         + 0.79% bridge_exists_with_ipv6_addr_and_port                                                                                                                                         
      - 3.94% addrs_in_same_network_family                                                                                                                                                     
         - 3.94% nodelist_add_node_and_family                                                                                                                                                  
            + circuit_launch_by_extend_info                                                                                                                                                    
+    8.43%  tor      tor                    [.] curve25519_donna                                                                                                                               
-    6.20%  tor      tor                    [.] microdesc_has_curve25519_onion_key                                                                                                             
   - node_get_curve25519_onion_key                                                                                                                                                             
      - 6.20% node_has_curve25519_onion_key                                                                                                                                                    
         - 3.11% router_add_running_nodes_to_smartlist                                                                                                                                         
            - router_choose_random_node                                                                                                                                                        
               + 3.10% choose_good_middle_server                                                                                                                                               
         - 3.09% count_acceptable_nodes.isra.2                                                                                                                                                
            + circuit_launch_by_extend_info                                                                                                                                       
+    5.13%  tor           [.] 0x0000000000145533                                                                                                                             
-    4.04%  tor      tor                    [.] node_get_prim_orport                                                                                                                           
   - node_get_prim_orport                                                                                                                                                                      
      - 4.04% node_get_addr                                                                                                                                                                    
         - 4.04% nodelist_add_node_and_family                                                                                                                                                  
            + circuit_launch_by_extend_info     

To summarize the above, it appears that we spend a lot of times in nodelist_add_node_and_family() doing some address validation during our path selection.

Then we do check a lot of memories with tor_mem_is_zero() within the node_get_curve25519_onion_key() that basically use it to figure out if the key is set or not.

Overall, to give an idea of the rest of the profile, majority is spent looking at IP addresses... It appears we basically only do that during path selection with a bit of "getting the bw weight" ;):

+    3.80%  tor      tor                    [.] tor_addr_family.isra.1                                                                                                                         
+    3.44%  tor      tor                    [.] tor_addr_compare_masked                                                                                                                        
+    3.02%  tor      tor                    [.] addrs_in_same_network_family                                                                                                                   
+    2.57%  tor      tor                    [.] node_get_pref_ipv6_orport                                                                                                                      
+    2.42%  tor      tor                    [.] tor_addr_copy                                                                                                                                  
+    2.27%  tor      tor                    [.] tor_addr_is_valid                                                                                                                              
+    2.10%  tor      tor                    [.] tor_addr_to_ipv4n.isra.0                                                                                                                       
+    1.93%  tor      tor                    [.] node_is_a_configured_bridge                                                                                                                    
+    1.86%  tor      tor                    [.] compute_weighted_bandwidths                                                                                                                    
+    1.71%  tor      tor                    [.] tor_mem_is_zero                                                                                                                                
+    1.71%  tor      tor                    [.] tor_port_is_valid                                                                                                                              
+    1.46%  tor      tor                    [.] tor_addr_is_valid_ipv4n                                                                                                                        
+    1.30%  tor      tor                    [.] bridge_exists_with_ipv4h_addr_and_port                                                                                                         
+    1.25%  tor      tor                    [.] tor_addr_from_ipv4n 
+    1.13%  tor      tor                    [.] node_has_preferred_descriptor                                                                                                                  
+    1.13%  tor      tor                    [.] tor_addr_to_in6_assert                                                                                                                         
+    1.10%  tor      tor                    [.] __bswap_32                                                                                                                                     
+    1.09%  tor      tor                    [.] tor_addr_to_ipv4h                                                                                                                              
+    1.07%  tor      tor                    [.] get_configured_bridge_by_addr_port_digest                                                                                                      

These almost all comes from within nodelist_add_node_and_family().

Note: See TracTickets for help on using tickets.