Opened 8 months ago

Closed 8 months ago

#21302 closed defect (fixed)

hs: Multiple service issues

Reported by: dgoulet Owned by: dgoulet
Priority: Medium Milestone: Tor: 0.3.0.x-final
Component: Core Tor/Tor Version: Tor: 0.2.9.8
Severity: Normal Keywords: tor-hs
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

While investigating #21297, I found several issues. Unfortunately, I can't find the cause of #21297 yet but maybe someone will with that list of issues.

1) Integer overflow in rend_consider_services_intro_points(). We have a safe guard against that *except* if the expiring_nodes list is not empty which is totally a possibility. This is bad because it will ultimately create a LOT of IP circuits and fail in unknown ways...

    if (smartlist_len(service->expiring_nodes) == 0 &&
        intro_nodes_len >= service->n_intro_points_wanted) {
      continue;
    }

    /* Number of intro points we want to open which is the wanted amount
     * minus the current amount of valid nodes. */
    n_intro_points_to_open = service->n_intro_points_wanted - intro_nodes_len;

2) In rend_service_intro_has_opened(), this if is subject to a bad "cast overflow" if we have a case that brings the expiring_nodes size bigger than the number of circuits. The following result is cast as an unsigned int so for instance "5 - 6" is actually a BIG number.

  if ((count_intro_point_circuits(service) -
       smartlist_len(service->expiring_nodes)) >
      service->n_intro_points_wanted) {

The consequence of this is that if we have a bug (maybe #21297!!) that brings the size of expiring_nodes to 6 and then we try to open 5 IP circuits for the service (3 + 2 extras), all 5 will always be closed and it will loop non stop.

3) In remove_invalid_intro_points(), if we put an IP object in the retry_nodes list and this IP also expires, it will be put in the expiring_nodes list of the service and *removed* from the service valid list. After this, we'll retry a circuit to that IP object but since it's not in the valid list anymore, the rend_consider_services_intro_points() will launch a bunch of other IP circuit(s) ignoring the one being retried. This is OK because the one we want to retry should actually expire BUT we could avoid a useless circuit being created since we know it will expire and potential bug of losing the reference to that circuit if we have issues with cleaning up our expiring_nodes list.

My theory is that the solution to #21297 lies in a combination of the above bugs. Either we overflow with (2) or we have IP circuits that we've lost the reference to (1) making (2) always closing the 5 circuits we just opened and thus not having any valid IP object for the service. I don't think (1) is involved but you never know.

Child Tickets

Change History (7)

comment:1 Changed 8 months ago by dgoulet

Owner: set to dgoulet
Status: newassigned

comment:2 Changed 8 months ago by dgoulet

Status: assignedneeds_review

See branch bug21302_030_01.

There are 3 commits, first two are for (1) and (2) and the third one is a new issue I found while fixing all this. We were never cleaning up the expiring nodes list if the circuit associated with an intro point in there didn't exists when considering creating new IPs.

The (3) issue, I'm not too keen on fixing it because it's pretty rare it will happen and requires much more changes to the HS subsystem that we are trying to do as little as modification we can on it.

comment:3 Changed 8 months ago by nickm

In a82e572e9ff5994d2ba14dcc1a25995f6bdda9b7, how do we know that count_intro_point_circuits() is always greater than expiring_nodes_len? In other words, how can we be sure that this subtraction never underflows:

+  num_ip_circuits = count_intro_point_circuits(service) - expiring_nodes_len;

The other commits look okay to me, if you've tested them. (We've actually removed something with the loop in 0342c1a82c56ad1f9ae9b8a536f32355640dc4bd, yeah?)

comment:4 in reply to:  3 Changed 8 months ago by dgoulet

Replying to nickm:

In a82e572e9ff5994d2ba14dcc1a25995f6bdda9b7, how do we know that count_intro_point_circuits() is always greater than expiring_nodes_len? In other words, how can we be sure that this subtraction never underflows:

+  num_ip_circuits = count_intro_point_circuits(service) - expiring_nodes_len;

Great observation. In *theory*, if a node is in the expiring node list it means it has a circuit. This is why the third commit actually improves on that and could be a cause of the unsolved #21297. However, I don't believe we have a good guarantee of this. Real guarantee would be that when we close an HS circuits, we also make sure anything expecting a circuit is cleaned up.

How is this fixup commit looking? c1f6238c.

The other commits look okay to me, if you've tested them. (We've actually removed something with the loop in 0342c1a82c56ad1f9ae9b8a536f32355640dc4bd, yeah?)

Yes it has been tested. Would be nice to have asn's eyes on that as well. It is quite core in the HS subsystem but worth fixing!

comment:5 Changed 8 months ago by asn

Fixes look sensible here after the fixup commit.

I hope the work we put in these legacy functions can be reused for prop224!

comment:6 Changed 8 months ago by dgoulet

Status: needs_reviewmerge_ready

comment:7 Changed 8 months ago by nickm

Resolution: fixed
Status: merge_readyclosed

squashing and merging!

Note: See TracTickets for help on using tickets.