Opened 11 months ago

Last modified 5 weeks ago

#27241 needs_information enhancement

Extract information from more kinds of wedged directory connections.

Reported by: nickm Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: fast-fix, 035-deferred-20180930, 040-deferred-201915
Cc: Actual Points:
Parent ID: Points:
Reviewer: mikeperry Sponsor:

Description

In main.c, we do:

    /* This check is temporary; it's to let us know whether we should consider
     * parsing partial serverdesc responses. */
    if (conn->purpose == DIR_PURPOSE_FETCH_SERVERDESC &&
        connection_get_inbuf_len(conn) >= 1024) {
      log_info(LD_DIR,"Trying to extract information from wedged server desc "
               "download.");
      connection_dir_reached_eof(TO_DIR_CONN(conn));

We should also, at a minimum, do this for microdesc downloads.

Child Tickets

Change History (16)

comment:1 Changed 11 months ago by nickm

Owner: set to nickm
Status: assignedaccepted

comment:2 Changed 11 months ago by nickm

Sponsor: Sponsor8-can
Status: acceptedneeds_review

See branch feature_27241, PR at https://github.com/torproject/tor/pull/286

(Calling this sponsor8 since it should help save BW)

comment:3 Changed 11 months ago by dgoulet

Reviewer: mikeperry

comment:4 Changed 11 months ago by mikeperry

Status: needs_reviewneeds_information

I'm wondering if we test various kinds of partial descriptor parsing for microdescs and extrainfo.

Particularly handle_response_fetch_microdesc() and router_load_extrainfo_from_string().

Have these functions been exposed to partial downloads and various kinds of fragmented descriptors before?

If not, we should test them for that first. If so, then this is fine by me.

comment:5 Changed 11 months ago by mikeperry

To help me understand the consequences of this (and to figure out if I need to also do anything here for #25573), I have the following questions:

  1. How do stalled dirconns translate to their linked edge_connection_ts? Can the edge connection "stall" first and/or get closed somewhere else, or will the dirconn always be the thing that stalls?
  2. What are the consequences to the dir conn if some other code marks the edge connnection is closed first?
  3. When the dirconn is closed first (such as here or elsewhere in the code), where in the code do we decide to mark/clean up/close the corresponding linked edge conn? I could not find this..

For #25573, I'm particularly concerned about multihop hsdir dirconns. But if we ever make regular directory activity multihop, this could apply there too. Plus the linked conn close conditions may be relevant to this bug?

comment:6 in reply to:  5 ; Changed 11 months ago by nickm

Replying to mikeperry:

To help me understand the consequences of this (and to figure out if I need to also do anything here for #25573), I have the following questions:

  1. How do stalled dirconns translate to their linked edge_connection_ts? Can the edge connection "stall" first and/or get closed somewhere else, or will the dirconn always be the thing that stalls?

I don't think we have any logic for an edge_connection_t to time out; in princple you can keep an ssh connection running over Tor indefinitely. Directory connections, however, do time out.

  1. What are the consequences to the dir conn if some other code marks the edge connnection is closed first?

When one of a pair of linked connections is finally closed from connection_unlink in main.c, the other connection gets unlinked from it, and told to read all of its data from its input.

So if something else closes the edge connection, the dir connection will get one last chance to handle any data queued on it before it is closed.

  1. When the dirconn is closed first (such as here or elsewhere in the code), where in the code do we decide to mark/clean up/close the corresponding linked edge conn? I could not find this..

Hmm. I wonder if nothing closes it, actually. Can you think of some experiment we could do to find out?

For #25573, I'm particularly concerned about multihop hsdir dirconns. But if we ever make regular directory activity multihop, this could apply there too. Plus the linked conn close conditions may be relevant to this bug?

comment:7 in reply to:  5 Changed 11 months ago by teor

Replying to mikeperry:

...
For #25573, I'm particularly concerned about multihop hsdir dirconns. But if we ever make regular directory activity multihop, this could apply there too. Plus the linked conn close conditions may be relevant to this bug?

In some cases, bridge clients fetch authority certificates over multihop connections:
https://github.com/torproject/tor/blob/94605f08fb89ea79409225362d2fa0f8a07435d7/src/feature/nodelist/routerlist.c#L952

And do most other actions over multihop connections:
https://github.com/torproject/tor/blob/48632455a5bd679d5f97c5137f24f91e564abad6/src/feature/dircache/directory.c#L185

And bridge relays upload descriptors over multihop connections:
https://github.com/torproject/tor/blob/48632455a5bd679d5f97c5137f24f91e564abad6/src/feature/dircache/directory.c#L379

comment:8 in reply to:  6 Changed 10 months ago by mikeperry

Replying to nickm:

Replying to mikeperry:

To help me understand the consequences of this (and to figure out if I need to also do anything here for #25573), I have the following questions:

  1. When the dirconn is closed first (such as here or elsewhere in the code), where in the code do we decide to mark/clean up/close the corresponding linked edge conn? I could not find this..

Hmm. I wonder if nothing closes it, actually. Can you think of some experiment we could do to find out?

I am not sure. Do we have a way to trigger this bug?

For #25573, I'm particularly concerned about multihop hsdir dirconns. But if we ever make regular directory activity multihop, this could apply there too. Plus the linked conn close conditions may be relevant to this bug?

I've been digging and I still cannot find anything that closes the edge conn when we mark the dirconn. This is fine for #25573, since it only deals with edge conns, but if we change that, we'll want to update #25573.

For purposes of this bug, I think the only question that remains is if we've exposed handle_response_fetch_microdesc() and router_load_extrainfo_from_string() to partial descriptors before.

I think the dirconn closing issue can be an orthogonal bug.

Last edited 10 months ago by mikeperry (previous) (diff)

comment:9 Changed 10 months ago by mikeperry

Ok I strongly suspect that #8387 is being caused by us not properly closing edge conns for dirconns underneath. We can use that bug for the dirconn/edgeconn piece, I guess.

comment:10 Changed 10 months ago by nickm

Are the issues here reasons to not merge the original patch? Does this still belong in needs_information?

comment:11 Changed 10 months ago by nickm

Keywords: 035-deferred-20180930 added
Milestone: Tor: 0.3.5.x-finalTor: 0.3.6.x-final

Deferring several items from 0.3.5. I think these are features, not bugfixes, and therefore no longer great for 0.3.5. Please let me know if I'm wrong.

comment:12 Changed 8 months ago by nickm

Milestone: Tor: 0.3.6.x-finalTor: 0.4.0.x-final

Tor 0.3.6.x has been renamed to 0.4.0.x.

comment:13 Changed 6 months ago by gaba

Sponsor: Sponsor8-can

comment:14 Changed 6 months ago by nickm

Keywords: 040-deferred-201915 added
Milestone: Tor: 0.4.0.x-finalTor: unspecified

Deferring some tickets from 0.4.0 without proposing them for later. Please tag with 041-proposed if you want to do them.

comment:15 Changed 5 weeks ago by nickm

Owner: nickm deleted
Status: needs_informationassigned

I'm listed as the owner of these needs_information tickets, but I'm not currently doing anything with them. Reassigning to nobody, then putting back in needs_information.

comment:16 Changed 5 weeks ago by nickm

Status: assignedneeds_information
Note: See TracTickets for help on using tickets.