Opened 12 months ago

Last modified 7 months ago

#27248 new defect

Can we make our node-related structures more efficient?

Reported by: nickm Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: 035-roadmap-master, 035-triaged-in-20180711
Cc: catalyst Actual Points:
Parent ID: #27243 Points:
Reviewer: Sponsor:

Description

Our profiles say that we're using a lot of RAM for our directory stuff. Some of that is for silliness like keeping text of directory stuff in RAM (yuck), but surely some of it is due to all our node_t/microdesc_t/routerstatus_t stuff. Can we make that smaller?

Child Tickets

TicketStatusOwnerSummaryComponent
#7174newRefactor node_t and router lists to make it easier to identify bugsCore Tor/Tor
#22739needs_revisionMake routerinfo_t and routerstatus_t addresses immutable; store overrides in node_tCore Tor/Tor
#23975assignedMake node_get_pref_ipv6_orport() check addresses in the right orderCore Tor/Tor
#27284assignedCheck IPv6 exit policies on microdescsCore Tor/Tor

Change History (10)

comment:1 Changed 12 months ago by teor

We can:

  • use microdesc_t, routerstatus_t, and routerinfo_t to fill in the fields in the node, and
  • remove microdesc_t, routerstatus_t, and routerinfo_t from the node.

If we only have one canonical location for node data, we will:

  • save a lot of RAM, and
  • fix some of the bugs that come from having inconsistent data in the node, microdesc, routerstatus, and routerinfo.

But when UseMicrodescriptors changes, we'll have to re-parse the other consensus flavour and descriptors. That's probably ok.

comment:2 in reply to:  1 Changed 12 months ago by dgoulet

Replying to teor:

We can:

  • use microdesc_t, routerstatus_t, and routerinfo_t to fill in the fields in the node, and
  • remove microdesc_t, routerstatus_t, and routerinfo_t from the node.

Tbh, I'm very confused by what you mean here? I assume you are talking about node_t when you say "node"?

The node_t has pointers to those 3 objects, so what would be your idea to save space?

  microdesc_t *md;
  routerinfo_t *ri;
  routerstatus_t *rs;

comment:3 Changed 12 months ago by catalyst

Cc: catalyst added

comment:4 Changed 12 months ago by teor

I am suggesting that Tor keeps each relay field in one place (usually the node_t), rather than up to 4 places (routerstatus_t, microdesc_t, routerinfo_t and node_t).

For example:

There is currently an IPv6 address and port field in routerstatus_t, microdesc_t, and routerinfo_t (and maybe in node_t).

Most clients use microdescs. They currently store IPv6:

  • in the microdesc_t, when using consensus methods 27 and earlier it is the relay's IPv6 address, otherwise it is zero
  • in the routerstatus_t, when using consensus methods 27 and later it is the relay's IPv6 address, otherwise it is zero

Directory authorities and directory mirrors also store the same IPv6 address in the routerinfo_t.

We can put an IPv6 address in the node, and make all of Tor's code use that IPv6 address. (Except maybe on authorities, if they need to know "IPv6 in the descriptor" and "IPv6 in the consensus" at the same time. But they mostly want to know what's in the descriptor.)

For every field in microdesc_t, routerstatus_t, and routerinfo_t we can:

  • put it in the node, or
  • remove it, or
  • decide to parse it as needed.

For routerinfo_t, we can also decide to:

  • not parse relay descriptors on directory mirrors, and
  • keep some authority-only routerinfo fields in a cut-down version of the structure.

But routerinfo_t is less important, because clients don't use it unless they have UseMicrodescriptors 0.

When we put all the microdesc_t and routerstatus_t fields in the node, we can stop allocating any routerstatus_t and microdesc_t. (Or just allocate them on the stack during parsing.)

comment:5 in reply to:  4 Changed 12 months ago by dgoulet

Replying to teor:

I am suggesting that Tor keeps each relay field in one place (usually the node_t), rather than up to 4 places (routerstatus_t, microdesc_t, routerinfo_t and node_t).

+1 on this idea.

I *think* (to be verified), that dirauth might need to have this separation concept in order to publish that information but also to create the consensus. In other words, iirc, when receiving for instance a new microdescriptor from a relay, the code then tries to find differences (cosmetic or not) from the previous one.

So for dirauth, keeping those objects separate makes things I guess easier code wise. Again, not entirely sure.

Anyway, overall, I strongly agree that node_t could keep only the required field for which a lot of them appear in all 3 objects but also I'm guessing some fields are also probably only useful for dirauth.

There is a concept of mutable vs immutable with these objects within a node_t object though but we can figure that part easily and even create a way better/clearer/safer model for that part.

comment:6 Changed 11 months ago by nickm

Milestone: Tor: 0.3.5.x-finalTor: 0.3.6.x-final

Deferring various feature-y things to 0.3.6. If one of these is actually happening in 0.3.5, please let me know!

comment:7 Changed 11 months ago by nickm

Sponsor: Sponsor8-can

comment:8 Changed 10 months ago by nickm

Milestone: Tor: 0.3.6.x-finalTor: 0.4.0.x-final

Tor 0.3.6.x has been renamed to 0.4.0.x.

comment:9 Changed 7 months ago by gaba

Sponsor: Sponsor8-can

comment:10 Changed 7 months ago by nickm

Milestone: Tor: 0.4.0.x-finalTor: unspecified
Note: See TracTickets for help on using tickets.