Opened 6 months ago

Closed 3 months ago

Last modified 2 months ago

#32720 closed task (implemented)

How much bandwidth does a user use to bootstrap and maintain dir info? How has that changed over time?

Reported by: arma Owned by: nickm
Priority: Medium Milestone: Tor: 0.4.4.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: prop312-can, network-team-roadmap-2020Q1
Cc: komlo Actual Points:
Parent ID: #33049 Points:
Reviewer: ahf Sponsor: Sponsor55-can

Description

We've gone through a series of iterations on the mechanisms for fetching directory info, culminating in these last two:

  • iteration n-1: "start by fetching a new consensus and all the microdescriptors, and then fetch a new consensus every few hours plus fetch new microdescriptors as needed"
  • iteration n: like that but use consensus diffs when possible for the new consensus documents

(and we compress some of these steps on the wire)

How many actual bytes is this right now? For the current network, how much are we saving with 'iteration n' over 'n-1'? It would be awesome to track this number over time, so we put ourselves in a position to be able to notice when the load gets higher than we expected.

I ask because we have a large org asking us about the bandwidth tradeoffs of adding 7 figures, 8 figures, or 9 figures of users into the Tor network, and all I have as an answer is my intuitive guess of "about a megabyte per user per day".

I cc komlo since maybe they already computed this metric as part of analyzing related work for walking onions.

And I choose the Tor component since the metrics folks will be happy to graph numbers if the network team exports the numbers, but probably they have no plans to calculate things themselves.

Child Tickets

Attachments (1)

download-measure.tar.gz (3.6 KB) - added by nickm 6 months ago.

Download all attachments as: .zip

Change History (23)

comment:1 Changed 6 months ago by nickm

Owner: set to nickm
Status: newaccepted

I've measured this in the silliest way possible: instrumenting the directory code to dump how much it downloads, and then running it for 24-48 hours with a fresh datadirectory.

With a Tor ~0.4.2 client, in late July, I got 2.4 MiB to bootstrap, plus mean 47 KiB per epoch to stay up-to-date.

I'm attaching the tarball of my experiment; I'm happy to re-run this with other versions and settings, so long as they work on the current Tor network.

Changed 6 months ago by nickm

Attachment: download-measure.tar.gz added

comment:2 Changed 6 months ago by arma

Thanks! Are these numbers uncompressed? If so, what are the compressed values, i.e. what bandwidth is actually spent on the wire?

And for extra credit, what's the breakdown between consensus info (consensus + diffs) and descriptor info (microdescs by default)?

comment:3 in reply to:  2 Changed 6 months ago by nickm

Replying to arma:

Thanks! Are these numbers uncompressed? If so, what are the compressed values, i.e. what bandwidth is actually spent on the wire?

No, that's the actual bytes on the wire. See the code and/or the README in the tarball.

And for extra credit, what's the breakdown between consensus info (consensus + diffs) and descriptor info (microdescs by default)?

For that I don't know, but I can tweak it. Later. :)

comment:4 Changed 6 months ago by nickm

I'm starting by trying the following experiments:

  1. master
  2. master with UseMicrodescriptors 0
  3. master with consensus diffs disabled
  4. master with non-zlib compression disabled

I'm using tweaked versions of master in preference to running old versions, since there are likely to be other bugs in the older versions that throw off the results.

In all cases, I'm running with

DormantOnFirstStartup 0 
dormantclienttimeout 1  week
circuitsavailabletimeout 1 day
fetchdirinfoearly 1

so that we try to download everything every hour, and we don't go dormant or decide that we aren't building circuits.

comment:5 Changed 6 months ago by nickm

I've tweaked the heartbeat message to track total downloads by purpose. For example, here's a preliminary result for experiment 1 2:

Dec 12 13:57:22.000 [notice] Heartbeat: Tor's uptime is 1:10 hours, with 5 circuits open. I've sent 982 kB and received 9.22 MB.
Dec 12 13:57:22.000 [notice] While bootstrapping, fetched this many bytes: 
Dec 12 13:57:22.000 [notice]     7725566 (server descriptor fetch)
Dec 12 13:57:22.000 [notice]     445783 (consensus network-status fetch)
Dec 12 13:57:22.000 [notice]     13266 (authority cert fetch)
Dec 12 13:57:22.000 [notice] While not bootsrapping, fetched this many bytes: 
Dec 12 13:57:22.000 [notice]     518089 (server descriptor fetch)
Dec 12 13:57:22.000 [notice]     40155 (consensus network-status fetch)

Please let me know if there is more info you think we need or more experiments I should try.

This code needs cleanup before it could go into master.

Last edited 6 months ago by nickm (previous) (diff)

comment:6 Changed 6 months ago by nickm

I had an early crash for unrelated reasons, but I think these results seem reasonably straightforward.

Experiment 1: (master. Bootstrapped, then updated 6 times.)

Dec 12 18:29:22.000 [notice] Heartbeat: Tor's uptime is 6:10 hours, with 5 circuits open. I've sent 1.63 MB and received 4.01 MB.
Dec 12 18:29:22.000 [notice] While bootstrapping, fetched this many bytes: 
Dec 12 18:29:22.000 [notice]     516749 (consensus network-status fetch)
Dec 12 18:29:22.000 [notice]     13402 (authority cert fetch)
Dec 12 18:29:22.000 [notice]     1824780 (microdescriptor fetch)
Dec 12 18:29:22.000 [notice] While not bootsrapping, fetched this many bytes: 
Dec 12 18:29:22.000 [notice]     212877 (consensus network-status fetch)
Dec 12 18:29:22.000 [notice]     108170 (microdescriptor fetch)

Experiment 2: (master, UseMicrodescriptors 0. Bootstrapped, then updated 6 times.)

Dec 12 18:37:22.000 [notice] Heartbeat: Tor's uptime is 5:50 hours, with 5 circuits open. I've sent 1.94 MB and received 13.02 MB.
Dec 12 18:37:22.000 [notice] While bootstrapping, fetched this many bytes: 
Dec 12 18:37:22.000 [notice]     7725566 (server descriptor fetch)
Dec 12 18:37:22.000 [notice]     445783 (consensus network-status fetch)
Dec 12 18:37:22.000 [notice]     13266 (authority cert fetch)
Dec 12 18:37:22.000 [notice] While not bootsrapping, fetched this many bytes: 
Dec 12 18:37:22.000 [notice]     3209765 (server descriptor fetch)
Dec 12 18:37:22.000 [notice]     255740 (consensus network-status fetch)

Experiment 3 (no diffs. Bootstrapped, then updated 5 times.):

Dec 12 18:34:22.000 [notice] Heartbeat: Tor's uptime is 5:20 hours, with 5 circuits open. I've sent 1.56 MB and received 6.34 MB.
Dec 12 18:34:22.000 [notice] While bootstrapping, fetched this many bytes: 
Dec 12 18:34:22.000 [notice]     517073 (consensus network-status fetch)
Dec 12 18:34:22.000 [notice]     13266 (authority cert fetch)
Dec 12 18:34:22.000 [notice]     1832842 (microdescriptor fetch)
Dec 12 18:34:22.000 [notice] While not bootsrapping, fetched this many bytes: 
Dec 12 18:34:22.000 [notice]     2638812 (consensus network-status fetch)
Dec 12 18:34:22.000 [notice]     94761 (microdescriptor fetch)

Experiment 4 (no zstd or lzma2, only zlib. Bootstrapped, then updated 5 times):

Dec 12 18:37:56.000 [notice] Heartbeat: Tor's uptime is 4:40 hours, with 5 circuits open. I've sent 1.40 MB and received 3.78 MB.
Dec 12 18:37:56.000 [notice] While bootstrapping, fetched this many bytes: 
Dec 12 18:37:56.000 [notice]     572325 (consensus network-status fetch)
Dec 12 18:37:56.000 [notice]     13110 (authority cert fetch)
Dec 12 18:37:56.000 [notice]     1836661 (microdescriptor fetch)
Dec 12 18:37:56.000 [notice] While not bootsrapping, fetched this many bytes: 
Dec 12 18:37:56.000 [notice]     200259 (consensus network-status fetch)
Dec 12 18:37:56.000 [notice]     84658 (microdescriptor fetch)

comment:7 Changed 3 months ago by teor

Keywords: prop312-can added
Parent ID: #33049
Sponsor: Sponsor55-can

Hi Nick,

In proposal 312 for Sponsor 55, I want to know how many outbound directory connections a relay makes every hour:
https://gitweb.torproject.org/torspec.git/tree/proposals/312-relay-auto-ipv6-addr.txt#n429

(Other Sponsor 55 proposals might need similar counts, and I know proposal 306 wants to know the number of client directory requests.)

Can we turn this code into a mergable tor heartbeat log line?

If this change will take a lot of work, we could get addresses from NETINFO cells instead. (But we'd still need some way of making some OR connections over IPv6. And the simplest way to do that on a relay is outbound directory requests.)

comment:8 Changed 3 months ago by teor

See also #26578 and #25210.

comment:9 Changed 3 months ago by nickm

Status: acceptedneeds_review

I found my original branch; it was called measure_dl. I've rebased it and cleaned it up a little as ticket32720. I've made an example PR as https://github.com/torproject/tor/pull/1750

It breaks the tests in test_status.c by adding a new heartbeat section; it would need at least new tests before it could be merged. Perhaps there should be an option to manage whether these stats are logged or not. What do you think?

Also, would you have time to make these changes, or should I put it on my queue?

comment:10 in reply to:  9 Changed 3 months ago by teor

Replying to nickm:

I found my original branch; it was called measure_dl. I've rebased it and cleaned it up a little as ticket32720. I've made an example PR as https://github.com/torproject/tor/pull/1750

It breaks the tests in test_status.c by adding a new heartbeat section; it would need at least new tests before it could be merged. Perhaps there should be an option to manage whether these stats are logged or not. What do you think?

I don't think an option is necessary.
What's the risk?
Should these statistics be enabled on clients by default?

Also, would you have time to make these changes, or should I put it on my queue?

I don't know if I'll have time to make these changes. I won't be sure until I start Sponsor 55 Objective 1.2 (Auto Relay IPv6 Addresses).

I really just need a once-off set of figures, so if it's a lot of work, I could just do a grep -c on the right tor logs.

If it's not much work, I'd really appreciate you doing it :-)

comment:11 Changed 3 months ago by teor

(The patch seems fine to me, too.)

comment:12 Changed 3 months ago by nickm

Status: needs_reviewaccepted

comment:13 Changed 3 months ago by nickm

Status: acceptedneeds_revision

comment:14 Changed 3 months ago by nickm

Milestone: Tor: 0.4.4.x-final

okay, I'll try to get the tests fixed up.

The rationale for making this an option here is twofold:

1) First, that the logs are a bit verbose
2) Second, this code will log the exact number of bytes you fetch in hidden service descriptors, which maybe you don't want in your logs, since it could conceivably reveal information about which services you'd visited.

One options for solving the second issue could be just not to record or log the numbers for directory purposes that use anonymous connections.

comment:15 Changed 3 months ago by teor

I think we should put anonymous directory purposes behind "SafeLogging 0".

I'm also happy with a specific option to enable all these logs.

comment:16 Changed 3 months ago by nickm

I have the tests passing and safelogging respected. I'll put this in needs_review once CI passes.

comment:17 Changed 3 months ago by nickm

Status: needs_revisionneeds_review

Travis is passing now. The appveyor error seems to be referring to a version of the code, which has me confused about what appveyor thinks it's doing.

comment:18 Changed 3 months ago by dgoulet

Reviewer: ahf

comment:19 Changed 3 months ago by ahf

Status: needs_reviewmerge_ready

This looks good. Interesting to get these numbers. Good idea with hiding purposes that needs anonymity behind the SafeLogging flag.

comment:20 Changed 3 months ago by nickm

Resolution: implemented
Status: merge_readyclosed

Merged to master!

comment:21 Changed 2 months ago by arma

I opened #33651 with what I suspect might be a bug on this code.

comment:22 Changed 2 months ago by gaba

Keywords: network-team-roadmap-2020Q1 added

Add all the tickets from sponsor 55 that are done and being worked on to the keyword #network-team-roadmap-2020Q1 so I can look at them in the wiki page...

Note: See TracTickets for help on using tickets.