It would be great to get an ongoing torperf run that's accessing a hidden service, so we can track performance of those over time also.
The simple part here is setting up a hidden service that's probably not the performance bottleneck, and setting up a torperf install as usual.
The complex part here is that the Tor client will cache the hidden service descriptor, and maybe even the rendezvous circuit (in the case of the every-five-minute 50kb fetch), skewing our results.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
A reasonable way to proceed would be to patch your Tor client to react in a new way to a USR2 signal, with two actions: a) flush the rendezvous descriptor cache:
I took my advice and made an arma/bug1944 git branch. If you run this Tor, you can send it a kill -USR2 and it will forget about all its rendezvous stuff.
Setting priority to minor, because we should finish our substantial changes to Torperf first before looking at Torperf for hidden services. Blocking on #2554 (closed).
Trac: Component: Metrics to Torperf Priority: normal to minor Points: N/Ato 3 Owner: karsten toN/A Status: needs_review to assigned
aaronsw, I looked at your branch today, and I have a few questions:
Are you still running your Torperf? If so, can you make the logs available via HTTP, so that we can graph them and put them on the metrics website? If not, would you mind running the Torperfs for hidden services?
Your Torperf dumps Tor's data directory and starts from scratch for every measurement, right? Did you try arma's bug1944 branch? I'd prefer a solution that doesn't download 2000+ relay descriptors for every 50KB measurement.
Why does your Torperf download files directly from torperf.tor2web.org:9081? If you want to compare your data to non-hidden-service download times, we already have data for that.
You define the same SOCKS and control ports for your three file sizes. How does this work for overlapping downloads?
I don't think I did try arma's branch. I can't remember why; probably because I didn't understand it.
It seemed to me that in a fair comparison between hidden and non-hidden, you'd want to control for everything else. Having one server hosted one place do hidden and another server hosted someplace else running different code do non-hidden seems to introduce other variables into the mix. These variables will probably average out over time, but at the time I wanted results right away.
fcntl.lockf(fh, fcntl.LOCK_EX) grabs an exclusive lock before running the download.
I set up three torperfs to a hidden service today and took notes of all substeps. Can you please take a look at these notes to tell me whether I'm measuring the right things, or what parts I should tweak? I also attached a graph with four hours of data.
Tomorrow I'm planning to look into logging more fine-grained events as described on wiki:doc/TorControlPortWalkthrough-HS and the corresponding service-side events. If there's anything in particular you want me to measure, please let me know. Or maybe this is a discussion for #2554 (closed)?
That timestamp is 1970-01-12T14:46:40, and these two calls will only remove descriptors published three days before that timestamp. arma's bug1944 branch contains one more zero, but 10000000 is still 1970-04-26T18:46:40 and much too old. A better number would be 2000000000 which is 2033-05-18T05:33:20. Or we can avoid timestamp maths and just use:
rend_cache_purge();
Similarly, I put in a call for rend_client_purge_last_hid_serv_requests() rather than the suggested directory_clean_last_hid_serv_requests().
Here's the full patch that I'm using now:
diff --git a/src/or/main.c b/src/or/main.cindex 094120f..ad7fa0b 100644--- a/src/or/main.c+++ b/src/or/main.c@@ -2102,9 +2102,13 @@ process_signal(uintptr_t sig) control_event_signal(sig); break; case SIGUSR2:- switch_logs_debug();- log_debug(LD_GENERAL,"Caught USR2, going to loglevel debug. "- "Send HUP to change back.");+ /* Replace the USR2 functionality with bug1944-specific plans.+ * Ugly hack, but it'll do. */+ log_info(LD_GENERAL, "Clearing all rendezvous caches and "+ "forgetting about rendezvous connections...");+ rend_cache_purge();+ rend_client_purge_last_hid_serv_requests();+ circuit_mark_all_dirty_circs_as_unusable(); control_event_signal(sig); break; case SIGHUP:
I made more progress on this, but need to clean up my notes and code before posting them here. Preliminary results are
a simple tor controller that connects to a local tor process and logs events to disk including their exact arrival time and
a parser that reads these event logs and reconstructs successful hidden service requests and substeps from STREAM, CIRC, and CIRC_MINOR events.
Substeps are:
from STREAM NEW: started request
to CIRC|CIRC_MINOR HSCR_CONNECTING: fetched descriptor
to CIRC|CIRC_MINOR HSCR_ESTABLISHED_IDLE: established rendezvous point
to STREAM SENTCONNECT: connected to hidden service
to STREAM SUCCEEDED: connected to webserver
to STREAM CLOSED: completed request
Events that I didn't include yet are
CIRC|CIRC_MINOR HSCI*, because I couldn't reliably reconstruct which introduction circuit was used for a request (edge case: RENDEZVOUS2 arrives before INTRO_ACK),
HS_DESC, because they didn't seem to add much information, and
CIRC|CIRC_MINOR HSSR*, becase I didn't combine event logs from two tor processes and because I'm not sure how interesting this would be anyway.
Some quick stats from 86 requests:
fetchdesc estrend connecths connectapp Min. :0.000175 Min. :0.05247 Min. : 0.3594 Min. :0.1390 1st Qu.:0.674754 1st Qu.:0.11881 1st Qu.: 0.9616 1st Qu.:0.3184 Median :1.013537 Median :0.19153 Median : 1.3313 Median :0.4999 Mean :1.402095 Mean :0.23187 Mean : 2.3325 Mean :0.5026 3rd Qu.:1.981897 3rd Qu.:0.26136 3rd Qu.: 2.0389 3rd Qu.:0.6207 Max. :6.312293 Max. :1.86909 Max. :48.6318 Max. :1.5455 completereq total Min. : 0.7379 Min. : 2.017 1st Qu.: 1.0330 1st Qu.: 3.694 Median : 1.2486 Median : 4.847 Mean : 1.5136 Mean : 5.983 3rd Qu.: 1.5413 3rd Qu.: 6.273 Max. :15.6800 Max. :53.806
That timestamp is 1970-01-12T14:46:40, and these two calls will only remove descriptors published three days before that timestamp. arma's bug1944 branch contains one more zero, but 10000000 is still 1970-04-26T18:46:40 and much too old. A better number would be 2000000000 which is 2033-05-18T05:33:20. Or we can avoid timestamp maths and just use:
{{{
rend_cache_purge();
}}}
Similarly, I put in a call for rend_client_purge_last_hid_serv_requests() rather than the suggested directory_clean_last_hid_serv_requests().
Yes, that seems like a bug indeed.
Replacing rend_cache_clean() with rend_cache_purge() seems like the correct thing to do.
There was also a call to rend_cache_clean_v2_descs_as_dir() but this seems to be used in HSDirs, not in clients. I see you didn't replace that with something in your full patch, which makes sense to me (since we are not HSDirs) but maybe arma was thinking of something?
BTW, I'm not very familiar with this part of the HS code (how IP/RP connections are organized), so I might be wrong.
Great work! Thanks!
Some thoughts from looking at the diagrams:
a) Fetching a descriptor seems to take lots of time: it usually takes more time than actually receiving a 51200 bytes file over an HS circuit. BTW, have you checked the HS_DESC events that get triggered when the descriptor request gets launched and when it gets received? I would be curious to learn why descriptor fetching takes so much time, does it visit multiple HSDirs till it find the right one?
b) As discussed in IRC yesterday, there is a step between "establish RP" and "connect to HS", which is establishing a circuit to the IP. Theoretically, this step should be measurable since we receive an INTRODUCE_ACK cell when it finishes. You told me in IRC, that sometimes you received RENDEZVOUS2 before INTRODUCE_ACK. Maybe the correct behavior in this case is that connection to the IP should be considered as finished when you have received either RENDEZVOUS2 or INTRODUCE_ACK? It's a pity we are missing this information, and it explains why the "connect to HS" step takes lots of time.
c) OTOH, "connect to webserver" as expected doesn't take much time at all. Since the connection to the HS has been established back in "connect to HS".
d) It's good to see that in the case of big files, we spend most of our time in "complete request". It would be very awkward if most of the time was spent in circuit establishment.
However, for small files, it seems that actually fetching the file does not take that much time, and most of the time is spent during "fetch descriptor" and "connect to HS". I think small files are more interesting for the purposes of our research.
Hi asn, thanks for your feedback! I just posted a fourth version of the notes based on your input. Two comments:
Here are some quick stats on failed descriptor fetches: 294 HS_DESC REQUESTED, 289 HS_DESC RECEIVED, 5 HS_DESC FAILED. Looks like failed requests are not the issue in the first substeps. I'd have to take a closer look at the raw event logs to say why this substep takes so long. Feel free to beat me to it.
I added a sixth substep for acknowledging introduction. In the few edge cases when we receive INTRODUCE_ACK after RENDEZVOUS2, that substep and the next are simply omitted in the boxplot diagram. (The request totals are still computed correctly using STREAM NEW and STREAM CLOSED events.)
You told me in IRC, that sometimes you received RENDEZVOUS2 before INTRODUCE_ACK.
This is normal and natural behavior btw -- once you send the ESTABLISH_INTRO cell to the introduction point, there's a race between when you get back the INTRODUCE_ACK from the intro point (saying "yes I sent your introduction request on to the hidden service") and when you get back the RENDEZVOUS2 cell (from the hidden service). I'd guess the intro ack comes back first more often, but it really depends on load inside the Tor network at that time.
Maybe the correct behavior in this case is that connection to the IP should be considered as finished when you have received either RENDEZVOUS2 or INTRODUCE_ACK?
Sounds plausible. I might even suggest leaving out the "do we get an ack" step in the case where we also (eventually) get a rendezvous2 cell, since they aren't ordered events. But tracking them is fine too in case we learn something.
"4. acknowledged introduction: two CIRC or CIRC_MINOR events with hidden service states HSCI_INTRO_SENT and HSCR_ESTABLISHED_WAITING in quick succession indicate a confirmation from the introduction point that it has received the client's introduction;"
This isn't quite true -- it indicates that we've sent the introduction attempt (i.e. an ESTABLISH_INTRO cell). We could get a NACK response from the intro point, meaning "the hidden service you asked me to introduce you to didn't have an intro circuit open to me, sorry". If we want to know that we've gotten confirmation from the introduction point that it has received the client's introduction, we could look for a transition to HS_STATE=HSCI_DONE.
We could get a NACK response from the intro point, meaning "the hidden service you asked me to introduce you to didn't have an intro circuit open to me, sorry".
For added fun, the NACK comes in the form of an RELAY_COMMAND_INTRODUCE_ACK cell that has non-zero length. From the control port you can tell because you get
where it used to be in OLD_HS_STATE=HSCI_INTRO_SENT but now it's in HS_STATE=HSCI_CONNECTING. That's because we just extend that circuit one more hop and try introducing again. But if that circuit already has too many hops, you instead just see a direct transition from HSCI_CONNECTING to CLOSED:
log_debug(LD_GENERAL,"Caught USR2, going to loglevel debug. "
"Send HUP to change back.");
/* Replace the USR2 functionality with bug1944-specific plans.
* Ugly hack, but it'll do. */
log_info(LD_GENERAL, "Clearing all rendezvous caches and "
"forgetting about rendezvous connections...");
rend_cache_purge();
}}}
I bet we want rend_client_purge_state() rather than rend_cache_purge(). That's what we do in the NEWNYM case. (It didn't exist when this ticket was opened.)
{{{
rend_client_purge_last_hid_serv_requests();
}}}
and then you won't need this line
{{{
circuit_mark_all_dirty_circs_as_unusable();
}}}
Hmmm. This line actually isn't quite sufficient -- established rend circs, or intro circs in state intro_sent, won't get dirty so they won't get cleared here. I think this is a bug in newnym too. I'll open a ticket for that.
In any case, I think for most normal situations your current code will do fine.
Argh, my bad, I meant HSCI_DONE, not HSCI_INTRO_SENT. However, this is just a documentation bug, not a bug in the code, because I'm not considering HSCI_* events at all in the code, but only HSCR_* events. The notes should say:
acknowledged introduction: a CIRC or CIRC_MINOR event with hidden service state HSCR_ESTABLISHED_WAITING indicates that the client has received an acknowledgement from the introduction point and has updated its internal state for the rendezvous circuit to expect a connection from the hidden service;
Does that make more sense? If so, I'll update the notes and leave the code unchanged.
Or do you want me to take out that substep and go back to the previous version with just five substeps?
If you think the current code does a good enough job, I'd rather not touch it now. I can set up a second experiment on another EC2 instance tonight if you want me to try out a different patch. Or we can do that when that other ticket you just opened is resolved. Let me know.
Hmmm. This line actually isn't quite sufficient -- established rend circs, or intro circs in state intro_sent, won't get dirty so they won't get cleared here. I think this is a bug in newnym too. I'll open a ticket for that.
I now think I am wrong here. Move along; nothing to see here.
Here is a breakdown of connection steps from client to HS:
- Fetch descriptor
- Alice establishes RP
- Alice sends INTRODUCE1
- Alice receives INTRODUCE_ACK / Alice receives RENDEZVOUS2
- Connect to HS (Alice sent RELAY_COMMAND_BEGIN cell)
- Connect to webserver (Alice sent HTTP GET /)
- Complete request (how can we break this down even more?)
It seems that our measurements are missing the Alice sends INTRODUCE1 step, which would allow us to break down the acknowledge introduction step even more.
Now here is a breakdown from the HS to the client:
- Bob received INTRODUCE2 from IP
- Bob connected to RP (Bob sent RENDEZVOUS1)
- Bob received RELAY_COMMAND_BEGIN
- Bob received application data
I wonder if we can make a test that collects both Client->HS measurements and HS->Client measurements and then combines them somehow. That would allow us to get even finer granularity in our measurements.
BTW, the fact that the complete request step has an outlier at 800+ seconds stretches the graph, and doesn't allow us to look at the other events in good detail. If we are mainly interested in circuit establishment performance, maybe we should only test small files anyway, so that the graphs are spread less.
How about we stop the current set of measurements, archive the results, tweak the scripts to analyze them and make plans for making future results more useful, and then start another round of measurements? Unless there's a benefit in collecting more data in the current setup?
Want to make the analysis script better for finer granularity? Note that I left out events coming from the introduction circuit on purpose: we'd have to match rendezvous circuit and introduction circuit somehow, and there's no single unique identifier that we could use for that. But please see the Python script in my Markdown file for details.
Speaking of unique request identifiers and hidden-service-side measurements: do you think we could link events using hidden service subdomains? I was thinking randomnumber.ourservice.onion or timestamp.ourservice.onion. Or maybe there's a better way?
And yes, in the future we can set up a single Torperf instance that requests 50 KiB files. Well, in theory, we could do 1 KiB files if we really, really only care about connection establishment. What do you think?
I fixed the graphs on the server by zooming into range [0, 300].
How about we stop the current set of measurements, archive the results, tweak the scripts to analyze them and make plans for making future results more useful, and then start another round of measurements? Unless there's a benefit in collecting more data in the current setup?
How about we stop the current set of measurements, archive the results, tweak the scripts to analyze them and make plans for making future results more useful, and then start another round of measurements? Unless there's a benefit in collecting more data in the current setup?