Opened 6 weeks ago

Last modified 5 days ago

#29787 new enhancement

Enumerate possible failure cases and include failure information in .tpf output

Reported by: karsten Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Onionperf Version:
Severity: Normal Keywords:
Cc: metrics-team, acute Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Our current model for distinguishing failures, timeouts, and successes is rather simple/arbitrary/confusing:

  • Timeout: We count any measurement with DIDTIMEOUT=1 and/or with DATACOMPLETE<1 as timeout.
  • Failure: We count any measurement that doesn't have DIDTIMEOUT=1 and that has DATACOMPLETE>=1 and READBYTES<FILESIZE as failure.
  • Success: We count everything else as success.

We're plotting timeouts and failures here. This is not as useful as it could be.

It would be so much better to enumerate all possible failure cases and include failure information in .tpf output files. Examples:

  • Turns out that long-running tor instances sometimes fail to keep up-to-date directory information (#29743), and as a result OnionPerf cannot make measurements.
  • Sometimes streams are closed with reason TIMEOUT or DESTROY (pages 1 and 2 of the first attachment on #29744), and I bet there are several subcases in each of these.
  • Regarding timeouts, it does happen that streams are closed by OnionPerf (pages 3 and 4 of that same attachment on #29744).
  • There are likely more failure cases that might be less frequent that I either did not include them in the #29744 graphs or did not even run into them at all in the logs I looked at.

Can we enumerate all or at least the most typical failure cases and define criteria for clearly distinguishing them from each other and from timeouts and from successes?

Can we also try to unambiguously identify these failure cases in existing tor/torctl/tgen logs that we process for .tpf files, so that we could include failure case IDs for them in the .tpf files?

Child Tickets

Change History (19)

comment:1 Changed 6 weeks ago by acute

Worth noting that the OnionPerf json output records the following error codes when a transfer fails, which come directly form TGen:

shd-tgen-transfer.c: case TGEN_XFER_ERR_NONE
shd-tgen-transfer.c: case TGEN_XFER_ERR_AUTH
shd-tgen-transfer.c: case TGEN_XFER_ERR_READ
shd-tgen-transfer.c: case TGEN_XFER_ERR_WRITE
shd-tgen-transfer.c: case TGEN_XFER_ERR_TIMEOUT
shd-tgen-transfer.c: case TGEN_XFER_ERR_STALLOUT
shd-tgen-transfer.c: case TGEN_XFER_ERR_PROXY
shd-tgen-transfer.c: case TGEN_XFER_ERR_MISC

We should consider making OnionPerf include the code (NONE, AUTH, READ etc) in .tpf output.
I can also work towards analyzing archived data to see how these codes map to the examples above and classify them. For now, these are my initial thoughts:

  • A 'NONE' error corresponds to a successful transfer
  • A 'TIMEOUT' error is recorded by OP if the TGen client fails to complete a transfer in a certain amount of time (defined in the TGen model graph)
  • A 'STALLOUT' error is recorded by OP if the TGen client fails to transfer any bytes in a certain amount of time (defined in the TGen model graph)
  • However, OP records a 'READ' error if the TGen server times out of a transfer, and possibly due to other failures
  • Streams being destroyed usually result in a 'PROXY' error, these can be further correlated with the torctl logs to define subcases

Will update after digging some more.

comment:2 Changed 6 weeks ago by karsten

Interesting! These sound like they could be very useful to better understand how the application observes failures and timeouts. But as you say, we'll probably have to further correlate these error codes with torctl logs to get a better insight into failures observed by the tor process.

Ideally, we'd have combined error cases/codes for failures/timeouts observed by the application and the tor process, rather than including OnionPerf's error code and then something from torctl events in .tpf output. Or we'll have to specify somewhere which combinations are possible and what exactly they mean. This is probably not trivial to do, but if we don't do it, others will try it and do it slightly wrong, so it's better if we do it.

Can I help with the digging? Maybe we can share the digging work without duplicating the effort?

comment:3 Changed 3 weeks ago by acute

Thank you for your patience with this. The plan is to extend the code at https://github.com/ana-cc/OPvalidator, on the basis that tor and tgen measurement results in the OP json output can be matched up by source port and time - and so we can get a lot more information per measurement, such as combined tor/tgen error codes and full path for each measurement. This is half a day's worth of work.

The next step after writing code is to analyze some logs where we have an idea of what caused a failure. I will need some logs where we observed failure cases - do you have these?

comment:4 Changed 3 weeks ago by karsten

Hi acute!

Your idea to extend that code that matches tgen logs and tor control port event logs sounds interesting. Is that going to replace OnionPerf's analysis.py? If yes, why don't you extend or replace that code rather than start a new code base?

However, I wonder if we could start simpler here by simply looking at the tgen logs alone:

  1. For an initial classification of failure cases it might be sufficient to learn when a request fails and how. Like, in which request phase does a request fail and how much time has elapsed up to that point? Maybe the tgen logs also tell us how a request failed, that is, whether the tor process sent an error or tgen ran into a timeout or stallout (even though we're setting stallout high enough that this is currently not the case) or checksum error or whatever. It would be good to know what fraction of requests succeeded and what fractions failed at the various request stages. This is all based on tgen information, which is the application point of view that treats tor as a black box.
  1. The next step after that, for me, would be to match tgen logs with tor control port event logs. I wonder why we'd be using the source port for this. Is that to handle potentially overlapping requests? Do we handle cases where a source port is re-used over the day, by including time? And what do we do if no corresponding source port is found in the other log file, or is that scenario unrealistic/impossible? In short, this sounds complicated and potentially error-prone. Maybe we could simplify this by doing the matching solely based on timing information? And do you think we could also match tor logs (not control port events) by using the same timing information? Assuming that there's anything interesting in these logs.

Sadly, the weekend is almost here and I likely won't be able to spend much time on this analysis over the weekend. But if I find time, I'll start by reading tgen logs and writing little helper tools to classify failure cases solely based on tgen logs. I'll share measurement identifiers of some sort for failure cases as I find them.

Thanks!

comment:5 Changed 2 weeks ago by karsten

Please disregard my suggestion to drop the source port entirely. I hadn't thought of requests being started every 5 minutes and running for up to 60 minutes. I still had the torperf model in mind where 50 KiB requests would be started every 5 minutes and killed 5 seconds before the next request. Same with 1 MiB/5 MiB requests with different timings. So, this suggestion didn't really make sense. Sorry for the possible confusion.

In the spirit of release early release often, I'm sharing some early results here:

$ grep "_changeState" onionperf_2019-0*.tgen.log | sed 's/.*moving from state //' | sed 's/ to state />/' | tr '\n' ';' | sed 's/CONNECT/_CONNECT/g' | tr '_' '\n' | sort | uniq -c | sort -nr
18476 CONNECT>INIT;INIT>CHOICE;CHOICE>REQUEST;REQUEST>RESPONSEA;RESPONSEA>RESPONSEB;RESPONSEB>RESPONSEC;RESPONSEC>SUCCESS;COMMAND>RESPONSE;RESPONSE>PAYLOAD;PAYLOAD>CHECKSUM;CHECKSUM>SUCCESS;
  81 CONNECT>INIT;INIT>CHOICE;CHOICE>REQUEST;REQUEST>RESPONSEA;RESPONSEA>ERROR;COMMAND>ERROR;
   5 CONNECT>INIT;INIT>CHOICE;CHOICE>REQUEST;REQUEST>RESPONSEA;RESPONSEA>RESPONSEB;RESPONSEB>RESPONSEC;RESPONSEC>SUCCESS;COMMAND>RESPONSE;RESPONSE>PAYLOAD;
   4 CONNECT>INIT;INIT>CHOICE;CHOICE>REQUEST;REQUEST>RESPONSEA;RESPONSEA>RESPONSEB;RESPONSEB>RESPONSEC;RESPONSEC>SUCCESS;COMMAND>RESPONSE;RESPONSE>PAYLOAD;SUCCESS>ERROR;PAYLOAD>ERROR;
   3 CONNECT>INIT;INIT>CHOICE;CHOICE>REQUEST;REQUEST>RESPONSEA;RESPONSEA>RESPONSEB;RESPONSEB>RESPONSEC;RESPONSEC>SUCCESS;COMMAND>RESPONSE;SUCCESS>ERROR;RESPONSE>ERROR;
   3 CONNECT>INIT;INIT>CHOICE;CHOICE>REQUEST;REQUEST>RESPONSEA;RESPONSEA>RESPONSEB;RESPONSEB>RESPONSEC;RESPONSEC>SUCCESS;COMMAND>RESPONSE;RESPONSE>ERROR;
   2 CONNECT>INIT;INIT>CHOICE;CHOICE>REQUEST;REQUEST>RESPONSEA;RESPONSEA>RESPONSEB;RESPONSEB>RESPONSEC;RESPONSEC>SUCCESS;COMMAND>RESPONSE;RESPONSE>PAYLOAD;PAYLOAD>CHECKSUM;CHECKSUM>SUCCESS;SUCCESS>ERROR;PAYLOAD>ERROR;
   2 CONNECT>INIT;INIT>CHOICE;CHOICE>REQUEST;REQUEST>RESPONSEA;RESPONSEA>RESPONSEB;RESPONSEB>RESPONSEC;RESPONSEC>SUCCESS;COMMAND>RESPONSE;RESPONSE>PAYLOAD;PAYLOAD>CHECKSUM;CHECKSUM>SUCCESS;PAYLOAD>ERROR;
   1 CONNECT>INIT;INIT>CHOICE;CHOICE>REQUEST;REQUEST>RESPONSEA;RESPONSEA>RESPONSEB;RESPONSEB>RESPONSEC;RESPONSEC>SUCCESS;COMMAND>RESPONSE;RESPONSE>PAYLOAD;PAYLOAD>CHECKSUM;CHECKSUM>SUCCESS;PAYLOAD>CHECKSUM;CHECKSUM>SUCCESS;
   1 

Explanation: These are the state changes for >18k OnionPerf runs performed by op-ab since mid-January. Each measurement starts with CONNECT, followed by a state change to INIT, then CHOICE, and so on. The first line contains the successful runs, whereas most of the following lines contain errors.

I need to take a break now, but after the break I'll look more at those lines except for the first to see what happens. I hope to find out why these measurements broke. Maybe I'll have to look at tor controller event logs or tor logs. Still, looking at 101 cases sounds manually sounds like it could work. We'll see!

comment:6 Changed 2 weeks ago by acute

It appears the tgen states are recorded by onionperf in the json output:

RESPONSEA;RESPONSEA>RESPONSEB;RESPONSEB>RESPONSEC;RESPONSEC>SUCCESS;COMMAND>RESPONSE;RESPONSE>PAYLOAD;PAYLOAD>CHECKSUM;CHECKSUM>SUCCESS;

...corresponds to:

    "proxy_choice": 0.003267, //CHOICE
    "proxy_init": 0.000639,   //INIT
    "proxy_request": 0.007366, //REQUEST
    "proxy_response": 1.042777, //RESPONSE A/B/C
    "response": 1.246581, //RESPONSE
    "payload_progress" //PAYLOAD
    "socket_connect": 0.000555, //CONNECT
    "checksum": 13.956267, //CHECKSUM
    "command": 10.765423, //COMMAND
    [...]

Where there is an error that stops the flow of state changes, the json will only contain the states that succeeded. My theory is that we can infer the phase where a transfer failed directly from the json output of OnionPerf. I've been working on a script that produces a *.csv output with all the information from the json for all onionperf errors. You can see a sample *csv file here (can't attach as trac thinks it's spam):

To test the theory above, I suggest we compare your results with the results of my python script over the same data. Here is the python script you can run over the same data you analyzed to see if we can get the same results (i.e same phase where the transfer errored). The *csv records other information also. You need python3 and python3-pandas installed for the script to run.

python3 op_errors.py --path folder-with-json-files

[ Also the script matches whether there was an error recorded in tor stream for each transfer. Matching is done on the port+timestamp. So far I can see only onionperf 'proxy' errors are directly linked to tor errors...let me know if you need more info on what the columns in the csv are]

comment:7 Changed 2 weeks ago by acute

Updated the script and csv file to actually show tor circuit recorded errors and fixed a column name - previous version did not dump this info. Please use this version for comparison. Note that column 'state failed' records the first state that was NOT found in the json. Use the same links to download.

comment:8 Changed 13 days ago by karsten

Hmm, I'm running into this issue on a fresh Debian Stretch with python3-pandas installed:

$ python3 op_errors.py --path jsonfiles/
Traceback (most recent call last):
  File "op_errors.py", line 5, in <module>
    import utils
ImportError: No module named 'utils'

comment:9 in reply to:  8 Changed 13 days ago by acute

Replying to karsten:

Hmm, I'm running into this issue on a fresh Debian Stretch with python3-pandas installed:

$ python3 op_errors.py --path jsonfiles/
Traceback (most recent call last):
  File "op_errors.py", line 5, in <module>
    import utils
ImportError: No module named 'utils'

Thanks for spotting this, delete the lines that say 'import utils, import logging, import re' initially the script was using those, but I simplified a lot and don't need the libraries anymore. I will update on my end and push to git.

comment:10 Changed 13 days ago by karsten

Now, when I remove those three lines and re-run the script, I get the following error:

$ python3 op_errors.py --path jsonfiles/
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pandas/indexes/base.py", line 2134, in get_loc
    return self._engine.get_loc(key)
  File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
  File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
  File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'failure_reason_local_y'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "op_errors.py", line 91, in <module>
    main()
  File "op_errors.py", line 85, in main
    all_csv_data["failure_reason_local"] = all_csv_data["failure_reason_local_y"]
  File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 2059, in __getitem__
    return self._getitem_column(key)
  File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 2066, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 1386, in _get_item_cache
    values = self._data.get(item)
  File "/usr/lib/python3/dist-packages/pandas/core/internals.py", line 3543, in get
    loc = self.items.get_loc(item)
  File "/usr/lib/python3/dist-packages/pandas/indexes/base.py", line 2136, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
  File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
  File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'failure_reason_local_y'

This error seems to go away if I unxz the single file contained in my jsonfiles/ directory. It would be nice if the script either handles xz-compressed files or ignores them (maybe after printing out a warning) and complains about not having any not-xz-compressed input files to process.

However, after unxz-ing the input file, I get this error:

$ python3 op_errors.py --path jsonfiles/
Traceback (most recent call last):
  File "op_errors.py", line 91, in <module>
    main()
  File "op_errors.py", line 53, in main
    tgen_dfObj = pd.DataFrame.from_dict(tgen_data, orient="index", columns= ["endpoint_local", "error_code", "endpoint_remote", "is_error", "total_bytes_read", "transfer_id", "unix_ts_end", "unix_ts_start", "elapsed_seconds"])
TypeError: from_dict() got an unexpected keyword argument 'columns'

Which Git repository is that where you're developing this script? I'd like to pull from that rather than editing the script file I have here.

Thanks!

comment:11 Changed 13 days ago by acute

Hi, this has been pushed to https://github.com/ana-cc/OPvalidator.
Ok, so my development maching runs testing, the issue is your version of pandas is older.
I've pushed a version that keeps compatibility with version 0.19 (the one in stretch).

comment:12 Changed 12 days ago by karsten

Alright, I got your script running. Yay!

I also improved my initial analysis a few days ago by first reordering tgen log lines by transfer and then looking at state change sequences.

A quick comparison of your script and my result shows that your script detects only a subset of 29 of the 95 failures that my updated script detects.

Here are the sequences found by my script together with their frequencies:

   81 OP_UNKNOWN_1 [transport CONNECT INIT, transport INIT CHOICE, transport CHOICE REQUEST, transport REQUEST RESPONSEA, transport RESPONSEA ERROR, transport_error NONE STATUS, transfer COMMAND ERROR, transfer_error NONE PROXY]
   6 OP_UNKNOWN_2 [transport CONNECT INIT, transport INIT CHOICE, transport CHOICE REQUEST, transport REQUEST RESPONSEA, transport RESPONSEA RESPONSEB, transport RESPONSEB RESPONSEC, transport RESPONSEC SUCCESS, transfer COMMAND RESPONSE, transfer RESPONSE PAYLOAD, transport SUCCESS ERROR, transport_error NONE READ, transfer PAYLOAD ERROR, transfer_error NONE READ]
   3 OP_UNKNOWN_3 [transport CONNECT INIT, transport INIT CHOICE, transport CHOICE REQUEST, transport REQUEST RESPONSEA, transport RESPONSEA RESPONSEB, transport RESPONSEB RESPONSEC, transport RESPONSEC SUCCESS, transfer COMMAND RESPONSE, transport SUCCESS ERROR, transport_error NONE READ, transfer RESPONSE ERROR, transfer_error NONE AUTH]
   2 OP_UNKNOWN_4 [transport CONNECT INIT, transport INIT CHOICE, transport CHOICE REQUEST, transport REQUEST RESPONSEA, transport RESPONSEA RESPONSEB, transport RESPONSEB RESPONSEC, transport RESPONSEC SUCCESS, transfer COMMAND RESPONSE, transfer RESPONSE ERROR, transfer_error NONE AUTH]
   2 OP_TIMEOUT_5 [transport CONNECT INIT, transport INIT CHOICE, transport CHOICE REQUEST, transport REQUEST RESPONSEA, transport RESPONSEA RESPONSEB, transport RESPONSEB RESPONSEC, transport RESPONSEC SUCCESS, transfer COMMAND RESPONSE, transfer RESPONSE PAYLOAD, transfer PAYLOAD ERROR, transfer_error NONE TIMEOUT]
   1 OP_TIMEOUT_6 [transport CONNECT INIT, transport INIT CHOICE, transport CHOICE REQUEST, transport REQUEST RESPONSEA, transport RESPONSEA RESPONSEB, transport RESPONSEB RESPONSEC, transport RESPONSEC SUCCESS, transfer COMMAND RESPONSE, transfer RESPONSE ERROR, transfer_error NONE TIMEOUT]

Your script finds:

 16 OP_UNKNOWN_1 (missing 65!)
  6 OP_UNKNOWN 2
  3 OP_UNKNOWN_3
  2 OP_UNKNOWN_4
  1 OP_TIMEOUT_5 (missing 1!)
  1 OP_TIMEOUT_6

Here are the missing transfer IDs:

transfer50k:11585 transfer50k:6186 transfer50k:563 transfer1m:3946 transfer1m:6079 transfer50k:10503 transfer50k:10594 transfer50k:10975 transfer50k:11040 transfer50k:11225 transfer50k:1154 transfer50k:11586 transfer50k:11587 transfer50k:11771 transfer50k:11888 transfer50k:12189 transfer50k:12475 transfer50k:12680 transfer50k:12723 transfer50k:1301 transfer50k:13433 transfer50k:13478 transfer50k:15107 transfer50k:15893 transfer50k:15902 transfer50k:16012 transfer50k:17344 transfer50k:17724 transfer50k:17791 transfer50k:18207 transfer50k:18257 transfer50k:18449 transfer50k:2460 transfer50k:2473 transfer50k:2903 transfer50k:3201 transfer50k:3310 transfer50k:3844 transfer50k:4601 transfer50k:4685 transfer50k:5067 transfer50k:5665 transfer50k:5754 transfer50k:581 transfer50k:5907 transfer50k:5912 transfer50k:6468 transfer50k:6551 transfer50k:6738 transfer50k:6886 transfer50k:7089 transfer50k:7266 transfer50k:7481 transfer50k:7503 transfer50k:7586 transfer50k:8200 transfer50k:8259 transfer50k:8277 transfer5m:14290 transfer5m:14570 transfer5m:2020 transfer5m:5208 transfer5m:5394 transfer5m:5422 transfer5m:6319 (these are the 65 OP_UNKNOWN_1 cases)

transfer5m:572 (this is the 1 missing OP_TIMEOUT_5 case)

Would you want to take a look why your script does not detect these failures?

comment:13 Changed 11 days ago by acute

The script only showed errors where it was correlated with a circuit id (an inner join in other words). I've updated it so it outputs all errors in the csv (eg I find a total of 123 errors between 10/01 and 10/04). Thanks for spotting this, the updated version is on git!

comment:14 Changed 11 days ago by karsten

Okay, that adds a lot more of the cases I'm seeing. Still, 3 are missing:

OP_TIMEOUT_5: transfer5m,572
OP_UNKNOWN_1: transfer50k,563 and transfer50k,581

Any idea why these are still missing?

(Note how we'll later be able to re-use these fixes to your script to further classify the failures! So even if this bug hunting might seem annoying at the moment, we might benefit from it later on.)

comment:15 in reply to:  14 Changed 11 days ago by acute

Replying to karsten:

Okay, that adds a lot more of the cases I'm seeing. Still, 3 are missing:

OP_TIMEOUT_5: transfer5m,572
OP_UNKNOWN_1: transfer50k,563 and transfer50k,581

Any idea why these are still missing?

I can't actually seem to find these in any of my json files or tgen logs just by grepping - could you tell me which tor log file this came from by any chance?

comment:16 Changed 8 days ago by karsten

Looks like they are all contained in the same file. Are you missing that one maybe?

$ grep -m1 "transfer5m,572," *.log
onionperf_2019-01-13_23:59:59.tgen.log:2019-01-12 11:33:05 1547292785.630821 [info] [shd-tgen-transfer.c:363] [_tgentransfer_changeState] transfer transfer5m,572,op-ab,GET,5242880,(null),0,state=COMMAND,error=NONE moving from state COMMAND to state RESPONSE
$ grep -m1 "transfer50k,563," *.log
onionperf_2019-01-13_23:59:59.tgen.log:2019-01-12 10:47:59 1547290079.082864 [info] [shd-tgen-transfer.c:363] [_tgentransfer_changeState] transfer transfer50k,563,op-ab,GET,51200,(null),0,state=COMMAND,error=NONE moving from state COMMAND to state ERROR
$ grep -m1 "transfer50k,581," *.log
onionperf_2019-01-13_23:59:59.tgen.log:2019-01-12 12:19:58 1547295598.455009 [info] [shd-tgen-transfer.c:363] [_tgentransfer_changeState] transfer transfer50k,581,op-ab,GET,51200,(null),0,state=COMMAND,error=NONE moving from state COMMAND to state ERROR

comment:17 Changed 8 days ago by acute

Ok thank you for that (was indeed missing that day). This was very interesting to debug, and it is not a bug in my script unfortunately:

I downloaded the op-ab hosted tgen and torctl log for 01-13 and recreated the json analysis file from them. It turns out the analysis file on op-ab is different than the one recreated from the same logs on my computer. I believe something has happened and onionperf has been cut short in the middle of the analysis for that particular day (my file is 16MB, op-ab's is 7.7MB). When I recreate any of the other json files from their respective logs (have tried for ~10 other days), the contents and sizes match up.

The good news is, if you run the python script on the json file produced from the logs, all the missing errors are found.

However, this is concerning. I have asked irl to examine op-ab for any clues as to what could have happened at midnight that day. As a first step, am thinking to write a script that rebuilds the analysis from the logs and validates the existing results. What do you think?

To recreate my steps:

  wget https://op-ab.onionperf.torproject.net/tgen-client/log_archive/onionperf_2019-01-13_23%3A59%3A59.tgen.log
  wget https://op-ab.onionperf.torproject.net/tor-client/log_archive/onionperf_2019-01-13_23%3A59%3A59.torctl.log
  onionperf analyze --tgen onionperf_2019-01-13_23:59:59.tgen.log --torctl onionperf_2019-01-13_23:59:59.torctl.log
  unxz onionperf.analysis.json.xz                    
  wget https://op-ab.onionperf.torproject.net/2019-01-13.onionperf.analysis.json.xz
  unxz 2019-01-13.onionperf.analysis.json.xz                    
  diff onionperf.analysis.json 2019-01-13.onionperf.analysis.json

comment:18 Changed 8 days ago by karsten

Hey, that's cool, thanks for finding that last remaining bug! (Do you think we could move that discussion to another ticket; it seems important to get that fixed, but it's not directly related to figure out in what ways OnionPerf measurements can fail.)

I guess the next step here is trying to find out why those ~100 measurements failed. I'll try to start looking at that this afternoon, but I'd say feel free to do the same in parallel, as time permits. Thanks!

comment:19 in reply to:  18 Changed 5 days ago by acute

Replying to karsten:

Hey, that's cool, thanks for finding that last remaining bug! (Do you think we could move that discussion to another ticket; it seems important to get that fixed, but it's not directly related to figure out in what ways OnionPerf measurements can fail.)

This issue has been filed under bug #30238.

Note: See TracTickets for help on using tickets.