We have a proposed patch in #4712 (moved) for proposal 182 (parent ticket #4682 (moved)). The patch is missing some pieces (for example it appears to do the wrong thing when RelayBandwidthRate is set), but I think under constrained circumstances a simulation should still give us some intuition about whether the patch is on the right track assuming the issues in #5334 (moved) are non-issues.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
While I'm at it: Rob/Kevin, when you set the BandwidthRate and BandwidthBurst for your simulated relays, do you pick the smallest number out of the descriptor and set both rate and burst to that number? Or do you pull out both the Rate and the Burst and use them?
I imagine a simulated network that never has any extra space in its token buckets could behave quite differently from the real Tor network (where the fast relays often have significant cushion).
I've attached a first set of results. The Tor model is as described in #4086 (where relay capacities in Shadow are based on their reported observed bandwidth in Tor).
Each of the task{a,b,c} branches were run directly, adding only configs needed for my private test network.
Completed download counts may give us a sense of load on the network.
taska: 9482 320KiB (web), 43 5MiB (bulk)
taskb: 27635 320KiB (web), 188 5MiB (bulk)
taskc: 20076 320KiB (web), 201 5MiB (bulk)
Is there a reason that the taska counts should be so low (I usually expect somewhere in the 20k range for web download counts)? Did something change in a recent version of Tor? Or should I look closer at the logs and rerun taska?
Is there a reason that the taska counts should be so low (I usually expect somewhere in the 20k range for web download counts)? Did something change in a recent version of Tor? Or should I look closer at the logs and rerun taska?
It looks like the credit caps are reducing overall network load, mostly from the web clients. Bulk load seems to be increasing. The effect seems greater with smaller credit caps.
Are we able to say anything about the patch's effects on latency, memory usage, and whether nodes actually obey their bandwidth limits with the patch in place?
I have the same question as I had for #6341 (moved): both credit cap cases get their last byte faster than vanilla, but they end up doing fewer transfers. What's up with that?
I have the same question as I had for #6341 (moved): both credit cap cases get their last byte faster than vanilla, but they end up doing fewer transfers. What's up with that?
I don't know enough about what the credit cap thing is supposed to be doing here to answer this. Can you give any intuition to whether or not you would expect this to happen given the desired functionality here? And/or can you explain what the patch does briefly?
Also, note that a separate vanilla run was done in #6401 (moved) where the load mostly agrees with the vanilla run here. So is it reasonable to say the patch is causing the behavior?
Are we able to say anything about the patch's effects on latency, memory usage, and whether nodes actually obey their bandwidth limits with the patch in place?
For each Tor node we can track CPU utilization, memory, and input/output bytes (though I may have to clean up some loose ends in this Shadow ticket). I believe this will allow us to address your concerns, but I am not sure what you mean by latency.
I'd have to do additional experiments with this feature turned on for the relays. Is it same to assume this is desired?
Print the heartbeat message every second instead of every minute with $ scallion --heartbeat-frequency=1 …
The heartbeat message will contain the number of bytes each nodes sends and receives per second. Match that up with the relay bandwidth limits to determine if nodes are actually obeying their bandwidth limits. You probably have to either modify the parse() function in analyze.py, or write a new script for this.
The per-node memory tracking is not working yet in Shadow, so we'll only be able to say things about overall memory consumption by looking at the data/dstat.log file.
Print the heartbeat message every second instead of every minute with $ scallion --heartbeat-frequency=1 …
The heartbeat message will contain the number of bytes each nodes sends and receives per second. Match that up with the relay bandwidth limits to determine if nodes are actually obeying their bandwidth limits. You probably have to either modify the parse() function in analyze.py, or write a new script for this.
Done. I wrote my own script and made two graphs: the first graph compares bandwidth rates to median bandwidths, and the second graph compares bandwidth bursts to 99th percentiles. For me it looks like all three branches respect bandwidth rates quite well and do not respect bandwidth bursts as much as they should. I do not see major differences between the three branches. I wonder if there's a better way to visualize this.
The per-node memory tracking is not working yet in Shadow, so we'll only be able to say things about overall memory consumption by looking at the data/dstat.log file.
I have the three dstat.log files. What do I do with them?
Print the heartbeat message every second instead of every minute with $ scallion --heartbeat-frequency=1 …
The heartbeat message will contain the number of bytes each nodes sends and receives per second. Match that up with the relay bandwidth limits to determine if nodes are actually obeying their bandwidth limits. You probably have to either modify the parse() function in analyze.py, or write a new script for this.
Done. I wrote my own script and made two graphs: the first graph compares bandwidth rates to median bandwidths, and the second graph compares bandwidth bursts to 99th percentiles. For me it looks like all three branches respect bandwidth rates quite well and do not respect bandwidth bursts as much as they should. I do not see major differences between the three branches. I wonder if there's a better way to visualize this.
It may make sense that the amount sent on the wire is slightly more than the bandwidth 99th percentile bandwidth sent in Tor ( b/c control packets, packet header overheads, etc, are included in the amount sent on the wire but not in Tor's limits).
The per-node memory tracking is not working yet in Shadow, so we'll only be able to say things about overall memory consumption by looking at the data/dstat.log file.
I have the three dstat.log files. What do I do with them?
I believe the first few lines contain header info that explains the format of the csv. One of the columns has a timestamp and another has the system memory usage. You should be able to draw a memory-over-time plots with those two columns, and compare each branch in the same graph. (note that this is total system memory usage, so this would only work if nothing else is consuming memory on these machines - which should be the case if you used EC2)
Print the heartbeat message every second instead of every minute with $ scallion --heartbeat-frequency=1 …
The heartbeat message will contain the number of bytes each nodes sends and receives per second. Match that up with the relay bandwidth limits to determine if nodes are actually obeying their bandwidth limits. You probably have to either modify the parse() function in analyze.py, or write a new script for this.
Done.
Also, can you attach the performance graphs for this set of runs?
It may make sense that the amount sent on the wire is slightly more than the bandwidth 99th percentile bandwidth sent in Tor ( b/c control packets, packet header overheads, etc, are included in the amount sent on the wire but not in Tor's limits).
Makes sense. I attached another graph that shows cumulative fractions of the differences between 99th percentile and bandwidth burst. That graph shows that there's hardly any difference between the three branches.
I believe the first few lines contain header info that explains the format of the csv. One of the columns has a timestamp and another has the system memory usage. You should be able to draw a memory-over-time plots with those two columns, and compare each branch in the same graph. (note that this is total system memory usage, so this would only work if nothing else is consuming memory on these machines - which should be the case if you used EC2)
Okay, I attached a graph for system memory usage, too. All three branches were run in newly created EC2 instances. I can't spot any difference between the branches.
Also, can you attach the performance graphs for this set of runs?
I didn't make any performance graphs yet. Making them now. Will attach them once I have them.
Also, can you attach the performance graphs for this set of runs?
I didn't make any performance graphs yet. Making them now. Will attach them once I have them.
I remain skeptical about the results though -- not because I think they're wrong, but I think because we don't have a good handle on what exactly is going wrong.
In particular, I wonder if further answers to #5398 (moved) would change our opinion here.
But this ticket does answer the "does it break or obviously go bad" question with a negative. Closing.
Trac: Status: needs_information to closed Resolution: N/Ato implemented