Björn Scheuermann and Florian Tschorsch of Uni Düsseldorf want to know what fraction of connections are used bidirectionally. They suggested to count read and written bytes per connection in 10-second intervals and classify connections as "below threshold", "mostly reading", "mostly writing", and "both reading and writing".
I implemented this new metric in branch bidistats in my public repository.
Here are the results of my test relay with a reduced report interval of 2 hours (the actual implementation reports 24 hour intervals):
These numbers imply that 530+1799 of 338+496+530+944+1301+1799, or 43.1% of all connections are used bidirectionally.
An open question is whether we should distinguish between connections to other relays and to clients. I wonder if there's an easy way to tell the two connection types apart.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
What's the motivation on this one? And why 10 seconds?
There's probably a better name than ConnStatistics here; lots of our other statistics could fairly be called connection statistics. ConnDirectionStats? ConBidiStats?
BIDI_THRESHOLD and BIDI_INTERVAL and BIDI_FACTOR are uncommented.
Doing a round trip to disk here feels a little kludgy; it could probably use unit tests too.
Looking at the code that generates the extrainfo, it looks like if we leave ConnStatistics on for a while, then turn if off for a year, then turn it on again, there could be a window when the old ConnStatistics from a year ago get written in the extrainfo. Is that true? If so, is that really what we want?
In any case, it looks like a new feature; is there a good reason to add it to 0.2.2.x rather than waiting for 0.2.3.x?
What's the motivation on this one? And why 10 seconds?
Björn's and Florian's assumption is that bidirectional use of connections screws Tor's performance. For a more general design discussion of the motivation and the parameters, please ask them by replying to my or-dev message from July 12:
There's probably a better name than ConnStatistics here; lots of our other statistics could fairly be called connection statistics. ConnDirectionStats? ConBidiStats?
Going with ConnDirectionStatistics.
BIDI_THRESHOLD and BIDI_INTERVAL and BIDI_FACTOR are uncommented.
Done.
Doing a round trip to disk here feels a little kludgy; it could probably use unit tests too.
I don't understand what you mean by "round trip to disk".
Yes, this code could use unit tests. If you say that the unit tests in #1825 (moved) look good, I'll add similar tests for connection stats.
Looking at the code that generates the extrainfo, it looks like if we leave ConnStatistics on for a while, then turn if off for a year, then turn it on again, there could be a window when the old ConnStatistics from a year ago get written in the extrainfo. Is that true? If so, is that really what we want?
No, load_stats_file() only returns a string if the stats have been written in the past last 25 hours.
In any case, it looks like a new feature; is there a good reason to add it to 0.2.2.x rather than waiting for 0.2.3.x?
Waiting for 0.2.3.x is fine.
But before we ignore this feature for the next two months, do you have an answer to my question from the task description?
An open question is whether we should distinguish between connections to other relays and to clients. I wonder if there's an easy way to tell the two connection types apart.
See updated branch bidistats in my public repository. I also added a ChangeLog entry and a man page entry.
The unit tests in #1825 (moved) do indeed look good, and the refactoring in question will take care of the "write it to disk then read it to disk even if you only wanted it in a string" issue.
I'll ask about the motivation on or-dev. Without knowing that, I can't really answer whether we should distinguish between relay and client connections, since I'm not sure what we're accomplishing as it is.
But to tell the connections apart, see current code and discussions for bug #1751 (moved) ; there is some (imperfect) code to do this already, and if the code improves, it should still present a similar interface.
The unit tests in #1825 (moved) do indeed look good, and the refactoring in question will take care of the "write it to disk then read it to disk even if you only wanted it in a string" issue.
Okay, I added unit tests in branch bidistats2. No need to review right now, as we're waiting for an or-dev answer before merging anyway.
I'll ask about the motivation on or-dev. Without knowing that, I can't really answer whether we should distinguish between relay and client connections, since I'm not sure what we're accomplishing as it is.
But to tell the connections apart, see current code and discussions for bug #1751 (moved) ; there is some (imperfect) code to do this already, and if the code improves, it should still present a similar interface.
I'll wait for the or-dev reply first before looking closer at this.
This should go into 0.2.3.xif it goes anywhere. See http://archives.seul.org/or/dev/Aug-2010/msg00038.html for Björn's response. Remaining question: Is there a reason this has to go into the main Tor distribution, or is it the kind of thing that could get good data just by running on a relay or two?
karsten: ping on this? When merged into master, it no longer builds. I think that the refactoring of extrainfo_dump_to_string has broken it. Can you fix it up?
karsten: ping on this? When merged into master, it no longer builds. I think that the refactoring of extrainfo_dump_to_string has broken it. Can you fix it up?
Right, the new extrainfo_dump_to_string() broke it. I rebased bidistats2 on master and fixed the compile problem. Please find branch bidistats3 in my public repository.