Opened 9 years ago

Closed 9 years ago

Last modified 7 years ago

#1819 closed enhancement (implemented)

Implement new metric on bidirectional use of connections

Reported by: karsten Owned by: karsten
Priority: Medium Milestone: Tor: 0.2.3.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-relay
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Björn Scheuermann and Florian Tschorsch of Uni Düsseldorf want to know what fraction of connections are used bidirectionally. They suggested to count read and written bytes per connection in 10-second intervals and classify connections as "below threshold", "mostly reading", "mostly writing", and "both reading and writing".

I implemented this new metric in branch bidistats in my public repository.

Here are the results of my test relay with a reduced report interval of 2 hours (the actual implementation reports 24 hour intervals):

conn-stats-end 2010-08-10 15:55:05 (7200 s)
conn-bi-direct 3199,338,496,530
conn-stats-end 2010-08-10 17:55:05 (7200 s)
conn-bi-direct 7524,944,1301,1799

These numbers imply that 530+1799 of 338+496+530+944+1301+1799, or 43.1% of all connections are used bidirectionally.

An open question is whether we should distinguish between connections to other relays and to clients. I wonder if there's an easy way to tell the two connection types apart.

Child Tickets

Change History (11)

comment:1 Changed 9 years ago by nickm

Status: newneeds_review

What's the motivation on this one? And why 10 seconds?

There's probably a better name than ConnStatistics here; lots of our other statistics could fairly be called connection statistics. ConnDirectionStats? ConBidiStats?

BIDI_THRESHOLD and BIDI_INTERVAL and BIDI_FACTOR are uncommented.

Doing a round trip to disk here feels a little kludgy; it could probably use unit tests too.

Looking at the code that generates the extrainfo, it looks like if we leave ConnStatistics on for a while, then turn if off for a year, then turn it on again, there could be a window when the old ConnStatistics from a year ago get written in the extrainfo. Is that true? If so, is that really what we want?

In any case, it looks like a new feature; is there a good reason to add it to 0.2.2.x rather than waiting for 0.2.3.x?

comment:2 in reply to:  1 Changed 9 years ago by karsten

Replying to nickm:

What's the motivation on this one? And why 10 seconds?

Björn's and Florian's assumption is that bidirectional use of connections screws Tor's performance. For a more general design discussion of the motivation and the parameters, please ask them by replying to my or-dev message from July 12:

http://archives.seul.org/or/dev/Jul-2010/msg00016.html

There's probably a better name than ConnStatistics here; lots of our other statistics could fairly be called connection statistics. ConnDirectionStats? ConBidiStats?

Going with ConnDirectionStatistics.

BIDI_THRESHOLD and BIDI_INTERVAL and BIDI_FACTOR are uncommented.

Done.

Doing a round trip to disk here feels a little kludgy; it could probably use unit tests too.

I don't understand what you mean by "round trip to disk".

Yes, this code could use unit tests. If you say that the unit tests in #1825 look good, I'll add similar tests for connection stats.

Looking at the code that generates the extrainfo, it looks like if we leave ConnStatistics on for a while, then turn if off for a year, then turn it on again, there could be a window when the old ConnStatistics from a year ago get written in the extrainfo. Is that true? If so, is that really what we want?

No, load_stats_file() only returns a string if the stats have been written in the past last 25 hours.

In any case, it looks like a new feature; is there a good reason to add it to 0.2.2.x rather than waiting for 0.2.3.x?

Waiting for 0.2.3.x is fine.

But before we ignore this feature for the next two months, do you have an answer to my question from the task description?

An open question is whether we should distinguish between connections to other relays and to clients. I wonder if there's an easy way to tell the two connection types apart.

See updated branch bidistats in my public repository. I also added a ChangeLog entry and a man page entry.

comment:3 Changed 9 years ago by nickm

The unit tests in #1825 do indeed look good, and the refactoring in question will take care of the "write it to disk then read it to disk even if you only wanted it in a string" issue.

I'll ask about the motivation on or-dev. Without knowing that, I can't really answer whether we should distinguish between relay and client connections, since I'm not sure what we're accomplishing as it is.

But to tell the connections apart, see current code and discussions for bug #1751 ; there is some (imperfect) code to do this already, and if the code improves, it should still present a similar interface.

comment:4 in reply to:  3 Changed 9 years ago by karsten

Status: needs_reviewassigned

Replying to nickm:

The unit tests in #1825 do indeed look good, and the refactoring in question will take care of the "write it to disk then read it to disk even if you only wanted it in a string" issue.

Okay, I added unit tests in branch bidistats2. No need to review right now, as we're waiting for an or-dev answer before merging anyway.

I'll ask about the motivation on or-dev. Without knowing that, I can't really answer whether we should distinguish between relay and client connections, since I'm not sure what we're accomplishing as it is.

But to tell the connections apart, see current code and discussions for bug #1751 ; there is some (imperfect) code to do this already, and if the code improves, it should still present a similar interface.

I'll wait for the or-dev reply first before looking closer at this.

comment:5 Changed 9 years ago by karsten

Updated branch bidistats2 in my public repository.

comment:6 Changed 9 years ago by nickm

Milestone: Tor: 0.2.3.x-final

This should go into 0.2.3.xif it goes anywhere. See http://archives.seul.org/or/dev/Aug-2010/msg00038.html for Björn's response. Remaining question: Is there a reason this has to go into the main Tor distribution, or is it the kind of thing that could get good data just by running on a relay or two?

comment:7 Changed 9 years ago by nickm

karsten: ping on this? When merged into master, it no longer builds. I think that the refactoring of extrainfo_dump_to_string has broken it. Can you fix it up?

comment:8 in reply to:  7 Changed 9 years ago by karsten

Status: assignedneeds_review

Replying to nickm:

karsten: ping on this? When merged into master, it no longer builds. I think that the refactoring of extrainfo_dump_to_string has broken it. Can you fix it up?

Right, the new extrainfo_dump_to_string() broke it. I rebased bidistats2 on master and fixed the compile problem. Please find branch bidistats3 in my public repository.

comment:9 Changed 9 years ago by nickm

Resolution: implemented
Status: needs_reviewclosed

Seems okay; merging it.

comment:10 Changed 7 years ago by nickm

Keywords: tor-relay added

comment:11 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.