I'd like to know what fraction of total Tor usage is hidden service usage, so we have a sense of whether hidden services matter now, and so we can track trends into the future.
For example, it would have been nice in August 2013 to have some metric of hidden service fraction that told us the spike in load and users had to do with hidden services.
Such statistics would also be useful to counter (or who knows, confirm) the analysts who say statements like "97% of Tor use is silk road".
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
My hs-stats branch implements this. I made it print out aggregate stats every thirty minutes, of the form
Sep 23 22:30:00.851 [notice] Heartbeat: 0|0 exit / 9|229 hidden service / 504|22963 middle circuits / 0|1000 other circuits handled.
That's 9 circuits that were hidden service related (and 229 cells on those), and 504 circuits that we saw an extend cell on (and 22963 cells on those).
The first graph shows all cell types in a bar chart. While this sounds great for comparing cells by type, this graph works really poorly for the tiny fraction of hidden-service cells in the network (yes, there's a green line near the x axis, at least for bolobolo1 and wannabe). Maybe it's useful for comparing middle vs. exit.
The second graph shows the fraction of hidden-service cells as compared to all cells. moria5 might be somewhat misleading here, because moria5 handles a tiny number of cells compared to the other two relays. The network-wide average is probably much closer to bolobolo1's or wannabe's fraction than moria5's fraction.
I also attached the R code which comes with easy-to-follow instructions.
asn, I just pushed more commits to my task-13192 branch and ran some tests in Chutney. Please take a look if you like my changes.
The next step would be to rewrite history. So far, only you and I have read this code. But the goal would be to have many more people read it. What we need are a few carefully crafted commits that are trivial to review even for people who are not C programmers. I have the following commit series in mind:
Your constify crypto_pk_get_digest patch.
My obfuscation functions including unit tests.
The two hidden-service statistics including everything from config option to writing stats to disk.
Your log statements and any remaining XXX, including the commented-out code that would include stats in extra-info descriptors. This commit shall not be given out to relay operators who we ask to help with gathering statistics, because they shouldn't accidentally run it. It's just for our testing and further development.
I can change history as described. Just let me know if this sounds reasonable to you.
I need to think more about the overflow protection in round_long_to_next_multiple_of(). I'm not sure what happens if there is actually an overflow there and that line is skipped.
Where can I find the formula you are using in transform_uniform_random_to_laplace()? Also, maybe we could rename that function to sample_laplace_distribution() or something.
When you do digestmap_free(hs_stats->onions_seen_this_period, NULL) do the elements of the digestmap get freed? I think you need to pass the tor_free_() func as the second argument?
This value, multiplied with EPSILON, is Laplace parameter b. */ I think this is divided not multiplied.
An interesting consequence of rounding up negative numbers is that a result of 0 doesn't mean that the underlying value was also 0. In our case it can be anything from -7 to 0. This is not something bad but something to keep in mind.
BTW, wrt to this unittest tt_assert(round_long_to_next_multiple_of(LONG_MIN,2) == LONG_MIN). It is the case in all platforms that LONG_MIN is even, right?
Yes, I think we wrote some good code there together.
Small review:
I need to think more about the overflow protection in round_long_to_next_multiple_of(). I'm not sure what happens if there is actually an overflow there and that line is skipped.
Yes, please do! I spent quite some time with paper and pencil here to go through the possible edge cases. And then I wrote unit tests for all of them that came to mind. But it's not at all unrealistic that I missed an important case.
Where can I find the formula you are using in transform_uniform_random_to_laplace()? Also, maybe we could rename that function to sample_laplace_distribution() or something.
From Wikipedia! :) I'll include a comment. I'm also renaming the function as suggested.
When you do digestmap_free(hs_stats->onions_seen_this_period, NULL) do the elements of the digestmap get freed? I think you need to pass the tor_free_() func as the second argument?
I think freeing elements would actually be harmful, because we're just storing void pointers ((void*)(uintptr_t)1) in the map. We cannot dereference those pointers and free whatever we find at that address. Should we ask a C programmer to get this confirmed?
This value, multiplied with EPSILON, is Laplace parameter b. */ I think this is divided not multiplied.
Err, yes. You mentioned that on IRC and totally wanted to change it. Changed now.
An interesting consequence of rounding up negative numbers is that a result of 0 doesn't mean that the underlying value was also 0. In our case it can be anything from -7 to 0. This is not something bad but something to keep in mind.
Agreed that it's potentially confusing, but it's correct, AFAICT.
BTW, wrt to this unittest tt_assert(round_long_to_next_multiple_of(LONG_MIN,2) == LONG_MIN). It is the case in all platforms that LONG_MIN is even, right?
Interesting question. I'm pretty sure. But even if not, the worst thing that can happen is that our unit tests break.
I made two more changes: I documented the new config option in the man page as discussed in #tor-dev, and I changed the default config value to 0 for two reasons: we should avoid that somebody accidentally turns on these statistics when running our branch; and we must avoid that we accidentally merge the wrong default value into master.
Branch task-13192 contains the new changes as additional commits, and branch task-13192-2 is heavily rebased following the plan described above.
Yes, I think we wrote some good code there together.
Small review:
When you do digestmap_free(hs_stats->onions_seen_this_period, NULL) do the elements of the digestmap get freed? I think you need to pass the tor_free_() func as the second argument?
I think freeing elements would actually be harmful, because we're just storing void pointers ((void*)(uintptr_t)1) in the map. We cannot dereference those pointers and free whatever we find at that address. Should we ask a C programmer to get this confirmed?
I'm sorry, I meant the key not the value. They key in our case is memdupped but I think it shouldn't be.
BTW, wrt to this unittest tt_assert(round_long_to_next_multiple_of(LONG_MIN,2) == LONG_MIN). It is the case in all platforms that LONG_MIN is even, right?
Interesting question. I'm pretty sure. But even if not, the worst thing that can happen is that our unit tests break.
I made two more changes: I documented the new config option in the man page as discussed in #tor-dev, and I changed the default config value to 0 for two reasons: we should avoid that somebody accidentally turns on these statistics when running our branch; and we must avoid that we accidentally merge the wrong default value into master.
Branch task-13192 contains the new changes as additional commits, and branch task-13192-2 is heavily rebased following the plan described above.
task-13192-2 looks OK to me. I don't see any new commits to the other branch.
FWIW, we should soon (in a week or so) give this code to people to run it in their relays. At that point, the commented code of extrainfo_dump_to_string() should be included and enabled so that stats actually get to us. What do we need to do to enable that piece of code? Revise the proposal and post it to [tor-dev]?
I'm sorry, I meant the key not the value. They key in our case is memdupped but I think it shouldn't be.
Oh, you're right. Fixed in new fixup commit on branch task-13192-2, I think.
task-13192-2 looks OK to me. I don't see any new commits to the other branch.
Looks like I forgot to push task-13192 earlier. Pushed now.
FWIW, we should soon (in a week or so) give this code to people to run it in their relays. At that point, the commented code of extrainfo_dump_to_string() should be included and enabled so that stats actually get to us. What do we need to do to enable that piece of code? Revise the proposal and post it to [tor-dev]?
Yes, let's revise the proposal and post the code to tor-dev@, possibly even with the commented-out code in it. Then let's give people a week, or at least a couple of days, to review everything. After that, we could publish the code with the working extra-info code and ask people to run it. Of course, there's always the possibility that people will have feedback that we need to incorporate, or that will force us to change our plans.