Simulate #9262 (moved) + #12890 (moved) in Shadow to test KISTs effect on performance. Come up with good default parameters that work well to reduce congestion/latency.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
After numerous big fixes and new Tor network model, I finally have some initial results. I tested nickm/kist at commit 55814effcb96ff4998e75a2136ac2ed631247d8a with UseKIST 0 and UseKIST 1.
The results totally blow. Downloads times increased dramatically when using the new feature and Tor queue times were unchanged. I also noticed download failure modes around 5 minutes and 10 minutes, and am still looking into the cause.
None of this makes sense to me yet, and I have little confidence in the results I got. (Specifically, the queue times should have at least slightly changed, but I have not observed that.) My current thinking is that the problems are primarily due to the changes that occurred as a result of using a new network model. So my next step is to simulate under our older, stable model and go from there.
I went back to the stable Tor network topology model that we used in the KIST paper in order to verify the performance issues I alluded to in my last post. The topology contained 3600 relays and 12000 clients. I ran nickm's kist branch merged with tor-0.2.6.2-alpha, in order to take advantage of the new TestingDirAuthVoteGuard and TestingDirAuthVoteExit options. I ran one experiment with UseKIST 1 and another with UseKIST 0.
The results are very similar to those obtained from my old topology and last set of experiments. The performance and throughput results are attached. As you can see, performance is worse when using the current KIST implementation than without it, and aggregate network throughput drops by almost half.
My current thinking is that we are starving the kernel and therefore not utilizing all available bandwidth of the relays, but more logging in the KIST branch would help give us some hard data about this potential problem.
Next I want to play around with KISTSockBufSizeFactor, so that we always write much more to the buffer than we think we need to. For example, I could set it extremely high to approximate the old behavior and make sure we avoid kernel starvation. I think that will give us a useful data point.
I believe the results I posted in comment 7 are invalid. I recently found and fixed several bugs in Shadow which affected network performance, and created a more recent model of Tor that we were using for our peerflow experiments. I have higher confidence in this model after running many many experiments with it and analyzing results obtained with it.
I compared Torperf performance in Tor vs in Shadow with my new fancy model. Those results are attached here. It appears that Shadow is again tracking Tor performance nicely. (I believe the difference in time to first byte is because Karsten and I are starting our download timers at different points, which we just realized this week.)
Update: using the model described in this comment, I ran KIST simulations using a variety of KISTSockBufsizeFactor settings (0.5, 1.5, 3.0) and compared the performance results against UseKIST 0 (vanilla Tor). The results are attached here. The high level result is that there was an insignificant change in performance among all settings tested.
One possibility for the insignificant performance change is that the network is not congested enough for KIST to make a difference. To better understand this possibility, I'd like to run some cell tracking code that allows us to compute the Tor application and the shadow kernel buffer times. We can then compare buffer times, and see how those change as we add load to the network (e.g., by doubling the number of clients).