This is a ticket to resolve a) of comment:1:ticket:4680 :
We should evaluate whether the final morpher transport should use 'random sampling', or 'morphing matrices' to select a packet's final packet size.
In 'random sampling', you have a packet size probability distribution of a target protocol, and you pick a random variable, and then you shape your packet to be of that size.
By 'morphing matrices', I mean the technique described in the paper Traffic Morphing: An efficient defense against statistical traffic analysis which attempts to minimize the padding overhead imposed by packet size shaping.
Since our morpher transport will try to turn Tor traffic into HTTPS traffic, as far as packet sizes are concerned, we should evaluate whether the 'morphing matrices' method is worth it. We can build a tool that uses both methods on a couple thousand packets, and see which method is more effective. If 'morphing matrices' is indeed more effective, we should see whether it's effective enough to justify its troublesome implementation.
I stupidly attached all relevant files to #4680 (moved), but let's continue the discussion here to avoid flooding #4680 (moved) more.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
(Let's rename 'random sampling' to 'direct sampling' from now on:)
The attachments in #4680 (moved) spawn at least two questions:
a) It seems that in the Server->Client case (500000_sc.png:ticket:4680)) 'traffic morphing' actually delivers. It results in 1/4 of the overhead of the 'direct sampling' method.
The (not-so-)funny thing is that in the Client->Server case (500000_cs.png:ticket:4680),,) 'traffic morphing' actually causes more overhead than 'direct sampling' does!
My guess is that this happens because in the C->S case, HTTPS has large probabilities in small packet sizes (https_cs.png:ticket:4680),,) so the morphing matrix tries to split into a small packet, and then the second half is padded to 1460. Example from the logs (cs_log.txt:ticket:4680)::)
sampling: Got packet size 586. We must morph it to 5. Splitting to 581 and sending the first part...sampling: Got packet size 581. We must morph it to 1459. Padding with 878 and sending.morphing: Got packet size 586. We must morph it to 127. Splitting to 459 and sending the first part...morphing: Got packet size 459. We must morph it to 123. Splitting to 336 and sending the first part...morphing: Got packet size 336. We must morph it to 1459. Padding with 1123 and sending.264: OVERHEAD ROUND SUMMARY: Sampling: 928 : Morphing: 1223264: OVERHEAD ROUND SUMMARY: Morpher lost (295)
Morpher gets a packet of 586 bytes to morph, splits it to 127+459. Sends 127 bytes, splits the 459 part to 123+336. Sends 123 bytes, and pads the 336 part to 1459. This results in an overhead of 1223 bytes. Direct sampling was better due to randomness.
On the other hand, in the S->C case, Tor seems to have good-enough probabilities of outputting small packets (tor_sc.png:ticket:4680)) [0] which fit nicely with the good-enough probabilities of HTTPS of outputting small packets (https_sc.png:ticket:4680)) and the results are:
sampling: Got packet size 65. We must morph it to 1413. Padding with 1348 and sending.morphing: Got packet size 65. We must morph it to 79. Padding with 14 and sending.5: OVERHEAD ROUND SUMMARY: Sampling: 1348 : Morphing: 145: OVERHEAD ROUND SUMMARY: Morpher won (1334)
sampling: Got packet size 11. We must morph it to 1379. Padding with 1368 and sending.morphing: Got packet size 11. We must morph it to 53. Padding with 42 and sending.6: OVERHEAD ROUND SUMMARY: Sampling: 1368 : Morphing: 426: OVERHEAD ROUND SUMMARY: Morpher won (1326)
so in such cases, 'traffic morphing' beats stupid 'direct sampling' easily.
And even when the original packet size is large, something like this happens:
sampling: Got packet size 1460. We must morph it to 1379. Splitting to 81 and sending the first part...sampling: Got packet size 81. We must morph it to 894. Padding with 813 and sending.morphing: Got packet size 1460. We must morph it to 1414. Splitting to 46 and sending the first part...morphing: Got packet size 46. We must morph it to 81. Padding with 35 and sending.7: OVERHEAD ROUND SUMMARY: Sampling: 863 : Morphing: 857: OVERHEAD ROUND SUMMARY: Morpher won (778)
So the question is, what do we do? Should we use morphing matrices S->C, and direct sampling in C->S? Or use direct sampling in both cases?
b) The other question is, even if we select the minimum overhead in each case, are we happy with this kind of overhead? Looking at the plot of 500 packets (500_cs.png:ticket:4680)) in the C->S case, we get approx. 70k bytes of overhead in 100 packets [1]. In the S->C case (500_sc.png:ticket:4680)) we see approx. 100k bytes overhead using direct sampling, and 70k bytes of overhead using morphing matrices, in 100 packets.
Are we happy with this?
Some things to note:
a) The morphing matrices might be wrong. I'm the only one who has reviewed morpher or the matrices.
b) The probability distributions might be biased. We know that they were captured in Chicago, but maybe there is lots of other traffic in there apart from Tor (it was captured by TCP port). Also see [0].
c) The tests/graphs might be wrong. I wrote the tests and I sometimes add bugs in my code.
So apart from the two questions, we should try resolving these last three, as well.
[0]: Hmm, why is this? Why do Tor relays send small packets to clients, when the opposite doesn't happen that much? Do Tor relays use variable sized cells more often than the clients (note that v3 link handshake didn't exist when these packets were captured)? Or is it just a coincidence that happens when big TLS records get split, and it just happened more to relays during the packet capture?
[1]: 100 actual packets, packets splitted during morphing don't count.
Note that currently the plot titles are inversed. So 500_cs.png:ticket:4680 is really the C->S plot, even if the title says "Server->Client: 500 packets". For now, trust the filenames more than the titles.
I'll try to re-plot them with correct titles, but it's gonna take me some time.
As a further note, that I forgot to write down in comment:1, since we are only interested in Tor->HTTPS morphing, we might be able to find some heuristic improvements to the linear programming part of morpheus, that will result in better morphing matrices and give us less overhead. This might be hard, but if we end up using traffic morphing we should look into it.
Note that currently the plot titles are inversed. So 500_cs.png:ticket:4680 is really the C->S plot, even if the title says "Server->Client: 500 packets". For now, trust the filenames more than the titles.
I'll try to re-plot them with correct titles, but it's gonna take me some time.
Trac Powered
Fixed. The plot titles should now be correct.
Trac: Description: This is a ticket to resolve a) of comment:1:ticket:4680 :
We should evaluate whether the final morpher transport should use 'random sampling', or 'morphing matrices' to select a packet's final packet size.
In 'random sampling', you have a packet size probability distribution of a target protocol, and you pick a random variable, and then you shape your packet to be of that size.
By 'morphing matrices', I mean the technique described in the paper Traffic Morphing: An efficient defense against statistical traffic analysis which attempts to minimize the padding overhead imposed by packet size shaping.
Since our morpher transport will try to turn Tor traffic into HTTPS traffic, as far as packet sizes are concerned, we should evaluate whether the 'morphing matrices' method is worth it. We can build a tool that uses both methods on a couple thousand packets, and see which method is more effective. If 'morphing matrices' is indeed more effective, we should see whether it's effective enough to justify its troublesome implementation.
I stupidly attached all relevant files to #4680 (moved), but let's continue the discussion here to avoid flooding #4680 (moved) more.
to
This is a ticket to resolve a) of comment:1:ticket:4680 :
We should evaluate whether the final morpher transport should use 'random sampling', or 'morphing matrices' to seTrac Powered
Plect a packet's final packet size.
In 'random sampling', you have a packet size probability distribution of a target protocol, and you pick a random variable, and then you shape your packet to be of that size.
By 'morphing matrices', I mean the technique described in the paper Traffic Morphing: An efficient defense against statistical traffic analysis which attempts to minimize the padding overhead imposed by packet size shaping.
Since our morpher transport will try to turn Tor traffic into HTTPS traffic, as far as packet sizes are concerned, we should evaluate whether the 'morphing matrices' method is worth it. We can build a tool that uses both methods on a couple thousand packets, and see which method is more effective. If 'morphing matrices' is indeed more effective, we should see whether it's effective enough to justify its troublesome implementation.
I stupidly attached all relevant files to #4680 (moved), but let's continue the discussion here to avoid flooding #4680 (moved) more.
Ugh! Fat fingers and bad touchpad: I accidentally changed the ticket description in comment:4. Reverting it.
Trac: Description: This is a ticket to resolve a) of comment:1:ticket:4680 :
We should evaluate whether the final morpher transport should use 'random sampling', or 'morphing matrices' to seTrac Powered
Plect a packet's final packet size.
In 'random sampling', you have a packet size probability distribution of a target protocol, and you pick a random variable, and then you shape your packet to be of that size.
By 'morphing matrices', I mean the technique described in the paper Traffic Morphing: An efficient defense against statistical traffic analysis which attempts to minimize the padding overhead imposed by packet size shaping.
Since our morpher transport will try to turn Tor traffic into HTTPS traffic, as far as packet sizes are concerned, we should evaluate whether the 'morphing matrices' method is worth it. We can build a tool that uses both methods on a couple thousand packets, and see which method is more effective. If 'morphing matrices' is indeed more effective, we should see whether it's effective enough to justify its troublesome implementation.
I stupidly attached all relevant files to #4680 (moved), but let's continue the discussion here to avoid flooding #4680 (moved) more.
to
This is a ticket to resolve a) of comment:1:ticket:4680 :
We should evaluate whether the final morpher transport should use 'random sampling', or 'morphing matrices' to select a packet's final packet size.
In 'random sampling', you have a packet size probability distribution of a target protocol, and you pick a random variable, and then you shape your packet to be of that size.
By 'morphing matrices', I mean the technique described in the paper Traffic Morphing: An efficient defense against statistical traffic analysis which attempts to minimize the padding overhead imposed by packet size shaping.
Since our morpher transport will try to turn Tor traffic into HTTPS traffic, as far as packet sizes are concerned, we should evaluate whether the 'morphing matrices' method is worth it. We can build a tool that uses both methods on a couple thousand packets, and see which method is more effective. If 'morphing matrices' is indeed more effective, we should see whether it's effective enough to justify its troublesome implementation.
I stupidly attached all relevant files to #4680 (moved), but let's continue the discussion here to avoid flooding #4680 (moved) more.