I made a start in commit 506ea16 in my task-29367 branch. It's probably 80% done now, but I have to stop working on this now. It would be good if we don't wait too long with this change to avoid having to resolve as many merge conflicts. If anybody else wants to pick this up before next week, please feel free to do that!
I tweaked my first commit above and rebased it to master. The new commit is e5f15ea in my task-29367-2 branch. It passes unit tests and at least doesn't break in obvious ways. I just deployed a test instance and will let it run for a day to see if it produces useful results.
Having a careful review by somebody who has ported Python code from 2 to 3 before would be greatly appreciated!
Your patch looks good to me but I found two additional issues:
I assume you're running the tests by executing run_tests.sh? The tests test_reprocessing.test_log_collection_tgen and test_reprocessing.test_log_collection_torctl fail for me because the two paths in the lists differ in their ordering. These unit tests rely on os.walk, which internally calls os.scandir, which yields files in random order. Looks like these tests are broken but we can fix this issue by calling sorted on the arbitrarily ordered file names. Here's a patch.
I ran the tool 2to3 over all source files and it suggested turning dict_keys and dict_values objects in analysis.py into list. Here's a patch. All unit tests still pass.
I forgot to mention that we should also update README.md as it still contains python 2 references. Here's a patch for that. All my patches are based on your task-29367-2 branch and should be easy to apply.
Regarding the second patch that turns dict keys and values into lists, I saw those changes, too, and did not make them on purpose. The reason is that we only need to iterate over these keys or values and don't need them as a list for further processing. But of course it doesn't do harm to convert them into lists, except for trivially lower performance. I think it's fine to just apply your patch.
While trying out my branch (before your patches and afterwards), I found another problem in the analysis mode: When OnionPerf opens the Tor control log files containing \r\n as newline, it converts these newlines to just \n. But that makes Stem sad which refuses to parse the contained events. I wrote a patch that enables OnionPerf to open Tor control log files with \r\n as newline and that leaves them unchanged before it passes lines to Stem. My test instance that ran over night successfully produced an analysis file with TGen and Tor control log parts.
This patch is in commit f863ee4 in my task-29367-2 branch which is based on your branch. Can you take another look at that commit? If it looks good, I'll merge everything to master.
This patch is in commit f863ee4 in my task-29367-2 branch which is based on your branch. Can you take another look at that commit? If it looks good, I'll merge everything to master.