ConvertToJson [0] is a converter from raw CollecTor data to JSON, based on metrics-lib. Converting Exit-List descriptors (@type Tordnsel) works fine ([0], line 1221) until a check for unrecognized attributes ([0], line 109) is added. With the check enabled te converter outputs an error report like the following for each descriptor of type tordnsel:
Unrecognized lines in /data/in/tordnsel/2015-09-01-08-02-03:
[@type tordnsel 1.0]
It doesn't produce a JSON result though (which it should even if it encounters an unrecognized line).
used the same sources and generated two new patch files.
one with the changes and one with the new test class.
(I don't know what caused the above problem.)
Applying the 2 patches seperately worked.
The converter doesn't throw errors anymore but doesn't output any descriptor data either (except the first line that's hardcoded at the top - verbose, compress and /data/in/path), no matter if the check for unrecognized lines is on or off.
Well, the test returns the five descriptors that it received.
I noticed that the your testdata file is not correctly named.
I should be named 2015-09-01-00-02-02 as those files on collector.torproject.org
All tordnsel descriptors are named like that on my box.
I threw a tordnsel descriptor together with descriptors of all other types in one directory and converted them in one pass to rule out typos and the like but same result: they all pass, no errors or warnings, but also no sign of a tordnsel-descriptor in the otherwise complete JSON result set.
Which data are you referring to? My sample data is not on Github and there's no file with 2790 lines that I'm aware of.
Can you tell me how to run the tests that your patch includes? Maybe that can give a hint on what's going wrong but I've never run test before.
Also I could use a hand in how to call the getException() method if you have a minute.
Changed error handler to descriptorFile.getException().printStackTrace(); but to no avail.
Just started my first foray into using a real debugger: what I can see and hopefully describe correctly is that for every other type of descriptors the code walks through every step of the if (descriptor instanceof <someDescriptorType>) switching logic but for tordnsel descriptors it jumps from for (Descriptor descriptor : descriptorFile.getDescriptors()) (line 75) straight to the line line befor bw.close. Just like that... That's "strange".
Rather to System.err if there is an exception:
{{{
if (descriptor instanceof ExitList) {
jsonDescriptor = JsonExitList.convert((ExitList) descriptor);
if(null != descriptorFile.getException()){
System.err.print(descriptorFile.getException());
}
}
You're above
descriptorFile.getException().printStackTrace();
will cause a NullPointerException when getException() returns null.
Replying to tomlurge:
Could you add your data to github?
After removing the double exitnode line the converter produced the wanted json
with your old data on github.
...
Can you tell me how to run the tests that your patch includes? Maybe that can give a hint on what's going wrong but I've never run test before.
Ah! I had forgotten that completely... As you can see in the /data/in directory, that is now in the repo, I have one directory with a few descriptors per each type and another directory, named 'singles', with one descriptot per type. The Tordnsel descriptor in the 'singles' directory is the same as in /docs/rawDataExamples.
Indeed every Tordnsel descriptot contained an entry "ExitAddress 89.248.169.36 2015-08-31 19:04:24". In all but one this entry was empty. In one descriptor - the one in the "singles" directory - the preceding entry was empty.
I have now - on my local copy - deleted these empty entries. Now the descriptor in "singles" passes as well as 6 of the 9 descriptors in directory "tordnsel". I assume the remaining 3 have more empty entries.
Thanks for spotting this!
But still this is a bug, right? The converter is not supposed to just give up silently on a whole descriptor file just because it encounters an empty entry. Btw: it's not a "double" entry as you say. It's an empty entry, followed by the next, non-empty, entry. So it's 2 ExitAddress lines following each other immediatly, but with different content.
But still this is a bug, right? The converter is not supposed to just give up silently on a whole descriptor file just because it encounters an empty entry.
No, it seems to be a feature. The descriptor doesn't give up silently, because there is the
getException() method. The Converter ought to use it in order to find out about problems.
That's why it's there.
And, that processing ends here was a decision made by Karsten long ago, so he probably could
tell about the reasoning behind it.
Ceterum censeo ;-) these findings are even more reason to add tests for the Converter now
and define expected behavior.
Btw: it's not a "double" entry as you say. It's an empty entry, followed by the next, non-empty, entry. So it's 2 ExitAddress lines following each other immediatly, but with different content.
This is not my naming:
The metrics-lib code expects one ExitAddress and then the three other data lines,
but it encounters two ExitNode entries. Thus, the metrics-lib code 'thinks' this a duplicate ExitNode entry in one descriptor.
Perfect! Resolving this ticket (as "not a bug", because the bug was not in metrics-lib, AFAIUI). Please re-open if issues in metrics-lib remain that were described above.
Trac: Resolution: N/Ato not a bug Status: needs_information to closed