Currently metrics-lib only reads tordnsel descriptors that contain
exactly one ExitAddress entry. This does not reflect the torperf data,
which can contain several ExitAddress entries TorDNSEL description (see also #17701 (moved)), i.e.
=== solution
metrics-libs should process tordnsel descriptors with an arbitrary count (but finite ;-) of ExitAddress lines.
(I could only find entries with one and two ExitAddress lines, but there is no limit stated in the spec.)
=== proposed changes
The interface needs to be changed:
I suggest that the current getExitAddress is marked deprecated and
returns one of the addresses read for backward compatibility
/** * Return one IP address that was determined in the scan. * @deprecated use getExitAddresses() */ public String getExitAddress();
In addition, a new method should be added
/* Return the IP addresses that were determined in the scan. */ public List<String> getExitAddresses();
Is this ok? Should anything be added or changed?
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
I think this is a fine approach, including making the current getExitAddress() method return the first entry and at the same time deprecating it.
One thing though: if we use a List<String> as return type for the new getExitAddresses() method, we'd lose the second scan time. How about a Map<String, Long> with address as key and scan time in milliseconds as value? We'd also have to deprecate getScanMillis() in that case, which would be okay.
Here's a slightly unrelated idea I had yesterday: how about we avoid having ExitListEntry be its own type in parallel to ExitList and rather turn it into ExitList.Entry? It's only used when processing exit lists anyway, and it would help clean up the API a tiny bit. If we're changing the API there now, why not change this, too. But, just a thought, this can easily be done in a separate commit after yours is in. Though ideally, if we agree it's a good idea, it should get into the same release.
By the way, I already have a draft patch for the ExitList.Entry part. It needs some cleaning up, which I'm happy to do. I'm pasting it below to give you the idea and to avoid that you're wasting effort writing the same code.
commit 3b54364e750401a9511582f797f458a21c2250beAuthor: Karsten Loesing <karsten.loesing@gmx.net>Date: Fri Dec 11 21:11:41 2015 +0100 Temp commit.diff --git a/src/org/torproject/descriptor/ExitList.java b/src/org/torproject/descriptor/ExitList.javaindex 09d7c25..755bed4 100644--- a/src/org/torproject/descriptor/ExitList.java+++ b/src/org/torproject/descriptor/ExitList.java@@ -7,10 +7,35 @@ import java.util.Set; /* Exit list containing all known exit scan results at a given time. */ public interface ExitList extends Descriptor {+ /* Exit list entry containing results from a single exit scan. */+ public interface Entry {++ /* Return the scanned relay's fingerprint. */+ public String getFingerprint();++ /* Return the publication time of the scanned relay's last known+ * descriptor. */+ public long getPublishedMillis();++ /* Return the publication time of the network status that this scan was+ * based on. */+ public long getLastStatusMillis();++ /* Return the IP address that was determined in the scan. */+ public String getExitAddress();++ /* Return the scan time. */+ public long getScanMillis();+ }+ /* Return the download time of the exit list. */ public long getDownloadedMillis(); /* Return the unordered set of exit scan results. */+ @Deprecated public Set<ExitListEntry> getExitListEntries();++ /* Return the unordered set of exit scan results. */+ public Set<ExitList.Entry> entrySet(); }diff --git a/src/org/torproject/descriptor/ExitListEntry.java b/src/org/torproject/descriptor/ExitListEntry.javaindex 201a172..b47158e 100644--- a/src/org/torproject/descriptor/ExitListEntry.java+++ b/src/org/torproject/descriptor/ExitListEntry.java@@ -3,6 +3,7 @@ package org.torproject.descriptor; /* Exit list entry containing results from a single exit scan. */+@Deprecated public interface ExitListEntry { /* Return the scanned relay's fingerprint. */diff --git a/src/org/torproject/descriptor/impl/ExitListEntryImpl.java b/src/org/torproject/descriptor/impl/ExitListEntryImpl.javaindex a03e373..0a5f8b6 100644--- a/src/org/torproject/descriptor/impl/ExitListEntryImpl.java+++ b/src/org/torproject/descriptor/impl/ExitListEntryImpl.java@@ -3,6 +3,8 @@ package org.torproject.descriptor.impl; import org.torproject.descriptor.DescriptorParseException;+import org.torproject.descriptor.ExitList;+ import java.util.ArrayList; import java.util.List; import java.util.Scanner;@@ -11,7 +13,7 @@ import java.util.TreeSet; import org.torproject.descriptor.ExitListEntry;-public class ExitListEntryImpl implements ExitListEntry {+public class ExitListEntryImpl implements ExitListEntry, ExitList.Entry { private byte[] exitListEntryBytes; public byte[] getExitListEntryBytes() {diff --git a/src/org/torproject/descriptor/impl/ExitListImpl.java b/src/org/torproject/descriptor/impl/ExitListImpl.javaindex 53dc112..cfa5248 100644--- a/src/org/torproject/descriptor/impl/ExitListImpl.java+++ b/src/org/torproject/descriptor/impl/ExitListImpl.java@@ -104,10 +104,13 @@ public class ExitListImpl extends DescriptorImpl implements ExitList { return this.downloadedMillis; }- private Set<ExitListEntry> exitListEntries =- new HashSet<ExitListEntry>();+ private Set<ExitListEntryImpl> exitListEntries =+ new HashSet<ExitListEntryImpl>(); public Set<ExitListEntry> getExitListEntries() { return new HashSet<ExitListEntry>(this.exitListEntries); }+ public Set<ExitList.Entry> entrySet() {+ return new HashSet<ExitList.Entry>(this.exitListEntries);+ } }
Yes, I can add this change to the patch.
Only one naming issue: I'd prefer to name public Set<ExitList.Entry> entrySet() something like
public Set<ExitList.Entry> getEntries().
The name 'entrySet' makes me think of java's 'Map.Entry',which would be misleading.
I didn't notice getScanMillis(). It would then have to return the time for the ExitAddress
returned in getExitAddress().
And, of course, getExitAddresses should return a Map<String, Long>. I suppose there won't be any ExitAddress lines with the same ip address and different scan time.
- this.unrecognizedLines = new ArrayList<String>();+ this.unrecognizedLines = new ArrayList<>();
All of your changes are fine.
PS:
looking at the test again, maybe the insufficient data test should be repeated to also include the other fields missing.
Are there more things to be tested based on your experience using metrics-lib?
I changed the line above, and I added a change log entry.
However, when writing that change log entry I realized that the previous behavior was not a bug after all! What we did was that we simply parsed an entry with multiple "ExitAddress" lines as separate exit list entries. But okay, I think the new interface is cleaner, so let's switch.
But! The way that getExitAddress() and getScanMillis() are currently implemented introduce a new bug, because we're return fewer exit list entries in getExitListEntries() than before. What we should do is keep the new behavior of getEntries() and change getExitListEntries() to return the old number of entries. I don't have a good fix in mind right now, but if we don't fix this, we'll break existing applications. Want to look into this?
Regarding tests, I'd probably add another test for a missing "ExitNode" line and one for a missing "Published" line, but those should be sufficient. Oh, and of course, the new behavior described in the previous paragraph would need testing.
I changed the line above, and I added a change log entry.
Thanks!
However, when writing that change log entry I realized that the previous behavior was not a bug after all! What we did was that we simply parsed an entry with multiple "ExitAddress" lines as separate exit list entries.
I think, these lines were first treated as a new exitlist-entry, but then discarded because the other keywords were missing.
But! The way that getExitAddress() and getScanMillis() are currently implemented introduce a new bug, because we're return fewer exit list entries in getExitListEntries() than before. ...
With the above reasoning the new code returns the same number as the old in getExitListEntries().
I think, these lines were first treated as a new exitlist-entry, but then discarded because the other keywords were missing.
Are you sure? I think the other keywords were not missing, because we wouldn't clear the StringBuilder after parsing an exit list entry, nor would we add an "ExitAddress" line to the StringBuilder. Here's the code. I tried this out and got more entries in the old code, but if you're observing something differently, I'll look again.
Oh, now I see, you're right.
I was referring to the code from my first patch which unfortunately already changed things to the behavior I described. (btw, this shows how important tests are. If these had been written before
the ExitList/Entry code I would have noticed earlier.)
Well, then getExitListEntries() has to convert entries with several exitAddresses into separate exitlistentries - one for each exitaddress. I'll provide the patch for this.
Looks good. I made some minor tweaks, including not creating new ExitListEntry instances in oldEntries() if they only contain one exit address. Tests ran through without problems, but maybe you can take a final look before I squash and merge? Thanks!
Looks good. I made some minor tweaks, including not creating new ExitListEntry instances in oldEntries() if they only contain one exit address. ...
Good idea.
All else is fine, too.