#23981 closed defect (fixed)

Fix NPE and include "bridge-distribution-request" lines in sanitized bridge descriptors

Reported by: karsten Owned by: iwakeh
Priority: High Milestone:
Component: Metrics/CollecTor Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by iwakeh)

For NPE log excerpt see comment:1.

From CollecTor's logs:

2017-10-25 02:09:20,131 WARN o.t.c.b.SanitizedBridgesWriter:952 Unrecognized line 'bridge-distribution-request any'. Skipping.

We should investigate what the exact specification of this line is, decide whether it's harmless to keep, add it to CollecTor's bridge descriptor module, try it out locally, get it reviewed, put out a release, and deploy that.

Child Tickets

Change History (16)

comment:1 Changed 21 months ago by karsten

Possibly related log lines:

2017-10-25 05:09:01,147 WARN o.t.c.b.SanitizedBridgesWriter:952 Unrecognized line 'bridge-distribution-request any'. Skipping.
2017-10-25 05:09:01,488 WARN o.t.c.b.SanitizedBridgesWriter:959 Could not parse server descriptor.
java.lang.NullPointerException: null
        at org.torproject.collector.bridgedescs.SanitizedBridgesWriter.scrubIpv4Address(SanitizedBridgesWriter.java:265)
        at org.torproject.collector.bridgedescs.SanitizedBridgesWriter.sanitizeAndStoreServerDescriptor(SanitizedBridgesWriter.java:751)
        at org.torproject.collector.bridgedescs.BridgeDescriptorParser.parse(BridgeDescriptorParser.java:44)
        at org.torproject.collector.bridgedescs.BridgeSnapshotReader.<init>(BridgeSnapshotReader.java:200)
        at org.torproject.collector.bridgedescs.SanitizedBridgesWriter.startProcessing(SanitizedBridgesWriter.java:217)
        at org.torproject.collector.cron.CollecTorMain.run(CollecTorMain.java:67)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2017-10-25 05:09:01,865 WARN o.t.c.b.SanitizedBridgesWriter:952 Unrecognized line 'bridge-distribution-request any'. Skipping.

comment:2 Changed 21 months ago by iwakeh

This ought to be scrubbed, as it might be used to single out bridges?

Here's the spec:

"bridge-distribution-request" SP Method NL

        [At most once, bridges only.]

        The "Method" describes how a Bridge address is distributed by
        BridgeDB. Recognized methods are: "none", "any", "https", "email",
        "moat", "hyphae". If set to "none", BridgeDB will avoid distributing
        your bridge address. If set to "any", BridgeDB will choose how to
        distribute your bridge address. Choosing any of the other methods will
        tell BridgeDB to distribute your bridge via a specific method:

          - "https" specifies distribution via the web interface at
             https://bridges.torproject.org;
          - "email" specifies distribution via the email autoresponder at
            bridges@torproject.org;
          - "moat" specifies distribution via an interactive menu inside Tor
            Browser; and
          - "hyphae" specifies distribution via a cryptographically-secure,
            invitation-based system.

        Potential future "Method" specifiers must be as follows:
            Method = (KeywordChar | "_") +

        All bridges SHOULD include this line. Non-bridges MUST NOT include
        it.  (It is not currently implemented, or obeyed by Bridge DB.)

        BridgeDB SHOULD treat unrecognized Method values as if they were
        "none".

        (Default: "any")
Last edited 21 months ago by iwakeh (previous) (diff)

comment:3 Changed 21 months ago by karsten

Hmm. What do you mean by "single out", and why would we have to scrub this information?

We did have bridge pool assignment files a while back and only stopped providing them, because BridgeDB stopped providing them to us reliably. But we did not scrub the information how a bridge was distributed there, and here we're looking at information from bridges how they want to be distributed, which may or may not be obeyed by BridgeDB.

Can you elaborate why you think this information needs to be scrubbed?

comment:4 Changed 21 months ago by iwakeh

Just from first reflex this gives some information where to look for bridges, in general. But, if this wasn't an issue before, ok.

I'll look into this further.

comment:5 Changed 21 months ago by iwakeh

Owner: changed from metrics-team to iwakeh
Status: newaccepted

comment:6 Changed 21 months ago by iwakeh

Hmm, so far I could not reproduce this problem.
Please find a temporary suggestion for catching this type of problem and retrieving more information about it.

There are also many other places that could cause trouble when unforeseen data arrives. Also some java 8 date handling classes might be of advantage. I'll open a ticket about that. And also think further about this problem.

comment:7 Changed 21 months ago by iwakeh

Description: modified (diff)
Summary: Include "bridge-distribution-request" lines in sanitized bridge descriptorsFix NPE and include "bridge-distribution-request" lines in sanitized bridge descriptors

Adapted summary and description.

Maybe open a different ticket for the unrecognized line?

comment:8 Changed 21 months ago by karsten

I think you'll need to set ReplaceIpAddressesWithHashes = true in order to reproduce the problem. At least I could reproduce the problem by doing so.

comment:9 Changed 21 months ago by karsten

Looks like this issue is caused by a server descriptor produced by an alternate Tor implementation that identifies itself as "platform Pearl 0fd5756 on Linux". And that implementation uses a different order of elements in its descriptor.

I'll start working on a fix now. If you started working on a fix, too, and would rather continue with that, let's meet on the meeting pad.

comment:10 Changed 21 months ago by karsten

Status: acceptedneeds_review

Please review the fix in my task-23981 branch. It still lacks a change log entry, and with more time I'd have done another review run myself, so handle with care.

Note that I tried to keep the patch minimal in order to get it fixed ASAP. While working on this patch I considered taking a different approach where we'd include placeholders for parts that we cannot sanitize yet and fill them in after finishing to read the original descriptor. This could be a List<StringBuilder> to retain order of original lines and where we'd keep references to some StringBuilders that we need to sanitize after reading. Still something to think about. Let's do that after fixing this ugly bug.

Oh, and we still need to make a final decision what to do with those new lines. Okay to keep them unchanged?

comment:11 Changed 21 months ago by iwakeh

Hmm, the fix cures the NPE problem, as hotfix. As another measure there should be very high priority on #20549 now.

comment:12 in reply to:  11 Changed 21 months ago by karsten

Replying to iwakeh:

Hmm, the fix cures the NPE problem, as hotfix.

Thanks for looking. I prepared a pre-release tarball that we should ideally put out tomorrow.

As another measure there should be very high priority on #20549 now.

Let's discuss that on #20549.

comment:13 in reply to:  10 ; Changed 21 months ago by arma

Replying to karsten:

Oh, and we still need to make a final decision what to do with those new lines. Okay to keep them unchanged?

I hope we choose "yes, ok to keep them unchanged". That way external researchers will be able to tie together usage patterns, and blocking patterns, to the distribution method that the bridge asked for.

(I am still sad that we don't record (and publish) what distribution mechanism bridgedb picked for a bridge. Most bridges probably won't set this line on their own, so seeing this line won't be a substitute for learning what bridgedb decided to do.)

comment:14 in reply to:  13 Changed 21 months ago by karsten

Replying to arma:

Replying to karsten:

Oh, and we still need to make a final decision what to do with those new lines. Okay to keep them unchanged?

I hope we choose "yes, ok to keep them unchanged". That way external researchers will be able to tie together usage patterns, and blocking patterns, to the distribution method that the bridge asked for.

That's the current plan, yes.

(I am still sad that we don't record (and publish) what distribution mechanism bridgedb picked for a bridge. Most bridges probably won't set this line on their own, so seeing this line won't be a substitute for learning what bridgedb decided to do.)

Good point. I'll put "Resume gathering bridge distribution statistics from BridgeDB" on the roadmap we're about to finish soon, though it will be on the long-term list for Q4/2018 or later. But at least we'll remember when we make the next roadmap or run out of other goals earlier than we thought (hahaha).

comment:15 in reply to:  13 Changed 21 months ago by iwakeh

Yes, let's keep the bridge-distribution-request ... lines. (Might be tidier on a new ticket as I overlooked it.)

comment:16 Changed 21 months ago by karsten

Resolution: fixed
Status: needs_reviewclosed

CollecTor now retains "bridge-distribution-request" lines. Example: https://collector.torproject.org/recent/bridge-descriptors/server-descriptors/2017-10-26-08-09-00-server-descriptors (URL will go away in a few days, but descriptors will be contained in tarballs)

Goal is added to the roadmap for long-term goals.

I believe that resolves everything on this ticket. Closing. If something remains, please re-open. Thanks!

Note: See TracTickets for help on using tickets.