Opened 6 years ago

Closed 5 years ago

#11216 closed defect (fixed)

BridgeDB is parsing PTs from `cached-extrainfo*` files cumulatively

Reported by: isis Owned by: isis
Priority: Very High Milestone:
Component: Circumvention/BridgeDB Version:
Severity: Keywords: bridgedb-0.3.0, metrics, bridgedb-parsers, stem, isis2014Q3Q4, isisExB
Cc: isis, sysrqb, karsten Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

This is causing a bridge whose extrainfo descriptor is in both cached-extrainfo and cached-extrainfo.new to have its transport lines added twice to BridgeDB's list of pluggable transports. This is likely what was causing #9264.

This is easily reproducible: simply generate some fake descriptors, copy the cached-extrainfo.new file to cached-extrainfo (each fake descriptor currently always has exactly two transport lines -- one obfs2 and one obfs3), and start bridgedb:

git clone https://gitweb.torproject.org/bridgedb.git && cd bridgedb && make install
pip install leekspin
generate-OR-descriptors -n 50
cp cached-extrainfo.new cached-extrainfo
bridgedb

If you do tail -f bridgedb.log, you'll see the following lines confirming the bug:

Mar 14 04:14:14 [DEBUG] 836f58807dc44a248c684da0be18b918badef421 supports 4 transports
Mar 14 04:14:14 [DEBUG] e79da1d32ad16e000149449ea191a0e4baa32ca0 supports 4 transports
Mar 14 04:14:14 [DEBUG] 8de1a8fb8c0e2d1885fd05e39541a50a507685ff supports 4 transports
Mar 14 04:14:14 [DEBUG] d0ee79b3bad89ebe6cb1e833e8ba8003ad315e06 supports 4 transports
Mar 14 04:14:14 [DEBUG] Saving state to:        '/home/isis/code/torproject/bridgedb/run-manual/bridgedb.state'

Child Tickets

Change History (9)

comment:1 Changed 5 years ago by isis

Keywords: bridgedb-0.2.x added; bridgedb-0.1.6 removed

The relevant parts of the code are here and here. These are very old, very unmaintained (I've not gotten around to refactoring them, nor creating unittests for them yet) bits of BridgeDB's code.

Basically, I don't especially care if these functions are kept. I would prefer them to be rewritten entirely, but whatever fixes this bug is fine by me. If someone were to go the complete rewrite route for fixing this bug, then 9000+ bonus points if you do any of the following:

  1. Refactor the extra-info descriptor parser using Stem.
  2. Write some unittests for the thing.

Though neither are necessary and all patches welcome.

comment:2 in reply to:  1 Changed 5 years ago by isis

Replying to isis:

  1. Refactor the extra-info descriptor parser using Stem.


By the way, the ticket for doing that is #9380.

comment:3 Changed 5 years ago by isis

Status: newneeds_review

A patch for this was emailed to me today. I need to review it.

comment:4 Changed 5 years ago by isis

Status: needs_reviewneeds_revision

I reviewed the patch that was emailed to me from the BridgeDB TWN advertisement, and it looks really good, and it technically does fix this bug... but it uses the first Pluggable Transport of a given type for a given bridge that was found in the cached-extrainfo file.

It should probably check the published line from the extrainfo descriptor, in order to determine which of the descriptors is newest. This would mean that the parser, bridgedb.Bridges.parseExtraInfoFile() would need to be changed to extract that line. Or, it would need to wait until I or someone else finishes #9380 and/or #10725.

comment:5 in reply to:  3 Changed 5 years ago by chingucha

Replying to isis:

A patch for this was emailed to me today. I need to review it.

Where is this patch?

comment:6 Changed 5 years ago by isis

Blocking on #9380.

comment:7 Changed 5 years ago by isis

Keywords: isis2014Q3Q4 added

comment:8 Changed 5 years ago by isis

Keywords: isisExB added

comment:9 Changed 5 years ago by isis

Keywords: bridgedb-0.3.0 stem added; bridgedb-0.2.x removed
Resolution: fixed
Status: needs_revisionclosed

This was fixed by commits 7869e4c7cd1e43f9354480f6cafba0794fd86433, 65f18cd7c31a97274a8277688f0512b69ac1e3e3, and fe70415269693948bdfc5c8ea3abfab2b1d86c49 in my fix/9380-stem_r10 branch.

Those commits introduce the bridgedb.parse.descriptors.deduplicate() function, which is called in the bridgedb.parse.descriptors.parseExtraInfoFiles() function. The former deduplicates all descriptors for every bridge, selecting only the newest descriptor for a particular bridge. Additionally, if any Bridge has multiple @type bridge-extrainfo descriptors with exactly the same timestamps, then a bridgedb.parse.descriptors.DescriptorWarning will be issued, since perfectly identical descriptors shouldn't be something an unmodified tor is capable of doing (and thus would imply that there is either a drastic regression in tor, or that someone has created a possibly-malicious OR implementation). Unittests and integration tests which verify that these behaviours are functioning as expected have also been added.

(Incidentally, these same changes also fix #2895.)

Since #9380 has been merged into develop for the next release, this issue should be fixed in bridgedb-0.3.0.

Note: See TracTickets for help on using tickets.