Opened 6 years ago

Closed 6 years ago

#9244 closed defect (wontfix)

bridgedb's GeoIP module (v0.2.6, the latest) causes python to segfault

Reported by: isis Owned by: isis
Priority: Medium Milestone:
Component: Circumvention/BridgeDB Version:
Severity: Keywords: geoip, maxmind
Cc: isis@…, Matthew.Finkel@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Perhaps we should look for another geoip API. The one that BridgeDB is currently using is for MaxMind, and, if I recall correctly, they changed their API somewhat recently and I think there is now a 'pygeoip2' module to correspond to the new API.

This should be looked into at some point, especially because the latest version of the current pygeoip module we are using (the one where we do 'import GeoIP' -- the namespace is a bit convoluded) has a reproducible segmentation fault:

In [17]: geoip.region_by_addr('24.5.85.80')
Invalid database type GeoIP Country Edition, expected GeoIP Region Edition, Rev 1
Segmentation fault
(bridgedb)∃!isisⒶwintermute:(feature/9199-improved-logging-rebased1-merged1 *>)~/code/torproject/bridgedb ∴ ipython
WARNING: Attempting to work in a virtualenv. If you encounter problems, please install IPython inside the virtualenv.

In [1]: import GeoIP

In [2]: GeoIP.__
GeoIP.__class__         GeoIP.__hash__          GeoIP.__repr__
GeoIP.__delattr__       GeoIP.__init__          GeoIP.__setattr__
GeoIP.__dict__          GeoIP.__name__          GeoIP.__sizeof__
GeoIP.__doc__           GeoIP.__new__           GeoIP.__str__
GeoIP.__file__          GeoIP.__package__       GeoIP.__subclasshook__
GeoIP.__format__        GeoIP.__reduce__
GeoIP.__getattribute__  GeoIP.__reduce_ex__

In [2]: geoip = GeoIP.new(Ge
GeneratorExit  GeoIP

In [2]: geoip = GeoIP.new(GeoIP.
GeoIP.GEOIP_CHARSET_ISO_8859_1  GeoIP.country_codes
GeoIP.GEOIP_CHARSET_UTF8        GeoIP.country_continents
GeoIP.GEOIP_CHECK_CACHE         GeoIP.country_names
GeoIP.GEOIP_INDEX_CACHE         GeoIP.error
GeoIP.GEOIP_MEMORY_CACHE        GeoIP.new
GeoIP.GEOIP_STANDARD            GeoIP.open

In [2]: geoip = GeoIP.new(GeoIP.GEOIP_STANDARD)

In [3]: geoip.re
geoip.record_by_addr  geoip.region_by_addr
geoip.record_by_name  geoip.region_by_name

In [3]: geoip.regio
geoip.region_by_addr  geoip.region_by_name

In [3]: geoip.region_by_addr('24.5.85.80')
Invalid database type GeoIP Country Edition, expected GeoIP Region Edition, Rev 1
Segmentation fault
(bridgedb)∃!isisⒶwintermute:(feature/9199-improved-logging-rebased1-merged1 *>)~/code/torproject/bridgedb ∴

Child Tickets

Change History (9)

comment:1 Changed 6 years ago by isis

Cc: isis@… added

comment:2 Changed 6 years ago by sysrqb

Cc: Matthew.Finkel@… added
Status: newneeds_information

do you have any suggestions? Have you used any others in the past? blockfinder? ;)

comment:3 Changed 6 years ago by isis

Blockfinder uses MaxMind's GeoIP, though when it refreshes the files I believe it just shells out with a wget to get the GeoIP DB file from MaxMind's website and then shoves it into a local sqlite DB. But I could be wrong, I've not touched that part of it's code and I only skimmed over it. Jake would probably know more.

I don't have any suggestions, though most Tor projects use MaxMind, so we probably shouldn't stray from that just because the files are not small and it is likely that users already have them taking up a chunk of their disk. By making this ticket, I just meant that some small amount of time should probably be spent at some point downloading a whole bunch of geoip modules and messing around with them. I was just dismayed that my 5 minutes of casually messing with the one that OONI and BridgeDB both use produced a segfault so easily.

comment:4 in reply to:  3 ; Changed 6 years ago by arma

Replying to isis:

By making this ticket, I just meant that some small amount of time should probably be spent at some point downloading a whole bunch of geoip modules and messing around with them.

Be sure to also look at Karsten's old tech reports comparing various geoip files to each other.

comment:5 Changed 6 years ago by karsten

Blockfinder is not a GeoIP database of its own, but allows you to import various databases into a SQLite database and query that. It's probably not useful for BridgeDB right now.

I recommend using MaxMind's database. If you don't get their library running, you could simply write your own parsing code to use the src/config/geoip file we ship with tor. The latter has the advantage that A1 Anonymous Proxy ranges are removed, which is not the case in MaxMind's original GeoIP file.

Speaking of MaxMind, they're planning to roll out a new GeoIP2 database version by the end of this year. This is interesting to us, because we won't have to remove A1 ranges from that database anymore. However, we'll have to write new tools to process their databases. I could imagine that we maintain a Python and a Java version of our own GeoIP parsing library that we can use in various products. I guess that means you shouldn't spend days on fixing GeoIP support in BridgeDB, because a few months later, we may have to fix it again.

comment:6 in reply to:  3 Changed 6 years ago by sysrqb

Replying to isis:

Blockfinder uses MaxMind's GeoIP, though when it refreshes the files I believe it just shells out with a wget to get the GeoIP DB file from MaxMind's website and then shoves it into a local sqlite DB. But I could be wrong, I've not touched that part of it's code and I only skimmed over it. Jake would probably know more.

Woops, yes, it definitely does.

I don't have any suggestions, though most Tor projects use MaxMind, so we probably shouldn't stray from that just because the files are not small and it is likely that users already have them taking up a chunk of their disk.

The size of the DB probably impacts OONI more than BridgeDB simply because not many users run Bridgedb locally. BridgeDB also only uses the GeoIP lookup to determine within which country the returned bridges should not be blocked. I'm guessing aagbsn hasn't had a problem with this, yet.

comment:7 in reply to:  4 Changed 6 years ago by sysrqb

Replying to arma:

Replying to isis:

By making this ticket, I just meant that some small amount of time should probably be spent at some point downloading a whole bunch of geoip modules and messing around with them.

Be sure to also look at Karsten's old tech reports comparing various geoip files to each other.

Is this one one you're thinking about? https://research.torproject.org/techreports/geoipdbcomp-2009-10-23.pdf

comment:8 Changed 6 years ago by sysrqb

Keywords: important added

Marking as important based on triage, but I'm unsure about the priority of this, we don't really have an alt right now. "If doing something causes it to break...don't do that." :)

comment:9 in reply to:  8 Changed 6 years ago by isis

Keywords: important removed
Resolution: wontfix
Status: needs_informationclosed

After reading Karstens's research paper, plus the news that:

Replying to karsten:

Speaking of MaxMind, they're planning to roll out a new GeoIP2 database version by the end of this year. This is interesting to us, because we won't have to remove A1 ranges from that database anymore.

means we can't really do anything now.

However, we'll have to write new tools to process their databases. I could imagine that we maintain a Python and a Java version of our own GeoIP parsing library that we can use in various products. I guess that means you shouldn't spend days on fixing GeoIP support in BridgeDB, because a few months later, we may have to fix it again.

This is a separate task from BridgeDB, because the results will be used by BridgeDB, Onionoo, Atlas, Globe, etc. I'm down to help make this library when that time comes.

For now, I'm closing as "wontfix".

Replying to sysrqb:

Marking as important based on triage, but I'm unsure about the priority of this, we don't really have an alt right now. "If doing something causes it to break...don't do that." :)

Seems sane enough. :)

Note: See TracTickets for help on using tickets.