Opened 3 months ago

Last modified 3 months ago

#28921 needs_information defect

tor-prompt command 'GETINFO desc/all-recent > /dev/null' fails

Reported by: wagon Owned by: atagar
Priority: Medium Milestone:
Component: Core Tor/Stem Version:
Severity: Normal Keywords: descriptor
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

If redirection is used, it fails:

$ tor-prompt --run 'GETINFO desc/all-recent' 1>/dev/null
Traceback (most recent call last):
  File "/path/to/stem/tor-prompt", line 8, in <module>
    stem.interpreter.main()
  File "/path/to/stem/stem/interpreter/__init__.py", line 151, in main
    interpreter.run_command(args.run_cmd, print_response = True)
  File "/path/to/stem/stem/util/conf.py", line 289, in wrapped
    return func(*args, config = config, **kwargs)
  File "/path/to/stem/stem/interpreter/commands.py", line 381, in run_command
    print(output)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u021b' in position 1237805: ordinal not in range(128)

If redirection is not used, it works.

I think atagar is right in his comment:

I suspect the issue is that you're using python3, and that tor-prompt is using print() which expect unicode. Server descriptors can have non-ascii content on contact lines which can cause the stacktrace you cited above.

I probably need to add some escaping within tor-prompt.

Child Tickets

Change History (11)

comment:1 Changed 3 months ago by atagar

Status: assignedneeds_information

Hi wagon, the above command works for me. Please attach your tor data directory's "cached-descriptors" file so I can see what content it's choking on and get a local repro.

comment:2 Changed 3 months ago by wagon

Please attach your tor data directory's "cached-descriptors" file

Trac restricts size of files by 2.9 MB, so I had to use somebody's uploader. Download it from here.

I couldn't evaluate privacy issues concerning this file, so I've decided to edit annotations using the command

for f in cached-descriptors* ; do 
sed -i '/^@downloaded-at /{ s/ .*$/ 2018-12-24 00:00:00/ }; 
/^@source .*/{ s/".*"/"208.113.135.162"/ }' "$f" ; 
done

and then encrypt it for you.

comment:3 Changed 3 months ago by atagar

Hi wagon. This is odd, I'm having difficulty reproducing that stacktrace. Non-ascii content appears on the contact lines but this is normal and stem accounts for this...

grep -P "[\x80-\xFF]" cached-descriptors
contact 0xdf0c3d316b7312d5 Alexander Kjäll <alexander.kjall@gmail.com>
contact fogmountain[Ät]gMx-D0t-nEt [tor-relay.co]
contact Mynameis Nobody <bleckbox ät ouvaton dodt org>
contact 0x6FBAB4BD076683498B71AB812C8A7BBF7B85E1AB Oddbjørn Norstrand <oddbjorn AT norstrand dot priv dot no>
contact DrChaos ät gmx dot de
contact 0x775BFC87 Hloupý Honza <dumbjack AT seznam dot cz>

I tried parsing this through stem and printing the output with both python 2.7 and 3.5 but couldn't repro that stacktrace. Are you using the copy of stem from git? If not then please give that a whirl.

I couldn't evaluate privacy issues concerning this file

There are no privacy issues with this file. It's just cached descriptor content. Everyone receives the same and each hours descriptors are publicly available on CollecTor.

comment:4 in reply to:  3 Changed 3 months ago by wagon

Replying to atagar:

Hi wagon. This is odd, I'm having difficulty reproducing that stacktrace. Non-ascii content appears on the contact lines but this is normal and stem accounts for this...

grep -P "[\x80-\xFF]" cached-descriptors

Your grep doesn't catch all non-ascii characters. For example, I tried this suggestion and found the following:

$ grep -E "[А-Яа-яЁё]" cached-descriptors    
contact ++Питер++ c.m.i(at)mail.ru    ++ hkp://keyserver.ubuntu.com:11371 ++ Bitcoin?  153gfzos233LcSnJpDF5u3q76iVAACwTAd

I have no idea what are the characters which results to error.

I tried parsing this through stem and printing the output with both python 2.7 and 3.5 but couldn't repro that stacktrace. Are you using the copy of stem from git? If not then please give that a whirl.

I used this git version. Now I've found another interesting observation. tor-prompt from 1.6.0 (release) doesn't have this problem.

There are no privacy issues with this file. It's just cached descriptor content. Everyone receives the same and each hours descriptors are publicly available on CollecTor.

There are privacy issues with this file, because IP addresses in annotations @source are guard nodes your Tor uses. I don't think it is safe to tell everybody which guard nodes you are using now.

comment:5 Changed 3 months ago by atagar

Hi wagon. What python interpreter version are you using?

because IP addresses in annotations @source are guard nodes your Tor uses

Gotcha. From what I understand those aren't particularly sensitive (guard status is public, and that a particular person had one in their pool isn't terribly interesting). That said, no harm in scrubbing it too. Stem doesn't care about those file annotations.

Last edited 3 months ago by atagar (previous) (diff)

comment:6 Changed 3 months ago by wagon

What python interpreter version are you using?

$ python --version
Python 2.7.9
$ python3 --version
Python 3.4.2

stem 1.6.0 and nyx 2.0.4 are listed in pip3 list, so I expect they use 3.4.2. However, that stable versions don't have the problem, while stem from git version has.

guard status is public, and that a particular person had one in their pool isn't terribly interesting

Nevertheless, Tor project hardly work to block all guard discovery attacks. This is the node which knows IP of Tor user. If guard is known it is very simple to deanonymize tor user.

comment:7 Changed 3 months ago by atagar

This is the node which knows IP of Tor user. If guard is known it is very simple to deanonymize tor user.

This is incorrect. Eavesdropping on a guard simply tells you that someone is using tor. Not where they're going. Deanonymization, say via a correlation attack, requires monitoring both your entry *and* exit traffic.

The strength of tor is this distribution of information. It's easy (albeit malicious and in many places illegal) to snoop on who uses tor by running a guard. It's also easy to figure out what tor is being used for by snooping on an exit. What tor makes difficult is deanonymizing you which requires both at the same time, which thanks to the size of the tor network is quite difficult.

comment:8 Changed 3 months ago by atagar

$ python --version

You misunderstand - my question was what python version are you using when you get this exact stacktrace. Do you get stacktraces with both python 2.7 and 3.4? If so then please provide the stacktrace you get for each since they should not be identical.

However, that stable versions don't have the problem, while stem from git version has.

Huh. Then I'm stumped. I just reviewed the code changes between 1.7.0 and the present git master. It does not include any changes to the interpreter, and because you're running a GETINFO command (rather than a descriptor fetching method) none of the descriptor parsing modules would be at play.

No doubt there's an encoding issue here but I'm having difficulty seeing how version 1.7.0 could succeed in this respect whereas the git codebase fails.

comment:9 Changed 3 months ago by wagon

You misunderstand - my question was what python version are you using when you get this exact stacktrace. Do you get stacktraces with both python 2.7 and 3.4? If so then please provide the stacktrace you get for each since they should not be identical.

OK, now it should be more clear. Old version, 1.6.0, which works fine, was installed using pip3:

$ pip3 show stem
---
Name: stem
Version: 1.6.0
Location: /usr/local/lib/python3.4/dist-packages
Requires:

Inside tor-prompt it reports itself as 3.4.2:

>>> import sys; print(sys.version)
3.4.2 (default, Sep 25 2018, 22:02:39)
[GCC 4.9.2]

A header of tor-prompt file is #!/usr/bin/python3.

Now, let's consider new version installed from git. In tor-prompt it says:

>>> import sys; print(sys.version)
2.7.9 (default, Sep 25 2018, 20:42:16)
[GCC 4.9.2]

A header of tor-prompt executable is #!/usr/bin/env python.

In my system both python2 and python3 are installed. Due to python in header git version fails. If I edit this file by replacing python by python3, the error disappears.

I think you should stick to python3 version in your git code if user has python3 installed. Just python should be used only as a fallback, when user doesn't have the third version. Thus, since everything works in python3, and python2 will not be supported someday anyway, you could fix headers and mark this ticket as resolved.

I'm having difficulty seeing how version 1.7.0 could succeed in this respect

It was 1.6.0. However, as I've just written, both version works with python3.

This is incorrect. Eavesdropping on a guard simply tells you that someone is using tor. Not where they're going. Deanonymization, say via a correlation attack, requires monitoring both your entry *and* exit traffic.

I look at this from simple theoretical point of view. Anonymity is indistinguishability of somebody (particular user) on so-called "anonymity set" (all tor users) for some outside observer. If this observer knows nothing about you except of "your are tor user" you get the best anonymity achievable in Tor network.

If you start logging to trac with a particular user name, you are no longer (ideally) anonymous for this observer, but pseudonymous. It means anybody can see what you are doing in Tor network, but nobody knows who you are. Pseudonymous users are less anonymous.

Then, if this observer knows even more information about you or your network connection, your "anonymity set" is reduced more. When nothing is known about you except your habit to use tor, you are one in 2 millions, so the anonymity set is big which makes it hard to geolocate you. If your guard is known, for powerful adversary your anonymity set is reduced to the number of tor users who selected particular guard node. It is about 1 thousand users. As you see, by disclosing your guard you reduce your anonymity set 2000 times which makes targeted correlation attacks simpler.

To get basic idea what is anonymity in network, look at definitions used for Chaum mixes and see how they were developed further in Tor routing protocol. First Tor papers with Roger, Paul, and Nick should be easy for you to read.

Last edited 3 months ago by wagon (previous) (diff)

comment:10 Changed 3 months ago by atagar

I think you should stick to python3 version in your git code

Ah, gotcha. The shebang does not differ between Stem 1.7.0 and the current master branch...

https://gitweb.torproject.org/stem.git/tree/tor-prompt?h=1.7.0
https://gitweb.torproject.org/stem.git/tree/tor-prompt?h=master

The shebang probably differs due to how you installed it. If installed via pip then iirc it rewrites shebangs to use the interpreter that you installed through.

Anywho, mystery solved. So this problem is with python 2.7 and not 3.4. I'll take another look in light of that later.

comment:11 Changed 3 months ago by wagon

The shebang does not differ between Stem 1.7.0 and the current master branch...

Yes. The same is true for 1.6.0. All of them have #!/usr/bin/env python as shebang.

The shebang probably differs due to how you installed it. If installed via pip then iirc it rewrites shebangs to use the interpreter that you installed through.

It seems to be true. I installed my 1.6.0 using pip3, so shebang was rewritten to use python3 explicitly. However, when I followed your recommendations and directly used git, I didn't get this shebang rewritten, so it sticks to python2:

$ file /usr/bin/python
/usr/bin/python: symbolic link to python2.7

I was confused by whereis ordering which lists python3 versions first:

$ whereis python
python:
/usr/bin/python3.4m
/usr/bin/python3.4
/usr/bin/python
/usr/bin/python2.7
/usr/lib/python3.4
/usr/lib/python2.6
/usr/lib/python2.7
/etc/python3.4
/etc/python
/etc/python2.7
/usr/local/lib/python3.4
/usr/local/lib/python2.7
/usr/include/python2.7
/usr/share/python
/usr/share/man/man1/python.1.gz

Nevertheless, as symlink says, python2 is used by default.

Note: See TracTickets for help on using tickets.