Opened 7 months ago

Last modified 5 months ago

#28877 assigned enhancement

'GETINFO desc/fingerprints' command to get known relays

Reported by: wagon Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points: 2
Reviewer: Sponsor:

Description

If UseMicrodescriptors is set to 0 in Tor client, the command GETINFO desc/all-recent returns very huge listing which interpreter cannot manage properly. If amount of memory is big, after finishing this command Nyx starts to response slowly on any keyboard input everywhere and in all its windows, not only in interpreter window (restart of Nyx helps). If amount of resources is not that big, it may crash with the following log (I tired to kill it, because my operating system hanged):

Traceback (most recent call last):
  File "/path/to/nyx_git/nyx/run_nyx", line 14, in <module>
    nyx.main()
  File "/path/to/nyx_git/nyx/nyx/__init__.py", line 177, in main
    nyx.starter.main()
  File "/path/to/nyx_git/nyx/stem/util/conf.py", line 289, in wrapped
    return func(*args, config = config, **kwargs)
  File "/path/to/nyx_git/nyx/nyx/starter.py", line 122, in main
    nyx.curses.halt()
  File "/path/to/nyx_git/nyx/nyx/curses.py", line 565, in halt
    with CURSES_LOCK:
  File "/usr/lib/python2.7/threading.py", line 168, in acquire
    me = _get_ident()
KeyboardInterrupt
Exception in thread Event notifier (most likely raised during interpreter shutdown):Exception in thread Tor listener (most likely raised during interpreter shutdown):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner

  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
   File "/usr/lib/python2.7/threading.py", line 763, in run
  File "/path/to/nyx_git/nyx/stem/control.py", line 978, in _event_loop  File "/usr/lib/python2.7/threading.py", line 763, in run
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'Empty'
  File "/path/to/nyx_git/nyx/stem/control.py", line 947, in _reader_loop
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'ControllerError'

Nyx version is this one.

Child Tickets

Change History (14)

comment:1 Changed 7 months ago by atagar

Component: Core Tor/NyxCore Tor/Tor
Summary: Command desc/all-recent slows down or craches NyxPaginate large controller commands like 'GETINFO desc/all-recent'

Hi wagon, I'm gonna toss this over to tor and repurpose it to be for paginated GETINFO commands. You're right that commands like 'GETINFO desc/all-recent' provide megabytes of data.

These GETINFO commands were created in tor's early days when the consensus was pretty small, but nowadays the response of this method is massive. The usual way of dealing with this is to have a paginated API where the caller can provide a starting index and number of responses. This would let Nyx fetch smaller batches without hosing the control connection.

Unfortunately without that there's not much I can do on my end.

comment:2 Changed 7 months ago by wagon

The usual way of dealing with this is to have a paginated API where the caller can provide a starting index and number of responses. This would let Nyx fetch smaller batches without hosing the control connection.

I'm not sure I understand what you mean. There are other commands which return descriptors by id or by name like desc/id/FINGERPRINT, output of these commands is small. However, a purpose of desc/all-recent is to get the full content of the file with descriptors, so how it can be "paginated"? Do you mean some internal TCP transport level?

Unfortunately without that there's not much I can do on my end.

Maybe you can. I started from finding a source of this problem. Initially I thought it may be a Tor side. I checked your instructions for telnet, but telnet works fine with desc/all-recent when invoked from interactive shell. It returns the whole output without any problem. The same is true for netcat and socat if they are used to connect to ControlPort.

However, if any of these tools are not used interactively, but are called inside some wrappers or functions, they all fail. They return some part of desc/all-recent output (each time it is a different part) after which the connection hangs. I looked for a way to solve this problem and found it. Probably, so powerful language as python also has some way to solve this problem.

Your tor-prompt works absolutely fine with desc/all-recent. Only Nyx has this problem. I think it is related to curses interface, i.e. you may need to do pagination inside Nyx to not overload curses API.

Last edited 7 months ago by wagon (previous) (diff)

comment:3 Changed 7 months ago by atagar

so how it can be "paginated"?

Hi wagon. A paginated API is one in which you can receive batches of a limited size. The caller then makes a series of calls to get the full listing. Ignoring the control interface for a minute paginated interfaces look like...

First Request: {
  first_index: 0,
  size: 5,
}

First Response: [
  <descriptor 1>,
  <descriptor 2>,
  <descriptor 3>,
  <descriptor 4>,
  <descriptor 5>,
]

Second Request: {
  first_index: 5,
  size: 5,
}

Second Response: [
  <descriptor 6>,
  <descriptor 7>,
]

The caller sees its second request received only two descriptors (rather than the five requested) so it knows it has received them all.

The reason for a paginated API is to divide the fourteen megabyte GETINFO response controllers presently get into a series of bite sized responses. A massive GETINFO response like this saturates the control connection, preventing further commands and events from being transmitted.

In fact, Nyx used to avoid commands like 'GETINFO ns/all' entirely in favor of reading cached descriptors from tor's data directory. This was far faster and avoids blocking the control socket (effectively all the command does is echo the file), but I've been cautioned that any direct use of tor's data directory is a bad idea.

I'm not overly married to the idea of a paginated API. I'd be delighted to chat with the network team about design ideas, but first step lets be clear about the problem we're trying to address: controllers need the ability to break up multi-megabyte responses into smaller replies so we avoid saturating the control connection.

Pagination might be a poor fit. In particular...

  • GETINFO commands are not designed to take keyword arguments. We could hack this together with positional arguments (GETINFO desc/batch/0/5 then GETINFO desc/batch/5/5 for the example above), but needless to say... ick.
  • Concurrency. If tor downloads new descriptors while we're in the middle of iterating over it's prior ones the caller will conclude with an incorrect enumeration. Usually this would be dealt with by a consensus id argument so the caller can specify the set of descriptors its iterating over but this isn't really how tor is designed.

So TL;DR: We need some way of breaking up these responses. Pagination probably isn't a good fit so ideas welcome.

Maybe you can. I started from finding a source of this problem.

I suspect we're talking about two different things. The above is a long time problem I've had with tor's 'get all descriptor' commands ('GETINFO desc/all-recent', 'GETINFO md/all', and 'GETINFO ns/all'). I'm tackling that topic in this ticket because that's the problem you cited originally ("GETINFO desc/all-recent returns very huge listing which interpreter cannot manage properly.").

Nyx actually avoids making that particular query because doing so would temporarily hose the control connection in the way you describe. I just took a look and unless I'm missing something I'm not finding any 'GETINFO desc/all-recent' calls in nyx.

I suspect your initial hypothesis about the reason Nyx is freezing is inaccurate. Feel free to file a separate ticket with the 'nyx --debug' output when Nyx freezes so I can see what's up. But I'd like to hijack this ticket to brainstorm our long term plan for these bulky GETINFO commands since they've been a long time pain point for me.

comment:4 Changed 7 months ago by atagar

Status: assignednew

comment:5 Changed 7 months ago by atagar

Owner: atagar deleted
Status: newassigned

comment:6 Changed 7 months ago by wagon

Hello, atagar. Thank you for the explanation, now it is more clear for me.

saturates the control connection, preventing further commands and events from being transmitted.

What it actually saturates? Indeed, in some setups I did see this problem, but tor-prompt doesn't have it. Is it some internal TCP or socket thing that is saturated? Can that limit be increased by setting some preference in the system?

but I've been cautioned that any direct use of tor's data directory is a bad idea.

I agree. Furthermore, you cannot use tor's data directory if you are not running Nyx as debian-tor user. If your user is just a member of debian-tor group (this is recommended setup, see discussion in #25890), you still cannot read files in that directory because of #28356.

We need some way of breaking up these responses. Pagination probably isn't a good fit so ideas welcome.

The above is a long time problem I've had with tor's 'get all descriptor' commands

Could you explain why it is a problem? Yes, it may block controller for a few seconds if user type this command in Nyx interpreter, but only in this case. This blocking would happen anyway because printing the output to a terminal always requires some time. If you need this output internally in some functions in Stem or Nyx, you can get it almost immediately. In addition you don't need to run such commands very often. Maybe you need to run them only once, when Nyx is started (then response can be cached for some time).

To be clear, let us check exact numbers. To be sure we are not wrong here because of specifics in implementation of Nyx or Stem, we can use the following simple script test.sh for these tests:

#!/bin/bash -e

cmd="$@"
pass="ControlPortPassword"

function test_tor() {
    echo "$1" >&3
    sed "/^250 OK\r$/q" <&3
    echo QUIT >&3
    exec 3<&-
}

exec 3<>/dev/tcp/127.0.0.1/9051
echo AUTHENTICATE \"$pass\" >&3
read -u 3
test_tor "$cmd"

Now fun begins. Let us check some simple fast command:

$ time tor-prompt --run 'GETINFO version' 1>/dev/null
  0.14s user 0.04s system 77% cpu 0.227 total
$ time ./test.sh 'GETINFO version' 1>/dev/null
  0.00s user 0.00s system 36% cpu 0.011 total

Compare it with more heavy output such as ns/all:

$ time tor-prompt --run 'GETINFO ns/all' 1>/dev/null
  0.28s user 0.07s system 58% cpu 0.591 total
$ time ./test.sh 'GETINFO ns/all' 1>/dev/null
  0.02s user 0.00s system  8% cpu 0.230 total

Now check it with the most resource consuming command, desc/all-recent:

$ time tor-prompt --run 'GETINFO desc/all-recent' 1>/dev/null
Traceback (most recent call last):
  File "/path/to/stem/tor-prompt", line 8, in <module>
    stem.interpreter.main()
  File "/path/to/stem/stem/interpreter/__init__.py", line 151, in main
    interpreter.run_command(args.run_cmd, print_response = True)
  File "/path/to/stem/stem/util/conf.py", line 289, in wrapped
    return func(*args, config = config, **kwargs)
  File "/path/to/stem/stem/interpreter/commands.py", line 381, in run_command
    print(output)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u021b' in position 1237805: ordinal not in range(128)

$ time ./test.sh 'GETINFO desc/all-recent' 1>/dev/null
  0.21s user 0.03s system 60% cpu 0.391 total

People always say that shell is too slow and inconvenient, while python is really fast and convenient. However, as you see, simple shell is 20-30 times faster than python on simple commands, about 2-3 times faster than python on heavy commands, and gets many megabytes output of desc/all-recent within less that half of second. Old UNIX style still rocks. Had you choose Bash or some other shell instead of python for writing Stem?

If I don't redirect output to /dev/null, terminal becomes a bottleneck for these commands. It will take about 5-6 seconds to print decs/all-recent output to a terminal. Nyx interpreter is even more slow than tor-prompt (I think it is due to curses), it will take 15 seconds or like that for this printing. If I don't redirect output to /dev/null, the last tor-prompt command doesn't crash.

Thus, I still doubt we need to ask tor network people to do something with it. As you see, well written tools work fast with current ControlPort implementation.

Feel free to file a separate ticket with the 'nyx --debug' output when Nyx freezes so I can see what's up.

I've created #28902. Now we have many tickets where I didn't get reply from you yet, and our discussion stopped, so I'm afraid I overload you with many tasks or questions.

UPDATE: You deleted yourself from this ticket, but it still has an assigned state, how it can be?

Last edited 7 months ago by wagon (previous) (diff)

comment:7 Changed 7 months ago by atagar

Hello, atagar. Thank you for the explanation, now it is more clear for me.

Thanks wagon! These are interesting stats.

Could you explain why it is a problem?

Do you have a SSD? When I run on my newish laptop I get similar results...

atagar@morrigan:~$ time tor-prompt --run 'GETINFO ns/all' 1>/dev/null

real	0m0.236s
user	0m0.108s
sys	0m0.045s

... but when I run on my admittedly ancient PC it's not so rosy...

atagar@odin:~$ time tor-prompt --run 'GETINFO ns/all' 1>/dev/null

real    0m1.466s
user    0m0.240s
sys     0m0.116s

I suspect this might be a matter of disk iops. Again, this command is reading fourteen megabytes of data from disk then dumping it on the socket. Stem (and by extension Nyx) get ran in low resource environments (arduino and such) where such commands can easily block the control connection for tens of seconds to minutes.

You're absolutely right that in your case (and most people's) these are fine, but these commands are no-gos if distributing an application more broadly.

People always say that shell is too slow and inconvenient, while python is really fast and convenient. However...

These are interesting numbers. I should profile our controller code with this input to figure out where the time's being spent (I made several optimizations for the Stem 1.7 release, but there's probably more room for improvement).

comment:8 Changed 7 months ago by wagon

Do you have a SSD?

I suspect this might be a matter of disk iops.

No, I don't have SSD. However, disk cache should work (further invocations of the same commands are faster than the first ones, I think it is because of disk cache). Now I cannot make an ideal test with the whole system residing in RAM, but I copied Nyx/Stem directory to /tmp (which resides in RAM) and compared it with my above initial tests. I didn't find any difference. Shell is still at least two times faster than tor-prompt.

Again, this command is reading fourteen megabytes of data from disk then dumping it on the socket.

I thought this data is read from memory of tor process, i.e. from RAM. Are you sure Tor makes request to hard drive each time it needs to provide this data?

get ran in low resource environments (arduino and such) where such commands can easily block the control connection for tens of seconds to minutes.

Do you think it also explains the crash of tor-prompt --run 'GETINFO desc/all-recent >/dev/null command?

If pipe is used, shell will also break (this is the reason I had to use descriptors in test.sh). For example, success of the command

( echo AUTHENTICATE \"pass\" ; echo GETINFO ns/all ; echo QUIT ) | nc 127.0.0.1 9051

depends on the time of invocation, existence of redirection to some file or to /dev/null, weather, and so on. If I redirect the output of this command to some file in RAM, I see that it stops after roughly 20k lines are printed. However, it works nice with short output commands.

these commands are no-gos if distributing an application more broadly.

I agree. Let us wait what Tor core people will say.

comment:9 Changed 7 months ago by atagar

Summary: Paginate large controller commands like 'GETINFO desc/all-recent''GETINFO desc/fingerprints' command to get known relays

Are you sure Tor makes request to hard drive each time it needs to provide this data?

I could be wrong on that detail. I'd be a little surprised if tor kept the full server descriptors in memory since the process does not need the full documents and doing so would bloat tor's memory profile but I didn't check what the C codebase does, nor does it honestly matter for this ticket.

I need the ability to iterate over all descriptors without getting such a large response. That said, I just thought of a cheap option that's far simpler than pagination: add a 'GETINFO desc/fingerprints' method that provides the all server descirptor fingerprints tor knows about.

I can call that to get an enumeration of the relays tor has descriptors for, then query them one at a time to break up our fourteen megabyte response. I suspect we'd want separate methods for other descriptor types ('GETINFO md/fingerprints' and 'GETINFO ns/fingerprints') because the relays we have information for are not necessarily the same between document types.

Do you think it also explains the crash of tor-prompt --run 'GETINFO desc/all-recent >/dev/null command?

No, it doesn't. Please file a separate ticket about that so we can keep this ticket narrowly focused. I suspect the issue is that you're using python3, and that tor-prompt is using print() which expect unicode. Server descriptors can have non-ascii content on contact lines which can cause the stacktrace you cited above.

I probably need to add some escaping within tor-prompt.

comment:10 in reply to:  9 ; Changed 7 months ago by teor

Milestone: Tor: unspecified
Points: 2
Type: defectenhancement

Replying to atagar:

I need the ability to iterate over all descriptors without getting such a large response. That said, I just thought of a cheap option that's far simpler than pagination: add a 'GETINFO desc/fingerprints' method that provides the all server descirptor fingerprints tor knows about.

I can call that to get an enumeration of the relays tor has descriptors for, then query them one at a time to break up our fourteen megabyte response. I suspect we'd want separate methods for other descriptor types ('GETINFO md/fingerprints' and 'GETINFO ns/fingerprints') because the relays we have information for are not necessarily the same between document types.

Here's a simple, generic scheme that we could implement:

If path/to/* takes a key, then path/to returns a list of known keys.

For example:
GETINFO desc/id/<OR identity> is an existing command.
We add GETINFO desc/id, which returns a list of known <OR identity> for GETINFO desc/id/<OR identity>.

Controllers should be prepared to handle keys that disappear before they are queried, for example, due to a consensus update.

This code should be relatively easy to implement in Tor.

comment:11 in reply to:  9 Changed 7 months ago by wagon

Replying to atagar:

I could be wrong on that detail. I'd be a little surprised if tor kept the full server descriptors in memory since the process does not need the full documents and doing so would bloat tor's memory profile

teor says some of these commands read it from the disk.

I need the ability to iterate over all descriptors without getting such a large response.

Here we may confuse two different things. There are some big outputs which are not lists of fingerprint-indexed small outputs. For example, an output of the command GETINFO dir/status-vote/current/consensus has some header and footer which are general for the whole list. Should such headers and footers be printed together with fingerprint lists for these types of commands?

Please file a separate ticket about that so we can keep this ticket narrowly focused.

#28921 is created.

comment:12 in reply to:  10 Changed 7 months ago by wagon

Replying to teor:

Here's a simple, generic scheme that we could implement:

If path/to/* takes a key, then path/to returns a list of known keys.

The command GETINFO downloads/desc/descs makes it (however, it will not work if client uses microdescriptors). Maybe you also mean that this command returns some general list, while here we need per-consensus lists.

I see that in some moments some nodes may be in ns/all list, but not in dir/status-vote/current/consensus list.

Last edited 7 months ago by wagon (previous) (diff)

comment:13 Changed 7 months ago by wagon

Do you think it also explains the crash of tor-prompt --run 'GETINFO desc/all-recent' >/dev/null command?

When shebang is fixed according to suggestions in #28921, execution time for desc/all-recent is almost 3 times higher:

$ time tor-prompt --run 'GETINFO desc/all-recent' 1>/dev/null
1.60s user 0.47s system 91% cpu 2.269 total
$ time ./test.sh 'GETINFO desc/all-recent' 1>/dev/null
0.65s user 0.02s system 82% cpu 0.802 total
Last edited 7 months ago by wagon (previous) (diff)

comment:14 Changed 5 months ago by wagon

as you see, simple shell is 20-30 times faster than python on simple commands, about 2-3 times faster than python on heavy commands

Here I copy relevant atagar comment from another thread:

On a side note, were you the person that pointed out that tor-prompt is a bit slow? If so then I just improved that a bit...
https://gitweb.torproject.org/stem.git/commit/?id=0fc61dd

UPDATE: I'll wait for new Stem release to test it. Shell version is also faster, because simple password authentication is faster than safecookie authentication, where, in addition, server reply is also checked at client side.

Last edited 5 months ago by wagon (previous) (diff)
Note: See TracTickets for help on using tickets.