Opened 12 months ago

Closed 8 months ago

#28181 closed enhancement (fixed)

spec: Add to pt-spec.txt control messages going back to main process (tor)

Reported by: dgoulet Owned by: dgoulet
Priority: Medium Milestone: Tor: 0.4.0.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-spec, tor-pt, 040-roadmap-subtask, postfreeze-ok
Cc: dcf, ahf Actual Points:
Parent ID: Points: 1
Reviewer: ahf Sponsor: Sponsor19

Description

As part of #28180, we want to have a PT to report back events, mainly connection to the bridge events for now, to the parent process which is tor in our case.

This ticket is for adding those to pt-spec.txt.

We'll concentrate on connection messages reporting the status of a connection to a Bridge at first to narrow down a bit this work.

Child Tickets

Change History (34)

comment:1 Changed 12 months ago by dgoulet

Keywords: 036-roadmap-subtask added

comment:2 Changed 12 months ago by dgoulet

Owner: set to dgoulet
Status: newassigned

comment:3 Changed 12 months ago by dcf

Cc: dcf added

My desideratum for this is a way to report a generic error at any time, before or after bootstrapping, and have the error message appear in the tor log.

A problem with pt-spec as it stands is that when a transport plugin encounters a fatal runtime error, the most it can do is write to its own log file (e.g. meek-client.log or obfs4proxy.log). The tor log only reports an unspecific message: "The communication stream of managed proxy '%s' is '%s'. Most probably the managed proxy stopped running. This might be a bug of the managed proxy, a bug of Tor, or a misconfiguration. Please enable logging on your managed proxy and check the logs for errors."

This sucks for debugging because in order to get the actual error message, the user has to hack a torrc file to enable a PT-specific debug log, run Tor Browser again, let it fail again, and then inspect the PT-specific log file. If transport plugins had a way to cause a message to appear in the tor log, the user could just "Copy Tor Log to Clipboard" in Tor Launcher.

The poor error reporting is so bad for usability that we have applied some workarounds to make sure that early errors get reported with CMETHOD-ERROR or SMETHOD-ERROR, because those error strings go to the tor log. For example here, we use CMETHOD-ERROR to report a failure to open the log file, even though we haven't yet gotten any CMETHOD lines from the parent process. Otherwise failure to open the log file is almost impossible to diagnose!

I wrote about this previously on the traffic-obf list. Quoting here:

I ran into an issue that isn't well served by the Tor pt-spec. A user's meek-client was failing and not even leaving a log. The Tor log said only "the communications stream is closed." It turns out that they didn't have permission to write to the log file they had configured, so meek-client was failing very early. meek-client was writing an error message to stderr, which was ignored by Tor. This made it hard to identify the cause of the problem.

https://bugs.torproject.org/20451

The pt-spec offers only these ways for a transport plugin to report an error:

  • VERSION-ERROR
  • ENV-ERROR
  • CMETHOD-ERROR
  • SMETHOD-ERROR
  • PROXY-ERROR

None of these is really appropriate for this situation. I discovered that you can (mis)use CMETHOD-ERROR, SMETHOD-ERROR, or PROXY-ERROR and then Tor will record the error in its log, even if Tor didn't request a corresponding CMETHOD, SMETHOD, or proxy.

I can imagine that the same thing could happen for other reasons, for example bad command line syntax, or some other error could happen even once connections are established that the transport plugin would want to report back to its parent process.

Three years ago we had a discussion (and a patch that was never merged) to treat anything written to stderr as an error message (currently tor ignores the stderr of transport plugins):

https://bugs.torproject.org/9957

obfsproxy may complain about some things (e.g. it not being able to write to its own log file) over stderr. If one runs obfsproxy as intended (using the ServerTransportPlugin directive in torrc), obfsproxy may exit (Tor will report this, of course) without any verbal explanation.

The question came up again when we were discussing what to do if a transport plugin can't create a directory at startup.

https://bugs.torproject.org/12088#comment:9

About error reporting: suppose your transport needs to create its own subdirectories in the state dir. You call pt.MakeStateDir, then you manually create the directories inside it. Suppose the creation of one of these directories fails. Is that an ENV-ERROR? If not, then a failure to create the state dir itself probably should also not be an ENV-ERROR. (Except to the extent that we abuse ENV-ERROR as a general-purpose error logging function.)

CMETHOD-ERROR/SMETHOD-ERROR could be a good way to report these kinds of generic errors that happen at startup. "CMETHOD-ERROR foo failed to create state directory." The only thing is I bet application authors want to handle all the state dir stuff once at the top of their program, before they enter the CMETHOD processing loop. It kind of makes me wish for a GENERIC-ERROR that could appear at any time during negotiation.

comment:4 Changed 12 months ago by dcf

dgoulet posted some info here:
https://lists.torproject.org/pipermail/tor-dev/2018-October/013512.html

Below is the link to the PT spec patch. It adds the EVENT message that for now is only used, as you will see, for reporting connection status message. However, you should see this EVENT message as extendable to whatever the PT would like to report to Tor that we can think of in the future:
https://gitweb.torproject.org/user/dgoulet/torspec.git/commit/?h=ticket28181_01&id=0814114fb39f9f7ddf16de35f97092ed74ffd1ee

3.3.4.1 CONNECTION type.
CONNECTION <transport> <address> <port>
CONNECTION-ERROR <transport> <ErrorType> [<ErrorMessage>]
CONNECTION-SUCCESS <transport>

My concern with this design is that it only has in mind BridgeDB-like transports that work using a single connection to a single IP address. IMO it's already a misdesign that the Bridge line requires an IP address and port, forcing transport like meek to use a dummy address purely for syntactical reasons, which leads to other problems (#18611). Imagine a transport that works by tunneling through UDP DNS requests: what "connection" can you report? What IP address and port are you "connected" to, your local DNS resolver? Or what about a transport that muxes across multiple parallel TCP connections? IMO future transports are increasingly going to move away from the "one static host, one TCP connection" model.

In the case of meek, there's no single connection nor specific IP address, because it's a sequence of HTTPS requests to a server identified by domain name. When a PT PROXY is in use, meek-client may not even know the IP address of the server. We could probably apply come logical hacks and try and fit messages into this reporting model, for instance by immediately reporting CONNECTION meek 0.0.0.3 1 before even touching the network, then waiting for the first HTTPS request and reporting CONNECTION-ERROR if it fails and CONNECTION-SUCCESS if it succeeds. But what if the first HTTPS connection succeeds and the second fails (perhaps due to a temporary server 500 error)? We can't send CONNECTION-ERROR after sending CONNECTION-SUCCESS.

comment:5 Changed 12 months ago by dgoulet

Thanks dcf1! That is very good feedback.

What if instead we turn this around and keep the EVENT message idea so Tor can already start expecting such communication and future impl/spec will be able to use that to send anything back to Tor.

But, we drop that CONNECTION event type that clearly doesn't work with all type of transport. Instead, we could have a simple CONNECTIVITY type which could report a binary state back to Tor saying "Ok I'm able to route traffic now as I connected to the endpoint and negotiated?".

And we can also think that it could have an optional "ErrorMessage" in case it is a bad connectivity?

Again, the goal here is to be able to make the PT tell Tor so Tor can tell the control port so ultimately Tor Browser can inform the user. Nothing more complicated for now.

comment:6 in reply to:  5 Changed 12 months ago by dcf

Thanks for working on this.

Replying to dgoulet:

What if instead we turn this around and keep the EVENT message idea so Tor can already start expecting such communication and future impl/spec will be able to use that to send anything back to Tor.

But, we drop that CONNECTION event type that clearly doesn't work with all type of transport. Instead, we could have a simple CONNECTIVITY type which could report a binary state back to Tor saying "Ok I'm able to route traffic now as I connected to the endpoint and negotiated?".

That's simpler and will work with more kinds of transports, but I'm afraid that some transports will just report CONNECTIVITY OK before doing anything, simply because they may not have anything to do until tor has given them something to send. Like in the case of meek, it doesn't try sending an HTTP request until it has some data to put in it. Potentially we could open the TCP connection to the web server and hold it idle until some data is available, but that would require breaking through a lot of HTTP library abstractions. In this case, we could try immediately sending an HTTP request with an empty body as a connectivity check, but in general that won't always work (and you would have to think about fingerprinting).

Wouldn't this CONNECTIVITY message have the same semantics as the existing SOCKS success/failure messages? Those messages, too, make sense for connection-oriented transports like obfs4, but others can only send back a meaningless "success" message before even doing anything. E.g., obfs4proxy, because it is connection-oriented, receives the SOCKS request, tries the TCP connection, then reports a SOCKS success/failure depending on the state of the TCP connection.
https://gitweb.torproject.org/pluggable-transports/obfs4.git/tree/obfs4proxy/obfs4proxy.go?id=8256fac93c2cf79742725e3aaced5bbe3380fd32#n145
Whereas meek simply reports SOCKS success unconditionally, because it can't do anything until tor has given it some data, and tor won't give it any data until it has reported SOCKS success.
https://gitweb.torproject.org/pluggable-transports/meek.git/tree/meek-client/meek-client.go?id=8a5c6a9cdc4dc37ba77bb041ee48a4320689cc9d#n279

And we can also think that it could have an optional "ErrorMessage" in case it is a bad connectivity?

Again, the goal here is to be able to make the PT tell Tor so Tor can tell the control port so ultimately Tor Browser can inform the user. Nothing more complicated for now.

It would help if I understood what the purpose is from the tor side. What I imagine is that you want intermediate PT connection progress steps to be shown under the Tor Launcher status bar, i.e., some intermediate statuses between BOOTSTRAP_STATUS_STARTING and BOOTSTRAP_STATUS_CONN_DIR. I guess I could be wrong about that.

How about this: transport plugins can send an EVENT INFO <msg> message that causes <msg> to appear in a user-visible status display, like Tor Launcher's status bar. And an EVENT ERROR <msg> that causes <msg> to appear in the user-visible display and in the tor log. Then PTs could report various kinds of messages, like

  • Establishing TCP connection
  • Registering with Snowflake facilitator
  • Connected to Snowflake proxy; currently have X out of Y desired
  • Connection lost to XXXX

I guess there are localization problems if transports can report any free-form text; I'm not sure how tor currently deals with messages like "Establishing an encrypted directory connection".

comment:7 Changed 12 months ago by dgoulet

Thanks dcf1! Lets again turn this around, what do you think about:

LOG <level> <msg>

The idea is for a PT (optional) to be able to report back logging messages to the main process. In the case of Tor/TB, we would use those, write them to our logs at the <level> and send them to the control port.

Then TB can collect the control port messages, decide which one to show the user. It would also be able to gather all the PT logs (from control port or tor logs) and use them for debugging or error report or what not.

comment:8 in reply to:  7 Changed 12 months ago by dcf

Replying to dgoulet:

Thanks dcf1! Lets again turn this around, what do you think about:

LOG <level> <msg>

The idea is for a PT (optional) to be able to report back logging messages to the main process. In the case of Tor/TB, we would use those, write them to our logs at the <level> and send them to the control port.

Yes! LOG is exactly what I've wanted for pluggable transports for a long time. IMO it's the #1 missing feature. I was disappointed when PT 2.0 had nothing to say on the subject, and kind of quit hoping after that.

A loglevel is a bit of a tor-ism, so you should check with non-tor PT implementers to see if they are happy with it. obfs4proxy already has a -logLevel option with values ERROR/WARN/INFO/DEBUG, so you may want to try enabling those options and looking at the log file, as a preview of the kind of messages tor will be receiving.

LOG is great for PT developers; but I fear that's not what this ticket was originally about, which is providing some kind of user interaction feedback during connection establishment. If I'm correct that tor is currently not reporting {SOCKS connection, SOCKS success, SOCKS failure} events through the control port, I think that would be a good place to start, because all PTs already implement that and it seems equivalent to what you proposed in comment:5.

One thing to keep in mind: PT stdout messages are per-process, not per-bridge or per-connection. It may not be the IPC mechanism you're looking for, depending on what you need to accomplish. It's common, for example, to run one client instance of obfs4proxy with multiple bridges simultaneously (with multiple different transports even). If you want log messages about the connection status of just one bridge, LOG won't do that (unless you force the messages to contain the remote IP and port, a bad idea IMO). LOG is only good for global status messages like "cannot open log file" or "panic in subroutine xyz", or for debugging where you don't mind inferring connection info manually. For per-connection status you're better off using SOCKS events.

comment:9 Changed 12 months ago by ahf

Cc: ahf added

comment:10 Changed 12 months ago by dgoulet

Then we should simply go with LOG <msg> and ONLY in the PT 1.0 spec for now... I would prefer the PT 2.0 community to cherry pick this if they want and maybe improve but at least have a better discussion/analysis of this feature.

It seems, without much more work, we can't really come up with a simple format that will apply to all current and future PTs. And, as you mention, the level is quite "Tor specific" and we (ahf and I) think it is a bad idea to require an PT to have knowledge of the tor's log level.

The <msg> should be PT specific that is the format is decided by the PT itself like obfs4 might decide to have one format and meek another.

In other words, for logging (which is what LOG is), Tor shouldn't care about the content of <msg> and simply log it at INFO level and send it to the control port with something like PT_LOG <transport> <msg> event (Tor knows what is the transport internally).

I believe you are also right that we are not fixing the "connection issue" and thus should go with the SOCKS event instead.

comment:11 Changed 11 months ago by nickm

Milestone: Tor: 0.3.6.x-finalTor: 0.4.0.x-final

Tor 0.3.6.x has been renamed to 0.4.0.x.

comment:12 Changed 11 months ago by teor

Keywords: 040-roadmap-subtask added; 036-roadmap-subtask removed

0.3.6 is now 0.4.0: changing keywords

comment:13 Changed 11 months ago by dgoulet

Status: assignedneeds_review

After implementation done in #28179, here is the updated branch:

Branch: ticket28181_02

comment:14 in reply to:  13 Changed 11 months ago by dcf

Replying to dgoulet:

After implementation done in #28179, here is the updated branch:

Branch: ticket28181_02

https://gitweb.torproject.org/user/dgoulet/torspec.git/commit/?h=ticket28181_02&id=cf2c60e223391c2e6360c24612638b63fd1951e0

"This message is for a client PT..." Instead of "client PT" you should use the term "PT Proxy". A PT Proxy may be running in either client mode or server mode, and I believe that LOG is meant to be used by both.

comment:15 Changed 11 months ago by dgoulet

I've pushed a fixup commit to mean either client or server.

comment:16 Changed 11 months ago by dgoulet

Reviewer: ahf

comment:17 Changed 10 months ago by ahf

Status: needs_reviewneeds_revision

I think it looks good with one small detail: we write "Client" a couple of places. This changes work for both PT clients and servers.

comment:18 in reply to:  17 Changed 10 months ago by dgoulet

Replying to ahf:

I think it looks good with one small detail: we write "Client" a couple of places. This changes work for both PT clients and servers.

Where do you see this "Client" mention? The section starts with this and that is the only mentions of "client":

This message is for a client or server PT ...

comment:19 Changed 9 months ago by dgoulet

Status: needs_revisionneeds_review

Branch: ticket28181_04

One commit that adds the LOG and STATUS spec rebased on latest master.

comment:20 in reply to:  19 ; Changed 9 months ago by dcf

The quoting rules (String or QuotedString) are fine with me. That was my biggest question at comment:2:ticket:28940. I might even suggest that you make it easier on implementers by not allowing String, only QuotedString (reduces code complexity on the parsing side especially).

The format example has a , but the syntax doesn't actually use ,:

STATUS TYPE=Type <K_1>=<V_1> [<K_2>=<V_2>, ...]

I would like to better understand the reason for STATUS. What should a PT expect the managing process (tor) to do when it receives a STATUS message? What would motivate a PT to send such a message? Or is the feature unspecified deliberately, on the assumption that we will need it for something someday, or that a standard will emerge among developers?

I suppose STATUS is meant to fill the desire for a more "machine-readable" type of message. If that's the case, then I would suggest some changes to the examples, which are currently:

STATUS TYPE=obfs4 CONNECT=Success
STATUS TYPE=obfs4 CONNECT=Failed BRIDGE=<Fingerprint>
STATUS TYPE="some name" PROGRESS="Getting to 42%"

My suggestions are:

STATUS TRANSPORT=obfs4 ADDRESS=198.51.100.123:1234 CONNECT=Success
STATUS TRANSPORT=obfs4 ADDRESS=198.51.100.222:2222 CONNECT=Failed FINGERPRINT=<Fingerprint> ERRSTR="Connection refused"
STATUS TRANSPORT=trebuchet ADDRESS=198.51.100.15:443 PERCENT=42

My reasoning:

  • Use TRANSPORT instead of TYPE to match the terminology in pt-spec.txt. METHOD would also work.
  • To specific a single bridge connection attempt (for success/failure reporting), you need to give both the TRANSPORT and the ADDRESS. If you give just the TRANSPORT there is ambiguity among the many bridges of that transport that may be configured. Just a fingerprint is no good, because a fingerprint is optional on a Bridge line, and if we failed to connect we may not know it.
  • Don't use "some name" as an example of a transport name, because that doesn't match "PT names MUST be valid C identifiers". I added ERRSTR for the sake of showing a quoted string.
  • If aiming for machine readability, prefer 42 to "Getting to 42%".

comment:21 in reply to:  20 Changed 9 months ago by dcf

Replying to dcf:

The quoting rules (String or QuotedString) are fine with me. That was my biggest question at comment:2:ticket:28940. I might even suggest that you make it easier on implementers by not allowing String, only QuotedString (reduces code complexity on the parsing side especially).

Oh, but I just remembered, there's the additional constraint of the PT meta-syntax. It would make lots of strings inexpressible using QuotedString. See §3.3:

<Line> ::= <Keyword> <OptArgs> <NL>
<Keyword> ::= <KeywordChar> | <Keyword> <KeywordChar>
<KeywordChar> ::= <any US-ASCII alphanumeric, dash, and underscore>
<OptArgs> ::= <Args>*
<Args> ::= <SP> <ArgChar> | <Args> <ArgChar>
<ArgChar> ::= <any US-ASCII character but NUL or NL>
<SP> ::= <US-ASCII whitespace symbol (32)>
<NL> ::= <US-ASCII newline (line feed) character (10)>

Because in QuotedString, all 8-bit characters are permitted unless explicitly disallowed and a backslash followed by any character represents that character, it wouldn't be possible to represent strings that contain byte values ≥ 0x80, for example.

I think it's important that any byte sequence be representable, especially in log messages that may contain e.g. filenames that could be in some weird encoding. I would suggest CString "abc\n\t\0\377" or some hex encoding like "abc\x0a\x09\x00\xff" instead.

comment:22 Changed 9 months ago by ahf

Sponsor: Sponsor8Sponsor19

comment:23 Changed 9 months ago by ahf

Status: needs_reviewneeds_revision

dgoulet, I've created https://github.com/torproject/tor/pull/645 which renames TYPE to TRANSPORT as per dcf's comment above.

I also think we should make sure the examples are in the machine readable format.

comment:24 Changed 9 months ago by nickm

Keywords: postfreeze-ok added

Mark some tickets as postfreeze-ok, to indicate that I think they are okay to accept in 0.4.0 post-freeze. Does not indicate that they are all necessary to do postfreeze.

comment:25 Changed 9 months ago by dgoulet

Status: needs_revisionneeds_review

Pushed fixup commits to update what dcf1 proposed: ticket28181_05

ACK on the code.

Last edited 9 months ago by dgoulet (previous) (diff)

comment:26 Changed 9 months ago by ahf

Status: needs_reviewmerge_ready

For Nick: The code that David ack'ed is in ​https://github.com/torproject/tor/pull/645

David's spec patches looks good to me.

comment:27 Changed 9 months ago by dgoulet

Points: 1

comment:28 Changed 9 months ago by nickm

Resolution: implemented
Status: merge_readyclosed

Merged!

comment:29 Changed 9 months ago by dcf

Parent ID: #28180

I asked for feedback from the author of ptadapter, which is one of the few PT manager implementations apart from tor; i.e., it will have to handle LOG and STATUS messages from PT implementations.

I agree with dcf's sentiments in general, that LOG messages are for human consumption and STATUS messages are for machines. The LOG mechanism basically becomes another handler for the logging system, alongside file output, syslog, etc. It might be worthwhile to add some way for the controlling application to indicate allowed log level, to avoid being flooded with debug messages (I know I like to flood my debug logs!)

As for STATUS, if these are messages meant for machines, then the machine is probably expected to take actions on them. It might be a good idea to think about what actions are wanted, and define some standardized keys to trigger these actions. A PERCENT key that moves a progress bar would be one example.

They add, "I think the existing syntax is a good foundation to build on." (Referring to what shipped in tor 0.4.0.1-alpha.)

comment:30 Changed 9 months ago by dcf

Parent ID: #28180

comment:31 Changed 8 months ago by ahf

Parent ID: #28180
Resolution: implemented
Status: closedreopened

comment:32 Changed 8 months ago by ahf

Status: reopenedneeds_review

It looks like we tried to do a change to both tor.git and torspec.git in the same comments here which caused some confusion.

We are still missing to merge the pt-spec.txt changes of this. They can be found in dgoulet's branch ticket28181_05 available at https://gitweb.torproject.org/user/dgoulet/torspec.git/log/?h=ticket28181_05

David requested review of this in comment 25 and I ack'ed the changes in comment 26.

The patches needs to be squashed prior to merging.

Leaving this comment as needs_review to let dgoulet give an ack on whether I got all the details right.

comment:33 Changed 8 months ago by dgoulet

Status: needs_reviewmerge_ready

comment:34 Changed 8 months ago by nickm

Resolution: fixed
Status: merge_readyclosed

Merged!

Note: See TracTickets for help on using tickets.