The client logs full SDP stanzas, which include local IP addresses.
When the websocket server panics (as in #29125 (moved)), it writes the client IP address to the log:
{{{
2019/01/18 18:56:29 http2: panic serving X.X.X.X:YYYY: interface conversion: *http2.responseWriter is not http.Hijacker: missing method Hijack
}}}
The websocket server logs an IP address on certain connection errors. (These are probably automated scanners, but still...)
{{{
2019/04/01 07:02:50 http: TLS handshake error from X.X.X.X:YYYY: acme/autocert: missing server name
2019/04/01 07:53:03 http: TLS handshake error from X.X.X.X:YYYY: tls: oversized record received with length 21536
}}}
The broker logs the raw bytes of ICE answers:
{{{
2019/03/22 06:28:32 Received answer: [XXX XXX XXX XXX ... XXX]
}}}
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
I think the highest priority for this ticket is to make sure that IP addresses don't end up in the logs. This can be a little tricky because IP addresses can show up as part of error messages. See scrubError in meek-server for a way of sanitizing certain kinds of errors. However for others, it may be necessary to do regex replacement or similar.
Also there are multiple "snowflake.log"s made by different programs: client, server, broker, proxy-go.
Trac: Cc: arma, Samdney to arma Keywords: starter deleted, N/Aadded
When the websocket server panics (as in #29125 (moved)), it writes the client IP address to the log:
{{{
2019/01/18 18:56:29 http2: panic serving X.X.X.X:YYYY: interface conversion: *http2.responseWriter is not http.Hijacker: missing method Hijack
}}}
The http server panic mentioned above is solved in this way, since the server is set up to log messages to the default logger (set up in server.go to output to the provided log file). If the code is ever changed to give the http.Server structs created in server.go their own logger, we'll need to make sure we add a scrubber to the logger.
One of the problems I can forsee with the current code is that I had to introduce a hack to make the scrubber leave fingerprints alone. This is based off the assumption that logged IPv6 addresses are enclosed in [,] brackets. As far as I can tell, this seems to be the case but it makes me a bit uncomfortable.
What are our feelings about scrubbing fingerprints from logs as well? Is it necessary? The one in particular I'm looking at is the received answer in the client log:
If we don't need the fingerprint, I'd prefer to "over scrub" instead of potentially leaking info if the IPv6 logging violates the above assumption.
I did a cursory glance at the sanitized logs and there are no obvious privacy-related leaks at the moment.
Putting this in needs_review because, while there are other logging changes that could be made, most are concerned with the verbosity of the output which is a separate issue.
We'll need the same functionality across multiple programs: broker, client, proxy-go, server. I think you've what you've done so far—putting the tests in server and copying the code where needed—is good for prototyping. Let me suggest a further simplification: only worry about the server log for now, and then we can perhaps factor the safe-logging code into a separate package for use by the other programs.
In server,
logs are scrubbed whether the --log option is used or not.
In proxy-go though,
logs are only scrubbed when --log is used, otherwise they are left unscrubbed.
Is there a reason for the difference?
I think it's best to default to scrubbed in all cases.
Maybe even in client,
we should run logs through the scrubber before sending them to ioutil.Discard,
both to remind us to use the scrubber after we can rely on #26360 (moved) being fixed and re-enable non-file logging,
and to eliminate one difference in the logging code paths
(imagine a bug in the log scrubber that only manifests when logging to a file).
Maybe use logic like this:
{{{
var logOutput io.Writer = os.Stderr
if logFilename != "" {
f, err := os.OpenFile(logFilename, os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0600)
if err != nil {
log.Fatalf("can't open log file: %s", err)
}
defer f.Close()
logOutput = f
}
//We want to send the log output through our scrubber first
log.SetOutput(&logScrubber{logOutput})
}}}
In client, the initial value for logOutput can be ioutil.Discard rather than os.Stderr.
Speaking of which, the default log output in the absence of a --log option should be
os.Stderr, not os.Stdout.
Ref: "That logger writes to standard error..."
There's now a Logger.Writer method that would allow
us to ask for the default output, but as it was only added in 1.12,
it's probably still too new to use.
The logScrubber.Write method only looks at one buffer's worth of data at a time,
so it fails when an IP address is split across buffers. For example, this test fails:
{{{
type byteAtATimeWriter struct {
io.Writer
}
func (w *byteAtATimeWriter) Write(p []byte) (int, error) {
total := 0
for i := range p {
n, err := w.Writer.Write(p[i : i+1])
total += n
if err != nil {
return total, err
}
}
return total, nil
}
func TestLogScrubberSplit(t *testing.T) {
log.SetFlags(0)
var buff bytes.Buffer
log.SetOutput(&byteAtATimeWriter{&logScrubber{&buff}})
log.Print("test 1.2.3.4 test")
if !bytes.Equal(buff.Bytes(), []byte("test X.X.X.X test\n")) {
t.Errorf("%q", buff.Bytes())
}
}
}}}
See below re RedactErrorWriter for an idea on how to deal with this.
You can use !bytes.Equal(a, b) instead of bytes.Compare(a, b) != 0. Try this:
{{{
gofmt -l -w -r 'bytes.Compare(a, b) != 0 -> !bytes.Equal(a, b)' server_test.go
}}}
Or, use buff.String and use string equality.
Should we scrub port numbers, too? A port number could be identifying within a short time span.
See below for more ideas about this.
Consider structuring repetitive tests like this:
{{{
for _, test := range []struct {
input, expected string
}{
{
"http: TLS handshake error from 129.97.208.23:38310:",
"http: TLS handshake error from X.X.X.X:38310:\n",
},
{
"http2: panic serving [2620:101:f000:780:9097:75b1:519f:dbb8]:58344: interface conversion: *http2.responseWriter is not http.Hijacker: missing method Hijack",
"http2: panic serving [X:X:X:X:X:X:X:X]:58344: interface conversion: *http2.responseWriter is not http.Hijacker: missing method Hijack\n",
},
{
//Make sure it doesn't scrub fingerprint
"a=fingerprint:sha-256 33:B6:FA:F6:94:CA:74:61:45:4A:D2:1F:2C:2F:75:8A:D9:EB:23:34:B2:30:E9:1B:2A:A6:A9:E0:44:72:CC:74",
"a=fingerprint:sha-256 33:B6:FA:F6:94:CA:74:61:45:4A:D2:1F:2C:2F:75:8A:D9:EB:23:34:B2:30:E9:1B:2A:A6:A9:E0:44:72:CC:74\n",
},
} {
var buff bytes.Buffer
log.SetOutput(&logScrubber{&buff})
log.Print(test.input)
if buff.String() != test.expected {
t.Errorf("%q: got %q, expected %q", test.input, buff.String(), test.expected)
}
}
}}}
(Not appropriate for every test, because sometimes you want to test more than one line.)
I reached the same conclusion you did: that, because of [[comment:10|net/http default error logging]], string-based grepping for things that look like IP addresses is unavoidable.
I started with a redactError function
that introspects a net.OpError
and replaces the address with a scrubbed address.
Then, you have to remember to call redactError on all errors before logging them.
There are similar functions in meek-server
and obfs4proxy.
redactError on its own is insufficient, because of net/http default logging.
Also, I worry that we'll get an error that contains an IP address but is not a net.OpError.
So I added a RedactErrorWriter type
that is the same idea as logScrubber: it hijacks the Write method and uses regexp to search for addresses.
The principal difference is that it buffers full lines to solve the split-write issue mentioned above.
Is it worth doing both, structure-based object manipulation and string-based patterns? Perhaps not, since the correct functioning of one can mask errors in the other. You could think of it as defense in depth, but it will be hard to consistently use a redactError function everywhere, when forgetting it doesn't have any observable consequences. Surely having both is safer, but the marginal safety increase of redactError may not be worth the (albeit small) code complexity increase.
One other difference is this massive, yet conservative, IP address–detecting regexp. I admit its appearance is comical, but it will detect IPv6 addresses not surrounded by brackets, and avoid false-positive matches like what you mentioned in comment:11. Also, it scrubs port numbers. I would say, test your and my test cases against the pattern you already have; and if there are any misdetections, a more precise regexp may be in order.
Trac: Cc: arma, Samdney to arma, Samdney, dcf Status: needs_review to needs_revision
What are our feelings about scrubbing fingerprints from logs as well? Is it necessary? The one in particular I'm looking at is the received answer in the client log:
{{{
a=ice-options:trickle
a=fingerprint:sha-256 8D:CE:FE:08:F1:AC:32:30:88:D1:B4:1A:34:84:19:C2:43:18:4A:57:A9:20:2F:DC:C2:32:01:38:F9:8B:E5:8C
a=setup:active
a=mid:data
}}}
If we don't need the fingerprint, I'd prefer to "over scrub" instead of potentially leaking info if the IPv6 logging violates the above assumption.
(Sidestepping the question slightly.) In the case of these full SDP stanzas, I think we should just not be logging them at all, not by default anyway.
The historical reason why the client logs these, I believe, has to do with early development of the system, when we would do ICE signaling manually (i.e., copy-and-paste the information from the terminal and into the browser, and vice versa). That's what the "SEND" button at https://snowflake.torproject.org/snowflake.html, and the NewCopyPasteDialer in client, are for. I'm okay with that mode of operation going away.
When the websocket server panics (as in #29125 (moved)), it writes the client IP address to the log:
{{{
2019/01/18 18:56:29 http2: panic serving X.X.X.X:YYYY: interface conversion: *http2.responseWriter is not http.Hijacker: missing method Hijack
}}}
to
For starters, the timestamps are in the local timezone. We can make that UTC
When the websocket server panics (as in #29125 (moved)), it writes the client IP address to the log:
{{{
2019/01/18 18:56:29 http2: panic serving X.X.X.X:YYYY: interface conversion: *http2.responseWriter is not http.Hijacker: missing method Hijack
}}}
The broker logs the raw bytes of ICE answers:
{{{
2019/03/22 06:28:32 Received answer: [XXX XXX XXX XXX ... XXX]
}}}
Replying to dcf:
Thanks for the helpful feedback! I've addressed each of the comments below:
Let me suggest a further simplification: only worry about the server log for now, and then we can perhaps factor the safe-logging code into a separate package for use by the other programs.
This is what I've done so far, the log scrubber is only being used on the server log at the moment.
I think it's best to default to scrubbed in all cases.
Agreed. Used the suggested logic to avoid missing default loggers in the future: server.go#L329
the default log output in the absence of a --log option should be
os.Stderr, not os.Stdout.
Ref: "That logger writes to standard error..."
There's now a Logger.Writer method that would allow
us to ask for the default output, but as it was only added in 1.12,
it's probably still too new to use.
Oops, fixed. I agree that the Logger.Writer method is the best thing to use and it's probably a good idea to wait until we want to update the go version for other reasons as well.
The logScrubber.Write method only looks at one buffer's worth of data at a time,
See below re RedactErrorWriter for an idea on how to deal with this.
Cool, I really like how it works. I implemented that for the log scrubber here: server.go#L91 and added your tests: server_test.go#L54
You can use !bytes.Equal(a, b) instead of bytes.Compare(a, b) != 0.
Done.
Should we scrub port numbers, too?
Yeah, that's a good idea. I've updated the regular expressions to catch the port. See my comments below for what I settled on.
Consider structuring repetitive tests like this:
Done.
Instead of X.X.X.X, tor uses[scrubbed], as do
meek-server
and obfs4proxy.
Nice, I like this better. It also leaks less information this way. Done.
Is it worth doing both, structure-based object manipulation and string-based patterns? Perhaps not, since the correct functioning of one can mask errors in the other. You could think of it as defense in depth, but it will be hard to consistently use a redactError function everywhere, when forgetting it doesn't have any observable consequences. Surely having both is safer, but the marginal safety increase of redactError may not be worth the (albeit small) code complexity increase.
I think for now it's okay to just have the log scrubber. Before I started it, I went through and looked at all of the returned errors and I don't recall finding any net.OpErrors anyway.
One other difference is this massive, yet conservative, IP address–detecting regexp. I admit its appearance is comical, but it will detect IPv6 addresses not surrounded by brackets, and avoid false-positive matches like what you mentioned in comment:11. Also, it scrubs port numbers. I would say, test your and my test cases against the pattern you already have; and if there are any misdetections, a more precise regexp may be in order.
I stumbled across that as well when I wrote the original regexp >.< I did end up taking some elements from it and expanding the regexp that I have to:
My reason for not going with the full one was that I found it unnecessary after expanding my tests with the ones you used. I also found that this massive regexp was still scrubbing the fingerprint, meaning additional changes still had to be made. In particular, this test case failed:
//Make sure it doesn't scrub fingerprint "a=fingerprint:sha-256 33:B6:FA:F6:94:CA:74:61:45:4A:D2:1F:2C:2F:75:8A:D9:EB:23:34:B2:30:E9:1B:2A:A6:A9:E0:44:72:CC:74", "a=fingerprint:sha-256 33:B6:FA:F6:94:CA:74:61:45:4A:D2:1F:2C:2F:75:8A:D9:EB:23:34:B2:30:E9:1B:2A:A6:A9:E0:44:72:CC:74\n",
Go does not yet support look-ahead or look-behind, which is what we'd want to pass this test case. Instead I implemented a work-around that will find regular expressions that check for characters surrounding the address, but only replaces the address portion of it:
const fullAddrPattern = `(^|\s|[^\w:])` + addressPattern + `(\s|(:\s)|[^\w:]|$)`var scrubberPatterns = []*regexp.Regexp{ regexp.MustCompile(fullAddrPattern),}func scrub(b []byte) []byte { scrubbedBytes := b for _, pattern := range scrubberPatterns { // this is a workaround since go does not yet support look ahead or look // behind for regular expressions. scrubbedBytes = pattern.ReplaceAllFunc(scrubbedBytes, func(b []byte) []byte { return addressRegexp.ReplaceAll(b, []byte("[scrubbed]")) }) } return scrubbedBytes}
In the case of these full SDP stanzas, I think we should just not be logging them at all, not by default anyway.
...
That's what the "SEND" button at https://snowflake.torproject.org/snowflake.html, and the NewCopyPasteDialer in client, are for. I'm okay with that mode of operation going away.
I removed this from the client logs, but if there's now unused functionality in the client or proxy code perhaps we should remove that too?
I made some other additional changes:
The regular expressions are now compiled only once, outside of the write function. This will make it easier to add more patterns the scrubber later and probably makes it more efficient
Modified the broker to not print out the raw bytes of ICE answers
Also please let me know if I should be doing something different for attributing the code you wrote for this.
I think this is ready to merge. It looks like we are not keeping permanent logs for snowflake-server; they are getting rotated in /var/log/tor. So we don't have to do anything special to purge the old logs.
Apart from the specific client and broker changes you made here, do the client and broker benefit from using the same log scrubber? I.e., is it worth it to factor out the log scrubber into its own package and share it, after merging it to server?
I think for now it's okay to just have the log scrubber. Before I started it, I went through and looked at all of the returned errors and I don't recall finding any net.OpErrors anyway.
Agreed, I think the string-based scrubbing is the way to go.
My reason for not going with the full one was that I found it unnecessary after expanding my tests with the ones you used. I also found that this massive regexp was still scrubbing the fingerprint, meaning additional changes still had to be made.
This looks good to me.
Go does not yet support look-ahead or look-behind, which is what we'd want to pass this test case. Instead I implemented a work-around that will find regular expressions that check for characters surrounding the address, but only replaces the address portion of it:
This is a good way to do it. I'm also fine if fingerprints get scrubbed, I don't think that necessarily has to be a test case. On the comment "go does not yet support look ahead or look behind," I think as a philosophical matter they plan never ever to implement those.
That's what the "SEND" button at https://snowflake.torproject.org/snowflake.html, and the NewCopyPasteDialer in client, are for. I'm okay with that mode of operation going away.
I removed this from the client logs, but if there's now unused functionality in the client or proxy code perhaps we should remove that too?
Yes; personally I think we can remove all the fifo copy-paste code. Let's do that in another ticket though.
Also please let me know if I should be doing something different for attributing the code you wrote for this.
Nothing needed in this regard.
Trac: Status: needs_review to merge_ready Description: For starters, the timestamps are in the local timezone. We can make that UTC
When the websocket server panics (as in #29125 (moved)), it writes the client IP address to the log:
{{{
2019/01/18 18:56:29 http2: panic serving X.X.X.X:YYYY: interface conversion: *http2.responseWriter is not http.Hijacker: missing method Hijack
}}}
The broker logs the raw bytes of ICE answers:
{{{
2019/03/22 06:28:32 Received answer: [XXX XXX XXX XXX ... XXX]
}}}
to
For starters, the timestamps are in the local timezone. We can make that UTC
The client logs full SDP stanzas, which include local IP addresses.
When the websocket server panics (as in #29125 (moved)), it writes the client IP address to the log:
{{{
2019/01/18 18:56:29 http2: panic serving X.X.X.X:YYYY: interface conversion: *http2.responseWriter is not http.Hijacker: missing method Hijack
}}}
The websocket server logs an IP address on certain connection errors. (These are probably automated scanners, but still...)
{{{
2019/04/01 07:02:50 http: TLS handshake error from X.X.X.X:YYYY: acme/autocert: missing server name
2019/04/01 07:53:03 http: TLS handshake error from X.X.X.X:YYYY: tls: oversized record received with length 21536
}}}
The broker logs the raw bytes of ICE answers:
{{{
2019/03/22 06:28:32 Received answer: [XXX XXX XXX XXX ... XXX]
}}}
Apart from the specific client and broker changes you made here, do the client and broker benefit from using the same log scrubber? I.e., is it worth it to factor out the log scrubber into its own package and share it, after merging it to server?
I would say yes, as a precaution, just because the broker at least also uses the net.http package for connections that involve the client IP address. It would be a very small library... is that a good way to do it?
I removed this from the client logs, but if there's now unused functionality in the client or proxy code perhaps we should remove that too?
Yes; personally I think we can remove all the fifo copy-paste code. Let's do that in another ticket though.
Created ticket #30008 (moved)
Apart from the specific client and broker changes you made here, do the client and broker benefit from using the same log scrubber? I.e., is it worth it to factor out the log scrubber into its own package and share it, after merging it to server?
I would say yes, as a precaution, just because the broker at least also uses the net.http package for connections that involve the client IP address. It would be a very small library... is that a good way to do it?
I think the way to do it is create a new subdirectory containing the code, put package safelog (or whatever) at the top of the file, and then import it where needed using import git.torproject.org/pluggable-transports/snowflake.git/safelog (or whatever).