Opened 16 months ago

Last modified 7 weeks ago

#22212 merge_ready defect

[warn] channelpadding_compute_time_until_pad_for_netflow(): Bug: Channel padding timeout scheduled 164729ms in the past. Did the monotonic clock just jump?

Reported by: arma Owned by: arma
Priority: Medium Milestone: Tor: 0.3.2.x-final
Component: Core Tor/Tor Version: Tor: 0.3.1.1-alpha
Severity: Normal Keywords: tor-client, 032-backport, 031-unreached-backport
Cc: mikeperry Actual Points:
Parent ID: Points:
Reviewer: mikeperry Sponsor: Sponsor2

Description

Normal Tor client, running Tor 0.3.1.0-alpha-dev 2e4f3b36bdfd5118, got this log line:

May 10 02:50:11.217 [warn] channelpadding_compute_time_until_pad_for_netflow(): Bug: Channel padding timeout scheduled 164729ms in the past. Did the monotonic clock just jump? (on Tor 0.3.1.0-alpha-dev 2e4f3b36bdfd5118)

My tor client had been idle for hours before this point, and the log message coincides with a new request that I made. I'm pretty sure there was no clock jump.

Alas, I have no more detailed logs. I'll try to reproduce with logs.

Child Tickets

Attachments (1)

info.log (165.6 KB) - added by teor 15 months ago.
Client log with padding error produced using chutney ref nodes.1495684579

Download all attachments as: .zip

Change History (43)

comment:1 Changed 16 months ago by nickm

Cc: mikeperry added

comment:2 Changed 15 months ago by mikeperry

Hrmm.. Noting some thoughts while we wait for more info or other instances: This could have happened if Tor stopped running its one-per-second housekeeping while you were reconnecting..

That large of a delta indicates a padding time was chosen for the channel, but a timer wasn't scheduled yet. Then that channel was forgotten about for about two minutes, until the connection housekeeping started running again, and we tried to schedule the padding timer, but found out it should have happened 2 minutes ago...

comment:3 Changed 15 months ago by mikeperry

(Either way, this is not catastrophic. Once that warn fired, padding was sent immediately, and the timestamp was reset...)

comment:4 Changed 15 months ago by arma

I have not yet been able to reproduce.

Changed 15 months ago by teor

Attachment: info.log added

Client log with padding error produced using chutney ref nodes.1495684579

comment:5 Changed 15 months ago by teor

Version: Tor: 0.3.1.1-alpha

This happened to me once when I was using chutney.
Since I am using a VM, there might have been an actual clock jump.

comment:6 in reply to:  5 ; Changed 15 months ago by arma

Replying to teor:

This happened to me once when I was using chutney.
Since I am using a VM, there might have been an actual clock jump.

You have log lines for that second, the second before it, and the second after it. So I am tempted to think there was no clock jump.

Also, you might enjoy "LogTimeGranularity 1" in your chutney torrc files. :)

comment:7 in reply to:  6 Changed 15 months ago by teor

Replying to arma:

...
Also, you might enjoy "LogTimeGranularity 1" in your chutney torrc files. :)

Added in #22373.

comment:8 Changed 15 months ago by mikeperry

Hrmm.. So in this log it looks like circuit construction delayed the connection housekeeping/run_scheduled_events() for a few seconds. This delay meant that the padding time elapsed before channelpadding_decide_to_pad_channel() was called, which meant that the time was in the past by the time it was called.. See also https://trac.torproject.org/projects/tor/ticket/16585

I am not sure what to do about this. I could change the code to always schedule an actual timer right away and cancel it if data is sent instead (instead of waiting around until we get within a second before scheduling a timer). This may mean more timer churn, but now that the timers are more efficient, perhaps this is not an issue.

Or, we could demote the log message to notice.

In terms of when padding actually gets sent, I'm guessing that it still won't happen any earlier with the schedule and cancel timer approach, because if this is #16585, libevent wouldn't call us back any sooner anyway.

This is making me think that we just demote this message and not increase the code complexity.. Any objections?

Last edited 15 months ago by mikeperry (previous) (diff)

comment:9 Changed 14 months ago by toralf

I go tthat message 3 times within last 48 hours with 0.3.1.3-alpha. eg.:

Jun 16 15:02:36.000 [warn] channelpadding_compute_time_until_pad_for_netflow(): Bug: Channel padding timeout scheduled 39ms in the past. Did the monotonic clock just jump? (on Tor 0.3.1.3-alpha dc47d936d47ffc25)

comment:10 in reply to:  9 ; Changed 14 months ago by arma

Replying to toralf:

I go tthat message 3 times within last 48 hours with 0.3.1.3-alpha. eg.:

Jun 16 15:02:36.000 [warn] channelpadding_compute_time_until_pad_for_netflow(): Bug: Channel padding timeout scheduled 39ms in the past. Did the monotonic clock just jump? (on Tor 0.3.1.3-alpha dc47d936d47ffc25)

What were the numbers you got for the other two times? (Apparently the numbers will help Mike guess about the bug.)

Last edited 14 months ago by arma (previous) (diff)

comment:11 Changed 14 months ago by nickm

Owner: set to mikeperry
Status: newassigned

comment:12 Changed 14 months ago by mikeperry

Status: assignedneeds_review

Ok, I demoted this to notice if the callback is over 10s late, and to info if it is less than that. I don't think there is anything more we can do here, until we're either multi-threaded for we have a solution for #16585.

See mikeperry/bug22212 (7914091445e076871bc9fdd41f8304a9b3af8e8c).

comment:13 Changed 14 months ago by nickm

Status: needs_reviewneeds_revision

looks ok, needs a changes file?

comment:14 Changed 14 months ago by nickm

Also you could write the body as:

  if (ms_until_pad_for_netflow < 0) {
    int severity = (ms_until_pad_for_netflow < -DFLT_ETC) ? LOG_NOTICE, LOG_INFO;
    log_fn(severity, LD_OR, ...)
    return 0;
  }

but I don't know if that's actually any more clear.

Version 0, edited 14 months ago by nickm (next)

comment:15 in reply to:  10 Changed 14 months ago by toralf

Replying to arma:

Replying to toralf:

I go tthat message 3 times within last 48 hours with 0.3.1.3-alpha. eg.:

Jun 16 15:02:36.000 [warn] channelpadding_compute_time_until_pad_for_netflow(): Bug: Channel padding timeout scheduled 39ms in the past. Did the monotonic clock just jump? (on Tor 0.3.1.3-alpha dc47d936d47ffc25)

What were the numbers you got for the other two times? (Apparently the numbers will help Mike guess about the bug.)

mr-fox linux # grep channel /tmp/warn.log  | cut -f2- -d')' | cut -f1 -d'.'
: Bug: Channel padding timeout scheduled 39ms in the past
: Bug: Channel padding timeout scheduled 1396ms in the past
: Bug: Channel padding timeout scheduled 25ms in the past
: Bug: Channel padding timeout scheduled 516ms in the past
: Bug: Channel padding timeout scheduled 395ms in the past
: Bug: Channel padding timeout scheduled 979ms in the past
: Bug: Channel padding timeout scheduled 25ms in the past
: Bug: Channel padding timeout scheduled 17ms in the past
: Bug: Channel padding timeout scheduled 114ms in the past
: Bug: Channel padding timeout scheduled 1163ms in the past

comment:16 Changed 14 months ago by mikeperry

Status: needs_revisionneeds_review

Ok, I added a changes file, and refactored that check. I also lowered the time limit for the notice line to 4.5 seconds, which is the point at which we start to risk allowing a new netflow record to get emitted.

comment:17 Changed 14 months ago by nickm

Resolution: fixed
Status: needs_reviewclosed

squashed and merging, in 0.3.1 and forward.

comment:18 Changed 14 months ago by toralf

Jul 03 20:51:39.000 [notice] Tor 0.3.1.4-alpha (git-fab91a290ded3e74) opening log file.
Jul 03 20:51:44.000 [notice] Channel padding timeout scheduled 519741ms in the past.

comment:19 in reply to:  18 Changed 14 months ago by teor

Resolution: fixed
Status: closedreopened

Replying to toralf:

Jul 03 20:51:39.000 [notice] Tor 0.3.1.4-alpha (git-fab91a290ded3e74) opening log file.
Jul 03 20:51:44.000 [notice] Channel padding timeout scheduled 519741ms in the past.

Skipping 520 seconds of padding is concerning, re-opening this issue.

comment:20 Changed 14 months ago by asn

FWIW, I got this warning on a chutney network after I suspended my laptop for a few seconds while the network was running. Unfortunately, I've lost the logs but I'm pretty sure that the branch included the fix above.

comment:21 Changed 12 months ago by toralf

I added a 2nd Tor exit relay to my machine, listening on ports 9001 and 9030, ipv4 and ipv4.
The new is 6EABEBF38CE7E3DF672C4DB01383606FE3EB2215 (the old is 1AF72E8906E6C49481A791A6F8F84F8DFEBBB2BA)
Started after rebooting the machine at the same time, both notice logs are nearly identical, except now for the message seen below of the old, matured exit relay. I got few minutes ago :

Aug 26 10:27:48.000 [notice] Received reload signal (hup). Reloading config and resetting internal state.
Aug 26 10:27:48.000 [notice] Read configuration file "/etc/tor/torrc".
Aug 26 10:27:48.000 [notice] Tor 0.3.1.5-alpha (git-83389502ee631465) opening log file.
Aug 26 10:27:49.000 [notice] Channel padding timeout scheduled 231988ms in the past. 

The new exit (just few days old) didn't show this behavior.

comment:22 Changed 12 months ago by nickm

(Mike, any insight here?)

comment:23 Changed 11 months ago by mikeperry

Two lines is not a whole lot of context here. It's also not clear if these are all happening after a SIGHUP, after a fresh start of Tor, or both/either?

comment:24 Changed 11 months ago by mikeperry

That said, if this happens right after a fresh start of Tor, then this may be caused by us not setting chan->timestamp_xfer_ms on channels before that code is reached. Odd that this is suddenly happening now, though. Perhaps some change in the channel code is causing us to not set that field in some cases? I can look into that, I guess.

comment:25 Changed 10 months ago by Logforme

My non-exit relay (855BC2DABE24C861CD887DB9B2E950424B49FC34) logs a lot of these messages since I upgraded to 3.1.7:

Oct 13 22:22:23.000 [notice] Channel padding timeout scheduled 171606ms in the past.
Oct 13 22:22:32.000 [notice] Channel padding timeout scheduled 145225ms in the past.
Oct 13 22:22:36.000 [notice] Channel padding timeout scheduled 178832ms in the past.
Oct 13 22:22:47.000 [notice] Channel padding timeout scheduled 183913ms in the past.
Oct 13 22:22:47.000 [notice] Channel padding timeout scheduled 169665ms in the past.
Oct 13 22:23:10.000 [notice] Channel padding timeout scheduled 159677ms in the past.
Oct 13 22:23:11.000 [notice] Channel padding timeout scheduled 193190ms in the past.
Oct 13 22:23:11.000 [notice] Channel padding timeout scheduled 195678ms in the past.
Oct 13 22:23:12.000 [notice] Channel padding timeout scheduled 172014ms in the past.
Oct 13 22:23:14.000 [notice] Channel padding timeout scheduled 226954ms in the past.
Oct 13 22:23:17.000 [notice] Channel padding timeout scheduled 173160ms in the past.
Oct 13 22:23:17.000 [notice] Channel padding timeout scheduled 212215ms in the past.
Oct 13 22:23:19.000 [notice] Channel padding timeout scheduled 203001ms in the past.
Oct 13 22:23:19.000 [notice] Channel padding timeout scheduled 160855ms in the past.
Oct 13 22:23:19.000 [notice] Channel padding timeout scheduled 237575ms in the past.
Oct 13 22:23:20.000 [notice] Channel padding timeout scheduled 200659ms in the past.
Oct 13 22:23:20.000 [notice] Channel padding timeout scheduled 213746ms in the past.
Oct 13 22:23:20.000 [notice] Channel padding timeout scheduled 218091ms in the past.
Oct 13 22:23:20.000 [notice] Channel padding timeout scheduled 181264ms in the past.
Oct 13 22:23:20.000 [notice] Channel padding timeout scheduled 173220ms in the past.
Oct 13 22:23:20.000 [notice] Channel padding timeout scheduled 185291ms in the past.
Oct 13 22:23:20.000 [notice] Channel padding timeout scheduled 221698ms in the past.
Oct 13 22:23:21.000 [notice] Channel padding timeout scheduled 211148ms in the past.
Oct 13 22:23:21.000 [notice] Channel padding timeout scheduled 238334ms in the past.
Oct 13 22:23:21.000 [notice] Channel padding timeout scheduled 255031ms in the past.
Oct 13 22:23:21.000 [notice] Channel padding timeout scheduled 165644ms in the past.
Oct 13 22:23:21.000 [notice] Channel padding timeout scheduled 208372ms in the past.
Oct 13 22:23:21.000 [notice] Channel padding timeout scheduled 185413ms in the past.
Oct 13 22:23:22.000 [notice] Channel padding timeout scheduled 248112ms in the past.
Oct 13 22:23:22.000 [notice] Channel padding timeout scheduled 231464ms in the past.
Oct 13 22:23:23.000 [notice] Channel padding timeout scheduled 216373ms in the past.
Oct 13 22:23:23.000 [notice] Channel padding timeout scheduled 158121ms in the past.
Oct 13 22:23:23.000 [notice] Channel padding timeout scheduled 222485ms in the past.
Oct 13 22:23:24.000 [notice] Channel padding timeout scheduled 206899ms in the past.
Oct 13 22:23:24.000 [notice] Channel padding timeout scheduled 196966ms in the past.
Oct 13 22:23:24.000 [notice] Channel padding timeout scheduled 182872ms in the past.
Oct 13 22:23:25.000 [notice] Channel padding timeout scheduled 204818ms in the past.
Oct 13 22:23:25.000 [notice] Channel padding timeout scheduled 212764ms in the past.
Oct 13 22:23:25.000 [notice] Channel padding timeout scheduled 146586ms in the past.
Oct 13 22:23:25.000 [notice] Channel padding timeout scheduled 200461ms in the past.
Oct 13 22:23:25.000 [notice] Channel padding timeout scheduled 227628ms in the past.
Oct 13 22:23:26.000 [notice] Channel padding timeout scheduled 203696ms in the past.
Oct 13 22:23:26.000 [notice] Channel padding timeout scheduled 197219ms in the past.
Oct 13 22:23:27.000 [notice] Channel padding timeout scheduled 214081ms in the past.
Oct 13 22:23:27.000 [notice] Channel padding timeout scheduled 204242ms in the past.
Oct 13 22:23:27.000 [notice] Channel padding timeout scheduled 205723ms in the past.
Oct 13 22:23:28.000 [notice] Channel padding timeout scheduled 214406ms in the past.
Oct 13 22:23:28.000 [notice] Channel padding timeout scheduled 235882ms in the past.
Oct 13 22:23:29.000 [notice] Channel padding timeout scheduled 203412ms in the past.
Oct 13 22:23:30.000 [notice] Channel padding timeout scheduled 243612ms in the past.
Oct 13 22:23:30.000 [notice] Channel padding timeout scheduled 213504ms in the past.
Oct 13 22:23:30.000 [notice] Channel padding timeout scheduled 188307ms in the past.
Oct 13 22:23:31.000 [notice] Channel padding timeout scheduled 200576ms in the past.
Oct 13 22:23:31.000 [notice] Channel padding timeout scheduled 216407ms in the past.
Oct 13 22:23:33.000 [notice] Channel padding timeout scheduled 179621ms in the past.
Oct 13 22:23:33.000 [notice] Channel padding timeout scheduled 192193ms in the past.
Oct 13 22:23:33.000 [notice] Channel padding timeout scheduled 183206ms in the past.
Oct 13 22:23:33.000 [notice] Channel padding timeout scheduled 194950ms in the past.
Oct 13 22:23:34.000 [notice] Channel padding timeout scheduled 216542ms in the past.
Oct 13 22:23:34.000 [notice] Channel padding timeout scheduled 207604ms in the past.
Oct 13 22:23:35.000 [notice] Channel padding timeout scheduled 204013ms in the past.
Oct 13 22:23:35.000 [notice] Channel padding timeout scheduled 193978ms in the past.
Oct 13 22:23:36.000 [notice] Channel padding timeout scheduled 206517ms in the past.
Oct 13 22:23:39.000 [notice] Channel padding timeout scheduled 189352ms in the past.
Oct 13 22:23:39.000 [notice] Channel padding timeout scheduled 228094ms in the past.
Oct 13 22:23:40.000 [notice] Channel padding timeout scheduled 209217ms in the past.
Oct 13 22:23:43.000 [notice] Channel padding timeout scheduled 200273ms in the past.
Oct 13 22:23:43.000 [notice] Channel padding timeout scheduled 237556ms in the past.
Oct 13 22:23:45.000 [notice] Channel padding timeout scheduled 197998ms in the past.
Oct 13 22:23:50.000 [notice] Channel padding timeout scheduled 239077ms in the past.
Oct 13 22:23:50.000 [notice] Channel padding timeout scheduled 220076ms in the past.
Oct 13 22:23:51.000 [notice] Channel padding timeout scheduled 204667ms in the past.
Oct 13 22:23:54.000 [notice] Channel padding timeout scheduled 207857ms in the past.
Oct 13 22:23:56.000 [notice] Channel padding timeout scheduled 233575ms in the past.
Oct 13 22:23:59.000 [notice] Channel padding timeout scheduled 203337ms in the past.
Oct 13 22:24:01.000 [notice] Channel padding timeout scheduled 197489ms in the past.
Oct 13 22:25:27.000 [notice] Channel padding timeout scheduled 232756ms in the past.

Relay uptime was 15 days when the above was logged. Last SIGHUP was at 06:25, 16 hours before. The relay runs NTP and keeps correct time.

comment:26 Changed 10 months ago by nickm

Sponsor: Sponsor2

comment:27 Changed 10 months ago by fcornu

Hi

Same here on a non-exit realy (B143D439B72D239A419F8DCE07B8A8EB1B486FA7) freshly upgraded to 0.3.1.7, also on a host with NTP

First occurence about 25mn after relay start

Oct 25 18:06:47.000 [notice] Channel padding timeout scheduled 230620ms in the past.
Oct 25 18:43:39.000 [notice] Channel padding timeout scheduled 216408ms in the past.
Oct 25 19:31:04.000 [notice] Channel padding timeout scheduled 242298ms in the past.
Oct 25 20:39:16.000 [notice] Channel padding timeout scheduled 540349ms in the past.
Oct 25 23:11:53.000 [notice] Channel padding timeout scheduled 221720ms in the past.

comment:28 Changed 8 months ago by dgoulet

Milestone: Tor: 0.3.1.x-finalTor: 0.3.2.x-final

Putting in 032 so we don't miss it. This notice level is something that is quite present in a normal relay logs. I get maybe 20-30 a day on a fast relay.

Shouldn't we put it to info log? Not sure this is something worth telling the operator when it happens because the operator can't do anything or at best sync its clock?...

comment:29 Changed 8 months ago by cypherpunks

Keywords: tor-client added

0.3.1.9 shows this in Tor Browser on Windows 7.

comment:30 in reply to:  28 Changed 7 months ago by arma

Replying to dgoulet:

Shouldn't we put it to info log? Not sure this is something worth telling the operator when it happens because the operator can't do anything or at best sync its clock?...

I'm a bit nervous about quieting it -- it seems there should be a real bug here, right? My relay has lines like

Dec 21 14:19:54.371 [notice] Channel padding timeout scheduled 617944ms in the past. 
Dec 22 16:22:23.034 [notice] Channel padding timeout scheduled 209517ms in the past. 
Dec 28 14:18:18.271 [notice] Channel padding timeout scheduled 165914ms in the past. 
Jan 06 16:16:11.495 [notice] Channel padding timeout scheduled 186827ms in the past. 
Jan 06 17:57:23.067 [notice] Channel padding timeout scheduled 141687ms in the past. 
Jan 06 17:57:23.068 [notice] Channel padding timeout scheduled 254534ms in the past. 

and I'm quite sure it isn't just freezing for 10+ minutes at a time.

How is Tor setting its timeouts for so far in the past?

comment:31 Changed 7 months ago by arma

Hm:

Jan 13 02:51:31.465 [info] channelpadding_send_padding_cell_for_callback(): Sending netflow keepalive on 3895 to 99.107.176.235:57814 (988E6BEC7AB2451AA66D2F9B2A9FA2A928C9795C) after 5058 ms. Delta 6ms
Jan 13 02:51:38.308 [info] channelpadding_send_padding_cell_for_callback(): Sending netflow keepalive on 3895 to 99.107.176.235:57814 (988E6BEC7AB2451AA66D2F9B2A9FA2A928C9795C) after 6843 ms. Delta 20ms
Jan 13 02:51:42.244 [info] channelpadding_send_padding_cell_for_callback(): Sending netflow keepalive on 3895 to 99.107.176.235:57814 (988E6BEC7AB2451AA66D2F9B2A9FA2A928C9795C) after 3938 ms. Delta 4ms
Jan 13 02:54:28.603 [notice] Channel padding timeout scheduled 159796ms in the past.
Jan 13 02:54:28.603 [info] channelpadding_send_padding_cell_for_callback(): Sending netflow keepalive on 3895 to 99.107.176.235:57814 (988E6BEC7AB2451AA66D2F9B2A9FA2A928C9795C) after 166359 ms. Delta 159798ms
Jan 13 02:54:33.863 [info] channelpadding_send_padding_cell_for_callback(): Sending netflow keepalive on 3895 to 99.107.176.235:57814 (988E6BEC7AB2451AA66D2F9B2A9FA2A928C9795C) after 5260 ms. Delta 6ms
Jan 13 02:54:43.120 [info] channelpadding_send_padding_cell_for_callback(): Sending netflow keepalive on 3895 to 99.107.176.235:57814 (988E6BEC7AB2451AA66D2F9B2A9FA2A928C9795C) after 9258 ms. Delta 10ms
Jan 13 02:54:50.978 [info] channelpadding_send_padding_cell_for_callback(): Sending netflow keepalive on 3895 to 99.107.176.235:57814 (988E6BEC7AB2451AA66D2F9B2A9FA2A928C9795C) after 7860 ms. Delta 6ms

I wonder if my conn to this relay was full of stuff that was trying to flush, during this 160s that surprised the netflow padder. That is, maybe if the outbuf-or-whatever never goes empty, we're never updating the "time to next netflow" timer? And it finally does go empty, and the netflow timer says "omg why did nobody call me until now?"

comment:32 Changed 7 months ago by arma

Ok, I think I've found the bug.

Ingredient one:
channel_timestamp_active(), channel_timestamp_recv(), and channel_timestamp_xmit() are the only functions that update chan->timestamp_xfer. They get called when we add stuff to our outbuf or read stuff from our inbuf, but they never get called when we flush our outbuf, that is, when we actually wrote stuff onto the network.

Ingredient two:
channelpadding_send_padding_cell_for_callback(), if called when chan->has_queued_writes(chan) is true (i.e. we still have stuff on the outbuf still trying to flush), sets chan->next_padding_time back to 0.

Ingredient three:
channelpadding_compute_time_until_pad_for_netflow(), if chan->next_padding_time is zero, sets it to chan->timestamp_xfer + the freshly chosen padding_timeout. And then it promptly checks if it's surprisingly long ago, and warns if it is.

So if it's been 160 seconds since we read any new cells or queued any new cells onto the chan (say, because we've been spending the whole time flushing queued stuff, but we hadn't finished flushing until now, and the other side didn't have anything to say), then timestamp_xfer is going to be 160 seconds in the past, and the newly computed next_padding_time will be slightly less than 160 seconds in the past, causing us to log_notice and then send a padding cell right then -- when in reality we probably flushed on that chan way more recently than 160 seconds ago.

Is the fix to call channel_timestamp_active() from connection_or_finished_flushing()? What is chan->timestamp_active *for*, anyway?

comment:33 Changed 7 months ago by arma

There's a good argument to be made for instead calling channel_timestamp_active() from connection_or_flushed_some(), rather than only when we finish flushing. Since it is "active", for some notion of active.

comment:34 Changed 7 months ago by Pine64ARMv8

I this can help, "Channel padding timeout scheduled XXXXXXms in the past." appears also with Tor 0.3.2.9:

Jan 09 21:10:25.461 [notice] Tor 0.3.2.9 (git-9e8b762fcecfece6) running on Linux with Libevent 2.1.8-stable, OpenSSL 1.1.1-dev, Zlib 1.2.8, Liblzma 5.1.0alpha, and Libzstd 1.3.2.
.../...
Jan 14 22:45:33.000 [notice] Channel padding timeout scheduled 182518ms in the past. 
Jan 15 01:59:48.000 [notice] Heartbeat: Tor's uptime is 5 days 3:59 hours, with 6126 circuits open. I've sent 324.43 GB and received 309.87 GB.
Jan 15 01:59:48.000 [notice] Circuit handshake stats since last time: 2349/2349 TAP, 866755/867105 NTor.
Jan 15 01:59:48.000 [notice] Since startup, we have initiated 0 v1 connections, 0 v2 connections, 0 v3 connections, and 109957 v4 connections; and received 182 v1 connections, 3720 v2 connections, 6311 v3 connections, and 458930 v4 connections.
Jan 15 05:59:48.000 [notice] Heartbeat: Tor's uptime is 5 days 7:59 hours, with 5870 circuits open. I've sent 336.13 GB and received 321.51 GB.
Jan 15 05:59:48.000 [notice] Circuit handshake stats since last time: 1602/1602 TAP, 811672/811672 NTor.
Jan 15 05:59:48.000 [notice] Since startup, we have initiated 0 v1 connections, 0 v2 connections, 0 v3 connections, and 112952 v4 connections; and received 182 v1 connections, 3835 v2 connections, 6599 v3 connections, and 475508 v4 connections.
Jan 15 06:22:19.000 [notice] Channel padding timeout scheduled 221453ms in the past. 
Jan 15 09:59:48.000 [notice] Heartbeat: Tor's uptime is 5 days 11:59 hours, with 7482 circuits open. I've sent 351.16 GB and received 336.47 GB.
Jan 15 09:59:48.000 [notice] Circuit handshake stats since last time: 2110/2110 TAP, 841999/841999 NTor.
Jan 15 09:59:48.000 [notice] Since startup, we have initiated 0 v1 connections, 0 v2 connections, 0 v3 connections, and 116107 v4 connections; and received 183 v1 connections, 4016 v2 connections, 6939 v3 connections, and 493174 v4 connections.
Jan 15 12:26:07.000 [notice] Channel padding timeout scheduled 263389ms in the past. 
Jan 15 13:59:48.000 [notice] Heartbeat: Tor's uptime is 5 days 15:59 hours, with 8796 circuits open. I've sent 363.28 GB and received 348.51 GB.
Jan 15 13:59:48.000 [notice] Circuit handshake stats since last time: 2072/2072 TAP, 750040/750040 NTor.
Jan 15 13:59:48.000 [notice] Since startup, we have initiated 0 v1 connections, 0 v2 connections, 0 v3 connections, and 119247 v4 connections; and received 184 v1 connections, 4184 v2 connections, 7267 v3 connections, and 509112 v4 connections.

comment:35 Changed 7 months ago by arma

For those who want to try the patch, here is what I'm running currently:

diff --git a/src/or/connection_or.c b/src/or/connection_or.c
index dadfdc4..ecc1ae4 100644
--- a/src/or/connection_or.c
+++ b/src/or/connection_or.c
@@ -603,6 +603,9 @@ connection_or_flushed_some(or_connection_t *conn)
 {
   size_t datalen;
 
+  /* Update the channel's active timestamp */
+  channel_timestamp_active(TLS_CHAN_TO_BASE(conn->chan));
+
   /* The channel will want to update its estimated queue size */
   channel_update_xmit_queue_size(TLS_CHAN_TO_BASE(conn->chan));
 

It is alas a rare log message for me, so it's hard to easily tell if I have resolved the issue. Maybe somebody who is getting the log message more often than I am can try it too?

comment:36 Changed 7 months ago by arma

Status: reopenedneeds_review

My bug22212 branch, targeted to maint-0.3.1, fixes this bug.

(Once we decide we like the fix, we should put it in a more recent release before considering backports.)

comment:37 Changed 7 months ago by mikeperry

Status: needs_reviewmerge_ready

This fix looks good to me. It also seems backport-safe, since at worst case these timestamp updates will be redundant to the channel layer ones, and best case they will cover these areas we missed.

comment:38 Changed 7 months ago by arma

Owner: changed from mikeperry to arma
Status: merge_readyaccepted

comment:39 Changed 7 months ago by arma

Reviewer: mikeperry
Status: acceptedmerge_ready

comment:40 Changed 7 months ago by nickm

Keywords: 031-backport 032-backport added

Merged to 0.3.3, marking for possible backport.

comment:41 Changed 7 months ago by nickm

(The branch is now arma_bug22212_031 in my public repo, with a fixed changes file.)

comment:42 Changed 7 weeks ago by teor

Keywords: 031-unreached-backport added; 031-backport removed

0.3.1 is end of life, there are no more backports.
Tagging with 031-unreached-backport instead.

Note: See TracTickets for help on using tickets.