Opened 4 months ago

Last modified 9 days ago

#22212 reopened defect

[warn] channelpadding_compute_time_until_pad_for_netflow(): Bug: Channel padding timeout scheduled 164729ms in the past. Did the monotonic clock just jump?

Reported by: arma Owned by: mikeperry
Priority: Medium Milestone: Tor: 0.3.1.x-final
Component: Core Tor/Tor Version: Tor: 0.3.1.1-alpha
Severity: Normal Keywords:
Cc: mikeperry Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Normal Tor client, running Tor 0.3.1.0-alpha-dev 2e4f3b36bdfd5118, got this log line:

May 10 02:50:11.217 [warn] channelpadding_compute_time_until_pad_for_netflow(): Bug: Channel padding timeout scheduled 164729ms in the past. Did the monotonic clock just jump? (on Tor 0.3.1.0-alpha-dev 2e4f3b36bdfd5118)

My tor client had been idle for hours before this point, and the log message coincides with a new request that I made. I'm pretty sure there was no clock jump.

Alas, I have no more detailed logs. I'll try to reproduce with logs.

Child Tickets

Attachments (1)

info.log (165.6 KB) - added by teor 4 months ago.
Client log with padding error produced using chutney ref nodes.1495684579

Download all attachments as: .zip

Change History (25)

comment:1 Changed 4 months ago by nickm

Cc: mikeperry added

comment:2 Changed 4 months ago by mikeperry

Hrmm.. Noting some thoughts while we wait for more info or other instances: This could have happened if Tor stopped running its one-per-second housekeeping while you were reconnecting..

That large of a delta indicates a padding time was chosen for the channel, but a timer wasn't scheduled yet. Then that channel was forgotten about for about two minutes, until the connection housekeeping started running again, and we tried to schedule the padding timer, but found out it should have happened 2 minutes ago...

comment:3 Changed 4 months ago by mikeperry

(Either way, this is not catastrophic. Once that warn fired, padding was sent immediately, and the timestamp was reset...)

comment:4 Changed 4 months ago by arma

I have not yet been able to reproduce.

Changed 4 months ago by teor

Attachment: info.log added

Client log with padding error produced using chutney ref nodes.1495684579

comment:5 Changed 4 months ago by teor

Version: Tor: 0.3.1.1-alpha

This happened to me once when I was using chutney.
Since I am using a VM, there might have been an actual clock jump.

comment:6 in reply to:  5 ; Changed 4 months ago by arma

Replying to teor:

This happened to me once when I was using chutney.
Since I am using a VM, there might have been an actual clock jump.

You have log lines for that second, the second before it, and the second after it. So I am tempted to think there was no clock jump.

Also, you might enjoy "LogTimeGranularity 1" in your chutney torrc files. :)

comment:7 in reply to:  6 Changed 4 months ago by teor

Replying to arma:

...
Also, you might enjoy "LogTimeGranularity 1" in your chutney torrc files. :)

Added in #22373.

comment:8 Changed 4 months ago by mikeperry

Hrmm.. So in this log it looks like circuit construction delayed the connection housekeeping/run_scheduled_events() for a few seconds. This delay meant that the padding time elapsed before channelpadding_decide_to_pad_channel() was called, which meant that the time was in the past by the time it was called.. See also https://trac.torproject.org/projects/tor/ticket/16585

I am not sure what to do about this. I could change the code to always schedule an actual timer right away and cancel it if data is sent instead (instead of waiting around until we get within a second before scheduling a timer). This may mean more timer churn, but now that the timers are more efficient, perhaps this is not an issue.

Or, we could demote the log message to notice.

In terms of when padding actually gets sent, I'm guessing that it still won't happen any earlier with the schedule and cancel timer approach, because if this is #16585, libevent wouldn't call us back any sooner anyway.

This is making me think that we just demote this message and not increase the code complexity.. Any objections?

Last edited 4 months ago by mikeperry (previous) (diff)

comment:9 Changed 3 months ago by toralf

I go tthat message 3 times within last 48 hours with 0.3.1.3-alpha. eg.:

Jun 16 15:02:36.000 [warn] channelpadding_compute_time_until_pad_for_netflow(): Bug: Channel padding timeout scheduled 39ms in the past. Did the monotonic clock just jump? (on Tor 0.3.1.3-alpha dc47d936d47ffc25)

comment:10 in reply to:  9 ; Changed 3 months ago by arma

Replying to toralf:

I go tthat message 3 times within last 48 hours with 0.3.1.3-alpha. eg.:

Jun 16 15:02:36.000 [warn] channelpadding_compute_time_until_pad_for_netflow(): Bug: Channel padding timeout scheduled 39ms in the past. Did the monotonic clock just jump? (on Tor 0.3.1.3-alpha dc47d936d47ffc25)

What were the numbers you got for the other two times? (Apparently the numbers will help Mike guess about the bug.)

Last edited 3 months ago by arma (previous) (diff)

comment:11 Changed 3 months ago by nickm

Owner: set to mikeperry
Status: newassigned

comment:12 Changed 3 months ago by mikeperry

Status: assignedneeds_review

Ok, I demoted this to notice if the callback is over 10s late, and to info if it is less than that. I don't think there is anything more we can do here, until we're either multi-threaded for we have a solution for #16585.

See mikeperry/bug22212 (7914091445e076871bc9fdd41f8304a9b3af8e8c).

comment:13 Changed 3 months ago by nickm

Status: needs_reviewneeds_revision

looks ok, needs a changes file?

comment:14 Changed 3 months ago by nickm

Also you could write the body as:

  if (ms_until_pad_for_netflow < 0) {
    int severity = (ms_until_pad_for_netflow < -DFLT_ETC) ? LOG_NOTICE : LOG_INFO;
    log_fn(severity, LD_OR, ...)
    return 0;
  }

but I don't know if that's actually any more clear.

Last edited 3 months ago by nickm (previous) (diff)

comment:15 in reply to:  10 Changed 3 months ago by toralf

Replying to arma:

Replying to toralf:

I go tthat message 3 times within last 48 hours with 0.3.1.3-alpha. eg.:

Jun 16 15:02:36.000 [warn] channelpadding_compute_time_until_pad_for_netflow(): Bug: Channel padding timeout scheduled 39ms in the past. Did the monotonic clock just jump? (on Tor 0.3.1.3-alpha dc47d936d47ffc25)

What were the numbers you got for the other two times? (Apparently the numbers will help Mike guess about the bug.)

mr-fox linux # grep channel /tmp/warn.log  | cut -f2- -d')' | cut -f1 -d'.'
: Bug: Channel padding timeout scheduled 39ms in the past
: Bug: Channel padding timeout scheduled 1396ms in the past
: Bug: Channel padding timeout scheduled 25ms in the past
: Bug: Channel padding timeout scheduled 516ms in the past
: Bug: Channel padding timeout scheduled 395ms in the past
: Bug: Channel padding timeout scheduled 979ms in the past
: Bug: Channel padding timeout scheduled 25ms in the past
: Bug: Channel padding timeout scheduled 17ms in the past
: Bug: Channel padding timeout scheduled 114ms in the past
: Bug: Channel padding timeout scheduled 1163ms in the past

comment:16 Changed 3 months ago by mikeperry

Status: needs_revisionneeds_review

Ok, I added a changes file, and refactored that check. I also lowered the time limit for the notice line to 4.5 seconds, which is the point at which we start to risk allowing a new netflow record to get emitted.

comment:17 Changed 3 months ago by nickm

Resolution: fixed
Status: needs_reviewclosed

squashed and merging, in 0.3.1 and forward.

comment:18 Changed 3 months ago by toralf

Jul 03 20:51:39.000 [notice] Tor 0.3.1.4-alpha (git-fab91a290ded3e74) opening log file.
Jul 03 20:51:44.000 [notice] Channel padding timeout scheduled 519741ms in the past.

comment:19 in reply to:  18 Changed 3 months ago by teor

Resolution: fixed
Status: closedreopened

Replying to toralf:

Jul 03 20:51:39.000 [notice] Tor 0.3.1.4-alpha (git-fab91a290ded3e74) opening log file.
Jul 03 20:51:44.000 [notice] Channel padding timeout scheduled 519741ms in the past.

Skipping 520 seconds of padding is concerning, re-opening this issue.

comment:20 Changed 2 months ago by asn

FWIW, I got this warning on a chutney network after I suspended my laptop for a few seconds while the network was running. Unfortunately, I've lost the logs but I'm pretty sure that the branch included the fix above.

comment:21 Changed 4 weeks ago by toralf

I added a 2nd Tor exit relay to my machine, listening on ports 9001 and 9030, ipv4 and ipv4.
The new is 6EABEBF38CE7E3DF672C4DB01383606FE3EB2215 (the old is 1AF72E8906E6C49481A791A6F8F84F8DFEBBB2BA)
Started after rebooting the machine at the same time, both notice logs are nearly identical, except now for the message seen below of the old, matured exit relay. I got few minutes ago :

Aug 26 10:27:48.000 [notice] Received reload signal (hup). Reloading config and resetting internal state.
Aug 26 10:27:48.000 [notice] Read configuration file "/etc/tor/torrc".
Aug 26 10:27:48.000 [notice] Tor 0.3.1.5-alpha (git-83389502ee631465) opening log file.
Aug 26 10:27:49.000 [notice] Channel padding timeout scheduled 231988ms in the past. 

The new exit (just few days old) didn't show this behavior.

comment:22 Changed 2 weeks ago by nickm

(Mike, any insight here?)

comment:23 Changed 9 days ago by mikeperry

Two lines is not a whole lot of context here. It's also not clear if these are all happening after a SIGHUP, after a fresh start of Tor, or both/either?

comment:24 Changed 9 days ago by mikeperry

That said, if this happens right after a fresh start of Tor, then this may be caused by us not setting chan->timestamp_xfer_ms on channels before that code is reached. Odd that this is suddenly happening now, though. Perhaps some change in the channel code is causing us to not set that field in some cases? I can look into that, I guess.

Note: See TracTickets for help on using tickets.