Opened 7 years ago

Closed 2 years ago

#6572 closed defect (fixed)

“Bug: Circuit somehow completed a hop while the network was not live.”

Reported by: bpmcontrol Owned by: mikeperry
Priority: Medium Milestone: Tor: 0.2.4.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-client
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

[Tue Aug 7 19:50:10 2012] Tor Software Error - The Tor software encountered an internal bug. Please report the following error message to the Tor developers at bugs.torproject.org: "circuit_build_times_network_close(): Bug: Circuit somehow completed a hop while the network was not live. Network was last live at 2012-08-07 19:49:09, but circuit launched at 2012-08-07 19:49:10. It's now 2012-08-07 19:50:10.
"
happened after i got a faster upload deal from my isp (just for an exit node), and a new router.

Child Tickets

Change History (25)

comment:1 Changed 7 years ago by rransom

Component: - Select a componentTor Client
Milestone: Tor: 0.2.3.x-final
Owner: set to mikeperry
Priority: normalmajor
Status: newassigned
Summary: this appeared in my log.“Bug: Circuit somehow completed a hop while the network was not live.”

This looks to me like a bug in the path-bias attack detection code.

comment:2 Changed 7 years ago by bpmcontrol

i don't know if it has anything to do with this message, but my router just re assigned a new ip to the box running the exit node.
this might be the result of a side effect.
if so, im sorry to bother u.
if there is anything u would like me to check or do, please let me know.

comment:3 in reply to:  1 ; Changed 7 years ago by mikeperry

Replying to rransom:

This looks to me like a bug in the path-bias attack detection code.

This is a CBT message. It has nothing to do with path bias, actually. Could be caused by a clock change maybe? Are there any other messages about the clock jumping?

comment:4 in reply to:  3 Changed 7 years ago by bpmcontrol

nothing about a clock, no...
i've restarted it with a static ip since then. and it seems to work fine.
but, i now see much less traffic relayed through me.
i think it is the router's fault. is there a recommended model?

comment:5 Changed 7 years ago by mikeperry

The warn is not related to your relay's traffic. It's actually a warn from the client side code, and it's non-fatal. It just means the clock or the client circuit construction code did something weird.

comment:6 in reply to:  5 Changed 7 years ago by bpmcontrol

thank you for the quick reply.
amazing work guys. all of it.

comment:7 Changed 7 years ago by nickm

Milestone: Tor: 0.2.3.x-finalTor: 0.2.4.x-final

This is annoying, but harmless probably. let's fix it in 0.2.4. We can backport if it turns out to matter.

comment:8 Changed 7 years ago by nickm

Keywords: tor-client added

comment:9 Changed 7 years ago by nickm

Component: Tor ClientTor

comment:10 Changed 7 years ago by nickm

Keywords: 024-deferrable added
Priority: majornormal

(We should try to fix this in 0.2.4, but shouldn't block on it.)

comment:11 Changed 7 years ago by mikeperry

Status: assignedneeds_information

nickm: I think I need your help here with Tor's network-related codepaths.

Here's the situation: This log message is emitted because a circuit timed with cpath->state == CPATH_STATE_OPEN, but that circuit was launched *before* the last recorded network activity. Obviously this is an impossibility, hence the warn. We record timestamps of network activity by calling circuit_build_times_network_is_live() in various places in the codebase where we receive network data. The timestamps in this log message indicate we must have forgotten to add a call somewhere such that the circuit was able to transition to CPATH_STATE_OPEN for its first hop without us calling circuit_build_times_network_is_live().

My question is: Where should we call circuit_build_times_network_is_live() so that we're *sure* that we received fresh, valid data from the Internet?

I thought that the calls to circuit_build_times_network_is_live() from connection_or_process_cells_from_inbuf() should have covered everything, but is there someplace deeper we should use instead?

comment:12 Changed 7 years ago by mikeperry

Hrmm.. Some instances of this could be caused by the updates to timestamp_began I introduced in the path bias changes and #7341..

I will verify that the recent changes that introduce updates to timestamp_began can't introduce more of these cases, and I will correct any I find, but this bug number predates both of those changes, so a quick spot-check of the network functions I mentioned is still a good idea.

comment:13 Changed 7 years ago by mikeperry

There are only 4 places where timestamp_began gets updated in the source:

  1. circuit_deliver_create_cell() - When we first decide to send the initial CREATE cell
  2. pathbias_send_usable_probe() - At circuit close when we decide to probe it
  3. circuit_extend_to_new_exit() - Some cannibalization cases
  4. circuit_launch_by_extend_info() - Other cannibalization cases

1 should not cause this issue, because in this case the circuit still has not completed any hops.

2 should not cause this issue, because the circuit is already opened, and this log message is in a code block that checks has_opened.

3 and 4 should not cause this issue, again because they should only be caused for has_opened circuits. Additionally, they also increment the desired_path_len, which this code block also checks is exactly 3...

I suppose we could beef up this log line to make extra sure none of the above have happened somehow, but otherwise I am at a loss...

comment:14 Changed 7 years ago by nickm

Status: needs_informationassigned

One stupid possibility: Our time mostly isn't guaranteed to be monotonic. I mean, probably it is? And bpmcontrol said that there weren't any warnings about jumpy or backwards time. But still.

My question is: Where should we call circuit_build_times_network_is_live() so that we're *sure* that we received fresh, valid data from the Internet?

Oh wow; I think Roger knows this stuff better than I. If anybody knows it at all.

I think that the call in connection_or_process_cells_from_inbuf *should* be good enough. If we got cells, we're reading data. Certainly we can't have a circuit become OPEN without receiving a CREATED or EXTENDED cell, right?

Oh hey. This message happens if the circuit was _launched_ when we were down, not if it was _completed_ when we were down, right? That's totally valid. Launching a circuit when the network is down IIUC means that we thought the network wasn't working, but we decided to poke it with a circuit anyway in case it really is up again. We'll decide that the network is up only when we finally get a successful OR connection IIUC.

comment:15 Changed 7 years ago by mikeperry

But the circuit actually somehow transitioned its cpath state into OPEN.. Not sure how this happens unless either the clock is not monotonic or one of those four cases reset the timestamp.

One thing I can do is change the timestamp usage here from timestamp_began to timestamp_created. That should eliminate the four cases I'm concerned about without a substantial impact on liveness detection. I will push a branch for that shortly.

That still might not fix the original cause, though... :/

comment:16 Changed 7 years ago by mikeperry

mikeperry/bug6572 has a fix that would cover the 4 cases I mentioned above (even if I missed something).

This should leave us only with clock jumps as the remaining possibility. If we have a function I can call to check if the clock went backwards or jumped, I can also add that before emitting the warn...

comment:17 Changed 7 years ago by mikeperry

Nick says circuit_note_clock_jumped() is one possibility.

comment:18 Changed 7 years ago by mikeperry

Status: assignedneeds_review

circuit_note_clock_jumped is not exactly amenable to what I'd need here. Based on the log, it seems like the clock jumped only a second or less. I am not sure we could actually make a reliable measure of that..

However, my branch should be good enough to guard against the four cases I listed above, so we should probably merge that.

I also added a commit to lower this log message to LD_CIRC@notice and clarify it a little.

comment:19 Changed 7 years ago by nickm

Keywords: 024-backport added; 024-deferrable removed
Milestone: Tor: 0.2.4.x-finalTor: 0.2.5.x-final
Status: needs_reviewneeds_information

Hm. This looks okay to me, I think. I'll merge this into 0.2.4, then put it into 0.2.5 needs-information 024-backport to express "if there's still an issue here, let's fix and maybe backport it."

comment:20 Changed 6 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.???

comment:21 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:22 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:23 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:24 Changed 2 years ago by nickm

Keywords: 024-backport removed

None of these is ripe for backport to 0.2.4 even if it does get fixed.

comment:25 Changed 2 years ago by nickm

Milestone: Tor: unspecifiedTor: 0.2.4.x-final
Resolution: fixed
Severity: Normal
Status: needs_informationclosed

Hm. This looks okay to me, I think. I'll merge this into 0.2.4, then put it into 0.2.5 needs-information 024-backport to express "if there's still an issue here, let's fix and maybe backport it."

0.2.4 is long ago; time to close this.

Note: See TracTickets for help on using tickets.