Opened 8 years ago

Closed 4 years ago

#3199 closed defect (implemented)

Refactor periodic events

Reported by: nickm Owned by:
Priority: Very Low Milestone: Tor: 0.2.8.x-final
Component: Core Tor/Tor Version: Tor: 0.2.7
Severity: Normal Keywords: easy, tor-relay, 027-triaged-1-deferrable, 028-triaged, pre028-patch
Cc: Actual Points:
Parent ID: Points: medium
Reviewer: Sponsor: SponsorS

Description

Right now we invoke a truly huge array of things from second_elapsed_callback() and run_scheduled_events(). There are at least 23 separate static time_t values that we do a comparison against every time a second elapses.

This is a really goofy way to handle periodic events. Let's refactor this to instead use libevent's timing code. In addition, we could make the timers first-class, so as to allow better introspection of Tor's status wrt each timer.

Child Tickets

Change History (25)

comment:1 Changed 8 years ago by nickm

Milestone: Tor: 0.2.3.x-finalTor: unspecified

comment:2 Changed 7 years ago by nickm

Keywords: easy added

comment:3 Changed 7 years ago by nickm

Keywords: tor-relay added

comment:4 Changed 7 years ago by nickm

Component: Tor RelayTor

comment:5 Changed 6 years ago by Ry

Current implementation has 2 oddities, do we want them mimicked in the refactor or not?

  1. All events are initially dispatched after 1 second instead of as soon as the main loop is called. (the first call to the run_scheduled_events is in the one_second callback)
  2. The timers exhibit an off by one error, e.g. 60 second intervals happen every 61 seconds. This is due to checking next < now rather than <=.

I can mimic these issues if necessary, but I think fixing/simplifying these at the same time would be a benefit.

comment:6 Changed 6 years ago by nickm

I think that eliminating those oddities is probably a fine thing.

comment:7 Changed 6 years ago by Ry

https://github.com/Ryman/tor/commits/bug3199

Each ported function is a commit in this branch (with minor exception), you might want to use 'Log [GENERAL] info notice' to see what's happening.

Currently, I'm accounting for the off by one errors and the delayed start as it's easier to test. Got some really dumb tests that will hard exit if they don't get an expected time (either now or the last expected time, to account for the one_second being run before or after the other events). The test macros are bad bad bad, but they'll be removed so it's not really a big deal I think.

I've run most of the ports overnight without failure, but it would be good to have a relay running the branch for a good period of time to ensure it's legit and so people can feel better about it. (I can't currently run a relay or bridge to test due to 3g internet)

All the static time_t's that are in run_scheduled_events are ported, there's still further work that might be doable once the dead code is removed. The dead code can be removed after it's seen to have expected behaviour over time.

Once the dead code is removed then we can remove the off by one errors (as the hard tests will be gone) and also break up the per second callbacks into better chunks (and port the second_timer to get rid of redundant initialisation code).

There are a lot of callbacks that get called per second due to precondition failures (e.g. not being a server), we can probably handle this better going forward. Atm, a -1 return just means poll me again next second and to indicate no action was taken, but we could easily have certain signals setup to negative return values, e.g. -7 means run me again when server_mode(options) changes, or even poll me again in 7 seconds but don't mark action as having taken. This should be pretty simple to do in future, it's just a matter of implementing it without bugs! :D

Also, I haven't really made these timers accessible/first-class yet, I couldn't figure out an especially clean way to do it without having to edit two points per code change, but it should be easy enough to do.

comment:8 Changed 6 years ago by Ry

Status: newneeds_review

comment:9 Changed 4 years ago by nickm

Milestone: Tor: unspecifiedTor: 0.2.7.x-final

These might actually be 0.2.7 material

comment:10 Changed 4 years ago by nickm

Keywords: 027-triaged-1-deferrable added

Marking tickets as deferrable from 0.2.7 triage round-1

comment:11 Changed 4 years ago by nickm

Milestone: Tor: 0.2.7.x-finalTor: 0.2.8.x-final

Deferring once more after triage :(

The main issue here is that this is a code movement patch primarily, which makes it time-consuming to review.

comment:12 Changed 4 years ago by isabela

Keywords: SponsorS added
Points: small
Priority: normaltrivial
Version: Tor: 0.2.7

comment:13 Changed 4 years ago by nickm

Keywords: 028-triaged added

comment:14 Changed 4 years ago by nickm

Keywords: SponsorS removed
Sponsor: SponsorS

Bulk-replace SponsorS keyword with SponsorS sponsor field in Tor component.

comment:15 Changed 4 years ago by nickm

Points: smallmedium

comment:16 Changed 4 years ago by nickm

Keywords: pre028-patch added

comment:17 Changed 4 years ago by nickm

Severity: Normal
Status: needs_reviewneeds_revision

Notes on the design:

  • Multi-statement macros should get wrapped in STMT_BEGIN and STMT_END.
  • Instead of exit(0), use tor_assert(0) for bugs.
    • INCREMENT_DELTA_AND_TEST is IMO a problematic interface -- having to get the id field correct is a very fragile.
    • Having relative times and absolute times _both_ be possible return types from event functions is also fragile IMO.
    • I'm not sure I completely understand what run_scheduled_events is doing any more -- it seems like we'd be better off finding a way to kill it off entirely. I bet we can do so more easily with the refactoring that has already happened in run_scheduled_events() in the master branch.

Notes on the patch set:

  • Needs tests! :)


Notes on the code movement:

  • our best strategy for merging this patch right now is probably to copy the infrastructure patches, clean 'em up, test 'em, and then re-do the code movement parts one at a time, since there has been a lot of code drift in run_scheduled_events() since this patch was first written.
  • with that in mind I'm not going to review the code movement piece of this. :)

comment:18 Changed 4 years ago by nickm

I've started doing this port in a branch called bug3199_redux in my public repository, based on current master and Ryman's code. Nothing in it has been refactored yet, but I believe I've taken care of all the notes above.

For what it's worth: we should also refactor this to use monotonic time, not time(NULL).

comment:19 Changed 4 years ago by nickm

Status: needs_revisionneeds_review

Okay. Now bug3199_redux seems to be at feature parity with the original branch, and should merge cleanly into master.

If somebody can give it a quick once-over, I can merge it!

comment:20 Changed 4 years ago by nickm

bug3199_redux_2 is the latest version (lightly squashed, irrelevant commits removed.)

comment:21 Changed 4 years ago by nickm

The branch is now squashed a little more as bug3199_redux_3, and I've cleaned up enough documentation.

comment:22 Changed 4 years ago by special

It makes me a little sad that events like rotate_onion_key, check_ed_keys, record_bridge_stats will be called once per second on clients to handle the unlikely case where the configuration changes and they're now relevant. That is consistent with previous behavior, though. They have a little more overhead than they used to - at least one time() and event_add() each.

It wasn't obvious what returning -1 from the callback would do, but it is documented. A constant would help, maybe. Speaking of which:

+* interval time. If the returned value is larger than <b>now</b> then it
+* is assumed to be a future time to poll again. */

From periodic.h:11. This comment is wrong; it will assert if a timestamp is returned.

In periodic_event_dispatch, the log_info is done after the callback. I think it would make more sense before.

I did not notice anything that moved and might be sensitive to being called in a different order, but I could imagine that being very subtle.

Nitpickery:

main.c:1612 - not new, but this function is run_connection_housekeeping, not connection_run_housekeeping.

main.c:1579 - extra space before GREEDY_DESCRIPTOR_RETRY_INTERVAL

main.c:1458 - the 3a comment is over-indented

periodic.c:15 - greate -> greater, ground -> grounds

LGTM otherwise.

comment:23 in reply to:  22 Changed 4 years ago by nickm

Replying to special:

It makes me a little sad that events like rotate_onion_key, check_ed_keys, record_bridge_stats will be called once per second on clients to handle the unlikely case where the configuration changes and they're now relevant. That is consistent with previous behavior, though. They have a little more overhead than they used to - at least one time() and event_add() each.

Indeed. I don't like this behavior either, but I think that it's best to change the behavior in another ticket. I'll open one after I merge this.

(All other comments applied to the branch)

comment:24 Changed 4 years ago by nickm

Merged this, and opened #17623 to track the follow-on cleanup.

comment:25 Changed 4 years ago by nickm

Resolution: implemented
Status: needs_reviewclosed
Note: See TracTickets for help on using tickets.