Opened 8 years ago

Last modified 4 weeks ago

#1926 new defect

Extra-info descriptors have corrupt {write,read}-history lines

Reported by: karsten Owned by:
Priority: Low Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-relay metrics extrainfo bandwidth
Cc: Actual Points:
Parent ID: #29019 Points:
Reviewer: Sponsor:

Description

While parsing extra-info descriptors for the dir bytes analysis, I found that some of them have weird write-history or read-history lines with obviously wrong dates. I wonder what leads to these false reports and if there are more errors like this in extra-info descriptors.

Here are some examples:

published 2010-09-03 08:21:00
write-history 2002-01-01 00:47:32 (900 s) 0,109568,44032,189440,219136,
92160,190464,354304,126976
read-history 2002-01-01 00:47:32 (900 s) 816128,1042432,163840,180224,
2770944,221184,133120,359424,770048
published 2010-06-08 19:20:14
write-history 2003-01-01 02:20:36 (900 s) 8192,26624,18432,49152,169984
read-history 2003-01-01 02:20:36 (900 s) 147456,154624,13312,23552,2411520
published 2010-08-17 17:14:51
write-history 2003-01-01 03:16:39 (900 s) 3454976
read-history 2010-08-17 04:16:39 (900 s)
published 2010-08-31 12:12:21
write-history 2002-01-01 03:06:31 (900 s) 117760,69632,59392,52224,61440,
871424,3474432,3022848,4219904,2000896,160768,138240,138240,175104,131072,
47104,59392,33792,37888,87040
read-history 2002-01-01 03:06:31 (900 s) 1923072,52224,194560,35840,
1046528,906240,3657728,3198976,5303296,2452480,338944,423936,1155072,
308224,119808,193536,1045504,91136,191488,104448
published 2010-08-31 12:12:28
write-history 2002-01-01 03:06:31 (900 s) 117760,69632,59392,52224,61440,
871424,3474432,3022848,4219904,2000896,160768,138240,138240,175104,131072,
47104,59392,33792,37888,87040
read-history 2010-08-31 12:06:31 (900 s) 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0
published 2010-05-28 21:30:25
write-history 2010-05-28 21:15:26 (900 s) 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0
read-history 1970-01-02 07:45:26 (900 s) 1245184,447488,1458176,1916928,
324608,479232,133120,272384,1698816,791552,573440,381952,352256,667648,
451584,861184,515072,902144,301056,5548032,14390272,11052032,395264,
372736,250880,560128,202752,199680,1078272,915456,2571264,227328,194560,
602112,189440,325632,194560,450560,320512,180224,263168,488448,167936,
172032,135168,506880,175104,205824,23552,401408,21504,21504,180224,274432,
18432,139264,15360,399360,13312,143360,20480,405504,16384,143360,16384,
398336,11264,142336,15360,270336,15360,143360,15360,398336,15360,14336,
20480,407552,18432,147456,13312,414720,13312,144384,10240,406528,3072,
135168,8192,270336,4096,7168,5120,404480,7168,5120

Any idea what causes this problem? I didn't find an obvious pattern in these examples. Is it just memory corruption on the relays?

Child Tickets

Attachments (5)

past (13.5 KB) - added by karsten 8 years ago.
History lines lying in the past of the descriptor publication time
future.bz2 (148.0 KB) - added by karsten 8 years ago.
History lines lying in the future of the descriptor publication time
falsedata (5.0 KB) - added by qbi 8 years ago.
Original output of my script (collecting wrong date entries from descriptors)
router-platform-date-V1.bz2 (11.4 KB) - added by qbi 8 years ago.
Listing of router, platform and dates where router publishes wrong history data
V1-dirinfo.bz2 (276.1 KB) - added by qbi 8 years ago.

Download all attachments as: .zip

Change History (30)

comment:1 Changed 8 years ago by nickm

Do these correlate with any particular pattern of tor versions?

comment:2 Changed 8 years ago by karsten

I found 2 relays running 0.2.1.26 on Windows and 1 relay running 0.2.1.21 on Linux. I admit that's not much. If this doesn't help, I can look at the 2009 archives, too, to find more cases.

comment:3 Changed 8 years ago by karsten

I had a closer look at the archives and found more mysteries. I think there's a bug in the byte counting code or in the state loading code. But staring at the code for two hours didn't help me find the bug. Maybe someone else is more lucky?

Here's my analysis so far:

  • Affected versions range from 0.2.0.23-rc to 0.2.2.13-alpha, so it's an old bug. The bug might even be older than 0.2.0.x when relays wrote their history lines to their server descriptors which I didn't look at.
  • Both relays on Windows and Linux are affected. I didn't find other platforms, but I expect the bug to affect all platforms.
  • Reported write-history and read-history lines have dates both in the past (see attached file past) and in the future (see attached file future.bz2). I found timestamps 1970-01-02 and 1985-06-11 as well as 2020-08-01 and 2028-09-23.
  • For timestamps in the future, relays keep reporting the same history lines over and over again. We have a relay reporting the same bandwidth history ending on 2009-08-12 for two years between 2007-08-15 and 2009-07-11.

Someone willing to win the oldest bug prize? :)

Changed 8 years ago by karsten

Attachment: past added

History lines lying in the past of the descriptor publication time

Changed 8 years ago by karsten

Attachment: future.bz2 added

History lines lying in the future of the descriptor publication time

comment:4 Changed 8 years ago by Sebastian

How about this theory for dates in the future: You ran your relay with the wrong time (the one in the future), and wrote a state file. Then you corrected your clock. Then Tor loads the clock, and doesn't discard that strange data, but rather keeps reporting it until either your state file is erased or the incorrect time has been reached

comment:5 Changed 8 years ago by karsten

Component: MetricsTor Relay

comment:6 Changed 8 years ago by arma

Milestone: Tor: 0.2.2.x-final
Priority: normalminor

comment:7 Changed 8 years ago by qbi

I looked at all available data and tried to extract wrong dates. The table below lists my findings. Basically I extracted the (read|write)-history dates from the data and threw away the correct year. So you'll find sometimes entries from December in January entries. Those are in my opinion not wrong dates.

My script also threw away double date entries. So there are cases where a date occurs on my than one day. So you'll find sometimes the same days on entries over weeks or even moths. One case is 2009-08-12. You can find it on each month between August 2007 and December 2009 (18 times). Other "special" dates are:

  • 2035-08-28, 2009-06-03, 2009-05-03 (6 entries in the table below)
  • 2001-12-31 (5 entries in the table below)
  • 2021-01-18, 2020-08-01, 2012-04-30, 2010-03-16, 2008-12-31, 2008-08-09, 2008-05-30 (4 entries in the table below)
  • 2014-05-27, 2007-01-24, 2002-01-01 (3 entries in the table below)

I have now a script running which extracts all descriptors with wrong dates. I'll attach it when it is finished.

The maximum and minimum date is highlighted.

02 2005 2000-01-01
03 2005
04 2005 2000-01-07
05 2005
06 2005
07 2005
08 2005
09 2005 2002-09-01
10 2005 2002-11-29
11 2005
12 2005
01 2006 2005-12-07, 2005-12-30, 2005-12-31
02 2006
03 2006
04 2006 2002-04-24
05 2006 2007-05-27
06 2006 2007-05-27
07 2006
08 2006 2007-08-10
09 2006 2008-09-08
10 2006 2007-01-24, 2010-07-17
11 2006 2007-01-24, 2007-01-26, 2007-07-16, 2007-12-06, 2010-07-17
12 2006 2007-01-24, 2007-12-03, 2007-12-09, 2007-12-13, 2007-12-20
01 2007 2006-12-30, 2006-12-31, 2021-01-18
02 2007 2014-05-27, 2021-01-18
03 2007 2009-05-03, 2009-06-03, 2010-03-16, 2014-05-27, 2014-11-25, 2035-08-28
04 2007 2008-04-01, 2009-04-12, 2009-05-03, 2009-06-03, 2010-03-16, 2014-05-27, 2014-11-25, 2021-01-18, 2021-03-31, 2035-08-28
05 2007 2008-05-30, 2009-02-16, 2009-05-03, 2009-06-03, 2010-03-16, 2012-04-30, 2015-05-07, 2021-05-11, 2030-05-04, 2035-08-28
06 2007 2008-05-30, 2008-06-21, 2009-05-03, 2009-06-03, 2010-03-16, 2012-04-30, 2013-06-18, 2017-06-03, 2027-06-09, 2030-05-04, 2035-08-28
07 2007 2008-05-03, 2008-05-30, 2008-06-21, 2008-07-01, 2009-05-03, 2009-06-03, 2012-04-30, 2013-06-18, 2035-08-28
08 2007 2008-08-09, 2009-08-12, 1999-11-13, 2008-05-03, 2008-05-30, 2008-08-09, 2008-08-17, 2009-05-03, 2009-06-03, 2009-08-12, 2010-08-10, 2012-04-30, 2020-08-01, 2021-01-18, 2027-07-31, 2035-08-28
09 2007 1985-06-11, 2008-08-09, 2008-09-15, 2009-08-12, 2020-08-01
10 2007 2008-08-09, 2009-08-12, 2011-01-01
11 2007 2009-08-12, 2020-08-01
12 2007 2009-08-12, 2020-08-01
01 2008 2009-08-12
02 2008 2009-08-12
03 2008 2009-08-12
04 2008 2004-08-26, 2009-08-12
05 2008 2001-01-01, 2009-08-12
06 2008 2007-01-01, 2009-08-12
07 2008 2009-08-12
08 2008 2009-08-12
09 2008 2009-08-12
10 2008 2007-10-26, 2009-08-12, 2009-10-29, 2015-07-19
11 2008 2009-08-12
12 2008 2009-01-01, 2009-08-12
01 2009 2001-12-31
02 2009 2001-12-31
03 2009
04 2009
05 2009
06 2009
07 2009 2028-09-23
08 2009
09 2009 2006-01-03
10 2009
11 2009 2004-11-01, 2006-12-26
12 2009 2010-01-01
01 2010 2009-10-07, 2009-11-30
02 2010 2003-12-31
03 2010 2008-12-31
04 2010 2008-01-02
05 2010 1970-01-02, 2001-12-31, 2009-10-07
06 2010 2003-01-01, 2009-01-01, 2009-12-21
07 2010 2001-01-01, 2001-07-12
08 2010 2002-01-01, 2003-01-01, 2007-12-31
09 2010 2002-01-01, 2003-01-12, 2008-12-31
10 2010 2000-01-01, 2000-01-02, 2006-08-15, 2008-12-31
11 2010 2001-12-31, 2002-01-03, 2009-08-17
12 2010 2001-12-31, 2002-12-30, 2009-03-11
01 2011 2002-01-01, 2002-01-23, 2010-01-16, 2010-09-05, 2010-10-15, 2010-11-19, 2010-11-22, 2010-11-24, 2012-01-17 (also many days in 12/2010)

Changed 8 years ago by qbi

Attachment: falsedata added

Original output of my script (collecting wrong date entries from descriptors)

comment:8 in reply to:  7 Changed 8 years ago by qbi

Replying to qbi:

My script also threw away double date entries. So there are cases where a date occurs on my than one day. So you'll find sometimes the same days on entries over weeks or even moths. One case is 2009-08-12. You can find it on each month between August 2007 and December 2009 (18 times). Other "special" dates are:

Correction: It should read "between August 2007 and December 2008".

Changed 8 years ago by qbi

Attachment: router-platform-date-V1.bz2 added

Listing of router, platform and dates where router publishes wrong history data

comment:9 Changed 8 years ago by qbi

I've found some more information:

*

comment:10 in reply to:  9 Changed 8 years ago by qbi

Replying to qbi:

I've found some more information:

  • The first router version is 0.1.0.0-alpha-cvs.
  • So far I only found Linux and Windows systems:
    • Linux i586 and i686
    • Windows 2000 SP4 server and workstation
    • Windows Me
    • Windows Longhorn
    • Windows XP SP1 and SP2
    • Windows 2003
  • When a date occurs multiple times, read- and write-history have exactly the same value (in seconds since 1970). For instance there are nearly 150 entries with the below dates:
    router cedartor 85.216.69.79 9001 0 0
    platform Tor 0.1.1.23 on Linux i686
    published 2006-10-27 15:28:49
    2007-01-24 09:44:26
    2007-01-24 09:44:26
    
  • When read- and write-history date are different, you only find one entry. Sometimes one of those entries is a correct date and sometimes both differ.

comment:11 Changed 8 years ago by qbi

I have some suspicion that (part of) the bug is located in rephist.c:1259. If the date on a local computer is set to some point in the future, then AFAIS it returns this date. This happens until the real date is reached.

Furthermore we should check if the value of b is really NULL. If not this should also give some nasty effects. I'll look into that later and comment some more.:)

comment:12 in reply to:  11 Changed 8 years ago by karsten

Replying to qbi:

I have some suspicion that (part of) the bug is located in rephist.c:1259. If the date on a local computer is set to some point in the future, then AFAIS it returns this date. This happens until the real date is reached.

I'm not sure which date you mean and how we would end up in the situation you describe. Can you elaborate?

But I wonder if we can assume that there is no big time difference between starting Tor and calling add_obs(). Except if you mean that single clock jumps while running Tor lead to this bug. So, when starting Tor we should notice if our current system time differs much from the time stored in our state file. I wonder if Tor writes apparently wrong timestamps to its state and doesn't validate these timestamps when loading the state.

Furthermore we should check if the value of b is really NULL. If not this should also give some nasty effects. I'll look into that later and comment some more.:)

Which line? Do you mean NULL or not NULL?

comment:13 Changed 8 years ago by qbi

I collected all V1 directory information in one document. It is to large to upload it here. So you find it at V1-dirinfo.bz2. This document contains all "broken" descriptors.

Changed 8 years ago by qbi

Attachment: V1-dirinfo.bz2 added

comment:14 in reply to:  7 Changed 8 years ago by qbi

Replying to qbi:

Update:

02 2011 2000-01-01, 2008-09-11, 2009-08-11, 2010-09-05, 2010-10-15, 2010-11-19, 2010-11-24

comment:15 Changed 8 years ago by qbi

As I suspected the dirreq-(read|write)-history lines are also affected by this bug. I found several lines with wrong dates so far.

comment:16 in reply to:  15 Changed 8 years ago by karsten

Replying to qbi:

As I suspected the dirreq-(read|write)-history lines are also affected by this bug. I found several lines with wrong dates so far.

Interesting! This makes sense, because I reused the byte history code for dirreq histories. Can you paste the affected relay descriptor identifiers here?

I wonder if we reached the point where looking at descriptor archives won't tell us why there's a bug. There seems to be no specific Tor version or platform that has this problem and that we could blame.

How about we analyze the code that implements the dirreq histories and try to find the bug here? If we find the bug in the code that's touched by dirreq histories, we very likely also find the bug in general byte histories. Dirreq histories have been implemented in this commit.

comment:17 Changed 8 years ago by qbi

The router PeterbjornxRelay reports 2011-01-26.

comment:18 Changed 7 years ago by karsten

I took another look at extra-info descriptors containing corrupt write-history lines, and I found an interesting pattern that I can't explain.

A lot of the obviously wrong write-history timestamps still match the 900-second period of the write history in the previous or subsequent descriptor.

Here are some examples (fingerprint, descriptor publication time, write-history timestamp, * for corrupt timestamp):

0A8B196D53B02CA93818B0258249C04A5CB60E39,2011-11-18 03:15:42,2010-11-18 03:09:43 *** 
0A8B196D53B02CA93818B0258249C04A5CB60E39,2011-11-18 03:16:43,2011-11-18 03:09:43

3A64376144842654940E09C4D8CC3A5307536C1B,2011-11-09 14:13:20,2004-12-31 23:01:45 *** 
3A64376144842654940E09C4D8CC3A5307536C1B,2011-11-09 14:17:24,2011-11-09 14:16:45

40257810D70D006D0B240D02727ED0C9636A45FA,2012-01-05 16:33:12,2008-07-18 00:21:50 *** 
40257810D70D006D0B240D02727ED0C9636A45FA,2012-01-05 16:34:13,2012-01-05 16:21:50

4373CC0CFF1BAC0B91C61F6E1BF8998C8B6DE4EE,2011-12-19 10:26:13,2011-12-19 10:22:38
4373CC0CFF1BAC0B91C61F6E1BF8998C8B6DE4EE,2011-12-19 14:50:20,2011-12-13 09:52:38 *** 
4373CC0CFF1BAC0B91C61F6E1BF8998C8B6DE4EE,2011-12-19 14:51:20,2011-12-19 14:37:38

4C2D42B29791347A9F94E52AB5B54CA1EF2A3F5F,2012-04-08 09:16:02,2002-01-01 03:16:49 *** 
4C2D42B29791347A9F94E52AB5B54CA1EF2A3F5F,2012-04-08 09:17:03,2012-04-08 09:16:49

4C2ED83F581503D2DE73EE5DB11D5DAC20F7986C,2011-10-26 21:47:47,2011-10-26 21:45:25
4C2ED83F581503D2DE73EE5DB11D5DAC20F7986C,2011-10-26 22:25:03,1970-01-01 00:00:25 *** 
4C2ED83F581503D2DE73EE5DB11D5DAC20F7986C,2011-10-26 22:26:36,2011-10-26 22:15:25

50C07527F812190CD2719125DB27F5A35E775536,2011-10-17 16:23:15,2010-10-17 18:05:19 *** 
50C07527F812190CD2719125DB27F5A35E775536,2011-10-17 16:27:51,2011-10-17 16:20:19

56113F3CC4FBAF9E2F4E6B020E821CD170B15885,2011-11-22 16:27:39,2011-11-22 16:14:41
56113F3CC4FBAF9E2F4E6B020E821CD170B15885,2011-11-22 18:09:22,2011-11-18 20:59:41 *** 
56113F3CC4FBAF9E2F4E6B020E821CD170B15885,2011-11-22 18:10:20,2011-11-22 17:59:41
56113F3CC4FBAF9E2F4E6B020E821CD170B15885,2011-11-22 18:24:28,2011-11-18 20:59:41 *** 
56113F3CC4FBAF9E2F4E6B020E821CD170B15885,2011-11-22 18:25:32,2011-11-22 18:14:41

72F69E6B3575BA258758E5B3FC749D01950210B4,2011-11-19 16:45:23,1999-12-31 23:30:40 *** 
72F69E6B3575BA258758E5B3FC749D01950210B4,2011-11-19 16:46:22,2011-11-19 16:45:40

8875C1AA54E9DA0C2F79AE7E21F03218BC5086F7,2012-02-12 22:45:22,2012-02-12 22:35:47
8875C1AA54E9DA0C2F79AE7E21F03218BC5086F7,2012-02-27 01:12:02,2012-02-04 02:35:47 *** 
8875C1AA54E9DA0C2F79AE7E21F03218BC5086F7,2012-02-27 01:13:03,2012-02-27 01:05:47

8DE5D6A67DD16A9C6EABB7EFC69BB81B6E26FCB6,2012-02-07 18:58:01,2005-04-06 02:21:18 *** 
8DE5D6A67DD16A9C6EABB7EFC69BB81B6E26FCB6,2012-02-07 18:58:09,2012-02-07 18:51:18

95C8F4A0C6262FD4B4CAF7D43F446026D35B8C86,2011-12-14 15:34:15,2011-05-14 15:19:09 *** 
95C8F4A0C6262FD4B4CAF7D43F446026D35B8C86,2011-12-14 15:35:15,2011-12-14 15:34:09

97DAAF5C00A4B1E28ED0E9306C59D50E9C6DFC77,2012-01-03 22:32:10,2000-01-01 07:16:13 *** 
97DAAF5C00A4B1E28ED0E9306C59D50E9C6DFC77,2012-01-03 22:33:22,2012-01-03 22:31:13

9CD2B1E09B76503B6ADC0EB90DD399DEC7A3CE76,2011-12-11 18:37:55,1970-01-01 00:15:42 *** 
9CD2B1E09B76503B6ADC0EB90DD399DEC7A3CE76,2011-12-11 18:39:39,2011-12-11 18:30:42

9CD2B1E09B76503B6ADC0EB90DD399DEC7A3CE76,2012-03-26 09:58:20,1970-01-02 21:30:41 *** 
9CD2B1E09B76503B6ADC0EB90DD399DEC7A3CE76,2012-03-26 10:00:29,2012-03-26 09:45:41

A153374361E53C21C673485F65A43E33CAA252D7,2012-02-19 17:58:29,2000-01-01 00:23:51 *** 
A153374361E53C21C673485F65A43E33CAA252D7,2012-02-19 17:59:30,2012-02-19 17:53:51

B47ED35EAD03C6D594A2A0ED1DD807EE5FF69F41,2011-11-10 00:03:16,2010-09-16 23:01:00 *** 
B47ED35EAD03C6D594A2A0ED1DD807EE5FF69F41,2011-11-10 09:20:28,2011-11-10 09:16:00

E163C7D1A1C187EF946EA06732398D1CDC0C9186,2011-10-12 14:03:49,2006-02-09 22:05:28 *** 
E163C7D1A1C187EF946EA06732398D1CDC0C9186,2011-10-12 14:05:51,2011-10-12 14:05:28

That means that corrupt timestamps aren't entirely random.

My first thought was that there are single corrupt bits. But that's unlikely/impossible if we look at the binary representation of, say, the first two dates above:

2010-11-18 03:09:43 = 01001100 11100100 10011000 11110111
2011-11-18 03:09:43 = 01001110 11000101 11001100 01110111

So, there's something else changing the "2011" to "2010" after formatting the timestamp string. I'm running out of ideas what that could be.

And why are quite a few of the relays publishing another descriptor 1 to 5 minutes after the one containing a corrupt timestamp? (Note that directory authorities must have accepted these descriptors, or we wouldn't have them in the archives.)

comment:19 Changed 6 years ago by nickm

Milestone: Tor: 0.2.2.x-finalTor: unspecified

comment:20 Changed 6 years ago by nickm

Keywords: tor-relay added

comment:21 Changed 6 years ago by nickm

Component: Tor RelayTor

comment:22 Changed 21 months ago by nickm

Keywords: metrics extrainfo bandwidth added
Severity: Normal

comment:23 Changed 13 months ago by karsten

Owner: karsten deleted
Status: newassigned

I don't think I'll find the time to hunt down this possible bug any further. And maybe we'll be using PrivCount for bandwidth statistics in the future anyway. I'm unassigning this ticket from myself, to have a more accurate view on what tickets I'm actually working on.

comment:24 Changed 13 months ago by karsten

Status: assignednew

comment:25 Changed 4 weeks ago by teor

Parent ID: #29019

#29019 may fix this bug, as a side-effect of checking for TestingTorNetwork period length transitions.

Note: See TracTickets for help on using tickets.