Opened 9 years ago

Closed 9 years ago

Last modified 7 years ago

#1351 closed defect (fixed)

tor stopped publishing descriptor

Reported by: Falo Owned by:
Priority: Low Milestone: Tor: 0.2.2.x-final
Component: Core Tor/Tor Version: 0.2.2.10-alpha
Severity: Keywords:
Cc: Falo, Sebastian, nickm, arma, phobos Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by nickm)

After several days of perfect operation tor stopped publishing its descriptor. The tor process on exit
"blutmagie" is still running but traffic has dropped to nearly zero. Logfile doesn't indicate any problems.

anonymizer2:~# tor -v
Apr 12 08:33:09.760 [notice] Tor v0.2.2.10-alpha-dev (git-81b84c0b017267b4).

clueless Olaf

[Automatically added by flyspray2trac: Operating System: Other Linux]

Child Tickets

Attachments (4)

anonymizer2.blutmagie.de_2-day.png (3.5 KB) - added by Falo 9 years ago.
bandwidth graph
anonymizer2.blutmagie.de.tcpo-day.png (2.6 KB) - added by Falo 9 years ago.
new tcp connections per second
anonymizer2.blutmagie.de.tcp-day.png (2.2 KB) - added by Falo 9 years ago.
number of established tcp sessions
anonymizer2.blutmagie.de_2-day.png.1 (2.8 KB) - added by Falo 9 years ago.
back in service again

Download all attachments as: .zip

Change History (19)

comment:1 Changed 9 years ago by Sebastian

Does bug 1346 seem relevant?

comment:2 Changed 9 years ago by Falo

i don't think so cause I neither touched tor nor openssl within the last time.
OpenSSL is 0.9.8m 25 Feb 2010 on Debian 2.6.30-2-amd64.

Does it make sense to try trigger publishing the descriptor sending sighups using config option "PublishServerDescriptor 0" and "PublishServerDescriptor 1"?

Changed 9 years ago by Falo

bandwidth graph

Changed 9 years ago by Falo

new tcp connections per second

Changed 9 years ago by Falo

number of established tcp sessions

comment:3 Changed 9 years ago by Falo

I've just uploaded three mrtg graphs from my exit node. The bandwidth as well as the number of new tcp connections per
second dropped to 2 Mbit/s respectively 10. This seems to be reasonable cause the node doesn't publish its descriptor
any more. But for some strange reasons the number of established tcp connections is still at about 10.000. Using lsof
I've made sure that they are bound to the tor process.

comment:4 Changed 9 years ago by Falo

a process trace shows lot of sick looking "EAGAIN (Resource temporarily unavailable)" messages which appear pretty different from my other (healthy) tor node for the tor networks status.

anonymizer2:~# strace -e trace=network -p cat /var/run/tor/tor.pid
Process 8847 attached - interrupt to quit
recvfrom(5588, "\376\223\v\346\220\202\10]^t\226\325\2623%\227\214mH\251\362\37z|v\2000ED\330\267\233"..., 15936, 0, NULL, NULL) = 1448
recvfrom(5588, 0x7f99b28095c8, 14488, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(5588, "f\227\355\360\tD\257\20-\354\324N\267U\216\340\20\1\30\343\235b\237-\216\333\23ID\36:\225"..., 15936, 0, NULL, NULL) = 2896
recvfrom(5588, 0x7f99b2809b70, 13040, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(5588, "g\366\271\0001V-\273\tU{$\35^\f\16\242\204\37\376\305C\6\360\273\24\5\222'\367\335\207"..., 15936, 0, NULL, NULL) = 1448
recvfrom(5588, 0x7f99b28095c8, 14488, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
sendto(5652, "\26\3\1\0\304\1\0\0\300\3\1K\304\36;({\370e\246r\"\244%B\216\366\177\343\17\347\365"..., 201, 0, NULL, 0) = 201
recvfrom(5875, "\237\3751\36\333\236\305\363I8\202\313=\213n\335\231\276\307\6\361R\24>T\334|\234\252\267\256\243"..., 15936, 0, NULL, NULL) = 1402
recvfrom(5875, 0x7f99b280959a, 14534, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(5875, ",\354\342w\316\324\265?:\221\246O\372\372F\251SX\346\31lm2\313\36\322;\207\235\202\212\354"..., 15936, 0, NULL, NULL) = 58
recvfrom(5875, 0x7f99b280905a, 15878, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
sendto(6162, "\304$\244\341I?\324\206\212\21\213\255[\2262\7b\200\334\345\ny\371Yp\20\316\232!\234M\n"..., 498, 0, NULL, 0) = 498
sendto(5988, "\0053\222\323n\251>F\24\20\26\273\215\26\272E\f\221\210\310D~\206_\"\274\204\36\272\315\207\271"..., 498, 0, NULL, 0) = 498
sendto(5988, "\326\364\32u\20V\362\2669j\320\213\t\316\21\362\241\250\255n\376\27N\367\246xa\376\2406\265r"..., 464, 0, NULL, 0) = 464
sendto(3952, "\36\37\246>B\211\370}\3659\336i\344\252rn1\243\242/\1\223\345'\272\304\3648s\35V\317"..., 1161, 0, NULL, 0) = 1161
recvfrom(5785, "\255\273\357WO\244xb)W\315\225\266\301\367VU\373\314\336\200WWc\340\361q\"\242A\262\r"..., 15936, 0, NULL, NULL) = 15936
recvfrom(5785, "\6\373\277\336j\3\25\247c\323\376\5J\277v\215mq_\270\314\375\3545&\301\271EJ\213\353F"..., 15936, 0, NULL, NULL) = 15936
recvfrom(7157, "\324\311\363\304\232O\214\6c;\202\324\311O\211\222\5\20Z\5;\241\202\342\276[\277\306\246\fDQ"..., 15936, 0, NULL, NULL) = 1460
recvfrom(7157, 0x7f99b28095d4, 14476, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3778, "q\301\33\373\321\261\320_\202%\242\177_]\3252Y\2715\315\201\205A\2571 \20\336\241\245b\232"..., 15936, 0, NULL, NULL) = 1448
recvfrom(3778, 0x7f99b28095c8, 14488, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3778, "z\211e?Q\367\"\315\303\305\366\16", 15936, 0, NULL, NULL) = 12
recvfrom(3778, 0x7f99b280902c, 15924, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
sendto(7219, "GET /images/bg_b.gif HTTP/1.1\r\nH"..., 318, 0, NULL, 0) = 318
sendto(4835, "GET /ajax/presence/reconnect.php"..., 498, 0, NULL, 0) = 498
accept(7, {sa_family=AF_INET, sin_port=htons(49342), sin_addr=inet_addr("90.37.192.208")}, [16]) = 5672
recvfrom(5652, "\26\3\1\0J\2\0\0F\3\1\5\25\332YU\335\214v|\20u\221\277Ej\374s\300\236\22]"..., 15936, 0, NULL, NULL) = 937
recvfrom(5652, 0x7f99b28093c9, 14999, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(6141, "V Y\375\3545)\17\224\322U\332z\312\201\334", 15936, 0, NULL, NULL) = 16
recvfrom(6141, 0x7f99b2809030, 15920, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
sendto(6162, "\371O\361\5\350\325\3\0018+\326\276\26K\4\311\350\207ONN\236\253\21518T\222\311Dx\275"..., 962, 0, NULL, 0) = 962

Changed 9 years ago by Falo

back in service again

comment:5 Changed 9 years ago by Falo

Like a phoenix from the ashes my exit node recovered to full bandwidth within two hours without any
human intervention. Pls have a look at the last attached graph.

cluelesssss

comment:6 Changed 9 years ago by Sebastian

Theory: We had a bug while the circuithalftime measurements were going
on that basically broke your ability to make circuits. This means the authorities
couldn't test reachability for you, and dropped you from the consensus.

comment:7 Changed 9 years ago by Falo

everything seems to be fine now but strace still shows "EAGAIN (Resource temporarily unavailable)"
messages. Probably I've been wrong accusing them as a problem.

comment:8 Changed 9 years ago by arma

Yeah, there's nothing wrong with EGAIN. From man recvfrom,

If no messages are available at the socket, the receive calls wait for
a message to arrive, unless the socket is non-blocking (see fcntl(2)),
in which case the value -1 is returned and the external variable errno
set to EAGAIN.

comment:9 Changed 9 years ago by arma

My theory is that you were a victim of the bug I've been tracking for the past
few weeks, and finally today committed what I hope is a good enough fix:

Changes in version 0.2.2.12-alpha - 2010-04-20

o Major bugfixes:

  • Many relays have been falling out of the consensus lately because not enough authorities know about their descriptor for them to get a majority of votes. When we deprecated the v2 directory protocol, we got rid of the only way that v3 authorities can hear from each other about other descriptors. Now authorities examine every v3 vote for new descriptors, and fetch them from that authority. Bugfix on 0.2.1.23.

comment:10 Changed 9 years ago by nickm

Milestone: Tor: 0.2.2.x-final

comment:11 Changed 9 years ago by nickm

Description: modified (diff)

Did we solve this one, or is it still happening?

comment:12 Changed 9 years ago by Sebastian

Haven't gotten any reports about relays dropping from the consensus for unknown reasons for quite a few weeks. I think we've fixed this.

comment:13 in reply to:  11 Changed 9 years ago by Falo

Replying to nickm:

Did we solve this one, or is it still happening?

it didn't happen anymore within the last two or three months.
Pls close this ticket.

comment:14 Changed 9 years ago by Sebastian

Resolution: Nonefixed
Status: newclosed

comment:15 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.