Opened 4 weeks ago

Closed 4 weeks ago

Last modified 3 weeks ago

#27795 closed defect (fixed)

Possible fd leak on 0.3.5.1-alpha

Reported by: dgoulet Owned by: nickm
Priority: High Milestone: Tor: 0.3.5.x-final
Component: Core Tor/Tor Version:
Severity: Critical Keywords: regression, tor-relay
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

It has been reported by toralf and moria1 so far that their relay ran out of file descriptor.

toralf's relay ended up in 100% CPU situation whereas I believe moria1 stopped working properly as a dirauth.

The DoS status was normal so the winning theory so far is a FD leak in 035.

Child Tickets

Attachments (1)

warn.log.gz (605.1 KB) - added by toralf 4 weeks ago.
kill -10 output of an exit relay (port 80/443) having usually ~ 4,000 OR connections and warning when it is reached 49967 connections

Download all attachments as: .zip

Change History (11)

comment:1 Changed 4 weeks ago by nickm

Priority: MediumHigh
Severity: NormalCritical

comment:2 Changed 4 weeks ago by nickm

The leading theory from yesterday is that something went wrong with my code for #24751, which affects how we close OR connections. But we couldn't find the bug.

comment:3 Changed 4 weeks ago by nickm

wait. WHY do we think that this is leaking fds? Is it because the OS says we're out of fds, or is it because of some log message?

Because I think the problem might be that my fix for #24751 does not interact correctly with our socket accounting code in socket.c. In that case, the sockets actually are getting closed, but Tor is not counting them as closed.

Changed 4 weeks ago by toralf

Attachment: warn.log.gz added

kill -10 output of an exit relay (port 80/443) having usually ~ 4,000 OR connections and warning when it is reached 49967 connections

comment:4 Changed 4 weeks ago by arma

moria1 is at 100% cpu too it turns out.

I like nickm's theory above. We're all just going on the log message. My tor has nowhere near that many fd's open now. Everything seems to come apart once Tor hits that limit -- or once Tor thinks it hits that limit.

comment:5 Changed 4 weeks ago by nickm

Owner: set to nickm
Status: newaccepted

comment:6 Changed 4 weeks ago by nickm

Status: acceptedneeds_review

Fix in branch bug27795_27782; pull request at https://github.com/torproject/tor/pull/364 . It also has a fix for #27782.

comment:7 Changed 4 weeks ago by dgoulet

This lgtm except one thing (unrelated to the main issue at hands):

  • tor_tls_release_socket() for NSS, seems to have a possible fd leak if PR_GetIdentitiesLayer() fails that is we BUG() and immediately return but I think we should close the sock.

comment:8 Changed 4 weeks ago by nickm

Status: needs_reviewneeds_information

fixed that and merged to master. Let's close this ticket if toralf and arma report that the error want away.

comment:9 Changed 4 weeks ago by nickm

Resolution: fixed
Status: needs_informationclosed

Hi. Is this bug still happening? Please reopen if so.

comment:10 Changed 3 weeks ago by arma

Still looking good on moria1.

Note: See TracTickets for help on using tickets.