Opened 5 weeks ago

Closed 5 weeks ago

Last modified 5 weeks ago

#28912 closed defect (fixed)

Stream hangs while downloading consensus via RELAY_BEGIN_DIR

Reported by: plcp` Owned by: dgoulet
Priority: Very High Milestone: Tor: 0.4.0.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: regression, 034-backport, 035-backport
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

After performing the following sequence of steps to download a consensus through a stream, Tor hangs after sending 58 (17 with deflate) cells:

  1. Open an TLS connection to a local OR.
  2. Negotiate a v5 link on the connection.
  3. Create a circuit via CREATE_FAST.
  4. Create a stream via RELAY_BEGIN_DIR.
  5. Send a request to download a consensus.

I managed to reproduce the issue with the code attached. Here are an excerpt of the file:

'''Using identity, we get 58 RELAY_DATA cells before hanging, 17 with deflate.

    Affected:
        Tor version 0.3.4.0-alpha-dev (git-3463b4e0652bacca).

    Not affected:
        Tor version 0.3.3.5-rc-dev (git-3ee4c9b1fae9d535).
'''

compression = b'identity' # or b'deflate'

'''Issue reproduced with the code below, originally bisected on [1].

        From tag:tor-0.3.4.1-alpha                      Affected?

        deb8970a29ef7427c3d42182d3bacc31ab602c03        yes
        2d7b5c6fe5dc46b7e7cd040e6723e25d12015985        yes
        3fa66f97996c179388fa91176b9a82fb9b5b31d8        no
        306563ac68250872791350bda1ba7a7acff5eb63        no
        3ee4c9b1fae9d53556b3e3be852f12e9abe51e14        no
        c32108ee0fea851ced14f71d842390992f762393        yes
        22845df2a7503ed73ed325c3a98916f289918caa        no
        c7d3de216c60c090fddb4926a739da038bb5d5fe        yes
        9ef4c05df8323850b5894782f435da15810d6189        no
        5e0fbd7006993a4e402f2eee49f6f86074923197        no
        c5899d5cf3a761f4049c1d6f05232731edcfeb57        no
        3463b4e0652bacca51fecd2c256e3e9d61ce920e        yes

    [1] Unpublished (yet) python client, no link here, sorry :(

    Usage:
        virtualenv venv
        source venv/bin/activate
        pip install -r stem cryptography
        tor PublishServerDescriptor 0 AssumeReachable 1 ExitRelay 0 ProtocolWarnings 1 SafeLogging 0 LogTimeGranularity 1 PidFile '$(mktemp)' SOCKSPort 0 ContactInfo none@example.com DataDirectory '$(mktemp -d)' ORPort 9050 DirPort 9051 Log 'err stderr' &

        python reproduce.py
'''

I provided the redacted logs for two different versions, one affected and one not.

Child Tickets

Attachments (5)

reproduce.py (7.2 KB) - added by plcp` 5 weeks ago.
log-0.3.3.5-rc-dev-3ee4c9b1fae9d535.gz (484.9 KB) - added by plcp` 5 weeks ago.
log-0.3.4.0-alpha-dev-3463b4e0652bacca.gz (572.9 KB) - added by plcp` 5 weeks ago.
log-0.3.5.5-alpha.gz (1.3 MB) - added by plcp` 5 weeks ago.
mainloop.patch (1.2 KB) - added by plcp` 5 weeks ago.

Change History (20)

Changed 5 weeks ago by plcp`

Attachment: reproduce.py added

Changed 5 weeks ago by plcp`

Changed 5 weeks ago by plcp`

comment:1 Changed 5 weeks ago by nickm

Component: - Select a componentCore Tor/Tor
Milestone: Tor: 0.4.0.x-final

comment:2 Changed 5 weeks ago by dgoulet

Status: newneeds_information

I'm 99% sure this is #27750. I've talked to plcp on IRC and they will try with 0.3.5+ since the fix hasn't been backported yet to <= 0.3.4.

comment:3 Changed 5 weeks ago by plcp`

Reproduced on Tor version 0.3.5.5-alpha¹ and on Tor version 0.3.5.4-alpha-dev² (git-d598d834f5ce3ae3)

¹pulled from https://dist.torproject.org/tor-0.3.5.5-alpha.tar.gz
²pulled from #27750 proposed fix

Last edited 5 weeks ago by plcp` (previous) (diff)

Changed 5 weeks ago by plcp`

Attachment: log-0.3.5.5-alpha.gz added

comment:4 Changed 5 weeks ago by plcp`

Reproduced against several live nodes through full 3-hop circuits:

hangs       ca0akatala (Tor 0.3.4.9)
hangs       TokenLow (Tor 0.3.4.9)
ok          wpiTidus (Tor 0.3.3.9)
ok          sq01 (Tor 0.3.3.7)
ok          niftytreerat (Tor 0.3.3.9)
hangs       humboldt (Tor 0.3.4.9)
stutters¹   F3Netze (Tor 0.3.4.9)
hangs       SvensRelay (Tor 0.3.4.9)
stutters¹   Lavaeolous (Tor 0.3.5.5-alpha)
ok          BeastieJoy60 (Tor 0.3.5.5-alpha)

¹hangs, then receives bursts of data before hanging again, several times.

Update:

Checked again (just to be sure) against a local 0.3.5.5-alpha node with compression = b'deflate', hangs after 17 RELAY_DATA cells received – before any SENDME cell is send – then stutters¹ far less than through a circuit against a live 0.3.5.5-alpha node.

¹instead of a burst of cells every few seconds or less, local node is hanging for ten seconds or more

Last edited 5 weeks ago by plcp` (previous) (diff)

comment:5 Changed 5 weeks ago by dgoulet

Keywords: regression added
Owner: set to dgoulet
Priority: MediumVery High
Status: needs_informationaccepted

Ok this is not #27750 ... and after initial analysis of the 035 logs provided, seems tor initially sends the directory data and then for some reasons looses track of the connection leading to it just hanging there without sending the remaining of the spooled directory data.

The chan=49 is a good example of a buggy one where chan=47 is working properly delivering ~2MB of data.

More debugging is needed... I'm bumping priority and flagging as a possible regression since this seems to be affecting only >= 034. Once we figure out the issue, we'll decide on backport or not.

comment:6 Changed 5 weeks ago by cypherpunks3

ticket:28717#comment:24

consensus
hangs after 17 RELAY_DATA cells received

spool_eagerly = 0

comment:7 Changed 5 weeks ago by plcp`

Issue reproduced on Tor version 0.4.0.0-alpha-dev (git-e4109020e9b423a1)

comment:8 Changed 5 weeks ago by plcp`

Applying the patch proposed by cypherpunks3 on Tor version 0.4.0.0-alpha-dev (git-e4109020e9b423a1) fixes the issue

Changed 5 weeks ago by plcp`

Attachment: mainloop.patch added

comment:9 Changed 5 weeks ago by plcp`

Tried to reproduce the issue after arma applied the patch to moria1, and I've been able to retrieve micro-descriptor consensuses through full circuits with moria1 as last node (still using encrypted directory requests).

Version 0, edited 5 weeks ago by plcp` (next)

comment:10 Changed 5 weeks ago by dgoulet

Keywords: 034-backport 035-backport added
Status: acceptedneeds_review

Ok the patch is the right thing to do for now. Reactivate the event as long as we have linked connections in the list. It is the part that differs from 0.3.3.

The offending commit appears to be: 5719dfb48f87a54aeb5982ff03345303bc058ebb

Branch: ticket28912_034_01
PR: https://github.com/torproject/tor/pull/615

comment:11 Changed 5 weeks ago by nickm

Status: needs_reviewmerge_ready

Okay. If CI likes it, and if moria doesn't explode, I say we merge this.

comment:12 Changed 5 weeks ago by nickm

Roger reports that Moria hasn't exploded.

comment:13 Changed 5 weeks ago by cypherpunks3

Why, how?

58 RELAY_DATA cells before hanging, 17 with deflate

If

#define DIRSERV_CACHED_DIR_CHUNK_SIZE 8192

comment:14 Changed 5 weeks ago by nickm

Resolution: fixed
Status: merge_readyclosed

Merged to 0.3.4 and forward.

comment:15 Changed 5 weeks ago by cypherpunks3

the first chunk of data was sent (usually 32KB)

Nope. 8KB.
58 RELAY_DATA for uncompressed data of 8KB
17 with deflate for 8KB without transform

Note: See TracTickets for help on using tickets.