#21054 closed defect (fixed)

hs: BUG() is triggered with ephemeral service on config reload

Reported by: dgoulet Owned by: dgoulet
Priority: High Milestone: Tor: 0.3.0.x-final
Component: Core Tor/Tor Version: Tor: 0.3.0.1-alpha
Severity: Normal Keywords: tor-hs, prop224, refactoring, review-group-14
Cc: Actual Points:
Parent ID: Points: 0.5
Reviewer: asn Sponsor: SponsorR-can

Description

Ticket #20853 introduced the issue with commit 63d3ba96f973735ded16e78bd0b8406b6fcdec35 (tor-0.3.0.1-alpha) with:

+        if (BUG(rend_service_is_ephemeral(new)) ||
+            BUG(rend_service_is_ephemeral(old))) {
+          continue;

The new service list does not ONLY contains ephemeral service so if you have one regular service and then you add an ephemeral one, a config reload will trigger BUG(rend_service_is_ephemeral(new) because that new object is from the global list containing all types of service.

Furthermore, this whole loop that is suppose to copy the intro points from the current service to the newly configured one is broken with that commit.

I'm working on a refactoring to first fix this bug then extract this large loop into a function and improve few things along the way _with_ unit tests. It is also an important piece of work for #20657 (prop224 service) because we'll need it for the v3 service list.

Child Tickets

Change History (9)

comment:1 Changed 17 months ago by dgoulet

Status: newneeds_review

See branch bug21054_030_01.

Both fixes the bug and refactor with unit test that part of the code. It is also an important piece for prop224 to have that service pruning code on reload in a separate function.

comment:2 Changed 17 months ago by asn

Reviewer: asn

comment:3 Changed 17 months ago by nickm

Keywords: review-group-14 added

comment:4 Changed 17 months ago by asn

Initial review:

Changes look reasonable to me. One question, why does circuit_get_next_service_intro_circ() need the start argument?

In the old code, we just iterated from the beginning of the circuitlist everytime. Why do we now begin from start? Is it an optimization?

Also, BTW I didn't manage to reproduce the original bug. How do we do it? I started up a Tor with a normal HS, then I added an ADD_ONION, and then I HUPed. Am I missing a step?

comment:5 in reply to:  4 ; Changed 17 months ago by dgoulet

Replying to asn:

Initial review:

Changes look reasonable to me. One question, why does circuit_get_next_service_intro_circ() need the start argument?

In the old code, we just iterated from the beginning of the circuitlist everytime. Why do we now begin from start? Is it an optimization?

Yes. It is also to basically follow the circuit_get_next_by_* API that uses that technique. Without it, we can't return a circuit from the middle of the list matching our requirements and then continuing the loop at that circuit. Else, the other options is for the function to return a list of matching circuits. No strong preferences here, I would be fine with either.

Also, BTW I didn't manage to reproduce the original bug. How do we do it? I started up a Tor with a normal HS, then I added an ADD_ONION, and then I HUPed. Am I missing a step?

Yes, normal service then you do ADD_ONION NEW:BEST Port=8161 and then HUP or SIGNAL RELOAD, you should hit the BUG. Btw, it won't be on console if you setup logging and should look like this (I just reproduced it on master):

Jan 09 09:23:44.000 [warn] tor_bug_occurred_(): Bug: src/or/rendservice.c:839: rend_config_services: Non-fatal assertion !(rend_service_is_ephemeral(new)) failed. (on Tor 0.3.0.1-alpha-dev 655ffeadd53833d9)
Jan 09 09:23:44.000 [warn] Bug: Non-fatal assertion !(rend_service_is_ephemeral(new)) failed in rend_config_services at src/or/rendservice.c:839. Stack trace: (on Tor 0.3.0.1-alpha-dev 655ffeadd53833d9)
[...]

comment:6 in reply to:  5 Changed 17 months ago by asn

Replying to dgoulet:

Replying to asn:

Initial review:

Changes look reasonable to me. One question, why does circuit_get_next_service_intro_circ() need the start argument?

In the old code, we just iterated from the beginning of the circuitlist everytime. Why do we now begin from start? Is it an optimization?

Yes. It is also to basically follow the circuit_get_next_by_* API that uses that technique. Without it, we can't return a circuit from the middle of the list matching our requirements and then continuing the loop at that circuit. Else, the other options is for the function to return a list of matching circuits. No strong preferences here, I would be fine with either.

I think the way you have it is fine!

Also, BTW I didn't manage to reproduce the original bug. How do we do it? I started up a Tor with a normal HS, then I added an ADD_ONION, and then I HUPed. Am I missing a step?

Yes, normal service then you do ADD_ONION NEW:BEST Port=8161 and then HUP or SIGNAL RELOAD, you should hit the BUG. Btw, it won't be on console if you setup logging and should look like this (I just reproduced it on master):

Jan 09 09:23:44.000 [warn] tor_bug_occurred_(): Bug: src/or/rendservice.c:839: rend_config_services: Non-fatal assertion !(rend_service_is_ephemeral(new)) failed. (on Tor 0.3.0.1-alpha-dev 655ffeadd53833d9)
Jan 09 09:23:44.000 [warn] Bug: Non-fatal assertion !(rend_service_is_ephemeral(new)) failed in rend_config_services at src/or/rendservice.c:839. Stack trace: (on Tor 0.3.0.1-alpha-dev 655ffeadd53833d9)
[...]

Hmm, I'm still unable to repro this with carml. I tried with unpatched df87812 and Tor will just successfuly HUP with no issues. Here is my torrc:

SocksPort auto
ControlPort auto

HiddenServiceDir /tmp/hidden_service
HiddenServicePort 6667 127.0.0.1:6667

CookieAuthentication 1
CookieAuthFileGroupReadable 1
ControlPort 9051

And then I just do:

$ ./bin/carml --connect tcp:127.0.0.1:9051 -q cmd ADD_ONION NEW:BEST Port=8161
$ ./bin/carml --connect tcp:127.0.0.1:9051 -q cmd SIGNAL RELOAD

but Tor seems to handle it just fine:

Jan 12 13:16:53.000 [notice] Tor 0.3.0.1-alpha-dev (git-df87812b41abccd6) opening log file.
Jan 12 13:16:57.000 [notice] New control connection opened from 127.0.0.1.
Jan 12 13:16:58.000 [notice] New control connection opened from 127.0.0.1.
Jan 12 13:16:58.000 [notice] Received reload signal (hup). Reloading config and resetting internal state.
Jan 12 13:16:58.000 [notice] Read configuration file "/home/f/Computers/tor/mytor/../confs/bug21054".
Jan 12 13:16:58.000 [warn] CookieAuthFileGroupReadable is set, but will have no effect: you must specify an explicit CookieAuthFile to have it group-readable.

What am I doing wrong?

comment:7 Changed 17 months ago by asn

Status: needs_reviewmerge_ready

OK I managed to reproduce it by using a direct connection to control port instead of camrl. David's branch indeed fixes the issue!

comment:8 Changed 17 months ago by nickm

Merged!

(For the record, I think that the circuit_get_next_by_*() functions are a bit error-prone. And I think we should probably get rid of them someday if we can, maybe replacing them with some circuit_get_all_by_*() variants that fill a smartlist. But that's sure another issue.)

comment:9 Changed 17 months ago by nickm

Resolution: fixed
Status: merge_readyclosed
Note: See TracTickets for help on using tickets.