Opened 3 years ago

Last modified 6 weeks ago

#17520 needs_information enhancement

Relax the rend cache failure cleanup timer

Reported by: dgoulet Owned by: neel
Priority: Medium Milestone: Tor: 0.3.5.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-hs, tor-client, prop224
Cc: neel@… Actual Points:
Parent ID: #23300 Points: 1
Reviewer: dgoulet Sponsor: Sponsor8-can

Description

rend_cache_failure_clean() is called every second and the reason is that we want to make the client wait as little as possible so we try our best to cleanup the failure cache.

This is not ideal CPU wise since the cache could technically grow to the number of possible introduction point in the network thus making it a larger loop every second.

Let's relax the cleanup timer here to 5 minutes (expiry time of an entry) and at each lookup, if the entry did expire, clean it and return that "we do not have an entry". This will not address the size of the cache that can grows but that's fine since we can handle that in the OOM. Also, a cache that has the maximum number of entries is a valid use case.

Child Tickets

Change History (24)

comment:1 Changed 3 years ago by nickm

Milestone: Tor: 0.2.8.x-finalTor: 0.2.9.x-final

It is impossible that we will fix all 226 currently open 028 tickets before 028 releases. Time to move some out. This is my second pass through the "new" and tickets, looking for things to move to 0.2.9.

comment:2 Changed 3 years ago by nickm

Points: small/medium

comment:3 Changed 3 years ago by dgoulet

Sponsor: SponsorRSponsorR-can

Move those from SponsorR to SponsorR-can.

comment:4 Changed 3 years ago by isabela

Milestone: Tor: 0.2.9.x-finalTor: 0.2.???

tickets market to be removed from milestone 029

comment:5 Changed 2 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:6 Changed 23 months ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:7 Changed 17 months ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:8 Changed 17 months ago by dgoulet

Keywords: tor-hs tor-client added
Parent ID: #17242
Points: small/medium1
Type: defectenhancement

This is relevant to prop224 client implementation as well.

comment:9 Changed 16 months ago by nickm

Keywords: prop224 added

comment:10 Changed 14 months ago by dgoulet

Parent ID: #17242#23300
  rend_cache_failure_clean(now);
  hs_cache_client_intro_state_clean(now);

... is called every 30 seconds actually which is done by rend_cache_failure_clean_callback().

1) Cleanup any entry that expired at lookup.
2) v2 entries expire after 5 minutes. and v3 after 2 minutes.

So, calling the callback every 2 minutes could be what we want to fit both use case *OR* we bring down to 2 minutes the v2.

comment:11 Changed 7 weeks ago by neel

Cc: neel@… added
Owner: set to neel
Status: newassigned

comment:12 Changed 7 weeks ago by neel

I have one question: In the return value of rend_cache_failure_clean_callback(), is 30 the equivalent of 1 second (where 5 minutes would be something like 5 * 60 * 30)?

comment:13 Changed 7 weeks ago by neel

Nevermind, dgoulet said 30 refers to 30 seconds.

comment:14 Changed 7 weeks ago by neel

comment:15 Changed 7 weeks ago by teor

Milestone: Tor: unspecifiedTor: 0.3.5.x-final
Sponsor: SponsorR-canSponsor8-can
Status: assignedneeds_review

Because this saves CPU and network (but costs RAM), we might want to put it in 0.3.5 for Sponsor 8.

comment:16 Changed 7 weeks ago by asn

Reviewer: dgoulet

comment:17 Changed 7 weeks ago by dgoulet

Status: needs_reviewneeds_revision

Unfortunately, the patch is missing the "Cleanup expired entries on lookup for v3", only v2 in the branch has been modified.

You probably simply want to call hs_cache_client_intro_state_clean() before calling cache_client_intro_state_lookup() ... or something around those lines :).

comment:18 Changed 7 weeks ago by neel

I have added the lines necessary in a new commit.

comment:19 Changed 7 weeks ago by neel

Status: needs_revisionneeds_review

comment:20 Changed 7 weeks ago by neel

Status: needs_reviewneeds_revision

Never mind my patches fail the tests.

comment:21 in reply to:  17 Changed 7 weeks ago by neel

Replying to dgoulet:

You probably simply want to call hs_cache_client_intro_state_clean() before calling cache_client_intro_state_lookup() ... or something around those lines :).

I tried this, and get errors like this:

hs_client/client_pick_intro: [forking] Sep 05 20:12:27.607 [err] tor_assertion_failed_: Bug: src/test/test_hs_client.c:431: test_client_pick_intro: Assertion !ip failed; aborting. (on Tor 0.3.5.0-alpha-dev e3d1c98f6add4d09)
Sep 05 20:12:27.613 [err] Bug: Assertion !ip failed in test_client_pick_intro at src/test/test_hs_client.c:431. Stack trace: (on Tor 0.3.5.0-alpha-dev e3d1c98f6add4d09)
Sep 05 20:12:27.613 [err] Bug:     0x165515c <log_backtrace_impl+0x4c> at /usr/home/neel/code/tor/tor/src/test/test (on Tor 0.3.5.0-alpha-dev e3d1c98f6add4d09)
Sep 05 20:12:27.613 [err] Bug:     0x1651420 <tor_assertion_failed_+0xa0> at /usr/home/neel/code/tor/tor/src/test/test (on Tor 0.3.5.0-alpha-dev e3d1c98f6add4d09)
Sep 05 20:12:27.613 [err] Bug:     0x1390475 <dir_common_construct_vote_3+0x57cb5> at /usr/home/neel/code/tor/tor/src/test/test (on Tor 0.3.5.0-alpha-dev e3d1c98f6add4d09)
Sep 05 20:12:27.613 [err] Bug:     0x14ac573 <testcase_run_one+0x373> at /usr/home/neel/code/tor/tor/src/test/test (on Tor 0.3.5.0-alpha-dev e3d1c98f6add4d09)
Sep 05 20:12:27.614 [err] Bug:     0x14ac9ff <tinytest_main+0x1af> at /usr/home/neel/code/tor/tor/src/test/test (on Tor 0.3.5.0-alpha-dev e3d1c98f6add4d09)
Sep 05 20:12:27.614 [err] Bug:     0x14ab9f3 <main+0x2f3> at /usr/home/neel/code/tor/tor/src/test/test (on Tor 0.3.5.0-alpha-dev e3d1c98f6add4d09)
[Lost connection!]
  [client_pick_intro FAILED]

I am thinking about calling hs_cache_client_intro_state_clean() in cache_client_intro_state_lookup(). Is this okay?

UPDATE Nevermind, this fails also, it's basically the same thing.

Last edited 7 weeks ago by neel (previous) (diff)

comment:22 Changed 7 weeks ago by neel

I got it passing tests provided I remove this code from src/test/test_hs_client.c (function test_client_pick_intro():

    /* Now also mark the chosen one as failed: See that we can't get any intro
       points anymore. */
    hs_cache_client_intro_state_note(&service_kp.pubkey,
                                &chosen_intro_point->auth_key_cert->signed_key,
                                     INTRO_POINT_FAILURE_TIMEOUT);
    extend_info_t *ip = client_get_random_intro(&service_kp.pubkey);
    tor_assert(!ip);

The reason why I think it's okay to remove this is because hs_cache_client_intro_state_find() and hs_cache_client_intro_state_note() would be calling hs_cache_client_intro_state_clean() which does it (and this test fails from the code I mentioned above if I did not delete it).

It is already committed.

comment:23 Changed 6 weeks ago by neel

Status: needs_revisionneeds_review

comment:24 in reply to:  22 Changed 6 weeks ago by dgoulet

Status: needs_reviewneeds_information

Replying to neel:

I got it passing tests provided I remove this code from src/test/test_hs_client.c (function test_client_pick_intro():

    /* Now also mark the chosen one as failed: See that we can't get any intro
       points anymore. */
    hs_cache_client_intro_state_note(&service_kp.pubkey,
                                &chosen_intro_point->auth_key_cert->signed_key,
                                     INTRO_POINT_FAILURE_TIMEOUT);
    extend_info_t *ip = client_get_random_intro(&service_kp.pubkey);
    tor_assert(!ip);

The reason why I think it's okay to remove this is because hs_cache_client_intro_state_find() and hs_cache_client_intro_state_note() would be calling hs_cache_client_intro_state_clean() which does it (and this test fails from the code I mentioned above if I did not delete it).

So hmmm it appears that from the above, hs_cache_client_intro_state_note() is used to mark the intro point as unusable and thus the client_get_random_intro() can't find any usable intro.

How is that related to the cleanup process that is about cache expiry? In other words, I'm not sure this test is suppose to fail all the sudden?

Note: See TracTickets for help on using tickets.