Opened 6 weeks ago

Closed 6 weeks ago

Last modified 6 weeks ago

#31333 closed enhancement (duplicate)

reduce fingerprints len by 32.5% to reduce descriptors size

Reported by: cypherpunks Owned by:
Priority: Low Milestone:
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Normal Keywords: fingerprint descriptor
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I have read proposals to reduce descriptor size and found fingerprints use SHA1, why not use base64 for them to change for example:
moria relay:

SHA1:

string(40) "9695DFC35FFEB861329B9F1AB04C46397020CE31"

base64 without trailing padding:

string(27) "lpXfw1/+uGEym58asExGOXAgzjE"

pseudocode for example:

substr(base64_encode(hex2bin('9695DFC35FFEB861329B9F1AB04C46397020CE31')),0,27)

results into 32.5% less fingerprints stringlen.

Child Tickets

Change History (4)

comment:1 Changed 6 weeks ago by teor

Resolution: duplicate
Status: newclosed

Most clients use microdescriptors and the microdescriptor consensus, so the size of fingerprints in descriptors doesn't matter that much.

Microdescriptor fingerprints have always been base64:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n1563

As have v3 consensus and microdesc consensus relay line fingerprints:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2297

And any fingerprints in tor cells are binary.

There are a few fingerprints that are still in the legacy hex format, but their space usage is insignificant compared to 6000 relays.

comment:2 in reply to:  1 ; Changed 6 weeks ago by cypherpunks

Replying to teor:

There are a few fingerprints that are still in the legacy hex format, but their space usage is insignificant compared to 6000 relays.

you are very quick in response. i see microdescriptor does not affect it. i have looked into my cached-descriptors and found about 2200 family lines with containing 23.548 sha1 notation fingerprints = 941kB. base64 encoded, without trailing =s could result into 635kB only

comment:3 in reply to:  2 Changed 6 weeks ago by teor

Replying to cypherpunks:

Replying to teor:

There are a few fingerprints that are still in the legacy hex format, but their space usage is insignificant compared to 6000 relays.

you are very quick in response. i see microdescriptor does not affect it.

Microdescriptors do contain hex fingerprints in family lines. Changing them would require a proposal. And we would have to use hex fingerprints until every tor version understood base64 fingerprints.

Maybe we will get a similar benefit when we add ed25519 fingerprints, and remove hex sha1 fingerprints.

i have looked into my cached-descriptors and found about 2200 family lines with containing 23.548 sha1 notation fingerprints = 941kB. base64 encoded, without trailing =s could result into 635kB only

Yes, but only relays download descriptors. Most clients do not.

comment:4 Changed 6 weeks ago by nickm

Please remember also that downloads are compressed. After zlib (or zstd) compression, there is not 32% difference in whether you use base16 or base64 encoding. For pure random bytes, I get a figure more like 11%.

That's still worth looking into, but as Teor notes, before we could change it, we'd need a proposal to figure out how to handle the backward compatibility issues.

Note: See TracTickets for help on using tickets.