Opened 5 years ago

Closed 4 years ago

#11332 closed defect (fixed)

Get a fresh set of relay/exit profiles on 0.2.5.5-alpha or later; optimize bottlenecks if found

Reported by: nickm Owned by:
Priority: Medium Milestone: Tor: 0.2.5.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-relay performance 025-triaged
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

On #7727 we had some profile results that we used to identify bottleneck functions in Tor to optimize. We should get some fresh profiles on 0.2.5.4-alpha once it's out (assuming we merge #9841), and then see which of those bottlenecks need the most attention.

Child Tickets

Attachments (11)

tor-0.2.5.2-alpha-results.gz (26.9 KB) - added by cypherpunks 4 years ago.
Tor-0.2.5.2-alpha
tor-0.2.5.4-alpha-ESBEK-tor5-28.05.2014-20.30.43.gz (16.3 KB) - added by esbek 4 years ago.
tor-0.2.5.4-alpha-ESBEK-tor5-29.05.2014-00.32.10.gz (27.5 KB) - added by esbek 4 years ago.
tor-0.2.5.4-alpha-ESBEK-tor5-29.05.2014-09.28.16.gz (31.8 KB) - added by esbek 4 years ago.
Same host longer period (~8h).
tor-0.2.5.4-alpha-ESBEK-tor2-29.05.2014-16.12.29.gz (30.8 KB) - added by esbek 4 years ago.
tor-0.2.5.4-alpha-privshield-20140529-1811.gz (549.2 KB) - added by yoriz 4 years ago.
Call graph of tor-exit.privshield.com after 30 mins measuring
results.gz (26.7 KB) - added by cypherpunks 4 years ago.
0.2.5.4-alpha-1~d70.wheezy+1 on linuxlounge.net
tor-be9058003da9b75abbf0403743e7631dc29ba27c-profile-run.gmon.out (612.1 KB) - added by andrea 4 years ago.
tor-be9058003da9b75abbf0403743e7631dc29ba27c-profile-run.gprof.txt (1.0 MB) - added by andrea 4 years ago.
p.gz (43.0 KB) - added by cypherpunks 4 years ago.
tor-58f4200789d0cc47ebd88f3091207cf4dd493573-profile-run.gprof.txt (1.9 MB) - added by andrea 4 years ago.

Change History (37)

comment:1 Changed 5 years ago by nickm

Keywords: 025-triaged added

One thought I had about that last profile, though: SHA1 was at the top of the list. OpenSSL's PRNG uses a bunch of SHA1, and is ridiculously slow for a userspace PRNG. If SHA1 turns up at the top of the list again, we should investigate whether it's our protocols' uses of SHA1, TLS's uses of SHA1, or OpenSSL's PRNG's uses of SHA1 that are most to blame.

comment:2 Changed 5 years ago by nickm

(to clarify: we should optimize any enormous bottlenecks in 0.2.5, and save medium to small bottlenecks for 0.2.6)

comment:3 Changed 4 years ago by nickm

Here are the maximally simple instructions as I understand them.

  • Run Linux. Install perf. (On Debian, this is 'apt-get install linux-tools')
  • (For call graphs only) Build Tor with -fno-omit-frame-pointer
  • Start a Tor server with DisableDebuggerAttachment 0 set
  • Wait for the server to get load
  • Find the PID of the Tor process; call it PID.
  • Run "perf record -p $PID"
    • (To benchmark worker threads too) It might be a good idea to do "perf record -a -p $PID"
    • (For call graphs only) Instead you might need something like "perf record --call-graph fp -a -g -p $PID"
  • After a while (an hour or so?), ctrl-C the 'perf record' command. It will have written its results into perf.data.
  • Run "perf report --stdio > results". Gzip that file and send it to nick.
    • (For call graphs only) You might need "perf report --call-graph -G --stdio > results".

These instructions probably suck! Please help me improve them.

Last edited 4 years ago by nickm (previous) (diff)

comment:4 Changed 4 years ago by cypherpunks

Would love to give it a try. Can you clarify specific commands to build Tor? Is it

./configure -fno-omit-frame-pointer; make && make install ?

comment:5 Changed 4 years ago by cypherpunks

For the debian package it would probably be like so:

  • cd /usr/src
  • apt-get source tor
  • cd tor-0.2.5*
  • vim debian/rules
  • append "CFLAGS=-fno-omit-frame-pointer"
  • dpkg-buildpackage -B

Packages would then lie in /usr/src.

(Updated, according to #6)

Last edited 4 years ago by cypherpunks (previous) (diff)

comment:6 Changed 4 years ago by nickm

cd tor-0.2.4*

Remember, I need profiles from Tor 0.2.5, not Tor 0.2.4.

comment:7 Changed 4 years ago by cypherpunks

Yes, I am running tor-0.2.5.2-alpha on Centos. It is not a 50+ Mbps relay, but I will give it shot to compile some data for you.

comment:8 Changed 4 years ago by nickm

What I most need is information about 0.2.5.4-alpha or later. 0.2.5.4-alpha has some performance improvements that should (I hope) affect the numbers a lot.

[But information about 0.2.5.2-alpha is still useful, because (a) it will let us know what stuff looked like before 0.2.5.4-alpha, and (b) it will help us figure out what the instructions for running perf should say.]

comment:9 Changed 4 years ago by cypherpunks

I can download and build 0.2.5.4-alpha. I build Tor from source anyway.

I will try to run it on 0.2.5.2 first, then build 0.2.5.4 and run it again.

Changed 4 years ago by cypherpunks

Tor-0.2.5.2-alpha

comment:10 Changed 4 years ago by nickm

Hm.

    28.21%      tor  libcrypto.so.1.0.0     [.] 0x000000000007398c                           
    14.41%      tor  tor                    [.] 0x00000000000f0991                           

Was this version of Tor built without debugging symbols or something?

comment:11 in reply to:  10 Changed 4 years ago by esbek

Replying to nickm:
...

Was this version of Tor built without debugging symbols or something?

It was built with this settings:
./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --libdir=/usr/lib64 --disable-silent-rules --disable-dependency-tracking --disable-buf-freelists --enable-asciidoc --docdir=/usr/share/doc/tor-0.2.5.4_alpha-r1 --enable-instrument-downloads --disable-bufferevents --enable-curve25519 --disable-nat-pmp --enable-gcc-hardening --enable-linker-hardening --disable-transparent --enable-threads --disable-upnp --disable-tor2web-mode --disable-unittests --disable-coverage

comment:12 Changed 4 years ago by esbek

I forgot that I stripped symbols. So it will take some time ...

Changed 4 years ago by esbek

Same host longer period (~8h).

comment:13 Changed 4 years ago by yoriz

My exit averages at about 32 Mbit/s. Not the > ~50 Mbit/s as requested on the mailinglist, but perhaps still useful? https://globe.torproject.org/#/relay/94E47A3E0A094C4B544B93D3247038243B61ACBC

I could use some help building with -fno-omit-frame-pointer as the instructions from both #comment:4 and #comment:5 don't seem to work for Ubuntu:

$ cat /etc/issue
Ubuntu 12.04.4 LTS \n \l
$ sudo apt-get update
$ sudo apt-get install linux-tools
$ sudo apt-get install build-essential
$ wget https://www.torproject.org/dist/tor-0.2.5.4-alpha.tar.gz
$ tar xvfz tor-0.2.5.4-alpha.tar.gz
$ cd tor-0.2.5.4-alpha
$ ./configure -fno-omit-frame-pointer
configure: error: unrecognized option: `-fno-omit-frame-pointer'
$ ls debian/rules
(file not found)

Please advice!

comment:14 in reply to:  4 Changed 4 years ago by nickm

Replying to cypherpunks:

Would love to give it a try. Can you clarify specific commands to build Tor? Is it

./configure -fno-omit-frame-pointer; make && make install ?

No; you'd have to try something like

./configure CFLAGS='-Wall -g -O2 -fno-omit-frame-pointer'

I think. But remember, the call-graph stuff is all optional.

Last edited 4 years ago by nickm (previous) (diff)

comment:15 in reply to:  13 Changed 4 years ago by yoriz

Replying to yoriz:

Nickm, thank you for pointing out the correct way to use configure. For those Ubuntu users finding this ticket later; these are the magic spells needed on a Ubuntu machine:

$ cat /etc/issue
Ubuntu 12.04.4 LTS \n \l
$ sudo apt-get update
$ sudo apt-get install linux-tools
$ sudo apt-get install build-essential
$ sudo apt-get install libevent-dev
$ sudo apt-get install libssl-dev
$ wget ​https://www.torproject.org/dist/tor-0.2.5.4-alpha.tar.gz
$ tar xvfz tor-0.2.5.4-alpha.tar.gz
$ cd tor-0.2.5.4-alpha
$ ./configure CFLAGS="-Wall -g -O2 -fno-omit-frame-pointer"
$ make

# Since I am operating a regular tor exit, I prefer to keep my normal
# tor installation intact and keep this experimental build separate.
# However, I will reuse the configuration of my regular node:

$ sudo mkdir -p /usr/local/etc/tor
$ sudo cp /etc/tor/torrc /usr/local/etc/tor
$ sudo ~/tor-0.2.5.4-alpha/src/or/tor
$ arm
(waiting for traffic to pick up again)

Changed 4 years ago by yoriz

Call graph of tor-exit.privshield.com after 30 mins measuring

Changed 4 years ago by cypherpunks

Attachment: results.gz added

0.2.5.4-alpha-1~d70.wheezy+1 on linuxlounge.net

comment:16 Changed 4 years ago by andrea

Attached gprof results for run of latest master as a middle relay, relaying about 1.18G using 659 seconds of CPU time. Yeah, siphash is kinda expensive.

comment:17 Changed 4 years ago by andrea

D'oh, that was the raw gmon.out. I'm not quite awake yet, apparently.

comment:18 Changed 4 years ago by nickm

Nice; I hadn't thought to also try grpof. I wonder if siphash has also started showing up under perf; it didn't show up to high under any of the other perf profiles, but those were before the changes in master that started using siphash for the circid/channel to circuit mappings (#11750).

Also, those results won't cover time spent in openssl. Can you get perf results too under current master?

comment:19 Changed 4 years ago by nickm

I added #12169 for optimizing memeq and and #12170 for making a lot of the performance issues (including siphash, I think!) go away.

comment:20 in reply to:  18 Changed 4 years ago by andrea

Replying to nickm:

Nice; I hadn't thought to also try grpof. I wonder if siphash has also started showing up under perf; it didn't show up to high under any of the other perf profiles, but those were before the changes in master that started using siphash for the circid/channel to circuit mappings (#11750).

Also, those results won't cover time spent in openssl. Can you get perf results too under current master?

Not without a fair amount of pain-in-the-assery. I think I need different kernel options for it to work.

comment:21 in reply to:  18 Changed 4 years ago by andrea

Replying to nickm:

Nice; I hadn't thought to also try grpof. I wonder if siphash has also started showing up under perf; it didn't show up to high under any of the other perf profiles, but those were before the changes in master that started using siphash for the circid/channel to circuit mappings (#11750).

Also, those results won't cover time spent in openssl. Can you get perf results too under current master?

I'd guess the easiest way to get OpenSSL results is to rebuild it with -pg and statically link to it.

comment:22 Changed 4 years ago by nickm

One thing in particular I'd sorta like to know from openssl is what fraction of SHA1 calls are accounted for by OpenSSL's baroque PRNG.

Changed 4 years ago by cypherpunks

Attachment: p.gz added

comment:23 Changed 4 years ago by nickm

Summary: Get a fresh set of relay/exit profiles on 0.2.5.4-alpha or later; optimize accordingly.Get a fresh set of relay/exit profiles on 0.2.5.5-alpha or later; optimize bottlenecks if found

Okay; I think we have enough info for 0.2.5.4-alpha. In 0.2.5.5-alpha and later, we should confirm that there are no new bottlenecks, and evaluate whether our fixes in #12170 and #12169 did any good.

comment:24 Changed 4 years ago by andrea

Attaching a fresh gprof output for a build linked against a profiled OpenSSL. This relayed 6.5G in 85 hours of wall clock time using 2075 seconds of CPU time.

comment:25 Changed 4 years ago by nickm

Neat. I don't see anything in there that seems like a terrible issue. There are a few one-percent functions in Tor that could probably turn into 0-percent functions, but nothing that looks like it's a bug.

Unless somebody else sees an issue in this profile, I say we call this fixed and open a new ticket to look at profiles in 0.2.6. :)

comment:26 Changed 4 years ago by nickm

Resolution: fixed
Status: newclosed

The ticket to do this again in 0.2.6 is #12464. Thanks, everyone!

Note: See TracTickets for help on using tickets.