Opened 8 years ago

Closed 8 years ago

Last modified 7 years ago

#5696 closed defect (user disappeared)

AESNI not in use with openssl 1.0.1 on tor 0.2.3.14-alpha

Reported by: cypherpunks Owned by:
Priority: Medium Milestone: Tor: 0.2.3.x-final
Component: Core Tor/Tor Version: Tor: 0.2.3.14-alpha
Severity: Keywords: aesni tor-relay
Cc: ln5 Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

The 0.2.3.14 states in the changelog(https://gitweb.torproject.org/tor.git/blob/tor-0.2.3.14-alpha:/ChangeLog) that aesni will be used. this does not seem to be the case:

# uname -a
FreeBSD metaverse.dfri.se 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan 3 07:46:30 UTC 2012 root@…:/usr/obj/usr/src/sys/GENERIC amd64

# sysctl -a |egrep 'hw.machine|hw.model'
hw.machine: amd64
hw.model: Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
hw.machine_arch: amd64

OpenSSL build:
./config -shared --prefix=/usr/local/testbuild/

libevent-2.0.18-stable:
CFLAGS=-I/usr/local/testbuild/include LDFLAGS=-L/usr/local/testbuild/lib ./configure --prefix=/usr/local/testbuild && make && make install
tor-0.2.3.14alpha
./configure --with-openssl-dir=/usr/local/testbuild/lib --disable-asciidoc --enable-gcc-warnings-advisory --enable-gcc-hardening --enable-linker-hardening --with-libevent-dir=/usr/local/testbuild/lib --prefix=/usr/local/testbuild

Results from bench:

OpenSSL 0.9.8:

dmap

nbits=65536
digestmap_set: 40.31 ns per element
digestmap_get: 30.15 ns per element
digestset_add: 9.94 ns per element
digestset_isin: 5.56 ns per element.
Hits == 32866304
False positive rate on digestset: 0.23%

aes

1 bytes: 13.08 nsec per byte
2 bytes: 9.55 nsec per byte
4 bytes: 7.89 nsec per byte
8 bytes: 7.04 nsec per byte
16 bytes: 6.67 nsec per byte
32 bytes: 6.48 nsec per byte
64 bytes: 6.38 nsec per byte
128 bytes: 6.40 nsec per byte
256 bytes: 6.35 nsec per byte
512 bytes: 6.32 nsec per byte
1024 bytes: 6.31 nsec per byte
2048 bytes: 6.30 nsec per byte
4096 bytes: 6.30 nsec per byte
8192 bytes: 6.30 nsec per byte

cell_aes

509 bytes, misaligned by 0: 6.12 nsec per byte
509 bytes, misaligned by 1: 6.12 nsec per byte
509 bytes, misaligned by 2: 6.12 nsec per byte
509 bytes, misaligned by 3: 6.12 nsec per byte
509 bytes, misaligned by 4: 6.12 nsec per byte
509 bytes, misaligned by 5: 6.12 nsec per byte
509 bytes, misaligned by 6: 6.12 nsec per byte
509 bytes, misaligned by 7: 6.12 nsec per byte
509 bytes, misaligned by 8: 6.13 nsec per byte
509 bytes, misaligned by 9: 6.12 nsec per byte
509 bytes, misaligned by 10: 6.12 nsec per byte
509 bytes, misaligned by 11: 6.13 nsec per byte
509 bytes, misaligned by 12: 6.12 nsec per byte
509 bytes, misaligned by 13: 6.12 nsec per byte
509 bytes, misaligned by 14: 6.12 nsec per byte
509 bytes, misaligned by 15: 6.12 nsec per byte

cell_ops

Inbound cells: 3126.88 ns per cell. (6.14 ns per byte of payload)

Outbound cells: 3131.38 ns per cell. (6.15 ns per byte of payload)

OpenSSL 1.0.1:

dmap

nbits=65536
digestmap_set: 151.35 ns per element
digestmap_get: 123.08 ns per element
digestset_add: 40.74 ns per element
digestset_isin: 29.20 ns per element.
Hits == 32825344
False positive rate on digestset: 0.21%

aes

1 bytes: 36.85 nsec per byte
2 bytes: 24.55 nsec per byte
4 bytes: 17.58 nsec per byte
8 bytes: 14.48 nsec per byte
16 bytes: 11.47 nsec per byte
32 bytes: 10.53 nsec per byte
64 bytes: 10.05 nsec per byte
128 bytes: 3.21 nsec per byte
256 bytes: 2.65 nsec per byte
512 bytes: 2.36 nsec per byte
1024 bytes: 2.23 nsec per byte
2048 bytes: 2.16 nsec per byte
4096 bytes: 2.12 nsec per byte
8192 bytes: 2.10 nsec per byte

cell_aes

509 bytes, misaligned by 0: 2.74 nsec per byte
509 bytes, misaligned by 1: 2.74 nsec per byte
509 bytes, misaligned by 2: 2.74 nsec per byte
509 bytes, misaligned by 3: 2.74 nsec per byte
509 bytes, misaligned by 4: 2.74 nsec per byte
509 bytes, misaligned by 5: 2.74 nsec per byte
509 bytes, misaligned by 6: 2.74 nsec per byte
509 bytes, misaligned by 7: 2.74 nsec per byte
509 bytes, misaligned by 8: 2.74 nsec per byte
509 bytes, misaligned by 9: 2.74 nsec per byte
509 bytes, misaligned by 10: 2.74 nsec per byte
509 bytes, misaligned by 11: 2.74 nsec per byte
509 bytes, misaligned by 12: 2.74 nsec per byte
509 bytes, misaligned by 13: 2.74 nsec per byte
509 bytes, misaligned by 14: 2.74 nsec per byte
509 bytes, misaligned by 15: 2.74 nsec per byte

cell_ops

Inbound cells: 1414.43 ns per cell. (2.78 ns per byte of payload)

Outbound cells: 1518.10 ns per cell. (2.98 ns per byte of payload)

This is nowhere near the dramatic performance improvements seen in
#5406

For comparision, here are benchmarks from a machine that does not have AESNI, but tor benched against 0.9.8 and 1.0.1:

OpenSSL 0.9.8:

dmap

nbits=65536
digestmap_set: 40.31 ns per element
digestmap_get: 30.15 ns per element
digestset_add: 9.94 ns per element
digestset_isin: 5.56 ns per element.
Hits == 32866304
False positive rate on digestset: 0.23%

aes

1 bytes: 13.08 nsec per byte
2 bytes: 9.55 nsec per byte
4 bytes: 7.89 nsec per byte
8 bytes: 7.04 nsec per byte
16 bytes: 6.67 nsec per byte
32 bytes: 6.48 nsec per byte
64 bytes: 6.38 nsec per byte
128 bytes: 6.40 nsec per byte
256 bytes: 6.35 nsec per byte
512 bytes: 6.32 nsec per byte
1024 bytes: 6.31 nsec per byte
2048 bytes: 6.30 nsec per byte
4096 bytes: 6.30 nsec per byte
8192 bytes: 6.30 nsec per byte

cell_aes

509 bytes, misaligned by 0: 6.12 nsec per byte
509 bytes, misaligned by 1: 6.12 nsec per byte
509 bytes, misaligned by 2: 6.12 nsec per byte
509 bytes, misaligned by 3: 6.12 nsec per byte
509 bytes, misaligned by 4: 6.12 nsec per byte
509 bytes, misaligned by 5: 6.12 nsec per byte
509 bytes, misaligned by 6: 6.12 nsec per byte
509 bytes, misaligned by 7: 6.12 nsec per byte
509 bytes, misaligned by 8: 6.13 nsec per byte
509 bytes, misaligned by 9: 6.12 nsec per byte
509 bytes, misaligned by 10: 6.12 nsec per byte
509 bytes, misaligned by 11: 6.13 nsec per byte
509 bytes, misaligned by 12: 6.12 nsec per byte
509 bytes, misaligned by 13: 6.12 nsec per byte
509 bytes, misaligned by 14: 6.12 nsec per byte
509 bytes, misaligned by 15: 6.12 nsec per byte

cell_ops

Inbound cells: 3126.88 ns per cell. (6.14 ns per byte of payload)

Outbound cells: 3131.38 ns per cell. (6.15 ns per byte of payload)

OpenSSL 1.0.1:

dmap

nbits=65536
digestmap_set: 151.35 ns per element
digestmap_get: 123.08 ns per element
digestset_add: 40.74 ns per element
digestset_isin: 29.20 ns per element.
Hits == 32825344
False positive rate on digestset: 0.21%

aes

1 bytes: 36.85 nsec per byte
2 bytes: 24.55 nsec per byte
4 bytes: 17.58 nsec per byte
8 bytes: 14.48 nsec per byte
16 bytes: 11.47 nsec per byte
32 bytes: 10.53 nsec per byte
64 bytes: 10.05 nsec per byte
128 bytes: 3.21 nsec per byte
256 bytes: 2.65 nsec per byte
512 bytes: 2.36 nsec per byte
1024 bytes: 2.23 nsec per byte
2048 bytes: 2.16 nsec per byte
4096 bytes: 2.12 nsec per byte
8192 bytes: 2.10 nsec per byte

cell_aes

509 bytes, misaligned by 0: 2.74 nsec per byte
509 bytes, misaligned by 1: 2.74 nsec per byte
509 bytes, misaligned by 2: 2.74 nsec per byte
509 bytes, misaligned by 3: 2.74 nsec per byte
509 bytes, misaligned by 4: 2.74 nsec per byte
509 bytes, misaligned by 5: 2.74 nsec per byte
509 bytes, misaligned by 6: 2.74 nsec per byte
509 bytes, misaligned by 7: 2.74 nsec per byte
509 bytes, misaligned by 8: 2.74 nsec per byte
509 bytes, misaligned by 9: 2.74 nsec per byte
509 bytes, misaligned by 10: 2.74 nsec per byte
509 bytes, misaligned by 11: 2.74 nsec per byte
509 bytes, misaligned by 12: 2.74 nsec per byte
509 bytes, misaligned by 13: 2.74 nsec per byte
509 bytes, misaligned by 14: 2.74 nsec per byte
509 bytes, misaligned by 15: 2.74 nsec per byte

cell_ops

Inbound cells: 1414.43 ns per cell. (2.78 ns per byte of payload)

Outbound cells: 1518.10 ns per cell. (2.98 ns per byte of payload)

Child Tickets

Change History (9)

comment:1 Changed 8 years ago by nickm

Milestone: Tor: 0.2.3.x-final

comment:2 Changed 8 years ago by nickm

Hm. So according to the openssl code in evp/e_aes.c, it really looks like AESNI _should_ be getting used where possible. If you can use a debugger, could you please tell me the contents of (OPENSSL_ia32cap_P[1]) ? If not, could somebody else with an AESNI-compatible chip have a look?

(Looking at the code: when we call EVP_aes_128_ctr(), OpenSSL 1.0.1 calls

{ return AESNI_CAPABLE?&aesni_##keylen##_##mode:&aes_##keylen##_##mode; }

And AESNI_CAPABLE is defined as:

#define AESNI_CAPABLE   (OPENSSL_ia32cap_P[1]&(1<<(57-32)))

Also, can anybody else with an aesni cpu confirm this?

comment:3 Changed 8 years ago by murble

I have tried with 0.2.3.15-alpha, my out put of the run is at
https://www.yuri.org.uk/~murble/tor/tor-aes.txt

running src/test/bench under gdb with a breakpoint in
aesni_ctr32_encrypt_blocks traps.
Z.B

Breakpoint 1, aesni_ctr32_encrypt_blocks () at aesni-x86_64.s:889
889 cmpq $1,%rdx

With the Debian wheezy libssl 1.0.1b-1 on x86-64 on an i7-2600.
and
OPENSSL_ia32cap_P[1] = 532341759

comment:4 in reply to:  3 Changed 8 years ago by arma

Replying to murble:

I have tried with 0.2.3.15-alpha, my out put of the run is at
https://www.yuri.org.uk/~murble/tor/tor-aes.txt

For posterity, here that is on trac:

murble@lxctest:~/git/tor-0.2.3.15-alpha$ ./src/test/bench
===== dmap =====
nbits=65536
digestmap_set: 35.87 ns per element
digestmap_get: 27.16 ns per element
digestset_add: 8.02 ns per element
digestset_isin: 4.94 ns per element.
Hits == 32858112
False positive rate on digestset: 0.22%
===== aes =====
1 bytes: 16.93 nsec per byte
2 bytes: 9.53 nsec per byte
4 bytes: 5.62 nsec per byte
8 bytes: 3.73 nsec per byte
16 bytes: 1.91 nsec per byte
32 bytes: 1.09 nsec per byte
64 bytes: 0.62 nsec per byte
128 bytes: 0.50 nsec per byte
256 bytes: 0.36 nsec per byte
512 bytes: 0.31 nsec per byte
1024 bytes: 0.27 nsec per byte
2048 bytes: 0.26 nsec per byte
4096 bytes: 0.25 nsec per byte
8192 bytes: 0.25 nsec per byte
===== cell_aes =====
509 bytes, misaligned by 0: 0.38 nsec per byte
509 bytes, misaligned by 1: 0.38 nsec per byte
509 bytes, misaligned by 2: 0.38 nsec per byte
509 bytes, misaligned by 3: 0.38 nsec per byte
509 bytes, misaligned by 4: 0.37 nsec per byte
509 bytes, misaligned by 5: 0.38 nsec per byte
509 bytes, misaligned by 6: 0.38 nsec per byte
509 bytes, misaligned by 7: 0.38 nsec per byte
509 bytes, misaligned by 8: 0.39 nsec per byte
509 bytes, misaligned by 9: 0.38 nsec per byte
509 bytes, misaligned by 10: 0.38 nsec per byte
509 bytes, misaligned by 11: 0.38 nsec per byte
509 bytes, misaligned by 12: 0.39 nsec per byte
509 bytes, misaligned by 13: 0.38 nsec per byte
509 bytes, misaligned by 14: 0.39 nsec per byte
509 bytes, misaligned by 15: 0.38 nsec per byte
===== cell_ops =====
 Inbound cells: 199.94 ns per cell. (0.39 ns per byte of payload)
Outbound cells: 207.36 ns per cell. (0.41 ns per byte of payload)

comment:5 Changed 8 years ago by arma

Just so somebody's said it: might it be relevant that all the happy people here are on Linux, and all the unhappy people are on FreeBSD?

comment:6 Changed 8 years ago by ln5

Cc: ln5 added

comment:7 Changed 8 years ago by cypherpunks

Resolution: user disappeared
Status: newclosed

The problem was related to the AESNI instruction being disabled in the bios. It works fine. This bug should be closed. Thanks!

cell_aes

509 bytes, misaligned by 0: 0.47 nsec per byte
509 bytes, misaligned by 1: 0.47 nsec per byte
509 bytes, misaligned by 2: 0.47 nsec per byte
509 bytes, misaligned by 3: 0.47 nsec per byte
509 bytes, misaligned by 4: 0.47 nsec per byte
509 bytes, misaligned by 5: 0.47 nsec per byte
509 bytes, misaligned by 6: 0.47 nsec per byte
509 bytes, misaligned by 7: 0.47 nsec per byte
509 bytes, misaligned by 8: 0.47 nsec per byte
509 bytes, misaligned by 9: 0.47 nsec per byte
509 bytes, misaligned by 10: 0.47 nsec per byte
509 bytes, misaligned by 11: 0.47 nsec per byte
509 bytes, misaligned by 12: 0.47 nsec per byte
509 bytes, misaligned by 13: 0.47 nsec per byte
509 bytes, misaligned by 14: 0.47 nsec per byte
509 bytes, misaligned by 15: 0.47 nsec per byte

cell_ops

Inbound cells: 258.41 ns per cell. (0.51 ns per byte of payload)

Outbound cells: 357.35 ns per cell. (0.70 ns per byte of payload)

comment:8 Changed 7 years ago by nickm

Keywords: tor-relay added

comment:9 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.