Opened 4 years ago

Closed 4 years ago

Last modified 3 years ago

#15503 closed defect (not a bug)

VIA PadLock suupport does not work.

Reported by: toyboy Owned by:
Priority: Medium Milestone:
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Keywords: VIA PadLock, lorax
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I have enabled VIA PadLock support in TOR by adding the following lines to torrc config file:
HardwareAccel 1
AccelName padlock

Since TOR prefers AES-128-GCM over AES-128/256-CBC I have disabled all AES-GCM algirithms in src/common/ciphers.inc file - this is required to test Via PadLock.
I am aware that AES-GCM is more secure than AES-CBC but AES-GCM is NOT supported by VIA PadLock.
After this modification I see in tcpdump that client and server agreed to use AES-256-CBC (0xc014) which is supported by VIA Padlock.

During startup in debug log file created by TOR I see the following messages:
...
Mar 29 14:09:39.000 [notice] Tor 0.2.7.0-alpha-dev (git-4e4ee768fb796f5d) opening log file.
Mar 29 14:09:39.692 [notice] Tor v0.2.7.0-alpha-dev (git-4e4ee768fb796f5d) running on Linux with Libevent 2.0.19-stable, OpenSSL 1.0.1e and Zlib 1.2.7.
Mar 29 14:09:39.693 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
Mar 29 14:09:39.695 [notice] This version is not a stable Tor release. Expect more bugs than usual.
Mar 29 14:09:39.697 [notice] Read configuration file "/etc/tor/torrc-test".
Mar 29 14:09:39.720 [notice] Opening Socks listener on 127.0.0.1:9050
Mar 29 14:09:39.000 [notice] Not disabling debugger attaching for unprivileged users.
Mar 29 14:09:39.000 [notice] Parsing GEOIP IPv4 file /tmp/tor-git/share/tor/geoip.
Mar 29 14:09:40.000 [notice] Parsing GEOIP IPv6 file /tmp/tor-git/share/tor/geoip6.
Mar 29 14:09:40.000 [notice] Default OpenSSL engine for SHA1 is VIA PadLock: RNG ACE2 PHE PMM [padlock]
Mar 29 14:09:40.000 [notice] Default OpenSSL engine for AES-128-ECB is VIA PadLock: RNG ACE2 PHE PMM [padlock]
Mar 29 14:09:40.000 [notice] Default OpenSSL engine for AES-128-CBC is VIA PadLock: RNG ACE2 PHE PMM [padlock]
Mar 29 14:09:40.000 [notice] Default OpenSSL engine for AES-256-CBC is VIA PadLock: RNG ACE2 PHE PMM [padlock]
Mar 29 14:09:41.000 [notice] Bootstrapped 0%: Starting
Mar 29 14:09:42.000 [notice] Bootstrapped 80%: Connecting to the Tor network
Mar 29 14:09:44.000 [notice] Bootstrapped 85%: Finishing handshake with first hop
Mar 29 14:09:44.000 [notice] Bootstrapped 90%: Establishing a Tor circuit
Mar 29 14:09:45.000 [notice] Tor has successfully opened a circuit. Looks like client functionality is working.
Mar 29 14:09:45.000 [notice] Bootstrapped 100%: Done
...

Additionally I have executed openssl quick test:

$ openssl speed -engine padlock -evp aes-256-cbc
engine "padlock" set.
Doing aes-256-cbc for 3s on 16 size blocks: 11632391 aes-256-cbc's in 2.38s
Doing aes-256-cbc for 3s on 64 size blocks: 8720103 aes-256-cbc's in 2.33s
Doing aes-256-cbc for 3s on 256 size blocks: 4521883 aes-256-cbc's in 2.28s
Doing aes-256-cbc for 3s on 1024 size blocks: 1642508 aes-256-cbc's in 2.40s
Doing aes-256-cbc for 3s on 8192 size blocks: 208581 aes-256-cbc's in 2.14s
OpenSSL 1.0.1e 11 Feb 2013
built on: Fri Mar 27 17:07:39 CET 2015
options:bn(64,32) rc4(8x,mmx) des(ptr,risc1,16,long) aes(partial) blowfish(idx)
compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall -march=i686 -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 78200.95k 239522.14k 507720.20k 700803.41k 798455.87k

I started to test this configuration and I have quickly realized that HW offload is NOT used. After attaching to pid of the TOR daemon with perf I have the following statistics:

Events: 205K cycles

47.56% libcrypto.so.1.0.0 . _sse_AES_encrypt_compact

6.32% libcrypto.so.1.0.0 . sha1_block_data_order
1.66% libcrypto.so.1.0.0 . AES_encrypt
1.42% libc-2.13.so . memcpy_ia32
1.37% libcrypto.so.1.0.0 . CRYPTO_ctr128_encrypt
1.37% [ip_tables] [k] ipt_do_table
1.32% [kernel] [k]
do_softirq
1.17% [kernel] [k] sock_def_readable
0.77% libpadlock.so . padlock_aes_cipher
0.77% libc-2.13.so . _int_malloc
0.73% tor . tor_memeq
0.72% libssl.so.1.0.0 . ssl3_cbc_digest_record
0.62% [libata] [k] ata_scsi_queuecmd
0.57% [r8169] [k] 0x2719
0.55% [kernel] [k] copy_to_user_ll
0.47% tor . siphash24
0.44% tor .
x86.get_pc_thunk.bx
0.41% [kernel] [k] nf_iterate
0.41% [vdso] . 0xb75209d1
0.39% tor . .L4
0.39% [kernel] [k] copy_from_user_ll
0.38% libevent-2.0.so.5.1.7 . 0xae18
0.34% [nf_conntrack] [k] tcp_packet
0.33% [kernel] [k] skb_copy_bits

...

It looks like SSE implementation of AES is in use and looks like SHA1 is NOT offloaded too.

Child Tickets

Attachments (1)

phe_sha_sum.txt (18.5 KB) - added by anon 4 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 Changed 4 years ago by nickm

I wonder whether that's the invocation from SSL, or the invocation from aes.c in Tor? Tor's aes.c uses EVP_aes_128_ctr() by default, I think -- is that accelerated in your setup?

comment:2 in reply to:  1 Changed 4 years ago by yawning

Keywords: lorax added
Priority: majornormal

Replying to nickm:

I wonder whether that's the invocation from SSL, or the invocation from aes.c in Tor? Tor's aes.c uses EVP_aes_128_ctr() by default, I think -- is that accelerated in your setup?

The padlock invocation is TLS, the SSE one is the cell crypto. Disabling GCM to test if padlock is working is sort of overkill, since we log which engines we're going to use. The issue here is that the user is running "OpenSSL-1.0.1OhMyGodUpgradeNow", which does not have padlock CTR support (Checking through the git tags, while the code to support it has been in the master branch for a while, they've never shipped it in a stable release.

If the underlying copy of OpenSSL supported it, we would use it.

#ifdef NID_aes_128_ctr
      log_engine("AES-128-CTR", ENGINE_get_cipher_engine(NID_aes_128_ctr));
#endif

So, there's no bug on our side here for AES. Since we don't use EVP based SHA1 currently, acceleration will not happen for that either, even if it happens to be available, which may be something we can fix, but I don't see this being major.

Clarification: Cleaned up phrasing etc.

Last edited 4 years ago by yawning (previous) (diff)

comment:3 Changed 4 years ago by nickm

Milestone: Tor: 0.2.7.x-finalTor: 0.2.???

comment:4 Changed 4 years ago by anon

For SHA, and getting access to state before finalization, see this undocumented behavior:

"On VIA Nano and later, you can perform partial hashes by setting EAX to FFFFFFFF before executing the REP XSHA1/256 instruction - and the CPU won't perform the final padding (so you can simply feed the chunks into the hash, just as you usually do with hashing functions). On older models (up to C7), such a possibility is not present, EAX has to be set to zero before the hash instruction, and a full hash (i.e. including the final padding) is performed." - http://stackoverflow.com/questions/21526677/streaming-sha-calculation-using-vias-padlock-hashing-engine

They link to the VIA Padlock SDK which contains examples of this usage.

Prior to all bits high option, you could trigger a bus error like exception during call, which left the state un-finalized. The performance hit of this method may make it useful to avoid side channels in software impls but gain nothing or very small in terms of performance.

Changed 4 years ago by anon

Attachment: phe_sha_sum.txt added

comment:5 Changed 4 years ago by yawning

Resolution: not a bug
Status: newclosed

Looked at this a bit more, since we probably should(?) use EVP for the non-one-shot hash calls. The fact ancient VIA processors don't have an easy way to get partial hashes ends up being a moot point because OpenSSL does not support Padlock's SHA acceleration at all.

There is partial code for it in master, but it is not wired into the EVP interface. I'm not sure if there's a easy way to implement "EVP_MD_CTX_copy_ex()` on the problematic old steppings, but that's the OpenSSL developer's problem and not mine.

In summary:

  • The SSE implementation of AES is used because OpenSSL does not expose CTR acceleration for PadLock in non-master.
  • None of the SHA calls are offloaded because OpenSSL does not expose SHA acceleration for PadLock at all, and tor doesn't use the EVP interface so even if it existed, it wouldn't be used.

I'll file a separate ticket regarding using EVP for hashing, but that's really a separate issue to "OpenSSL's support for PadLock is lacking", which is not a tor bug.

comment:6 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:7 Changed 3 years ago by nickm

Milestone: Tor: 0.3.???

Milestone deleted

Note: See TracTickets for help on using tickets.