Faster curve25519 implementation for ntor
Floodberry's curve25519 implementations at https://github.com/floodyberry/curve25519-donna are mostly C, and claim to be faster still than the ones we're using now, especially on intel cpus. We should evaluate them and consider switching.
Also, if we find an ed25519 implementation we like and wind up using it, we should evaluate using its component pieces to build an optimized curve25519 implementation for calculations on the base point as per http://www.imperialviolet.org/2013/05/10/fastercurve25519.html ; Adam has some example code based on one of the amd64 assembly implementations.