Consider batch-exponentiation tricks to improve ntor performance
In the ntor paper, the authors say:
In our protocol the server needs to compute X^b^ and X^y^. Since the base is the same, squarings in the squareand-multiply algorithm can be parallelized [MN96] reducing the computational cost to 1.33 exponentiations. Further improvements such the \Exponent Combination Method" [MN96, x2.3] can be applied to the computation of the server. However such algorithms further increase the complexity of the computations and the improvements are not always clear cut.
This is why they list "1.33" online exponentiations, while our naive implementation has 2.
We should examine whether we can actually use some/all of these techniques. (I believe there has been some discussion on the point on tor-dev already; anybody want to dig up a link?)