Opened 5 years ago

Closed 5 years ago

#13616 closed enhancement (invalid)

define jmeter testcase(s) and ant task(s)

Reported by: iwakeh Owned by: iwakeh
Priority: High Milestone:
Component: Metrics/Onionoo Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: #13080 Points:
Reviewer: Sponsor:

Description

Collect the following data

  • concurrent users
  • response times
  • ...

Child Tickets

Attachments (1)

onionoo-jmeter-testplan.jmx (16.0 KB) - added by iwakeh 5 years ago.
an example test plan for jmeter

Download all attachments as: .zip

Change History (12)

comment:1 Changed 5 years ago by iwakeh

Owner: set to iwakeh
Priority: normalmajor
Status: newassigned
Summary: define jmeter testcasesdefine jmeter testcase(s) ant ant task(s)

comment:2 Changed 5 years ago by iwakeh

It's important to have some performance testing. See #13089 for example.

comment:3 Changed 5 years ago by iwakeh

I attach an example test plan for getting started with jmeter.
(this is just a quick example I 'clicked together' using the jmeter gui)

First steps in jmeter

On debian/wheezy install

dpkg -l | grep jmeter
ii  jmeter                                2.11-2                             all          Load testing and performance measurement application (main application)
ii  jmeter-apidoc                         2.11-2                             all          Load testing and performance measurement application (API doc)
ii  jmeter-help                           2.11-2                             all          Load testing and performance measurement application (user manual)
ii  jmeter-http                           2.11-2                             all          Load testing and performance measurement application (http module)
ii  jmeter-java                           2.11-2                             all          Load testing and performance measurement application (java module)
ii  jmeter-tcp                            2.11-2                             all          Load testing and performance measurement application (tcp module)

The gui comes up with /usr/bin/jmeter

Now you can load the attached test-plan.

  • take a look at 'User defined Variables' for setting the host and log file
  • then the thread group 'https requests' for adapting the number of threads and the like. This thread group runs the two defined requests for "summary" and 700 details.
  • the thread group 'https requests main page' contains almost the same, but only a single request for onionoo's main page

(I just ran some mild tests against your mirror and hope it didn't cause any alarm bells :-)

Last edited 5 years ago by iwakeh (previous) (diff)

Changed 5 years ago by iwakeh

Attachment: onionoo-jmeter-testplan.jmx added

an example test plan for jmeter

comment:4 Changed 5 years ago by iwakeh

Status: assignedneeds_information

Some thoughts and questions:

Should the jmeter test really be automated?

The jmeter gui offers really nice statistics.
Secondly, the performance tests will run against different targets.
Recently it would have been jetty vs. tomcat. In future maybe, new implementation
against current implementation. As the questions that we want to answer using jmeter
differ quite a bit, this area will be difficult to automate. And, at least for the questions
up to now we would need a 'real' Onionoo server instance.

Test resources in git

Should jmeter tests have there own folder in git?
Items that should/could be stored there are jmeter test plans and corresponding performance data.

Last edited 5 years ago by iwakeh (previous) (diff)

comment:5 Changed 5 years ago by karsten

Type: taskenhancement

Sounds like an enhancement to me, and what's a "task" anyway in this context?

comment:6 Changed 5 years ago by karsten

I finally started working on this, because I want to have some baseline for considering switching to a database (#11573). I briefly looked a JMeter, but found it too heavy-weight for our purposes. I decided to instead use httperf and a simple shell script around it:

#!/bin/bash

URIS=(
  "/summary?limit=1&"
  "/summary?limit=1&type=relay"
  "/summary?limit=1&type=bridge"
  "/summary?limit=1&running=true"
  "/summary?limit=1&running=false"
  "/summary?limit=1&search=moria1"
  "/summary?limit=1&search=ria"
  "/summary?limit=1&search=a"
  "/summary?limit=1&search=9695DFC35FFEB861329B9F1AB04C46397020CE31"
  "/summary?limit=1&search=9695DFC3"
  "/summary?limit=1&search=969"
  "/summary?limit=1&search=DD51A2029FED0276866332EACC6459E1D015E349"
  "/summary?limit=1&search=DD51A202"
  "/summary?limit=1&search=DD5"
  "/summary?limit=1&search=lpXfw1/+uGEym58asExGOXAgzjE"
  "/summary?limit=1&search=lpX"
  "/summary?limit=1&search=128.31.0.34"
  "/summary?limit=1&search=128.31.0"
  "/summary?limit=1&search=128.31"
  "/summary?limit=1&search=128"
  "/summary?limit=1&lookup=9695DFC35FFEB861329B9F1AB04C46397020CE31"
  "/summary?limit=1&lookup=DD51A2029FED0276866332EACC6459E1D015E349"
  "/summary?limit=1&country=us"
  "/summary?limit=1&as=3"
  "/summary?limit=1&flag=Running"
  "/summary?limit=1&flag=Authority"
  "/summary?limit=1&first_seen_days=0-2"
  "/summary?limit=1&first_seen_days=0-3"
  "/summary?limit=1&first_seen_days=3"
  "/summary?limit=1&contact=arma"
  "/summary?limit=1&contact=arm"
  "/summary?limit=1&contact=a"
  "/summary?limit=1&order=consensus_weight"
  "/summary?limit=1&order=-consensus_weight"
  "/summary?limit=1&family=9695DFC35FFEB861329B9F1AB04C46397020CE31"
  "/details?limit=100"
  "/details?limit=100&offset=500"
  "/details?limit=100&fields=fingerprint"
  "/bandwidth?limit=100"
)

for (( i = 0 ; i < ${#URIS[@]} ; i++ )); do
  if [ ! -d ${URIS[$i]} ]; then
    httperf --server=onionoo.thecthulhu.com --uri=${URIS[$i]} \
            --port=443 --ssl --num-calls=10 --verbose >> perf.log
  fi
done

The output is a verbose log containing lines like this:

Reply time [ms]: response 48.0 transfer 0.1

Here are results starting with highest response times:

   121.9 /summary?limit=1&order=-consensus_weight
   121.4 /summary?limit=1&order=consensus_weight

Looks like ordering results is a really expensive operation, which is something I didn't expect. But I do expect that a database would be better at this.

   106.8 /summary?limit=1&search=9695DFC35FFEB861329B9F1AB04C46397020CE31

Searching by such a long string (a full hex fingerprint) shouldn't be as expensive.

   105.8 /details?limit=100&fields=fingerprint

This one is expensive, because we need to parse 100 JSON documents and produce 100 new JSON documents on-the-fly. This won't get any faster when moving the search index to a database, unless we use a database that can store and process JSON documents. Though I'm not certain whether that will be faster and worth the effort.

   105.3 /summary?limit=1&search=DD51A2029FED0276866332EACC6459E1D015E349
    89.1 /summary?limit=1&search=lpXfw1/+uGEym58asExGOXAgzjE
    88.5 /summary?limit=1&search=9695DFC3
    85.4 /summary?limit=1&search=969
    72.7 /summary?limit=1&search=DD51A202
    71.3 /summary?limit=1&search=lpX
    70.2 /summary?limit=1&search=moria1
    69.9 /summary?limit=1&search=DD5
    68.7 /summary?limit=1&search=ria
    68.6 /summary?limit=1&search=128.31.0.34
    67.2 /summary?limit=1&search=128.31.0
    66.9 /summary?limit=1&search=128.31
    65.0 /summary?limit=1&search=128
    61.3 /summary?limit=1&search=a

All these searches should be faster. What's interesting is that longer search terms take longer than short ones, even though shorter terms produce many more intermediate results (and we don't stop after the first result, even though we could). My hope is that a database will make all these searches faster, though I'm slightly concerned that substring searches with very few characters (like ria in moria1 or even a in arma) might not be as fast. I could imagine changing the protocol to require at least three characters in these searches to make use of trigram matching. After all, what do people expect when they search for a?

    54.4 /summary?limit=1&first_seen_days=0-3
    54.2 /summary?limit=1&first_seen_days=0-2
    53.9 /summary?limit=1&first_seen_days=3
    52.8 /summary?limit=1&running=false

These look okay.

    52.2 /details?limit=100&offset=500

This includes transfering 100 details documents, so should be fine.

    51.9 /summary?limit=1&flag=Running
    51.0 /summary?limit=1&running=true
    50.2 /summary?limit=1&contact=arma

These look fine.

    49.9 /details?limit=100

This is a tiny bit faster than the search above with offset 500, which seems reasonsable.

    49.4 /summary?limit=1&contact=arm
    49.1 /bandwidth?limit=100
    48.0 /summary?limit=1&
    47.5 /summary?limit=1&as=3
    47.2 /summary?limit=1&country=us
    47.1 /summary?limit=1&flag=Authority
    46.9 /summary?limit=1&contact=a
    46.5 /summary?limit=1&family=9695DFC35FFEB861329B9F1AB04C46397020CE31
    45.8 /summary?limit=1&type=relay
    44.2 /summary?limit=1&type=bridge
    42.2 /summary?limit=1&lookup=DD51A2029FED0276866332EACC6459E1D015E349
    42.2 /summary?limit=1&lookup=9695DFC35FFEB861329B9F1AB04C46397020CE31

No surprises here, these all look good.

So, what other searches did I miss?

comment:7 in reply to:  6 ; Changed 5 years ago by iwakeh

Concerning the test cases I would add more data, e.g. measure many request
for different fingerprints or ip addresses (or ... or ...) in order to avoid a bias when measuring the new retrieval methods.

JMeter's scope is concurrent stress testing of the entire web application.
If this is not intended at all, we should close this (#13616) issue.

I think, 'response preparation performance measuring' should be a new issue.

For a database benchmark I would suggest measuring data preparation
using a simple benchmarking class that calls the code responsible for preparing
a response directly. W/o any network or web app in between.

This benchmarking class could be in the testing package. An ant task
could be added for performing these benchmark tests. Thus, even ensuring
later on that certain changes don't degrade performance.

It might be even good to prepare a measurement class for json parsing and
preparing in itself, in order to evaluate Gson replacements?

What do you think?

comment:8 Changed 5 years ago by iwakeh

Summary: define jmeter testcase(s) ant ant task(s)define jmeter testcase(s) and ant task(s)

comment:9 in reply to:  7 ; Changed 5 years ago by karsten

Replying to iwakeh:

Concerning the test cases I would add more data, e.g. measure many request
for different fingerprints or ip addresses (or ... or ...) in order to avoid a bias when measuring the new retrieval methods.

Agreed. While this is not necessary for measuring the current implementation that stores all relevant search data in memory, it would be very useful for testing any database-based solutions.

I wonder if we can use out/summary as input to automatically generate as many query samples as we need.

JMeter's scope is concurrent stress testing of the entire web application.
If this is not intended at all, we should close this (#13616) issue.

Well, my impression was that we'll want something simpler for this specific case.

Let's close this ticket as soon as we have spawned new tickets, okay?

I think, 'response preparation performance measuring' should be a new issue.

For a database benchmark I would suggest measuring data preparation
using a simple benchmarking class that calls the code responsible for preparing
a response directly. W/o any network or web app in between.

This benchmarking class could be in the testing package. An ant task
could be added for performing these benchmark tests. Thus, even ensuring
later on that certain changes don't degrade performance.

Agreed on all the above. Basically, that would be a performance test of RequestHandler. But I guess, we'd want to use what's in the local out/summary to populate the node index, rather than putting in some samples as we do for unit tests. And we might want to find a new place for these test classes than src/test/java/ in order not to conflict with unit tests.

It might be even good to prepare a measurement class for json parsing and
preparing in itself, in order to evaluate Gson replacements?

That could be useful, though it's quite specific. There's an assumption in that that Gson is the performance bottleneck, but if we figure out it's not, we might not even learn about performance problems located nearby. I think I'd rather want to start one layer above that, and if we identify a bottleneck there that could be related to Gson, I'd want to try replacing it and seeing if that improves the layer above.

For example, we rely on Gson being fast when responding to a request for details documents with the fields parameter set. It might be useful to write a performance test for ResponseBuilder and see if those requests stand out a lot.

Another place where we use Gson is as part of the hourly cronjob, though performance is less critical there. But maybe we could write a similar performance test class for DocumentStore, and then we can not only evaluate Gson replacements but also new designs where we replace file system storage with database storage.

What do you think?

I think that this is all very useful stuff, but also that I need help with this. I'm counting four small or mid-sized projects here:

  • Make room for performance tests somewhere in src/ and write a separate Ant task to run them.
  • Take an out/summary file as input and generate good sample requests for a RequestHandler performance test class. Also write that test class.
  • Write a performance test class for ResponseBuilder, probably requiring a successful run of the hourly updater to populate the out/ directory.
  • Write another performance test class for DocumentStore that takes a populated status/ and out/ directory as input and performs a random series of listing, retrieving, removing, and storing documents. Ideally, the test class would make sure that the contents in both directories are still the same after running the test.

Plenty of stuff to do here. Want to help getting this started?

comment:10 in reply to:  9 Changed 5 years ago by iwakeh

Replying to karsten:

I wonder if we can use out/summary as input to automatically generate as many query samples as we need.

Yeah, this seems the right way to do the benchmarking. In addition, we shouldn't forget
searches without any results.

That could be useful, though it's quite specific. There's an assumption in that that Gson is the performance bottleneck, ...

I don't assume that Gson is a performance bottleneck, in the opposite. But it has its quirks (e.g.
html-escapes) and if at some point there is need and time to use a different json-solution, it
might be good to at least get the same performance from the new solution.

  • Make room for performance tests somewhere in src/ and write a separate Ant task to run them.

I'll start with this one.

  • Take an out/summary file as input and generate good sample requests for a RequestHandler performance test class. Also write that test class.
  • Write a performance test class for ResponseBuilder, probably requiring a successful run of the hourly updater to populate the out/ directory.
  • Write another performance test class for DocumentStore that takes a populated status/ and out/ directory as input and performs a random series of listing, retrieving, removing, and storing documents. Ideally, the test class would make sure that the contents in both directories are still the same after running the test.

Plenty of stuff to do here. Want to help getting this started?

Sure :-)

comment:11 Changed 5 years ago by iwakeh

Resolution: invalid
Status: needs_informationclosed

The benchmarking tasks will be handled from a new parent.
This ticket is obsolete now.

Note: See TracTickets for help on using tickets.