Opened 2 years ago

Closed 22 months ago

#21759 closed enhancement (implemented)

Add persistence for torperf/onionperf

Reported by: iwakeh Owned by: iwakeh
Priority: Medium Milestone: CollecTor 1.5.0
Component: Metrics/CollecTor Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

As discussed in #21272 comment:18 and before.

All storing of files should be done by the persist-mechanism, i.e.,
a o.t.c.persist.TpfPersistence class needs to be introduced (this would
shorten TorperfDownloader quite a bit and prevent another
re-implementation of this functionality).

Decide if sync-functionality for the Xperf files can be done with this ticket, too, or needs another ticket.

Child Tickets

Change History (27)

comment:1 Changed 2 years ago by iwakeh

Owner: changed from metrics-team to iwakeh
Status: newaccepted

comment:2 Changed 2 years ago by iwakeh

Milestone: CollecTor 1.3.0

Also add sync-code.

comment:3 Changed 2 years ago by iwakeh

Status: acceptedneeds_information

Currently, Onionperf files are stored accumulated (many single descriptors in one file) in both recent and out folders.
This sort of deviates from the usual process of storing only single descriptor files in the out folder.
Should onionperf storage be changed to storing single file descriptors in the out directory?

comment:4 Changed 2 years ago by iwakeh

Should the standard annotation be changed from current Onionperf("@type torperf 1.0\n") to Onionperf("@type torperf 1.1\n")?

comment:5 Changed 2 years ago by iwakeh

Status: needs_informationneeds_review

Assuming 'no' for the question in comment:3 (store only one file in out) and 'yes' for comment:4 please find this branch for review.

This branch introduces sync-functionality for Onionperf, but not yet storing via the persistence mechanism, because the latter is currently implemented using an implicit transaction, i.e., all descriptors in one downloaded descriptor file are only stored, if all were valid. This is different from the sync-approach where invalid/unparseable descriptors are ignored, but valid ones stored no matter if they came in one file.

Could the sync-approach be also used for directly downloaded descriptors?

comment:6 in reply to:  3 Changed 2 years ago by karsten

Replying to iwakeh:

Currently, Onionperf files are stored accumulated (many single descriptors in one file) in both recent and out folders.
This sort of deviates from the usual process of storing only single descriptor files in the out folder.
Should onionperf storage be changed to storing single file descriptors in the out directory?

Good question! To be honest, I don't find it particularly useful that current tarballs contain just a single descriptor per file. This works okay for large descriptors like consensuses or votes, but not so much for server descriptors or even microdescriptors. The main reason why we're still doing it is that it's easy to check whether we already have a given descriptor if we can identify it by filename. If we had a database for this information and if we were to rewrite the tarball-creating code, we might do things differently. For example, we could put all descriptors starting with digest ee into an ee file. That would probably speed up things a lot for people using our tarballs.

So, for this case I'd say let's leave files as they are and not split them up.

comment:7 in reply to:  4 Changed 2 years ago by karsten

Replying to iwakeh:

Should the standard annotation be changed from current Onionperf("@type torperf 1.0\n") to Onionperf("@type torperf 1.1\n")?

Yes. Though I didn't find where this is even used in the current code, but your patch might change that. So, yes, we should update this.

I'll look at the remaining comment 5 next, but that might be either tonight or tomorrow.

comment:8 Changed 2 years ago by karsten

Status: needs_reviewneeds_revision

It took a bit longer to respond to your comment 5, but here it comes:

  • The following is entirely unrelated to the given patch, but can you try to change your Git commit messages to follow the 50/72 rule? That is, the first line has no more than 50 characters (unless that's just too hard, in which case 60 are still okay), the next line is blank, and all remaining lines are wrapped at 72 characters with newlines added between paragraphs. And can you write the first line using the imperative, as in "Add sync functionality for tpf files." rather than "Implements XY"? Again, this is unrelated to this specific commit, but just like we're trying to use a common code style we should aim for using a common commit message style. Sorry for the nitpick. :)
  • Can you add a change log entry for this change?
  • Should we put out a metrics-lib 2.1.0 first towards the end of the month, so that we don't have to require a -dev version in the next released CollecTor?
  • Can we keep the old capitalization of OnionPerf in the configs? It seems that's what Rob used. It also means fewer changes to the code.
  • Typo: "Instantiate", not "Instanciate".
  • The line byte[] db = desc.getRawDescriptorBytes(); in SyncPersistence is probably still left over from testing. This method has become really expensive, so we should be careful not to call it more often than necessary.
  • I'd like to avoid that we handle descriptors differently depending on whether we download them from an OnionPerf host vs. synchronize them from a CollecTor host. And I believe that the approach taken in this synchronization patch makes more sense than the strict validation approach taken in the download part. But maybe we can first take a step back and look at all the other modules and find a common approach for all or at least most of them. Let's have this discussion first and then merge this patch if it still does what we think should be done. Does that make sense?

Thanks!

comment:9 in reply to:  8 Changed 2 years ago by iwakeh

Replying to karsten:

It took a bit longer to respond to your comment 5, but here it comes:

Thanks for reviewing!

  • The following is entirely unrelated to the given patch, but can you try to change your Git commit messages to follow the 50/72 rule? That is, the first line has no more than 50 characters (unless that's just too hard, in which case 60 are still okay), the next line is blank, and all remaining lines are wrapped at 72 characters with newlines added between paragraphs. And can you write the first line using the imperative, as in "Add sync functionality for tpf files." rather than "Implements XY"? Again, this is unrelated to this specific commit, but just like we're trying to use a common code style we should aim for using a common commit message style. Sorry for the nitpick. :)

No nitpicking; we even have that documented. Will adapt the commits.

  • Can you add a change log entry for this change?

Sure.

  • Should we put out a metrics-lib 2.1.0 first towards the end of the month, so that we don't have to require a -dev version in the next released CollecTor?

Agreed, the new CollecTor release should only use release versions of metrics-lib.
Wouldn't the next be 2.0.1 for the tiny fix? Are there more things we could release soon?
Maybe continue the metrics-lib release discussion on #22912 or via mail as it feels out of place here?

  • Can we keep the old capitalization of OnionPerf in the configs? It seems that's what Rob used. It also means fewer changes to the code.

Well, in general I'd prefer:

  1. consistent naming and capitalization of classes and configuration (i.e., OnionperfDownloader.class might need to be renamed too, if 'OnionPerf*' is chosen.)
  2. rather less capitalization, e.g., prefer 'Exitlist' over 'ExitList'

Maybe, there are more rules for naming.

  • Typo: "Instantiate", not "Instanciate".
  • The line byte[] db = desc.getRawDescriptorBytes(); in SyncPersistence is probably still left over from testing. This method has become really expensive, so we should be careful not to call it more often than necessary.

Get to these two in a new patch when the final topic is addressed.

  • I'd like to avoid that we handle descriptors differently depending on whether we download them from an OnionPerf host vs. synchronize them from a CollecTor host. And I believe that the approach taken in this synchronization patch makes more sense than the strict validation approach taken in the download part.

Yes, I agree the simpler version will be better.

But maybe we can first take a step back and look at all the other modules and find a common approach for all or at least most of them. Let's have this discussion first and then merge this patch if it still does what we think should be done. Does that make sense?

Totally, I'll survey that question as next step before any other work here.

Thanks!

comment:10 Changed 2 years ago by iwakeh

Common approach for storing:
It turns out that the functionality in OnionperfDownloader was too restrictive.
Here's an overview:

  • bridge-desc (all types): after sanitation the descriptor is written; if one descriptor cannot be sanitized, it is skipped
  • relay-desc (all types): descriptors written one by one skipping problematic ones
  • exitlists: always stored as a single file.

Solution: if a single TorperfResult is not parseable, it should simply be skipped as in the sync-implementation.

Another topic, which relates to comment:7:
Currently, all synced descriptors receive their annotation from the Annotation enum (cf. package o.t.c.conf.Annotation). This happens, because only the raw bytes are taken from a given descriptor and written to the file system.
But, actually the annotation(s) should be taken from the 'getAnnotations' method and be prepended to the raw descriptor bytes, shouldn't they?
The fix for this would also simplify some sync-code. New ticket also for milestone 1.3.0?

comment:11 Changed 2 years ago by iwakeh

Milestone: CollecTor 1.3.0CollecTor 1.4.0

Move to next milestone to limit 1.3.0 to the webstats functionality.

comment:12 in reply to:  10 Changed 2 years ago by iwakeh

Status: needs_revisionneeds_information

The following is still open and needs info:

Common approach for storing:
It turns out that the functionality in OnionperfDownloader was too restrictive.
Here's an overview:

  • bridge-desc (all types): after sanitation the descriptor is written; if one descriptor cannot be sanitized, it is skipped
  • relay-desc (all types): descriptors written one by one skipping problematic ones
  • exitlists: always stored as a single file.

Solution: if a single TorperfResult is not parseable, it should simply be skipped as in the sync-implementation.

The second part:

Another topic, which relates to comment:7:
Currently, all synced descriptors receive their annotation from the Annotation enum (cf. package o.t.c.conf.Annotation). This happens, because only the raw bytes are taken from a given descriptor and written to the file system.
But, actually the annotation(s) should be taken from the 'getAnnotations' method and be prepended to the raw descriptor bytes, shouldn't they?
...

is now a new ticket #23215.

comment:13 Changed 2 years ago by iwakeh

Status: needs_informationneeds_review

Please review the commits on top of this task-21759-2 branch.

The first commit makes the onionperf module implementation adhere to the standard, i.e., use a sources property for determining the operation types (in this commit only Remote). It also makes naming changes such that only 'OnionPerf' is used for camel-case names.

The second commit adds Sync as new source.

The third adds and adapts tests accordingly. The tests need task-22912 for successful completion. Thus, temporarily metrics-lib version is set to 'dev' here.

The final two commits take care of changelog and tiny typos.

comment:14 Changed 2 years ago by iwakeh

Regarding the first part of comment:12:
The discussion should be taken to a ticket summarizing the changes for all modules.
I'll look for an appropriate ticket or create a new one.

The current implementation of sync-runs is not affected by this considering our agreement in comment:8 and comment:9.

comment:15 Changed 23 months ago by karsten

Alright, I finally reviewed your task-21759-2 branch. Here's what I found:

  • The case changes to configuration options (like "OnionperfActivated" -> "OnionPerfActivated") deserve a change log entry under "Major changes", unless we don't care about case there. Because if we do we'll have to inform operators that they'll need to update their configurations.
  • We'll need to put out metrics-lib 2.1.0 before merging at least the last commit(s) in your branch. Should we 1) push that forward to get this branch merged, 2) merge only commits from this branch that don't require m-lib 2.1.0, or 3) put this branch on hold until 2.1.0 is out?

comment:16 in reply to:  15 Changed 23 months ago by iwakeh

Replying to karsten:

Alright, I finally reviewed your task-21759-2 branch. Here's what I found:

  • The case changes to configuration options (like "OnionperfActivated" -> "OnionPerfActivated") deserve a change log entry under "Major changes", unless we don't care about case there. Because if we do we'll have to inform operators that they'll need to update their configurations.

Indeed, that is important and should go in the Major-section! Besides the changelog I will also highlight this in the announcement.

  • We'll need to put out metrics-lib 2.1.0 before merging at least the last commit(s) in your branch. Should we 1) push that forward to get this branch merged, 2) merge only commits from this branch that don't require m-lib 2.1.0, or 3) put this branch on hold until 2.1.0 is out?

Just merge all now. We have releases, thus master can depend on some unreleased metrics-lib dev-version, that's fine. And, other tickets will need metrics-lib-2.1.0, too. We only need to make sure that metrics-lib 2.1.0 is out before we release CollecTor again.

comment:17 Changed 23 months ago by karsten

Hrmmmmm-okay. Can you rebase and squash? Otherwise I'll do it tomorrow. (And let's consider putting out metrics-lib 2.1.0 this week.)

comment:18 Changed 23 months ago by karsten

Actually, no, we can't put out metrics-lib 2.1.0 before we put out releases for all tools depending on metrics-lib, because of #19617. Let's not merge this branch now, or at least not the parts requiring 2.1.0. Or let's put in the exception for metrics-lib 2.1.0 to keep using Java 7 despite the metrics-base update.

comment:19 Changed 23 months ago by iwakeh

Better merge now.

For releasing metrics-lib using java 7 only the following line needs to be added to metrics-lib build.xml
<property name="source-and-target-java-version" value="1.7" />
This will override the setting from metrics-base.

comment:20 Changed 23 months ago by karsten

Okay. I'll write a cheat sheet for release dependencies tomorrow. ;)

Can you rebase and squash, or should I do that tomorrow? (Will not merge before tomorrow anyway.)

comment:21 Changed 23 months ago by iwakeh

Please find the rebased branch here.

comment:22 Changed 23 months ago by karsten

Status: needs_reviewmerge_ready

Hmm, no, that's not what I meant by rebase and squash. The result may be similar, but I find the Git history much easier to use when re-applying commits to master.

I also found a flaw in our merge/release order: We said we'd need to release metrics-lib before releasing CollecTor, so that we don't have to depend on a metrics-lib -dev version in a released CollecTor. But we also need to release CollecTor before metrics-lib, or we'd get into trouble with Java 8. Let's not do all this, it's a major PITA waiting to happen. BTDT.

I just put <property name="source-and-target-java-version" value="1.7" /> into metrics-lib's build.xml, so that we can put out new metrics-lib versions whenever we need. And I rebased and squashed your earlier task-21759 branch with the goal of merging it as soon as metrics-lib 2.1.0 is out. Now we can still put out new CollecTor versions whenever we need, even if that need arises before metrics-lib 2.1.0 comes out.

The result is task-21759-3 in my repository.

Changing to merge_ready as soon as metrics-lib 2.1.0 is out.

comment:23 in reply to:  14 Changed 23 months ago by iwakeh

Replying to iwakeh:

Regarding the first part of comment:12:
The discussion should be taken to a ticket summarizing the changes for all modules.
I'll look for an appropriate ticket or create a new one.

The new ticket is #23421.

comment:24 Changed 22 months ago by iwakeh

Milestone: CollecTor 1.4.0CollecTor 1.5.0

Move to next milestone in order to accommodate interim release.

comment:25 Changed 22 months ago by karsten

Just to be sure, this is ready to be merged now, right? (I only looked very briefly and would check more thoroughly tomorrow, but maybe there's something that happened in the past two weeks that would lead to not merging as-is that I might overlook tomorrow?)

comment:26 Changed 22 months ago by iwakeh

This is all fine now that the new metrics-lib is out (see your comment:15, the comments after that are about release scheduling and obsolete now). The current branch is referenced in comment:22.

comment:27 Changed 22 months ago by karsten

Resolution: implemented
Status: merge_readyclosed

Rebased and merged to master. Closing. Thanks!

Note: See TracTickets for help on using tickets.