Opened 3 years ago

Closed 2 years ago

#20380 closed enhancement (implemented)

Expand INSTALL.md to a more complete operator's guide

Reported by: karsten Owned by:
Priority: Medium Milestone: CollecTor 1.1.0
Component: Metrics/CollecTor Version:
Severity: Normal Keywords:
Cc: iwakeh Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

In the last week or two we spent some time on writing more complete operator's guides for the various metrics services. This is for our October milestone: "Provide user-friendly documentation that empowers users to independently operate CollecTor instances."

I'll attach the latest operator's guide for CollecTor in a minute. The funny whitespace comes from the document being a LaTeX table with commented-out columns for the other metrics services.

I'd like to put the text from that PDF into the current INSTALL.md, maybe after reformatting some things in Markdown.

I'd also like to remove the current README.md, because there should be just one document telling operators how to use CollecTor, and we can probably expect most operators to know how to use gpg.

Thoughts?

Child Tickets

Attachments (1)

operating-2016-10-17.pdf (110.9 KB) - added by karsten 3 years ago.

Download all attachments as: .zip

Change History (27)

Changed 3 years ago by karsten

Attachment: operating-2016-10-17.pdf added

comment:1 Changed 3 years ago by karsten

Status: newneeds_review

Attached.

comment:2 in reply to:  description Changed 3 years ago by iwakeh

Replying to karsten:

...
I'd also like to remove the current README.md, because there should be just one document telling operators how to use CollecTor, and we can probably expect most operators to know how to use gpg.

Thoughts?

Maybe, there should be a new section at the beginning with an audience description and the skill set expected?

And, yes, one expected skill is knowing how to use gpg.

comment:3 Changed 3 years ago by karsten

Good idea! New first two paragraphs:

"Welcome to CollecTor, your friendly data-collecting service in the Tor network. CollecTor fetches data from various nodes and services in the public Tor network and makes it available to the world. This data includes relay descriptors from the directory authorities, sanitized bridge descriptors from the bridge authority, and other data about the Tor network.

This document describes how to set up your very own CollecTor instance. It was written with an audience in mind that has at least some experience with running services and is comfortable with the command line. It’s not required that you know how to read or even write Java code, though."

Should I go ahead and create a branch for this, so that we can fine-tune the text? Or would you rather provide more feedback based on the PDF?

comment:4 Changed 3 years ago by iwakeh

This sound good.

Yes, a branch and using md/txt will make work on the doc easier.

comment:6 Changed 2 years ago by iwakeh

Milestone: CollecTor 1.1.0

comment:7 Changed 2 years ago by karsten

Mind taking another look at the branch above, so that I can merge to master and then ask folks to try out these instructions?

comment:8 Changed 2 years ago by iwakeh

Thanks for the start here!

Please see some suggestions here based on your file.

The idea behind my changes is that I think the service shouldn't be run from the unpacked tar
folder. The tar contains a development environment, so the jar would disappear after 'ant clean' or changed etc.
The runtime directory should only contain files that are really necessary for the application or which were created by the application.
Hope this doesn't make the description too complicated.

I also would like have even less description of tools from the OS, because such things should be decided by the operator.

comment:9 in reply to:  8 ; Changed 2 years ago by karsten

Replying to iwakeh:

Thanks for the start here!

Thanks for looking! :)

Please see some suggestions here based on your file.

A few thoughts:

  • When you say that closer monitoring will be needed when disk space drops below a given number, do you mean 200G or 20G or a different number?
  • We shouldn't add new section headers easily. The chosen section headers and even paragraphs in this document (will) have equivalents in the other operator's guides for other metrics tools. If we want to add new sections, we'll also have to add those sections to the other manuals. The current sections are:
$ grep "^#" INSTALL.md
# CollecTor Operator's Guide
## Setting up the host
## Setting up the service
## Maintaining the service
  • (continued) What other sections or even subsections should we include, and what instructions would go into those vs. the existing sections?

The idea behind my changes is that I think the service shouldn't be run from the unpacked tar
folder. The tar contains a development environment, so the jar would disappear after 'ant clean' or changed etc.
The runtime directory should only contain files that are really necessary for the application or which were created by the application.
Hope this doesn't make the description too complicated.

Yes, makes sense, let's change that. There are still a few paths left where we refer to files in collector-<version>/ and where we should tell the user to copy those files to the working directory and run them from there. I can update those places.

I also would like have even less description of tools from the OS, because such things should be decided by the operator.

Which parts would that include? The crontab, @reboot, screen, etc.? Can you make a list?

Again, thanks for the review!

comment:10 in reply to:  9 Changed 2 years ago by iwakeh

Replying to karsten:

...
A few thoughts:

  • When you say that closer monitoring will be needed when disk space drops below a given number, do you mean 200G or 20G or a different number?

I was referring to the disk space available when starting, i.e. very close to 150G and logging to the same disk requires more attention than a terabyte setup. Hmm, but if that is confusing just discard it.

  • We shouldn't add new section headers easily. The chosen section headers and even paragraphs in this document (will) have equivalents in the other operator's guides for other metrics tools. If we want to add new sections, we'll also have to add those sections to the other manuals. The current sections are:
$ grep "^#" INSTALL.md
# CollecTor Operator's Guide
## Setting up the host
## Setting up the service
## Maintaining the service

It's important to have a consistent structure, but it would be helpful for readers to have sub-headings, which are application dependent. Scrolling through a document with only generic headings when looking for particular information takes longer (of course, there is a search).
So, maybe keep the top level consistent and allow for application dependent headlines below?

  • (continued) What other sections or even subsections should we include, and what instructions would go into those vs. the existing sections?

I see two more sections.

  • 'Planning the Service' contrasts those sections giving a to-do list. People running instances will have different needs that can be better covered this way.
  • and even more important, a section 'Bootstrapping' or similar. What data to download before a first run etc. Again this is not a to-do list as it depends what data should be processed.

The idea behind my changes is that I think the service shouldn't be run from the unpacked tar
folder. The tar contains a development environment, so the jar would disappear after 'ant clean' or changed etc.
The runtime directory should only contain files that are really necessary for the application or which were created by the application.
Hope this doesn't make the description too complicated.

Yes, makes sense, let's change that. There are still a few paths left where we refer to files in collector-<version>/ and where we should tell the user to copy those files to the working directory and run them from there. I can update those places.

I also would like have even less description of tools from the OS, because such things should be decided by the operator.

Which parts would that include? The crontab, @reboot, screen, etc.? Can you make a list?

When we avoid mentioning any such tools and methods, we avoid getting out of date and stay platform independent. People operating servers have their favorite tools for and know what to do when told

  • run this script every three days
  • provide an http server for serving data and files in folders X, Y, Z.
  • for continuous operation ensure start-up on reboot and
  • monitoring of logs as well as running service is important

etc.

CollecTor does not depend on apache or crontab only the services provided by them. Even the suggested install of openjdk could be left out. Also apt-get. Attempt of a list:

  • apt-get
  • apache2
  • crontab
  • gpg
  • openjdk, only Java 7
  • screen
  • ...

Another thing would be to use <OutPath>/recent and similar instead of the default choices provided. So, it is clear which option is referred to.

The backup recommendation I would also leave out. It depends on the setup and the kind of data collected. Or, move it to 'Planning the service'?

comment:11 Changed 2 years ago by karsten

Thanks for the detailed feedback! Please take a look at my updated task-20380 branch for changes discussed below.

Replying to iwakeh:

Replying to karsten:

...
A few thoughts:

  • When you say that closer monitoring will be needed when disk space drops below a given number, do you mean 200G or 20G or a different number?

I was referring to the disk space available when starting, i.e. very close to 150G and logging to the same disk requires more attention than a terabyte setup. Hmm, but if that is confusing just discard it.

Ah, now I understand. Hmm, I think I'd rather pick a different number than 150G than going into more detail there. After all, a CollecTor instance that doesn't download and serve the full tarball archive will need a lot less than 150G, and an instance that does serve tarballs might run out of disk space in a year or two even with 150G. Let's just change it to 200G to have some more room to breathe.

  • We shouldn't add new section headers easily. The chosen section headers and even paragraphs in this document (will) have equivalents in the other operator's guides for other metrics tools. If we want to add new sections, we'll also have to add those sections to the other manuals. The current sections are:
$ grep "^#" INSTALL.md
# CollecTor Operator's Guide
## Setting up the host
## Setting up the service
## Maintaining the service

It's important to have a consistent structure, but it would be helpful for readers to have sub-headings, which are application dependent. Scrolling through a document with only generic headings when looking for particular information takes longer (of course, there is a search).
So, maybe keep the top level consistent and allow for application dependent headlines below?

I admit that there could be more sections, though I have not yet given up on keeping them independent of the application. I added a few more section headers.

  • (continued) What other sections or even subsections should we include, and what instructions would go into those vs. the existing sections?

I see two more sections.

  • 'Planning the Service' contrasts those sections giving a to-do list. People running instances will have different needs that can be better covered this way.
  • and even more important, a section 'Bootstrapping' or similar. What data to download before a first run etc. Again this is not a to-do list as it depends what data should be processed.

Added the first but not yet the second. Let me know if anything is still missing.

The idea behind my changes is that I think the service shouldn't be run from the unpacked tar
folder. The tar contains a development environment, so the jar would disappear after 'ant clean' or changed etc.
The runtime directory should only contain files that are really necessary for the application or which were created by the application.
Hope this doesn't make the description too complicated.

Yes, makes sense, let's change that. There are still a few paths left where we refer to files in collector-<version>/ and where we should tell the user to copy those files to the working directory and run them from there. I can update those places.

I also would like have even less description of tools from the OS, because such things should be decided by the operator.

Which parts would that include? The crontab, @reboot, screen, etc.? Can you make a list?

When we avoid mentioning any such tools and methods, we avoid getting out of date and stay platform independent. People operating servers have their favorite tools for and know what to do when told

  • run this script every three days
  • provide an http server for serving data and files in folders X, Y, Z.
  • for continuous operation ensure start-up on reboot and
  • monitoring of logs as well as running service is important

etc.

CollecTor does not depend on apache or crontab only the services provided by them. Even the suggested install of openjdk could be left out. Also apt-get. Attempt of a list:

  • apt-get
  • apache2
  • crontab
  • gpg
  • openjdk, only Java 7
  • screen
  • ...

Agreed with almost all changes mentioned here, except for Apache. I believe that CollecTor depends on Apache to put together its header.html, footer.html, and to create directory listings. I haven't tried out other HTTP servers, but unless somebody has, I don't want to recommend any HTTP server if what we really need is an Apache. (Note that this is different for Metrics, Onionoo, and ExoneraTor which can all work with any HTTP server that can forward requests to Tomcat/Jetty.)

Another thing would be to use <OutPath>/recent and similar instead of the default choices provided. So, it is clear which option is referred to.

Good idea.

The backup recommendation I would also leave out. It depends on the setup and the kind of data collected. Or, move it to 'Planning the service'?

I'm not sure. This seems like a question that new operators might have, though maybe not during the setup process when they're not yet certain that they will succeed. That recommendation would probably benefit from a section header, so that people who don't care can skip it more easily. Changed.

Please take another look. Thanks!

comment:12 Changed 2 years ago by iwakeh

This looks good! The new sections make it a lot easier to read (also during review :-)

A few small things:

  • line 180: 'logback.xml' has to be available in the java classpath the name shouldn't change. Maybe also mention that we use 'slf4j-api' throughout. Thus, your free to use the logging framework of your choice, i.e. any other implementation of slf4j could be chosen and provided to CollecTor.
  • line 188: Missing option -DLOGBASE=<your-log-path>, and maybe add 'the -Xmx option is based on 4g RAM here, if your machine has more feel free to adapt this as needed'
  • line 225: This is still based on the assumption everything lives below the workdir, which doesn't need to be the case. Maybe, change along the line 'A backup of your CollecTor instance should include the <ArchivePath> and your configuration and other changes, which makes it possible to set-up this instance again. A backup for short term recovery would also include the more volatile data in <StatsPath>, <RecentPath, and <OutputPath>.' (This suggestion can surely be phrased better.)

comment:13 Changed 2 years ago by iwakeh

Addendum:

The shell script can also be anywhere as long a the paths are set correctly.

comment:14 in reply to:  13 ; Changed 2 years ago by karsten

Replying to iwakeh:

This looks good! The new sections make it a lot easier to read (also during review :-)

Glad to hear!

A few small things:

Or let's just create that directory in the script if it doesn't exist. No reason to do this for ARCHIVEDIR and not for TARBALLTARGETDIR.

Fixed.

Fixed.

  • line 180: 'logback.xml' has to be available in the java classpath the name shouldn't change.

Hmm, when I last tested this I only succeeded when renaming the file, but now that doesn't work anymore. What I did was copy src/main/resources/logback.xml to the working directory and edit the logfile-base property to "${LOGBASE}/colector-", yet that change gets disregarded. What am I doing wrong?

Maybe also mention that we use 'slf4j-api' throughout. Thus, your free to use the logging framework of your choice, i.e. any other implementation of slf4j could be chosen and provided to CollecTor.

Do you think the average operator will care? They'd have to provide a .jar file with the logging implementation. Is this our audience?

  • line 188: Missing option -DLOGBASE=<your-log-path>,

In fact, I left that out on purpose, because it was disregarded in my earlier testing. Same issue as above, what am I doing wrong?

and maybe add 'the -Xmx option is based on 4g RAM here, if your machine has more feel free to adapt this as needed'

Added a line for that.

  • line 225: This is still based on the assumption everything lives below the workdir, which doesn't need to be the case. Maybe, change along the line 'A backup of your CollecTor instance should include the <ArchivePath> and your configuration and other changes, which makes it possible to set-up this instance again. A backup for short term recovery would also include the more volatile data in <StatsPath>, <RecentPath, and <OutputPath>.' (This suggestion can surely be phrased better.)

Changed.

Replying to iwakeh:

Addendum:

The shell script can also be anywhere as long a the paths are set correctly.

Changed.

Please find my branch task-20380-2 with some changes. I guess the only part that needs fixing before we merge is the logging section, plus the things that I overlooked.

comment:15 Changed 2 years ago by iwakeh

Getting close:

The web application directory in line 153 doesn'd need to be a subdirectory of the working directory. The webapp-files of collector need to be copied to an html-document directory apache serves; and <ArchivePath> and <RecentPath> could be linked there.

Logging:

What exact command line did you use?
Logback looks for a file named 'logback.xml' (or its groovy analog) in the classpath, i.e. if your edited 'logback.xml' is in the local path you need to set -cp .:collector-<version>.jar.
An easier way to debug logback: set <configuration debug="true"> at the beginning of the logback.xml. That will print lots of logging debug info to std-out.

comment:16 in reply to:  14 Changed 2 years ago by iwakeh

Replying to karsten:

...

Maybe also mention that we use 'slf4j-api' throughout. Thus, your free to use the logging framework of your choice, i.e. any other implementation of slf4j could be chosen and provided to CollecTor.

Do you think the average operator will care? They'd have to provide a .jar file with the logging implementation. Is this our audience?

Could be that some operators happen to be more familiar with something else and prefer using that. Mentioning that we use slf4j and don't insist on logback would make things easier for them and others shouldn't be confused by that. But, if you think it causes confusion just disregard this.

comment:17 Changed 2 years ago by iwakeh

Please also add some note about the following to the 'common issues' section.

ERROR org.torproject.descriptor.index.DescriptorIndexCollector Cannot fetch remote file: 2016-10-18-17-05-00-server-descriptors
java.io.FileNotFoundException: https://collector.torproject.org/recent/relay-descriptors/server-descriptors/2016-10-18-17-05-00-server-descriptors

(cf. #20430 comment:2)

comment:18 Changed 2 years ago by karsten

Okay, I fixed most issues in my updated branch, except for the logging part. The -cp .:collector-<version>.jar part helped a bit, but now it complains about finding two logback.xml files in the classpath. Blargh.

Can you provide a command line that works for you? Or can you provide a patch for that section? Thanks!

comment:19 Changed 2 years ago by karsten

Hmm, here's the error message I get:

12:05:54,595WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs multiple times on the classpath.

A quick search for that message shows that we probably shouldn't be shipping a logback.xml in the .jar. Maybe this requires more time after the release.

How about we take out the section about "Changing logging options" for now to unblock the release?

comment:20 Changed 2 years ago by iwakeh

Well, that's just a warning in debug mode, but it does choose the right file. And, I would expect operators to ask very soon about logging config if we don't say anything about it.

What about adding the urls for documentation?

### Changing logging options

Internally, CollecTor uses the Simple Logging Facade for Java (SLF4J) and ships with
the Logback implementation for SLF4J. So, if you prefer another logging framework than 
Logback, you could provide and use that instead.

In addition, CollecTor provides a default logging configuration (in `collector-<version>/src/main/resources/logback.xml`), which runs on info level and creates one logfile per module and a common log. It also expects a path where the logs should be written to, which is set on the java command line by adding `-DLOGBASE=<your-log-dir>`.

If you want to adjust some logging parameters just copy the default configuration from
`collector-<version>/src/main/resources/logback.xml` to your working directory,
edit your copy, and execute the .jar file as follows:

```java -Xmx2g -DLOGBASE=<your-log-dir> -jar -cp .:collector-<version>.jar
org.torproject.collector.Main```

For more detailed information or if you have different logging needs, please refer to 
the Logback documentation:    http://logback.qos.ch/
and for switching to a different framework to the SFL4J site:  http://www.slf4j.org/

comment:21 Changed 2 years ago by karsten

Pushed some changes along those lines to my branch. Now preparing the release.

comment:22 Changed 2 years ago by iwakeh

Perfect :-)

comment:23 Changed 2 years ago by karsten

Please try out the pre-release tarball and let me know if that's good to go.

comment:24 Changed 2 years ago by iwakeh

Looks fine! The executable jar runs, INSTALL.md and changelog are fine, all tests (incl. the usually ignored) pass.

I'll prepare the announcement and make the changes to the relevant wiki pages.

comment:25 Changed 2 years ago by karsten

Thanks for checking! Release tarball (same file as above) and signature as now available here: https://dist.torproject.org/collector/1.1.0/

Please make the announcement and changes to tickets and wiki pages. Thanks!

Congratulations, this is a big step forward!

comment:26 Changed 2 years ago by iwakeh

Resolution: implemented
Status: needs_reviewclosed

Yes, it is :-)

Released and announced.

Closing.

Thanks!

Note: See TracTickets for help on using tickets.