Opened 8 years ago

Closed 7 years ago

#4859 closed task (fixed)

Sanitize and process logs for more torproject.org domains

Reported by: runa Owned by: runa
Priority: Medium Milestone:
Component: Metrics/Analysis Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We have AWStats and Webalizer running on https://webstats.torproject.org/, and we have processed sanitized logs for www.torproject.org (all of 2010, nothing for 2011) and metrics.torproject.org (some of 2010, most of 2011).

We should sanitize and process logs (automatically) for more torproject.org domains, such as: www, blog, trac, gitweb, and bridges.

We can use this ticket to figure out what to set up first, who should do what on the web server side, what's needed to import web logs for a new domain into AWStats and Webalizer etc.

Child Tickets

Change History (10)

comment:1 Changed 8 years ago by runa

Here's what needs to happen with AWStats and Webalizer for each new host we want to add:

  • Create a new config file for AWStats and Webalizer
  • Add the hostname to logimport.sh and logarchive.sh
  • Create a directory for the host in /srv/webstats.torproject.org/htdocs/webalizer/
  • Update https://webstats.torproject.org/

The logarchive script will archive all the logs found in the out/ directory. Before syncing logs for a new host, let me know so that I can update the scripts and do all the steps listed above.

comment:2 in reply to:  description Changed 8 years ago by arma

Replying to runa:

We should sanitize and process logs (automatically) for more torproject.org domains, such as: www, blog, trac, gitweb, and bridges.

bridges.tp.o doesn't log (most) requests, so it isn't an easy candidate for these stats.

comment:3 Changed 8 years ago by karsten

Owner: changed from karsten to runa
Status: newassigned

Re-assigning to runa who started looking into webstats again yesterday.

comment:4 Changed 8 years ago by runa

Cc: karsten added

Will the sanitization process you put together work for all logs on all hosts we want to include?

comment:5 Changed 8 years ago by karsten

Yes, it should.

comment:6 Changed 8 years ago by runa

Cc: karsten removed
Owner: changed from runa to karsten

Seems like the process does not work for onionoo logs, so I'm giving this ticket back to Karsten. I will take care of creating config files and including the logs in the import/archive process once the sanitization part works.

comment:7 Changed 7 years ago by karsten

Owner: changed from karsten to runa

I can't think of an easy way to make the sanitizing process work for Onionoo's logs. The problem is that Onionoo's URLs contain parameters in the path part which we're not sanitizing. For example, a typical Onionoo URL is /details/lookup/F204[...]. We could define special sanitizing rules for Onionoo logs to remove the F204[...] part, but that's ugly. I'd say let's leave out Onionoo's logs from webstats entirely.

Giving back the ticket, because the sanitizing process should work for common Apache web logs which it was designed for.

comment:8 Changed 7 years ago by runa

Ok. How do we include logs from a new host, such as www.torproject.org?

comment:9 Changed 7 years ago by karsten

See the README. Please update as needed.

comment:10 Changed 7 years ago by runa

Resolution: fixed
Status: assignedclosed

Ok, I created #6196 to get the logs copied over to the webstats host.

Note: See TracTickets for help on using tickets.