gitweb.tpo performance problems

added component::internal services/service - git owner::anarcat priority::immediate resolution::fixed severity::major status::closed type::defect labels

today we have seen the limits of the cache configuration as a crawler trashed through the cache by hitting the random cgit pages it was coming from tor so it was not useful to block it by IP either.

instead I pushed down the MaxRequestWorkers setting from 1500 to 75. that seems to have calmed things down. the upstream default is 150, but that's still too much for cgit, as 155 cgit processes blows through 8GB of memory quite easily.

in my notes, I had:

other possible fix: fcgiwrap or rate-limiting, wait for cache to fill up first

i'm thinking fcgiwrap would be an interesting approach as it would limit the number of cgit processes separately from apache.

right now puppet is on hold on the server to respect the MaxRequestWorkers change. it's a puppet-wide setting so i'm hesitant in changing it there.

i tried another, simpler tack, and set RLimitNPROC to 75 150, which should limits the number of CGI processes fired. so i reset the MaxRequestWorkers back to 1500, we'll see how this goes.

thanks to sangy on irc for the suggestion!

new trick did not work, reverting to maxclients

We now set both maxclients as well as resource limits via puppet.

We may want to play with the numbers: for now puppet does MaxRequestWorkers 150 for hosts where we say we want fewer than 1500.

The resource limits currently are tpo-wide, but we can override them for each node as we see fit.

RLimitMEM changed from 256M to 512M because translation.git could not be cloned. now it can.

for future reference, if the MaxRequestWorkers hack doesn't cut it, we could simulate fastcgi with uwsgi or fcgiwrap and decouple the apache and cgit threads, e.g.:

https://wiki.archlinux.org/index.php/Cgit#Apache

i revert the MaxRequestWorkers down to 75, because during my tests last week, 150 would still crash vineale.

i refactored the puppet stuff to make all those parameters selectable per host.

i think this ticket may be closed in (say) a week if the problem doesn't occur again.

looks like we're happy again (for now), but i suspect we'll go through this misery again soon. as long as the git repos keep growing, we'll have to face this problem.

next step is probably look at git-specific optimizations like the commit graph stuff...

Trac:
Resolution: N/A to fixed
Status: assigned to closed

closed

gitweb.tpo performance problems

Child items ...

Activity