Opened 4 weeks ago

Last modified 3 weeks ago

#32133 assigned defect

gitweb.tpo performance problems

Reported by: anarcat Owned by: anarcat
Priority: Immediate Milestone:
Component: Internal Services/Service - git Version:
Severity: Major Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


as mentioned in #29336, gitweb often falls over and dies because it runs out of memory under the load of cgit.cgi processes. it easily goes through its 8GB of memory and explodes.

i first addressed this problem in early march by enabling the cgit cache, but the problem still occurs when a crawler (or attacker?) trashes through he cache.

Child Tickets

Change History (8)

comment:1 Changed 4 weeks ago by anarcat

today we have seen the limits of the cache configuration as a crawler trashed through the cache by hitting the random cgit pages it was coming from tor so it was not useful to block it by IP either.

instead I pushed down the MaxRequestWorkers setting from 1500 to 75. that seems to have calmed things down. the upstream default is 150, but that's still too much for cgit, as 155 cgit processes blows through 8GB of memory quite easily.

in my notes, I had:

other possible fix: fcgiwrap or rate-limiting, wait for cache to fill up first

i'm thinking fcgiwrap would be an interesting approach as it would limit the number of cgit processes separately from apache.

right now puppet is on hold on the server to respect the MaxRequestWorkers change. it's a puppet-wide setting so i'm hesitant in changing it there.

comment:2 Changed 4 weeks ago by anarcat

i tried another, simpler tack, and set RLimitNPROC to 75 150, which should limits the number of CGI processes fired. so i reset the MaxRequestWorkers back to 1500, we'll see how this goes.

thanks to sangy on irc for the suggestion!

comment:3 Changed 4 weeks ago by anarcat

new trick did not work, reverting to maxclients

comment:4 Changed 3 weeks ago by weasel

We now set both maxclients as well as resource limits via puppet.

We may want to play with the numbers: for now puppet does MaxRequestWorkers 150 for hosts where we say we want fewer than 1500.

The resource limits currently are tpo-wide, but we can override them for each node as we see fit.

comment:5 Changed 3 weeks ago by weasel

RLimitMEM changed from 256M to 512M because translation.git could not be cloned. now it can.

comment:6 Changed 3 weeks ago by anarcat

for future reference, if the MaxRequestWorkers hack doesn't cut it, we could simulate fastcgi with uwsgi or fcgiwrap and decouple the apache and cgit threads, e.g.:

comment:7 Changed 3 weeks ago by anarcat

i revert the MaxRequestWorkers down to 75, because during my tests last week, 150 would still crash vineale.

comment:8 Changed 3 weeks ago by anarcat

i refactored the puppet stuff to make all those parameters selectable per host.

i think this ticket may be closed in (say) a week if the problem doesn't occur again.

Note: See TracTickets for help on using tickets.