#32133 closed defect (fixed)

gitweb.tpo performance problems

Reported by: anarcat Owned by: anarcat
Priority: Immediate Milestone:
Component: Internal Services/Service - git Version:
Severity: Major Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

as mentioned in #29336, gitweb often falls over and dies because it runs out of memory under the load of cgit.cgi processes. it easily goes through its 8GB of memory and explodes.

i first addressed this problem in early march by enabling the cgit cache, but the problem still occurs when a crawler (or attacker?) trashes through he cache.

Child Tickets

Change History (9)

comment:1 Changed 13 months ago by anarcat

today we have seen the limits of the cache configuration as a crawler trashed through the cache by hitting the random cgit pages it was coming from tor so it was not useful to block it by IP either.

instead I pushed down the MaxRequestWorkers setting from 1500 to 75. that seems to have calmed things down. the upstream default is 150, but that's still too much for cgit, as 155 cgit processes blows through 8GB of memory quite easily.

in my notes, I had:

other possible fix: fcgiwrap or rate-limiting, wait for cache to fill up first

i'm thinking fcgiwrap would be an interesting approach as it would limit the number of cgit processes separately from apache.

right now puppet is on hold on the server to respect the MaxRequestWorkers change. it's a puppet-wide setting so i'm hesitant in changing it there.

comment:2 Changed 13 months ago by anarcat

i tried another, simpler tack, and set RLimitNPROC to 75 150, which should limits the number of CGI processes fired. so i reset the MaxRequestWorkers back to 1500, we'll see how this goes.

thanks to sangy on irc for the suggestion!

comment:3 Changed 13 months ago by anarcat

new trick did not work, reverting to maxclients

comment:4 Changed 13 months ago by weasel

We now set both maxclients as well as resource limits via puppet.

We may want to play with the numbers: for now puppet does MaxRequestWorkers 150 for hosts where we say we want fewer than 1500.

The resource limits currently are tpo-wide, but we can override them for each node as we see fit.

comment:5 Changed 13 months ago by weasel

RLimitMEM changed from 256M to 512M because translation.git could not be cloned. now it can.

comment:6 Changed 13 months ago by anarcat

for future reference, if the MaxRequestWorkers hack doesn't cut it, we could simulate fastcgi with uwsgi or fcgiwrap and decouple the apache and cgit threads, e.g.:

https://wiki.archlinux.org/index.php/Cgit#Apache

comment:7 Changed 13 months ago by anarcat

i revert the MaxRequestWorkers down to 75, because during my tests last week, 150 would still crash vineale.

comment:8 Changed 13 months ago by anarcat

i refactored the puppet stuff to make all those parameters selectable per host.

i think this ticket may be closed in (say) a week if the problem doesn't occur again.

comment:9 Changed 11 months ago by anarcat

Resolution: fixed
Status: assignedclosed

looks like we're happy again (for now), but i suspect we'll go through this misery again soon. as long as the git repos keep growing, we'll have to face this problem.

next step is probably look at git-specific optimizations like the commit graph stuff...

Note: See TracTickets for help on using tickets.