Stop making Apache Suck!

Hello, my name is Jeff and I miss-configure Apache and mod_wsgi.

Hello Jeff

Ok, so I am embarrassed to say that I had miss-configured Apache for far too long. On far-too many servers. Eventually leaving me with egg on my face. I learned from ‘Making Apache suck less for hosting python web applications‘ by Graham Dumpleton. This is a far-too-common problem.

Part of the reason these misconfiguration issues crop up is because of the apache delivery process, specifically the default apache configuration. Apache is delivered to many flavors of linux through various package management mechanisms. For example:

  • RPM packages for openSUSE, Fedora, rhel, CentOS, Scientific, Mandriva, Mageia and many others.
  • Deb packages for Debian, Ubuntu, Mint, KNOPPIX, aptosid.
  • source based distros: emerge, lunar, sorcery: Gentoo, Lunar, Sorcery
  • source code deliveries.


All of these deliver apache with ‘varying’ and sometimes ‘no’ default configuration. To make matters worse, most deliveries set Apache in a MPM Prefork mode of operation. This will feed into high memory usage as the python application stack will not be shared among Apache child processes.

Graham ‘the author of mod_wsgi for apache‘ Dumpleton does a nice job of pointing out pot-holes for setting up a python mod_wsgi based apache stack.

do not confuse apache mod_wsgi with the confusingly named mod_wsgi for nginx

If you have not watched this talk, I highly recommend taking the half hour to improve your working knowledge of apache prefork and worker modes, and how they will effect your ram and cpu of your vm or computer.

The slides for this talk are on lanyard.com

The blog post by Vivek @ HackerEarth:

is what I used as a guide for re-configuring my apache/mod_wsgi servers.

Read In the end I did the following steps:

  1. Use Apache MPM_worker
  2. Removed unnecessary Apache modules.
  3. Turn KeepAlive Off
  4. Evaluate and set the MaxMemFree setting.
  5. Consider enabling WSGIRestrictEmbedded
  6. Use daemon mode of mod_wsgi
  7. Configure the mpm_worker module.
  8. I did add mod_status.so and mod_info.so. Should consider not having them in a production setup.
  9. Configure mod_info and mod_status

1: Use Apache MPM_worker

How can you tell if Apache is built with mpm_worker enabled?

# Ubuntu
$ /usr/sbin/apache2 -V |grep 'Server MPM'| tr -s ' ' | cut -d ' ' -f 3
 
# CentOS
$ /usr/sbin/httpd -V |grep 'Server MPM'| tr -s ' ' | cut -d ' ' -f 3 

2: Remove unnecessary Apache modules

 This is what I was left with. Graham actually paired it down a bit further than this, bottom line only load what you actually use.

LoadModule wsgi_module modules/mod_wsgi.so
LoadModule authz_host_module modules/mod_authz_host.so
LoadModule deflate_module modules/mod_deflate.so
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule ssl_module modules/mod_ssl.so
LoadModule mime_module modules/mod_mime.so
LoadModule vhost_alias_module modules/mod_vhost_alias.so
LoadModule negotiation_module modules/mod_negotiation.so
LoadModule alias_module modules/mod_alias.so
LoadModule rewrite_module modules/mod_rewrite.so
 
# Optional for debug
LoadModule status_module modules/mod_status.so
LoadModule info_module modules/mod_info.so
 
# mod_log_config and mod_logio are CentOS module's
# may not be relevant in your flavor of linux ( i.e: not used in Ubuntu ).
LoadModule log_config_module modules/mod_log_config.so
LoadModule logio_module modules/mod_logio.so

Watching memory, before and after removing unused modules?

$ watch -n 1 "echo -n 'Apache Processes: ' && ps -C httpd --no-headers | wc -l && free -m"

3/4/5: Turn Off KeepAlive / Evaluate MaxMemFree / Restrict Embeded

KeepAlive Off
 
<IfModule mpm_worker_module>
   #... snip...
   MaxMemFree 256
</IfModule>
 
# Keep's python interpreter from being initialized in the apache server worker processes
WSGIRestrictEmbedded On 

6: Use daemon mode of mod_wsgi

WSGIDaemonProcess django processes=4 threads=15 display-name=%{GROUP}
WSGIProcessGroup django 

7: Configure MPM worker module

# Rule's from Graham [ http://lanyrd.com/2013/pycon/scdyzk/#link-qhyk ]
# 1: ensure that MaxSpareThreads is >= MinSpareThreads + ThreadsPerChild
# 2: suggest that MinSpareThreads and MaxSpareThreads be set as multiples of ThreadsPerChild
# 3: dont set StartServers to be less than MinSpareThreads/ThreadsPerChild
# result-in: apache immediatly starts processes
# 4: dont set StartServers to be greater than MaxSpareThreads/ThreadsPerChild
# result-in: apache immediatly kills processes
#
# Note: Altho, this is all expressed in terms of threads. Apache does not scale @ the thread level.
# The number of threads per process is static.
# When scaling its the same as prefork, a process will either be created or killed.
# The decision tho is based on available threads instead.
 
# it scales on the process
<IfModule mpm_worker_module>
    StartServers 10
    MinSpareThreads 150
    MaxSpareThreads 180
    ThreadsPerChild 15
    MaxClients 240
    MaxRequestsPerChild 0
    MaxMemFree 256
</IfModule>
# This config results in:
# 1: apache starting 10 processes
# 2: apache never having less than 10 processes: ( 150/15 )
# 3: apache pruning processes down to 12 processes: ( 180/15 )
# 4: apache putting a process ceiling @ 16 processes ( 240/15 )
# 5: in addition to these processes we will have 10 wsgi daemon processes
 
# -----------------------------------------
# Optional smaller-footprint ( single vm's config)
#<IfModule mpm_worker_module>
#     StartServers 5
#     MinSpareThreads 75
#     MaxSpareThreads 90
#     ThreadsPerChild 15
#     MaxClients 120
#     MaxRequestsPerChild 0
#     MaxMemFree 256
#</IfModule>
# This config results in:
# 1: apache starting 5 processes
# 2: apache never having less than 5 processes: ( 75/15 )
# 3: apache pruning processes down to 6 processes: ( 90/15 )
# 4: apache putting a process ceiling @ 8 processes ( 120/15 )
# 5: in addition to these processes we will have 10 wsgi daemon processes 

8/9: mod_status and mod_info

#ExtendedStatus on
<Location /server-status>
   SetHandler server-status
   Order Deny,Allow
   #Allow from all
   Allow from 10.0.0.0/255.255.255.0
   Allow from 192.168.1.0/255.255.255.0
   Allow from 172.16.0.0/255.255.0.0
</Location>
 
<Location /server-info>
   SetHandler server-info
   Order Deny,Allow
   #Allow from all
   Allow from 10.0.0.0/255.255.255.0
   Allow from 192.168.1.0/255.255.255.0
   Allow from 172.16.0.0/255.255.0.0
</Location> 

2008.04.27.jack_big_smile

That’s all folks. Now enjoy your freshly tuned web application. I should mention that I cover a related topic titled The web request is a scary place Where I go into how to build a asynchronous infrastructure using AMQP.

Reference:

  • Pingback: The web request is a scary place | Jeffield

  • gosukiwi

    I wonder how this tuned-apache compares with nginx. Still thing the latter would use less memory and archieve similar performance.

    • Jeff Sheffield

      Here are a couple of decent articles covering your question. http://bit.ly/1HUVMyF This one is interesting because it covers the market share of Apache. ( Apache has been around since Feb 1995, that’s right 20 years ) So it is proven, solid and plug-able. Here is a decent article covering the performance differences http://bit.ly/1hT31wH You have to take that with a grain of salt tho… if you are running a python application. I would think you can architect nginx to serve UWSIG and be very fast. See this entry: http://bit.ly/1BQgiwG I expect an nginx/UWSGI well tuned system would outperform an apache/mod_wsgi system. Tho I have not gathered data to prove that theory.

      • gosukiwi

        Interesting. The first article basically says Nginx is better but the authors has more experience with Apache, seems like that’s the only pro he has towards Apache. I’ll agree everything just works and it’s easy to get started using Apache, but Nginx is really simple to configure, so even if it doesn’t work out of the box, you can easily fix it. Working with mostly small web servers I’d say nginx is better as it consumes less memory overall. For big server architectures the difference might not be so big, but for a small 512MB server instance, nginx can save you some precious RAM!

        • Jeff Sheffield

          yea, I agree, for small its no-contest Nginx wins. To be honest, I really wonder how that stack would compare to node.js. I am extremely compelled with one-language-javascript across the stack. I think builds better web development teams, but I guess for small projects its a team of one. — grin — p.s. I have another blog post on building Nginx source

          • gosukiwi

            Haven’t worked much with Node. I know it’s pretty fast though, and most of the time is proxied behind nginx. I do love Javascript and Node though.