Tuning guide

The following sections describe how to tune PostgreSQL and Apache for good performance.

Depending on the size of your install, this can be mandatory.

Tuning Apache

MPM

In general, it makes sense to use a threaded MPM for Apache. Prefork is less suitable, because the database connection pool would not be shared across the worker children. With prefork, each process would open its own connection to the database. For small installations, this might not matter. However, for a busy download server, the threaded event MPM or worker MPM are better choices.

The following would be a configuration for the event MPM which serves up to 960 connections in parallel, using 64 threads per process. (Values for threads per process that scale best on Linux are 32 or 64.):

# event MPM
<IfModule event.c>
    ServerLimit           15
    MaxClients           960

    StartServers           2

    ThreadsPerChild       64
    ThreadLimit           64

    MinSpareThreads       32
    # must be >= (MinSpareThreads + ThreadsPerChild)
    MaxSpareThreads      112

    # at 200 r/s, 20000 r results in a process lifetime of 2 minutes
    MaxRequestsPerChild 20000
</IfModule>

Refer to the Apache MPM documentation for details.

DBD connection pool

With threaded MPMs, the database connection pool is shared among all threads of an Apache child. Thus, it makes sense to tune the size of the pool to the number of processes and threads. The following should fit the above Apache dimensions quite well:

DBDMin  0
DBDMax  12
DBDKeep 3
DBDExptime 10

The total number of connections that Apache might open needs to be reflected in the PostgreSQL setup accordingly. The above example would be served safely with a database with a max_connections = 500 setting. (This setting may seem far too high, but it keeps even enough headroom to start separate Apache servers for testing purposes, or for upgrade purposes (killing Apache with SIGWINCH for graceful stop, and starting a new one while the old one still continues to serve requests to their end).

Having said that, if you find that you need a hundred connections or more in an everyday situation, there is something wrong – then you need to check if you chose the Apache threading model, check for the size of the database connection pool, and verify that there are no big bottlenecks in the database (which causes Apache to stall and stack up working threads and connections).

Thread stack size

When using lots of threads, their might be funny effects on some platforms. The default stack size allocated by the operating system for threads might be quite large, e.g. 8 MB on Linux. If this leads to problems, you could considerably decrease the stack size as such:

# If this isn't set, the OS' default will be used (8 MB)
# which is way more than necessary
ThreadStackSize 1048576

But normally the default settings should just work, I guess.

HTTP/1.1 KeepAlive

KeepAlive, a HTTP/1.1 feature, saves overhead by reusing already existing TCP connections to process further HTTP requests. If no additional request arrives after n seconds, the server closes the connection.

This is a good thing, but it can also become a problem when too many threads/processes linger around waiting for the next request in the connection. Each such thread would occupy a slot that could otherwise be used to handle other requests. In addition, even the number of unused ephemeral ports could become scarce under extreme conditions. The configured default of a KeepAlive time of 15 seconds in most Apache installs is far more than necessary. A good value is 2 seconds, which keeps the good side of KeepAlive, but avoids the drawbacks to the extent that they don’t tend to be a problem.

Hence, good settings are:

KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 2

Note

When it is getting really serious, like when you are slashdotted, don’t hesitate to simply switch KeepAlive off. You will see a drastic improvement, and probably save you. (If you did your homework and your website is lean and fast otherwise ;-) If it is fat and bloated, there is not much to do.)

Tuning PostgreSQL

To tune PostgreSQL for good performance, you should tweak some or all of the parameters below in postgresql.conf.

This config file often lives in the same directory as the PostgreSQL database, which would be /var/lib/pgsql/data on an openSUSE system – or it could be in /etc, as in /etc/postgresql/8.3/main on Debian Lenny.

Memory sizing

With a small database, using only a few megabytes, there will not be much need for tuning. With larger databases, that go into the hundreds of megabytes, tuning becomes important.

Note

Make sure to reserve enough memory for the database, especially if it will be large. As a rough first estimate, it is usually sufficient and optimal if the reserved RAM is about the same as the database size on disk.

Allocating memory to the database is done in the following way. PostgreSQL largely relies on buffer caching done by the OS. Thus, the first measure in “reserving” memory is to not run too much other stuff on the machine, which would compete for memory, or (in other words) having enough memory. In general, PostgreSQL’s performance reaches its maximum when the whole dataset, including indexes, fits in to the amount of RAM available for caching. (That statement is true if the whole dataset is actually used – if only parts are used, top performance will be reached already with less memory. MirrorBrain tends to use the whole dataset, at least during mirror scanning.)

There is a special command mb db sizes that helps you to assess the size of your databases. See Database size info with mb db size. (Just note that changes may not be immediately reflected in the numbers, because the statistics are updated periodically by PostgreSQL.)

shared_buffers

This parameter should usually be set to about 10-25% of the available RAM. Maximum value may be limited by the SHMMAX tunable of the OS.

(Of course, if your database is only 10 MB in size, there is no benefit in increasing this value that far. It obviously depends on the database size.)

Mirror scanning can incur heavy write activity, if there is a lot of fluctuation in the file tree, and when done in a massive parallel way. Scanning performance can benefit from higher values (25-50%) here. For read performance, (as affecting Apache and mod_mirrorbrain) higher values are not needed.

effective_cache_size

This is the effective amount of caching between the actual PostgreSQL buffers, and the OS buffers.

This does not create RAM allocations nor does it change how PostgreSQL uses RAM – it just gives PostgreSQL an assumption about the availability of memory to the OS cache. This influences decisions in the query planner, regarding usage of indexes.

In principle, this value could be set to the sum of cached + free in the free -m output. However, this value needs to be divided by the number of processes using this memory simultaneously. To estimate the latter, you could use top to see how many PostgreSQL processes are busy at the same time.

Anyway, it is better to set this parameter too low rather than too high, because that could result in too many index scans.

Other memory parameters that you might want to increase are:

  • maintenance_work_mem (generously)

  • work_mem (a bit)

Connection setup

listen_addresses

You’ll need to change the parameter listen_addresses if you

  1. run the web server on a different host than the database server, or if you

  2. want to use the mb admin tool from a different host than the the database host.

The default is localhost only. Add ‘*’ or comma-separated addresses.

max_connections

The default of 100 should fit many cases. Apache’s re-usal of connections is so efficient (and MirrorBrain quickly done with answering queries) that a handful connections is enough. However, if you use Apache’s prefork MPM, every child will use a connection. Thus, if you allow to have 200 Apache processes running you will need to adjust max_connections accordingly. With a threaded Apache, the connection pool is shared, so no problem. This is further discussed above, in the notes regarding Apache tuning.

Transaction log

The transaction log (called Write-Ahead-Log or WAL) is a central thing in PostgreSQL, and the configuration of its handling important.

synchronous_commit

In any case, you should disable the synchronous commit mode (synchronous_commit = off). The only case where you don’t want that is if you have other databases than MirrorBrain, which require a higher level of data integrity than MirrorBrain does.

wal_buffers

The default (64kB) may be increased to e.g. 256kB.

checkpoint_segments

For big databases (hundreds of MB in size), increase this from 3 to 32.

checkpoint_timeout

Increase to 15min.

To log a checkpoint whenever one occurs, set log_checkpoints = on and checkpoint_warning to 1h.

Deadlocks

deadlock_timeout

As described below, set this parameter to 30s.

Concurrent write access by different processes to the same rows causes a queue-up of those write-requests. A row can be written only by one process at a time. If a process waits too long, it gives up after a while. Its lock times out, so to speak, which is called a deadlock in this context. It’s not a real deadlock in the common sense, it’s just giving up after a while.

Read activity (as done by Apache + mod_mirrorbrain, serving users) is not affected by write activity locks. Write activity is mainly caused by mirror scanning. Scanning then again is often done in parallel, to save time, so it is typical to have to wait for locks (when two scanners happen to want to write to similar regions in the database).

The default time waiting for a lock is 1s in PostgreSQL, which is often too short for MirrorBrain. That could be too long for other applications in fact, but for a mirror scanner it doesn’t matter if it has to wait many seconds now and then. In fact, it is best to increase the lock waiting time to something like 30 seconds. The deadlocks don’t occur frequently when scanning, but when they occur, you don’t want a scanner to give up on that part and have some missing files on the mirror later.

Such deadlocks are more likely to occur when scanning a new mirror, which means that every database row has to be touched (for each file found on the mirror). Even more likely (actually, unavoidable) are they when you fill your database for the first time, after installing, and the rows are created at the first place. In that case, you will see deadlocks occur frequently. The best advice is to ignore them and simply scan once again, after the first run has completed.

Later scans mainly see what they know already, so there is no reason to write to the database, which means that deadlocks don’t occur.

Enhancing logging

Logging can be enhanced with some details that might be relevant or helpful to tuning for MirrorBrain:

log_line_prefix

To get more detailed log lines, set it to:

'%m [%p:%l] %u@%d '

Note the trailing space!

log_lock_waits

To log lock waits >= deadlock_timeout, set to on.

log_min_duration_statement

You could log all long-running queries by configuring this to e.g. 2000 (value is in milliseconds).

Preventing kernel “write floods”

The kernel might decide to write a whole bunch of changes to disk in one step. This might block other operations for several seconds. That’s deadly for performance of a web server.

This may affect you if you run a huge MirrorBrain database (like, say, 1 GB in size) in conjunction with heavy write activity (scanning many mirrors).

You should first make sure that you have followed all the tuning advice given above.

Now, if

  1. that is all fine and

  2. you are sure that your database server principally has enough memory and

  3. you see webserver hangs occuring intermittently and

  4. there is no other memory-intensive task to be done by the database server

then you can tune your kernel to handle write in a (for us) more efficient way. (The kernel finds it more efficient, by default, to wait a while until there is a lot to write – for us it is more efficient to write out data more frequently, in smaller portions.)

The two kernel tunables that let us achieve the desired behaviour are:

vm.overcommit_memory = 2
vm.dirty_background_ratio = 0

Put them into /etc/sysctl.conf to make it a permanent change.

You’ll find additional details in the official PostgreSQL documentation: http://www.postgresql.org/docs/9.3/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT (the rest of the page is also readworthy).