Initial configuration steps on all platforms¶
Creating a file tree¶
If you haven’t got a file tree yet, you should create it now.
Make the directory for the file tree and fill it:
mkdir /srv/mytree
rsync .... /srv/mytree
Note that this file tree is a necessary prerequisite to running MirrorBrain, even though it intercepts the requests to those files and redirects them to mirrors.
Having said that, there is a way to get by without local files, which is by
using the null-rsync (found in the source tree)
tool instead of
rsync to pull the files. null-rsync is used exactly as
rsync, but it will create a pseudo file tree that requires very few local
space. However, since those files are filled with zeroes (!), it is important
to make sure that MirrorBrain never delivers content from those files. That
is achieved by using the MirrorBrainFallback
directive to define some
mirrors that are always available and are guaranteed to have all those
files. (The directive can be configured individually per directory in Apache
config.) See the 2.11.0 release notes for details.
Note that if you do have the real files locally, you can automatically maintain cryptographic hashes of them in the database; running with pseudo files cuts on some very useful features. In addition, the local files are always available to deliver them directly, which is a good fallback behaviour for files that are not mirrored at all, files that have not arrived on any mirror just yet, and so on. Of course, you can also make sure that files are never delivered from the redirector (in other words, it redirects always).
Note
In summary: a tree with real files is required, if you want to serve any hashes, zsync, or torrents. But you can make sure that the content is always redirected. The “fake tree” that you can create with null-rsync is good only for pure redirection. (And Metalinks without hashes.) The server doesn’t know any content then; only file path, size, mtime, nothing else.
Creating mirrorbrain.conf¶
Create a configuration file named /etc/mirrorbrain.conf
with the content below:
[general]
instances = main
[main]
dbuser = mirrorbrain
dbpass = 12345
dbdriver = postgresql
dbhost = 127.0.0.1
# optional: dbport = ...
dbname = mirrorbrain
[mirrorprobe]
# logfile = /var/log/mirrorbrain/mirrorprobe.log
# loglevel = INFO
Note
The database password in the above template is only a placeholder and you need to edit it: change it to the actual password, the one that you gave when you ran PostgreSQL’s createuser command. Likewise, make sure that you picked the same username.
Set the following permissions and privileges on the file:
sudo chmod 0640 /etc/mirrorbrain.conf
sudo chown root:mirrorbrain /etc/mirrorbrain.conf
Other possible options per MirrorBrain instance are:
- scan_top_include
Directory names separated by spaces. Meaning: Scan only these directories, and ignore all other directories at the top level.
- scan_exclude_rsync
Exclude list for rsync scans (same rules as for rsyncs option
--exclude
apply). Meaning: Ignore all directories or path names that match, everywhere in the tree.
- scan_exclude
Exclude list for FTP scans. Meaning: Ignore all directories or path names that match, everywhere in the tree.
Testing the database admin tool¶
At this point, you should be able to type the following command without getting an error:
mb list
It’ll produce no output, but exit with 0. If it gives an error, something is wrong.
Note
Do this to verify that the previous steps have been completed successfully.
Likewise, the following command should not return any error, but rather displays its usage info. If so, the installation should be quite fine:
mb help
Also, the following should work (you might have to change the path to
/usr/share/GeoIP
for your system):
% geoiplookup_continent -f /var/lib/GeoIP/GeoIP.dat www.slashdot.org
NA
The NA
stands for North America and indicates that the GeoIP lookup works
correctly.
Creating some mirrors¶
Collect a list of mirrors (their HTTP baseurl, and their rsync or FTP baseurl for scanning). For example:
http://ftp.isr.ist.utl.pt/pub/MIRRORS/ftp.suse.com/projects/
rsync://ftp.isr.ist.utl.pt/suse/projects/
http://ftp.kddilabs.jp/Linux/distributions/ftp.suse.com/projects/
rsync://ftp.kddilabs.jp/suse/projects/
Now you need to enter the mirrors into the database; it could be done using the “mb” mirrorbrain tool. (See ‘mb help new’ for full option list.):
mb new ftp.isr.ist.utl.pt \
--http http://ftp.isr.ist.utl.pt/pub/MIRRORS/ftp.suse.com/projects/ \
--rsync rsync://ftp.isr.ist.utl.pt/suse/projects/
mb new ftp.kddilabs.jp \
--http http://ftp.kddilabs.jp/Linux/distributions/ftp.suse.com/projects/ \
--rsync rsync://ftp.kddilabs.jp/suse/projects/
The tool automatically figures out the GeoIP location of each mirror by itself. But you could also specify them on the commandline.
If you want to edit a mirror later, use:
mb edit <identifier>
To simply display a mirror, you could use ‘mb show kddi’, for instance.
Finally, each mirror needs to be scanned and enabled:
mb scan --enable <identifier>
See the output of mb help for more commands. Refer to Maintaining the mirror database for a full reference documentation to the mb tool.
Setting up required cron jobs¶
Setting up mirror monitoring¶
Mirror monitoring needs to be set up to run automatically. Put this into
/etc/crontab
:
The following cron job is needed to check which mirrors are reachable. This command is responsible for checking the mirrors in short intervals, and marking them online/offline in the database:
* * * * * mirrorbrain mirrorprobe
Setting up mirror scanning¶
Configure mirror scanning:
45 * * * * mirrorbrain mb scan --quiet --jobs 4 --all
Use more parallel scanners (-j|--jobs ...
) if you have a beefy machine.
The --quiet
option can be used twice (e.g. as -qq
), which will totally
silence the scanner, except for error messages. This means that you get a mail
only when there is something wrong.
Maintenance¶
Another cron job is useful to remove unreferenced files from the database:
# Monday: database clean-up day...
30 1 * * mon mirrorbrain mb db vacuum
Keeping the GeoIP database uptodate¶
The GeoIP database is changed at least once a month, so a new copy should be downloaded regularly:
# update GeoIP database on Mondays
31 2 * * mon root sleep $(($RANDOM/1024)); /usr/bin/geoip-lite-update
(The ‘sleep’ is there so you can copy the line, don’t need to adjust the time, and still the GeoIP servers will not get a lot of simultaneous hits at exactly the same time. That’s all.)
Testing¶
- TODO: describe how to test that the install was successful
(When testing, consider any excludes that you configured, and which might introduce confusion.)
Many HTTP clients can be used for testing, but cURL is a most helpful tool for that. Here are some examples.
Showy the HTTP response code and the Location header pointing to the new location:
curl -sI <url>
Display the metalink:
curl -s <url>.metalink
Show a HTML list with the available mirrors:
curl -s <url>?mirrorlist
Setting up logging¶
You may want to log more details than Apache normally logs into the access_log file. You can define a new log format that gives you an access_log, with details from MirrorBrain added:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \
want:%{WANT}e give:%{GIVE}e r:%{MB_REALM}e %{X-MirrorBrain-Mirror}o \
%{MB_CONTINENT_CODE}e:%{MB_COUNTRY_CODE}e ASN:%{ASN}e P:%{PFX}e \
size:%{MB_FILESIZE}e %{Range}i" combined_redirect
This defines a new log format called “combined_redirect”, which you can use in your virtual hosts with the CustomLog directive.
Instead of:
CustomLog /var/log/apache2/myhost/access_log combined
you would use:
CustomLog /var/log/apache2/myhost/access_log combined_redirect
Creating hashes¶
First, add some configuration (example):
MirrorBrainMetalinkPublisher "openSUSE" http://download.opensuse.org
You need to create a directory where to store the hashes. For instance,
/srv/hashes/srv/opensuse
. Note that the full pathname to the filetree
(/srv/opensuse
) is part of this target path.
Make the directory owned by the mirrorbrain
user.
Now, create the hashes with the following command. It is best run as
unprivileged user (mirrorbrain
):
mb makehashes /srv/opensuse -t /srv/hashes/srv/opensuse
Add the hashing command to /etc/crontab to be run every few hours. Alternatively, run it after changes in the file tree happen, coupled to some trigger etc.
Note
The path names used here need to match the DocumentRoot
in Apache’s virtual
host setup.
(This command was called metalink-hasher
in previous releases of
MirrorBrain.)
Optional things you might want¶
further things that you might want to configure:
mod_autoindex_mb, a replacement for the standard module mod_autoindex:
a2dismod autoindex a2enmod autoindex_mb Add IndexOptions Mirrorlist # or IndexOptions +Mirrorlist, depending on your config (if you have options # that are inherited from above in the tree, and you just want to add this one here)
add a link to a CSS stylesheet for mirror lists:
MirrorBrainMirrorlistStylesheet "http://static.opensuse.org/css/mirrorbrain.css"
and for the autoindex:
IndexStyleSheet "http://static.opensuse.org/css/mirrorbrain.css"
Configuring GeoIP¶
Note
It is better to use the larger GeoLiteCity database, instead of the minimal GeoIP database that contains only country information. With the more detailed info in the former database, a better mirror selection is achieved in many cases.
Edit /etc/apache2/conf.d/mod_geoip.conf:
<IfModule mod_geoip.c>
GeoIPEnable On
GeoIPDBFile /var/lib/GeoIP/GeoLiteCity.dat.updated
#GeoIPOutput [Notes|Env|All]
GeoIPOutput Env
</IfModule>
(Change GeoIPOutput All to GeoIPOutput Env)
Note that a caching mode like MMapCache needs to be used, when Apache runs with the worker MPM.In this case, use:
<IfModule mod_geoip.c>
GeoIPEnable On
GeoIPDBFile /var/lib/GeoIP/GeoLiteCity.dat.updated MMapCache
GeoIPOutput Env
</IfModule>
Seting up automatic updates of the GeoIP database¶
New versions of the GeoIP database are released each month. You can set up a
cron job to automatically fetch new updates as follows. If you do that, make
sure to set the GeoIPDBFile path (see above) to
/var/lib/GeoIP/GeoLiteCity.dat.updated
:
# update GeoIP database on Mondays
31 2 * * mon root sleep $(($RANDOM/1024)); /usr/bin/geoip-lite-update
Creating a virtual host¶
Maybe create a DNS alias for your web host, if needed.
Note
A complete reference of all Apache directives can be found here.
The following snippet would create a new site as virtual host:
sudo sh -c "cat > /etc/apache2/sites-available/mirrorbrain << EOF
<VirtualHost 127.0.0.1>
ServerName mirrors.example.org
ServerAdmin webmaster@example.org
DocumentRoot /var/www/downloads
ErrorLog /var/log/apache2/mirrors.example.org/error.log
CustomLog /var/log/apache2/mirrors.example.org/access.log combined
<Directory /var/www/downloads>
MirrorBrainEngine On
MirrorBrainDebug Off
FormGET On
MirrorBrainHandleHEADRequestLocally Off
MirrorBrainMinSize 2048
MirrorBrainExcludeUserAgent rpm/4.4.2*
MirrorBrainExcludeUserAgent *APT-HTTP*
MirrorBrainExcludeMimeType application/pgp-keys
Options FollowSymLinks Indexes
AllowOverride None
Order allow,deny
Allow from all
</Directory>
</VirtualHost>
EOF
"
Another example:
<VirtualHost your.host.name:80>
ServerName samba.mirrorbrain.org
ServerAdmin webmaster@example.org
DocumentRoot /srv/samba/pub/projects
ErrorLog /var/log/apache/samba.mirrorbrain.org/logs/error_log
CustomLog /var/log/apache/samba.mirrorbrain.org/logs/access_log combined
<Directory /srv/samba/pub/projects>
MirrorBrainEngine On
MirrorBrainDebug Off
FormGET On
MirrorBrainHandleHEADRequestLocally Off
MirrorBrainMinSize 2048
MirrorBrainExcludeUserAgent rpm/4.4.2*
MirrorBrainExcludeUserAgent *APT-HTTP*
MirrorBrainExcludeMimeType application/pgp-keys
Options FollowSymLinks Indexes
AllowOverride None
Order allow,deny
Allow from all
</Directory>
</VirtualHost>
Make the log directory for the virtual host:
sudo mkdir /var/log/apache2/mirrors.example.org/
Enable the site:
sudo a2ensite mirrorbrain
Restart Apache, best while watching the error log:
sudo tail -f /var/log/apache2/error.log &
sudo /etc/init.d/apache2 restart