Maintaining the mirror database¶
Concepts – the mb command¶
mb is a commandline tool to do maintain the mirror database, create mirrors, edit them, work with files and other tasks.
It has several subcommands, and it is typically used in one the following forms:
mb <command> mb <command> <identifier>
A typical example would be:
mb edit opensuse.uib.no
Note the first argument (after
edit), which is the mirror identifier. It
serves as a name that uniquely identifies a single mirror. It can be useful if
these identifiers are memorizable by a human.
For all mb commands where a mirror (or several) needs to be specified, you can abbreviate the identifier by typing part of it. For instance, instead of:
mb show opensuse.uib.no
you could just type:
mb show uib
as long as
uib is uniquely identifying a mirrors among the others.
The mb command is extensible. See the developers documentation for instructions. (To be written yet.) .. TODO: add reference
mb has reference documentation built-in. If you just call mb or mb -h or mb help, it will print out the list of known subcommands:
% mb Usage: mb COMMAND [ARGS...] mb help [COMMAND] Options: --version show program's version number and exit -h, --help show this help message and exit -d, --debug print info useful for debugging -b BRAIN_INSTANCE, --brain-instance=BRAIN_INSTANCE the mirrorbrain instance to use. Corresponds to a section in /etc/mirrorbrain.conf which is named the same. Can also specified via environment variable MB. Commands: commentadd add a comment about a mirror db (vacuum) perform database maintenance delete delete a mirror from the database dirs show directories that are in the database disable disable a mirror edit edit a new mirror entry in $EDITOR enable enable a mirror export export the mirror list as text file file operations on files: ls/rm/add help (?) give detailed help on a specific sub-command instances list all configured mirrorbrain instances iplookup lookup stuff about an IP address list list mirrors markers show or edit marker files mirrorlist generate a mirror list new insert a new mirror into the database probefile list mirrors on which a given file is present by probing... rename rename a mirror's identifier scan scan mirrors score show or change the score of a mirror show show a mirror entry test test if a mirror is working update update mirrors network data in the database
By typing mb <command> -h or mb help <command>, help for the individual command will be printed:
% mb help list list: list mirrors Usage: mb list [IDENTIFIER] Options: -h, --help show this help message and exit -r XY show only mirrors whose region matches XY (possible values: sa,na,oc,af,as,eu) -c XY show only mirrors whose country matches XY -a, --show-disabled do not hide disabled mirrors --disabled show only disabled mirrors --prio also display priorities --asn also display the AS --prefix also display the network prefix --region also display the region --country also display the country --other-countries also display other countries that a mirror is configured to handle
Creating a new mirror¶
As necessary ingredient, there need to be mirror servers. They need to serve content via HTTP or FTP. To be scanned, they need to run rsync, FTP or HTTP. rsync is most efficient for this. FTP is second choice. At last, HTTP may be used, however it’ll work only if the HTTP server provides a reasonable “standard” directory index.
To make a new mirror known to the database, you use the mb command, specifically the mb new subcommand. An example would be the following:
mb new opensuse.uib.no -H http://opensuse.uib.no/ \ -F ftp://opensuse.uib.no/pub/Linux/Distributions/opensuse/ \ -R rsync://opensuse.uib.no/opensuse-full/
This creates a new entry in the mirror database with the data provided on the commandline.
Because providing a lot of data on the commandline can be tiresome, and incremental changes are often needed to get the data right, there is a command to edit the data later: mb edit.
A new mirror created this way is disabled in the beginning, because it needs to be scanned first before it can be useful.
Enabling a mirror, or more correctly enabling redirections to a mirror, can be done with the command mb enable.
Before doing this for the first time, the mirror needs to be scanned to be useful; see below (Scanning mirrors).
Another way to enable a mirror is to edit its database record directly (see below, where this is explained).
Disabling a mirror¶
Using the mb disable command, a mirror can be disabled, and MirrorBrain will immediately stop to send requests to it.
Another way to disable a mirror is to use mb edit to edit its
database record, and changing the
enabled field to
the same time, a comment about the reason could be left in the
Disabled mirrors are not scanned. Thus, it is usually advisable to scan a mirror before reenabling it after inactivity for prolonged time, using mb scan -e.
A mirror will also effectively be disabled if the
score is set to
Deleting a mirror¶
A mirror is deleted with the mb delete command. This command is an exception of the rule of abbreviating mirror identifiers; here, the full and exact identifier of the mirror to be deleted must be specified. This is to prevent typos.
A deleted mirror is permanently pruned from the database upon completion of the command.
Displaying details about a mirror¶
mb show will print out the metadata of a mirror. Example:
% mb show uib identifier : opensuse.uib.no operatorName : UiB - University of Bergen, IT services operatorUrl : http://it.uib.no/ baseurl : http://opensuse.uib.no/ baseurlFtp : ftp://opensuse.uib.no/pub/Linux/Distributions/opensuse/opensuse/ baseurlRsync : rsync://opensuse.uib.no/opensuse-full/ region : eu country : no asn : 224 prefix : 126.96.36.199/16 regionOnly : False countryOnly : False asOnly : False prefixOnly : False otherCountries : fileMaxsize : 0 publicNotes : score : 100 enabled : True statusBaseurl : True admin : X, Y, ... adminEmail : firstname.lastname@example.org ---------- comments ---------- Added - Wed May 6 14:36:10 2009 *** scanned and enabled at Wed May 6 14:47:56 2009. Gave stage access. poeml, Mon May 11 16:11:56 CEST 2009 Adjusted FTP URL after they switched to stage. (appended "opensuse"). rsync down at the moment. poeml, Mon May 11 17:18:06 CEST 2009 ---------- comments ----------
A mirror record explained¶
This is the unique identifier of the mirror server. In the table shown by mb edit, this is the only field that cannot be edited. To rename an identifier, you can use the mb rename command.
The realname of the mirror operator. This could be a person, an the organization running the mirror, or a sponsor. If the mirror list is exposed in some way, this field could be used to give the operator some visibility. Otherwise, it is of no significance than for your information.
A contact or informative URL.
The root HTTP URL of the mirrored file tree on the mirror. Used by the redirector to redirect requests via HTTP. If a mirror doesn’t offer HTTP, but only FTP, an FTP URL can be entered here as well.
The root FTP URL of the mirrored file tree on the mirror. Used by the scanner to retrieve the file list - if rsync isn’t available..
The root rsync URL used by the scanner to find the files via rsync. It’s possible to use URLs with credentials, like
The region code specifying the continent the mirror server is located in. See also
The country code for the server. See also
This is optional and is a number of the autonomous system the mirror is located in. It may serve as a more specific “network location” than the country, and is filled in automatically when a mirror is created. If you don’t use the autonomous system database together with MirrorBrain, the value will be zero and will be ignored by MirrorBrain. It is not strictly needed. It can also be edited manually, or updated via mb update --asn <identifier> from looked up data. Only meaningful if MirrorBrain is used together with mod_asn.
If true, only clients from the same region (continent) as the mirror are redirected to this mirror.
If true, only clients from the same country as the mirror are redirected to this mirror.
If true, the mirror will only get requests from clients that are located within the same network autonomous system (using the value in
If true, the mirror will only get requests from clients that are located within the same network prefix using the value inn
List of other countries that should be sent to this mirror server. This overrides the country and region choice, and can be used to fine-tune mirror selection. The list of country IDs specified here is given in the form of comma-separated two-letter codes. Apache does a simple string match on these, and a value that would make sense would be
Maximum filesize, the server can deliver without problems (some servers have problems with files > 2GB for example). MirrorBrain automatically checks HTTP servers for correct delivery, so there is no need to define this value for that reason. It can be used, however, to cause only “small” requests to go to certain mirrors, which are known to have too few bandwidth to deliver large files. If you set a threshold here (in bytes), the mirror will only get files that are smaller.
Notes which should be added to a html page listing all mirrors. The field may be used to store information separately from private notes taken in the comments field. The data isn’t exposed though, unless you take care of it.
The score (priority) of the server. Higher scored servers are used more often than lower scored servers. Default is 100. A server with score=150 will be used more often than a server with score=50.
Whether a mirror gets requests. Use this to enable redirects to a mirror, or switch them off. Can also be set with mb enable/disable <identifier>.
This field is edited by the mirror probe each time it runs (which normally is done frequently via cron). If it’s true, the mirror probe found that the mirror is alive the last time it looked.
Name of an admin or contact person for the mirror.
Contact Email address.
Free text field for additional comments. Use it in any way that suits you. It lends itself to take notes about communication with mirrors, for instance.
Editing a mirror¶
A mirror (in the mirror database) can be edited with the mb edit command.
The command will bring up an editor with the mirror’s metadata. The
VISUAL environmental variable is respected, and
the editor defaults to vim.
For fields where a Boolean is expected, you can type the value (while editing) in the form of 0/1 instead of true/false (shorter to type).
When you save the text and close the editor, you’ll be asked whether to save the data to the database.
Editing a mirrors network location¶
There are some fields in the mirror record, for which manual editing doesn’t make so much sense. These are:
autonomous system number,
When a mirror is created (using mb new ), then all these fields are automatically filled in. This requires a working DNS lookup and a GeoIP database.
The lookup of the autonomous system number and network prefix require mod_asn to be configured.
The geographical coordinates require the GeoIP database to be the GeoIP city (lite) version. The smaller database versions don’t contain the coordinates.
The data can be updated later with the mb update command. Regularly running this command (say, once a month) is a good idea because the data sometimes might change over time. However, this also means that manual edits will be overwritten.
To update all network data for all mirrors, simply run:
% mb update -A --all-mirrors
The command can also be used for individual mirrors, and to update only some data:
% mb update --coordinates --asn --prefix ftp5 updating geographical coordinates for ftp5.gwdg.de (0.000 0.000 -> 53.083 8.8)
Or it can be applied to all active mirrors:
% mb update --coordinates --asn --prefix updating geographical coordinates for ring.yamanashi.ac.jp (0.000 0.000 -> 36.0 138.0) updating network prefix for mirror.lupaworld.com (188.8.131.52/12 -> 184.108.40.206/12) [...]
mb list lists mirrors, with less or more details. In its simplest form, the command will simply print all identifiers of enabled mirrors. mb list -a includes also the disabled mirrors.
More useful is to add filters, or display more data.
Examples of filtering by country code (here: Bulgaria,
% mb list -c bg mirrors.netbg.com bgadmin.com
Example of filtering by region (here: Oceania,
oc), and also displaying the
value of the
otherCountries field for each mirror:
% mb list -r oc --other-countries ftp.iinet.net.au nz mirror.aarnet.edu.au nz mirror.pacific.net.au nz mirror.internode.on.net nz mirror.3fl.net.au nz netspace.net.au nz optusnet.com.au nz
Example of listing all mirrors in Portugal and showing their
% mb list -c pt --prio lisa.gov.pt 100 ftp.isr.ist.utl.pt 50 uminho.pt 50 ftp.nux.ipb.pt 3
Showing priority, network prefix and autonomous system of Chinese mirrors:
% mb list -c cn --prio --as --prefix mirror.lupaworld.com 100 4134 220.127.116.11/12 lizardsource.cn 30 9389 18.104.22.168/21 lcuc.org.cn 100 17816 22.214.171.124/17
When not filtering the output, the
options are useful, because they add that data into the output. An example
would be listing all mirrors with the command mb list --prio --as
--prefix --country --region.
Mirrors need to be scanned for their file lists. This is done with the mb scan command. The program will try rsync, if available, FTP if not, or HTTP if it’s the only option.
An individual mirror can be scanned like this:
% mb scan roxen Fri Jul 31 21:31:50 2009 roxen.integrity.hu: starting Fri Jul 31 21:31:51 2009 roxen.integrity.hu: total files before scan: 17248 Fri Jul 31 21:31:59 2009 roxen.integrity.hu: scanned 17248 files (1935/s) in 8s Fri Jul 31 21:31:59 2009 roxen.integrity.hu: files to be purged: 0 Fri Jul 31 21:32:00 2009 roxen.integrity.hu: total files after scan: 17248 Fri Jul 31 21:32:00 2009 roxen.integrity.hu: purged old files in 1s. Fri Jul 31 21:32:00 2009 roxen.integrity.hu: done. Completed in 9 seconds
After creation of a new mirror, it is disabled first. A typical workflow would
be to scan it, after creating it, and then enabling redirection. mb
scan command can be used with the
--enable option to make this
happen. If the scan went successfully, the mirror will be enabled afterwards:
% mb scan -e tuwien Fri Jul 31 21:50:45 2009 gd.tuwien.ac.at: starting Fri Jul 31 21:50:45 2009 gd.tuwien.ac.at: total files before scan: 712 Fri Jul 31 21:50:46 2009 gd.tuwien.ac.at: scanned 712 files (511/s) in 1s Fri Jul 31 21:50:46 2009 gd.tuwien.ac.at: files to be purged: 0 Fri Jul 31 21:50:46 2009 gd.tuwien.ac.at: total files after scan: 712 Fri Jul 31 21:50:46 2009 gd.tuwien.ac.at: purged old files in 0s. gd.tuwien.ac.at: now enabled. Fri Jul 31 21:50:46 2009 gd.tuwien.ac.at: done. Completed in 1 seconds
To scan all enabled mirrors in parallel, you would use
option to specify the number of scanners to start in parallel, and the
% mb scan -j 16 -a
This is likely what you would configure to be done periodically by cron.
To scan only a subdirectory on the mirrors, the
-d option can be used. This
can be useful when it is known that content has been added or removed in
particular places of large trees, in the following example shown with a single
% mb scan -d repositories/Apache ftp5 Checking for existance of 'repositories/Apache' directory . Scheduling scan on: ftp5.gwdg.de Completed in 0 seconds Fri Jul 31 21:41:37 2009 ftp5.gwdg.de: starting Fri Jul 31 21:41:38 2009 ftp5.gwdg.de: files in 'repositories/Apache' before scan: 780 Fri Jul 31 21:41:40 2009 ftp5.gwdg.de: scanned 780 files (636/s) in 1s Fri Jul 31 21:41:40 2009 ftp5.gwdg.de: files to be purged: 0 Fri Jul 31 21:41:42 2009 ftp5.gwdg.de: total files after scan: 760122 Fri Jul 31 21:41:42 2009 ftp5.gwdg.de: purged old files in 2s. Fri Jul 31 21:41:42 2009 ftp5.gwdg.de: done. Completed in 4 seconds
For debugging purposes, the
-v option is useful. It can be repeated several
times to enable more output.
Files known to the database can be listed with the mb file ls command. When specifying a path name, the leading slash is optional and not relevant. (Internally, the filenames are stored without.)
% mb file ls /distribution/11.1/repo/oss/suse/ppc/tcsh-6.15.00-93.3.ppc.rpm as th 100 ok ok mirror.in.th eu at 100 disabled dead tugraz.at eu at 100 ok ok gd.tuwien.ac.at eu de 100 ok ok ftp5.gwdg.de eu hu 100 ok ok roxen.integrity.hu
Globbing can be used. Then, to get more than a list or mirrors, but also the
--url option is useful:
% mb file ls \*.iso -u as th 100 ok ok mirror.in.th http://mirror.in.th/opensuse/ppc/factory/iso/openSUSE-NET-ppc-Build0137-Media.iso as th 100 ok ok mirror.in.th http://mirror.in.th/opensuse/ppc/factory/iso/openSUSE-Factory-NET-ppc-Build0051-Media.iso as th 100 ok ok mirror.in.th http://mirror.in.th/opensuse/ppc/factory/iso/openSUSE-Factory-NET-ppc-Build0059-Media.iso as th 100 ok ok mirror.in.th http://mirror.in.th/opensuse/ppc/factory/iso/openSUSE-NET-ppc-Build0116-Media.iso eu de 100 ok ok ftp5.gwdg.de http://ftp5.gwdg.de/pub/opensuse/ppc/factory/iso/openSUSE-NET-ppc-Build0179-Media.iso eu hu 100 ok ok roxen.integrity.hu http://roxen.integrity.hu/pub/opensuse/ppc/factory/iso/openSUSE-NET-ppc-Build0179-Media.iso
In addition to just listing what’s known to the database, the command can also do probing. The number is the HTTP return code (200 for OK):
% mb file ls /distribution/11.1/repo/oss/suse/ppc/tcsh-6.15.00-93.3.ppc.rpm --probe ..... as th 100 ok ok mirror.in.th 200 eu at 100 disabled dead tugraz.at eu at 100 ok ok gd.tuwien.ac.at 200 eu de 100 ok ok ftp5.gwdg.de 200 eu hu 100 ok ok roxen.integrity.hu 200
When used with probing, there is the additional option to actually download the content and display a checksum of what was returned:
% mb file ls --probe /distribution/11.1/repo/oss/suse/ppc/tcsh-6.15.00-93.3.ppc.rpm --md5 ..... as th 100 ok ok mirror.in.th 200 50dc50b20a97783a51ff402359456e3a eu at 100 disabled dead tugraz.at eu at 100 ok ok gd.tuwien.ac.at 200 50dc50b20a97783a51ff402359456e3a eu de 100 ok ok ftp5.gwdg.de 200 50dc50b20a97783a51ff402359456e3a eu hu 100 ok ok roxen.integrity.hu 200 50dc50b20a97783a51ff402359456e3a
To be usable with lots of mirrors, the probing is done in parallel.
The mb file command can also be used as mb file add and mb file rm to manipulate the database. See the help output of the command for details.
Exporting mirror lists¶
The mb export command can export data from the mirror database in several different formats, for different purposes.
Exporting in mirmon format¶
mirmon is a program written by Henk P. Penning which monitors the status of mirrors. The format “mirmon” exports a list of mirrors in a text format that can be read by mirmon.
With this, it is straighforward to deploy mirmon and automate it to use the mirrors from the database. Thus, no separate list of mirrors needs to be maintained for it.
mb export --format=mirmon generates the list that mirmon needs,
which basically looks like this:
% mb export --format=mirmon | head de http://ftp-stud.fht-esslingen.de/pub/Mirrors/ftp.opensuse.org/ <...@...> de ftp://ftp-stud.fht-esslingen.de/pub/Mirrors/ftp.opensuse.org/ <...@...> de rsync://ftp-stud.fht-esslingen.de/opensuse/ <...@...> us http://mirror.anl.gov/pub/opensuse/opensuse/ <...@...> us ftp://mirror.anl.gov/pub/opensuse/opensuse/ <...@...> us rsync://mirror.anl.gov/opensuse/opensuse/ <...@...> ...
To give a full example, here’s how the actual mirmon config file would look
like. Note the
mirror_list line which pulls the generated list in:
project_name example.org project_url http://www.example.org/mirrors/ mirror_list /usr/bin/mb export --format=mirmon | web_page /var/www/example.org/mirmon/index.html icons icons probe /usr/bin/wget -q -O - -T %TIMEOUT% -t 1 %URL%timestamp.txt state /home/mirrorbrain/mirmon/state countries /usr/local/mirmon-2.3/countries.list project_logo http://www.example.org/images/logo.gif list_style plain timeout 20
The cron job to create the list and run mirmon would look like this:
30 * * * * mirrorbrain perl /usr/local/mirmon-2.3/mirmon -q -get update -c /etc/mirmon.conf
Note: when mirmon is run for the first time, the state file needs to be touched, or the script will not run.
The icons which are included in the resulting HTML page need to made available by Apache:
Alias /mirmon/icons /usr/local/mirmon-2.3/icons <Directory /usr/local/mirmon-2.3/icons> Options None AllowOverride None Order allow,deny Allow from all </Directory>
If your mirmon is configured with
list_style apacheinstead of
list_style plain, a different mirror list format is needed; use mb export with the
mb export --format=mirmon-apacheoption then.
If you prefer to run mb export under a different user id than mirmon, you can write the mirror list to an intermediate file, and configure mirmon to use the file like this:
Exporting to a Version Control System (VCS)¶
Exporting data in text format is a dead easy way to keep a history of changes that happen in the mirror database — and mail them around, so everybody involved is kept updated. At the same time, it serves archival purposes.
The idea is to export snapshots of the data in text format. The resulting files are put into a standard version control system, and standard post-commit hook scripts can be used to trigger certain actions (e.g. email).
The resulting archive of changes is all human-readable (much more useful than raw database backups). The changes can actually be mailed around in the form of a diff, showing some context.
A different way to implement a notification system for mirror changes would be to notify about each and every change done to the database — however, often changes have to be done incrementally and this would be a noisy method when working on a mirror’s configuration.
Instead, an hourly snapshot is normally sufficient to keep others informed, and shouldn’t be too noisy.
Subversion is the only version control system supported at the moment, but should hopefully be ubiquitous enough.
To set this up, first a repository needs to be created:
doozer:~ # su - mirrorbrain mirrorbrain@doozer:~> svnadmin create mirrors-svn-repos mirrorbrain@doozer:~> svn co file://$PWD/mirrors-svn-repos mirrors-svn Checked out revision 0. mirrorbrain@doozer:~>
Then, set up a cron job to run every hour, calling mb export with
--format=vcs and the
--commit=svn options. The latter automatically
svn commit after the export (taking into account files that have been
deleted, or occur for the first time):
# export mirrordb contents to SVN and send commit mails 7 * * * * mirrorbrain mb export --format vcs --target-dir ~/mirrors-svn --commit=svn
Finally, the post-commit hook script is missing, which takes care of sending mails. Create and edit it as follows:
mirrorbrain@doozer:~> touch mirrors-svn-repos/hooks/post-commit mirrorbrain@doozer:~> chmod +x mirrors-svn-repos/hooks/post-commit mirrorbrain@doozer:~> vi mirrors-svn-repos/hooks/post-commit #!/bin/sh REPOS="$1" REV="$2" /usr/share/subversion/tools/hook-scripts/mailer/mailer.py commit "$REPOS" "$REV" /etc/mailer.conf
The path to the mailer.py script likely needs adjustment. The
/etc/mailer.conf) could look like this:
[general] mail_command = /usr/sbin/sendmail [defaults] diff = /usr/bin/diff -u -L %(label_from)s -L %(label_to)s %(from)s %(to)s generate_diffs = add copy modify show_nonmatching_paths = yes [mirrordb] for_repos = /home/mirrorbrain/mirrors-svn-repos from_addr = mirrorbrain@... to_addr = admin@foo bar@... commit_subject_prefix = [mirrordb] propchange_subject_prefix = [mirrordb]
Exporting in PostgreSQL format¶
The format “postgresql” creates SQL INSERT statements that can be run on a PostgreSQL database. This can e.g. be used to migrate the data into another database.
The resulting dump could be loaded into a mirrorbrain instance like this:
mb db shell < db.dump
Exporting in Django format¶
This is experimental stuff — intended for hacking on the Django web framework. Data is exported in the form of Django ORM objects, and the export routine will very likely need modification for particular purposes. The existing code has been used to experiment with. Get in contact if you are interested in hacking on this!
Performing database maintenance¶
The mb db command offers some helpful functionality regarding database maintenance. It has several subcommands.
Regular cleanups with mb db vacuum¶
This command cleans up unreferenced files from the mirror database.
This should be done once a week for a busy file tree. Otherwise it should be rarely needed, but can possibly improve performance if it is able to shrink the database.
When called with the
-n option, only the number of files to be cleaned up
is printed, so it’s purely for information. No cleanup is performed.
The recommended cron job looks like this:
# Monday: database clean-up day... 30 1 * * mon mirrorbrain mb db vacuum
Note: This functionality is not to be confused with the PostgreSQL-internal vacuuming, which typically happens automatic these days (8.x), but was a manual process at some time in the past.
Database shell with mb db shell¶
With this command, you can conveniently open a database shell:
% mb db shell psql (8.4.1) Type "help" for help. mb_opensuse=>
…ready to enter commands in psql, the PostgreSQL interactive terminal.
Database size info with mb db size¶
The command mb db size prints the size of each database relation. (In PostgreSQL speak, a relation is a table or an index.) This provides insight for appropriate database tuning and planning. Here’s an example:
% mb db sizes Size(MB) Relation 464.5 filearr 532.9 filearr_path_key 74.3 filearr_pkey 23.8 pfx2asn 30.1 pfx2asn_pfx_key 19.9 pfx2asn_pkey 0.0 pg_foreign_server 0.0 pg_foreign_server_name_index 0.0 pg_foreign_server_oid_index 0.0 pg_user_mapping_user_server_index 0.2 server 0.0 server_enabled_status_baseurl_score_key 0.0 server_identifier_key 0.0 server_pkey 0.0 sql_sizing_profiles Total: 1145.9
This example shows a really, really large database, containing nearly 3 millions (!) of files. It uses a good gigabyte of disk space.
filearr contains the file names and associations to the mirrors.
filearr_path_key is the index on the file names.
filearr_pkey is the
primary key. These will be the largest things in a database filled with
millions of files.
pfx* relations are only present when mod_asn is installed. The size
they use is always the same.