From Mageia wiki
Jump to: navigation, search


Drakconf multiflag.png
Other languages
Deutsch ; English ;

Purpose

urpmi-proxy is an http proxy for urpmi, it allows clients to configure this proxy as a local mirror. It fetches from configurable sources and stores it in a local cache, during the request. You can also configure it to have an extra repository so you can provide extra or modified packages.

Author: Maarten Vanraes
License: GPLv2+

That means that this is primarily for:

  • people with more than one Mageia installation in a local network (including VMs, chroots, etc...)
  • people who want to have a local mirror, but want to have immediately the new files, or only want to store the packages that they actually use
  • people who want to test locally built packages and see how they integrate with the others
  • people who's reason is not listed above :-)

Quick & Dirty

Install urpmi-proxy on your server and your client machines can use it like a "mirror":

On the server, install:

# urpmi urpmi-proxy

Since urpmi-proxy is a webapplication (cgi-script) for apache, you need to make sure apache is running:

# systemctl restart httpd.service

Also make sure port 80 is open in the firewall on the machine where urpmi-proxy runs. ("web server" in Mageia Control Center > Security > Personal firewall)

NOTE: If you want more performance, you'll need to configure it (see below)

Configure the clients

Network install using your proxy

Example for Mageia 5, 64 bit: as a network install path eg: http://my-urpmi-proxy-server/mageia/distrib/5/x86_64 : Or more in detail: Boot the computer on separately downloaded boot.iso, and choose to install using HTTP, manual configuration, then the IP adress of your urpmi-proxy, and path /mageia/distrib/4/x86_64 .

Changing clients to use your urpmi-proxy

To remove all previously configured media and then add the urpmi-proxy media:

urpmi.removemedia -a
urpmi.addmedia --distrib http://my-urpmi-proxy-server/mageia/distrib/5/x86_64

What if my server isn't Mageia?

No problem, use the source tarball, install it on your server. Obviously your server won't be using the urpmi-proxy itself (no recursive loop issue), but since doesn't have urpmi, it'll need to be configured, though. Just

  • put the urpmi-proxy.conf file in /etc,
  • rename the apache.conf file and put it in the webapps.d folder of apache, and
  • put the cgi file in the location specified by the apache.conf file.

That's all.

Clients? What about the server itself?

OK, so if you want the server itself to use it as well (or you just have one machine), there's some configuration required. By default, urpmi-proxy uses the urpmi settings of the server to see where it should get the files which it proxies.

Thus, if you would just use it on the server, it would get the files from ... the server itself? which would get them from ... the server itself? OUCH, a recursive loop.

So how do we fix this? Simple: just configure urpmi-proxy get it from somewhere else, rather than looking at the urpmi settings of the server.

The configuration file is /etc/urpmi-proxy.conf, in it, there's a $sources variable, choose a good mirror and put it there instead of the default 'urpmi', eg:

$sources = [ 'http://mageia.unige.ch/mirror' ];

Note the trailing ";" which must be present. If you look at the link in a browser, you'll see 'iso' and 'distrib' directories, I prefer http urls, because it seems to be the most efficient.

Now urpmi-proxy uses these configured settings, and so the server (or your only machine) can safely use the urpmi-proxy link. (Any changes to the config file are instantly available.)

Right, so what about this configuration file?

Well, I would say, the major point on this are the sources of course:

  • urpmi
This is the default, internally it gets mapped to mirrorlist://$MIRRORLIST really.
  • mirrorlist
This looks in the urpmi mirrorlist code and uses the supplied mirrorlist to get the currently used mirror of your machine and will use that mirror.
  • http or ftp
This is what you would use for manual configuration, any http or ftp url will do nicely here. make sure it doesn't end on / and this folders should contain distrib and iso folders.
  • file
This one is the funny one, this is used to have extra repositories that are mixed with the Mageia ones, so you have an effective overlaid repository. In other words: packages you put here, can be added to the Mageia repositories, seamlessly. See even further below for more info regarding this.

There is also a logfile you can specify (see below for more info regarding the logging format):

$logfile = '/var/log/urpmi-proxy.log';

Also a debug option, (debugging info will be in apache logs, not the logfile above):

$debug = 0;

This path will be used as the storage for the proxy (the files and directory structure will be exactly the same, as on mirrors, so you could actually use this as a file-based repository directly, or even rsync this at some intervals:

$cache_path = '/var/cache/urpmi-proxy';

Define the temporary cache, which is used for downloading files, in the same filesystem as the cache_path, so that the hardlinks will work efficiently.

$cache_tmp_path = '/var/tmp/urpmi-proxy';
Dragon-head.png Here be dragons!
If you change the folder location, you should do it for both, in the same filesystem and do pay attention to assign the ownership to the apache user apache:apache

The next parts are parameter you should really stay away from unless you've read the code, so I won't really bother with explaining them; but in short, media.cfg files are being merged, while MD5SUM files are always checked for updates.

$check_updates_only_files = 'MD5SUM';
$check_no_updates_files = undef;
$merge_files = 'media.cfg';

Lastly, there's also tuning options:

$connect_timeout = 120;
$ftp_response_timeout = 30;
$max_stall_speed = 8192;
$max_stall_time = 60;

What does it actually do? How does it actually work?

Pretty simple:

It's an apache webapp, in a sense that however your apache is configured, it intercepts '<whatever>/mageia/*', the rest of the link is passed to the program.

It analyses the path you have put in, looks locally in the path specified above if it's already there and if so, returns the file.

If it's not found, it's matched against the sources in top-down order and tries to fetch the file.

The urpmi and mirrorlist sources are being looked up to find out where to fetch the file.

If it's fetching the file, it uses a clever trick to store the file at the same time it's downloading, but at the same time, the downloaded data is already being sent to the client who requested the file. That means that there is no visible delay when dowloading files!

It also logs what has happened for each request in the log file.

What's the format of this log file?

[DATE TIME] PATH - HTTP_RETURN_CODE - CACHE_CODE

I'll give you an example; here some files are installed which weren't cached:

[Fri Dec 30 16:19:33 CET 2011] /distrib/1/i586/media/core/updates/media_info/MD5SUM - 200 - MISS [Fri Dec 30 16:19:34 CET 2011] /distrib/1/i586/media/core/updates/media_info/20111230-134534-info.xml.lzma - 200 - MISS [Fri Dec 30 16:19:56 CET 2011] /distrib/1/i586/media/core/updates/libxulrunner9.0.1-9.0.1-0.2.mga1.i586.rpm - 200 - MISS [Fri Dec 30 16:20:13 CET 2011] /distrib/1/i586/media/core/updates/firefox-9.0.1-0.1.mga1.i586.rpm - 200 - MISS [Fri Dec 30 16:20:14 CET 2011] /distrib/1/i586/media/core/updates/libsqlite3_0-3.7.9-1.1.mga1.i586.rpm - 200 - MISS [Fri Dec 30 16:20:15 CET 2011] /distrib/1/i586/media/core/updates/firefox-en_ZA-9.0.1-0.3.mga1.noarch.rpm - 200 - MISS [Fri Dec 30 16:20:16 CET 2011] /distrib/1/i586/media/core/updates/libsqlite3-devel-3.7.9-1.1.mga1.i586.rpm - 200 - MISS [Fri Dec 30 16:20:18 CET 2011] /distrib/1/i586/media/core/updates/firefox-en_GB-9.0.1-0.3.mga1.noarch.rpm - 200 - MISS [Fri Dec 30 16:20:18 CET 2011] /distrib/1/i586/media/core/updates/xulrunner-9.0.1-0.2.mga1.i586.rpm - 200 - MISS

or here, this is a part of a typical mga_applet session:

[Mon Jan 9 23:36:11 CET 2012] /distrib/cauldron/x86_64/media/core/release/media_info/MD5SUM - 200 - MISS [Mon Jan 9 23:36:14 CET 2012] /distrib/cauldron/x86_64/media/core/release/media_info/20120109-221927-synthesis.hdlist.cz - 200 - MISS [Mon Jan 9 23:36:15 CET 2012] /distrib/cauldron/x86_64/media/nonfree/release/media_info/MD5SUM - 200 - MISS [Mon Jan 9 23:36:15 CET 2012] /distrib/cauldron/x86_64/media/tainted/release/media_info/MD5SUM - 200 - MISS [Mon Jan 9 23:36:16 CET 2012] /distrib/cauldron/x86_64/media/tainted/release/media_info/20120109-215715-synthesis.hdlist.cz - 200 - MISS

this is a part of installing a series of dependant packages, and a mixture of cached and uncached requests (notice the timestamp differences):

[Sun Jan 8 17:34:28 CET 2012] /distrib/cauldron/x86_64/media/core/release/lib64mikmod3-3.2.0-0.beta2.9.mga1.x86_64.rpm - 200 - CACHED_NO_CHECK [Sun Jan 8 17:34:28 CET 2012] /distrib/cauldron/x86_64/media/core/release/lib64mikmod-devel-3.2.0-0.beta2.9.mga1.x86_64.rpm - 200 - CACHED_NO_CHECK [Sun Jan 8 17:34:28 CET 2012] /distrib/cauldron/x86_64/media/core/release/lib64xdmcp6-devel-1.1.0-1.mga1.x86_64.rpm - 200 - CACHED_NO_CHECK [Sun Jan 8 17:34:29 CET 2012] /distrib/cauldron/x86_64/media/core/release/lib64pciaccess-devel-0.12.1-1.mga1.x86_64.rpm - 200 - MISS [Sun Jan 8 17:34:30 CET 2012] /distrib/cauldron/x86_64/media/core/release/lib64gpm-devel-1.20.6-5.mga1.x86_64.rpm - 200 - CACHED_NO_CHECK [Sun Jan 8 17:34:30 CET 2012] /distrib/cauldron/x86_64/media/core/release/lib64jbig-devel-2.0-5.mga1.x86_64.rpm - 200 - CACHED_NO_CHECK [Sun Jan 8 17:34:31 CET 2012] /distrib/cauldron/x86_64/media/core/release/x11-proto-devel-7.6-12.mga2.x86_64.rpm - 200 - MISS [Sun Jan 8 17:34:31 CET 2012] /distrib/cauldron/x86_64/media/core/release/cvs-1.12.13-18.mga1.x86_64.rpm - 200 - CACHED_NO_CHECK [Sun Jan 8 17:34:38 CET 2012] /distrib/cauldron/x86_64/media/core/release/lib64ogg-devel-1.3.0-1.mga2.x86_64.rpm - 200 - MISS [Sun Jan 8 17:34:38 CET 2012] /distrib/cauldron/x86_64/media/core/release/lib64slang-devel-2.2.4-3.mga2.x86_64.rpm - 200 - MISS [Sun Jan 8 17:34:39 CET 2012] /distrib/cauldron/x86_64/media/core/release/lib64ffi5-devel-3.0.10-1.mga2.x86_64.rpm - 200 - MISS [Sun Jan 8 17:34:40 CET 2012] /distrib/cauldron/x86_64/media/core/release/lib64xcb-randr0-1.7-3.mga2.x86_64.rpm - 200 - MISS [Sun Jan 8 17:34:41 CET 2012] /distrib/cauldron/x86_64/media/core/release/lib64attr1-devel-2.4.46-1.mga2.x86_64.rpm - 200 - MISS [Sun Jan 8 17:34:43 CET 2012] /distrib/cauldron/x86_64/media/core/release/lib64lzma-devel-5.0.3-1.mga2.x86_64.rpm - 200 - MISS

the possible values for CACHE_CODE are:

  • MISS
This file was not cached
  • MISS_FAIL
This file was not cached, but failed to be fetched
  • MISS_FAIL_SENT
This file was not cached, failed to be fetched, but was sent anyway (maybe partially?)
  • HIT_NO_CHECK
This file was cached and not checked for updates
  • HIT_AFTER_FAIL
This file had failed to fetch, but still had a cached version which is returned
  • HIT_AFTER_FAIL_UNMODIFIED
This file had failed to fetch, and still had a cached version, but the request itself was still marked as unmodified (even though it failed)

So how do I add a local repository into this?

This part is pretty easy:

$sources = [ 'file:///var/lib/urpmi-proxy/repository', 'http://mageia.unige.ch/mirror', ];

but that doesn't mean that you have a local repository yet...

What it does mean is that it'll use the first line to find the files first, if that fails, it uses the second line (your real media).

Thinking about the mirror layout, it generally means that it'll look for file:///var/lib/urpmi-proxy/repository/distrib/2/x86_64/media/core/release/* files.

So, if you want an extra repository in mga2 (let's call it "extra"), you would create a directory structure /var/lib/urpmi-proxy/repository/distrib/2/x86_64/media/extra/release.

The next part is to put rpm files in that directory.

Then, you'll have to make it a repository, but indexing the files and such:

# genhdlist2 --allow-empty-media --no-bad-rpm --xml-info --clean /var/lib/urpmi-proxy/repository/distrib/2/x86_64/media/extra/release/

Now we have everything, except that your extra repository isn't used.

The final step is to override the media.cfg file to include the extra repository.

Get the /distrib/2/x86_64/media/media_info/media.cfg file and put it in /var/lib/urpmi-proxy/repository/distrib/2/x86_64/media/media_info/

then edit the file, to include:

[extra/release] hdlist=hdlist_extra_release.cz name=Extra Release media_type=release

Any new installs or adding new media from within your urpmi-proxy, will now also include the extra repository enabled by default!

Can I use this with iurt?

probably

Development

The git repository can be found at http://gitweb.mageia.org/software/rpm/urpmi-proxy/

Rinsing the cache

The cache will by time grow large on changing repositories like cauldron, but also on updates. Here are some tricks to keep disk usage low, and one can even lower download need!

First some manual methods

  • When Cauldron change to final: In your cache tree rename your cauldron branch to the final version, and cauldron packages will be reused. (Old package versions are easily rinsed away using script below.) You may also want to manually sort by size and remove something fat not probable to be needed. If you want to keep following cauldron consider copying the branch instead of renaming.
  • When no client use an old Mageia versions repo anymore: delete that branch in cache
  • You may of course manually delete at need: odd large packages, testing repos...

Whatever you delete: anything will be downloaded again if needed.

Remove old packages

Luca Olivetti kindly shared on discuss@ml.mageia.org at 2014-11-16 01 a script https://wiki.mageia.org/mw-en/images/d/d9/Cache-rinse.txt that can clean out all but the newest versions of a package in each folder in a tree. If some client would want an elder package, urpmi-proxy will simply download it again. Note: it need be run with option --delete to actually do anything. Run it without that option and you will see what it would do and how many bytes it would clear. Note: cache-rinse.py must exist in and be run from the cache tree root!

Remove old hdlists

In media_info of all update repos and all cauldron there will accumulate old files that describe(d) the current media content. Easiest is to empty such media_info folders, and when it is wanted, the then latest version will get downloaded.

Cross link .noarch packages

An optimistion that save both space and download is to cross link .noarch packages (i.e large game maps) using hard links so if it is downloaded for one arch it is also handed to the other arch corresponding repo, so it is not not downloaded again to another place. And links do hardly use any disk storage - on the contrary: if both architectures have already downloaded 700MB .noarch file(s), this procedure will save 700 MB disk space. Best is of course to execute this after either architecture have downloaded large .noarch file(s).

Script

Here is a script https://wiki.mageia.org/mw-en/images/6/6b/Urpmi-proxy_rinse.sh.txt that perform all above except the manual actions. I run it manually especially after I have updated a computer and seen it pulled many MB of .noarch package(s). You could probably have it run (also) by cron or something else. As root of course. NOTE: adjust the part "for ver in 4 5 cauldron ;" according to the Mageia versions you use! Also check the other declarations as to where cache is etc in the beginning