2016년 12월 17일 토요일

Mirroring PyPI locally

I recently had to set up a local PyPI (Python Package Index) mirror at a client site so that developers could download pip packages locally instead of from the Internet. I used bandersnatch to mirror the packages hosted on PyPI. Bandersnatch can be installed via pip and mirroring requires just two steps.

Customize /etc/bandersnatch.conf

The first time you run bandersnatch, /etc/bandersnatch.conf will be automatically generated:

[root@ospctrl1 Documents]# bandersnatch
2017-06-28 16:05:11,761 WARNING: Config file '/etc/bandersnatch.conf' missing, creating default config.
2017-06-28 16:05:11,761 WARNING: Please review the config file, then run 'bandersnatch' again.



In this file you will specify the path into which you will download PyPI packages. I set my download path as follows:

[mirror]
; The directory where the mirror data will be stored.
directory = /MIRROR/pypi

By default, a logfile is specified, but bandersnatch will crash with the error ConfigParser.NoSectionError: No section: 'formatters' if the logfile does not exist. I simply commented out the line to fix this problem:

;log-config = /etc/bandersnatch-log.conf

You can see a sample bandersnatch.conf file at this pastebin:
http://paste.openstack.org/show/586323/

Start the mirroring process

You can start mirroring PyPI with the following command:

sudo bandersnatch mirror

As of Oct. 2016, PyPI takes up 380 GB on disk. Although mirroring requires just two steps, you are not done yet. You still have to make the packages available over your network and you must setup pip clients to use your local mirror.

Make the local mirror available over the network

Once you have mirrored pip packages, the next step is to setup an NFS share to the path where PyPI packages are mirrored or serve the mirror over HTTP using a webserver like Apache, nginx, darkhttpd, etc. I chose to go with darkhttpd because it doesn't require any setup whatsoever. You simply invoke darkhttpd in the path you wish to share over http. If you invoke it as sudo, it will serve files over port 80, but if you invoke it as the regular user, it will serve files over port 8080. You can specify the TCP port used with the --port option.

sudo darkhttpd /MIRROR

Now the Document Root for the webserver is /MIRROR under which the pypi folder is located. I generally place all my local mirrors (i.e. Ubuntu 16.04 repo, CentOS 7 repo, RHEL repo, etc) under the Document Root so that they are all available over http.

Assuming your local mirror machine has the IP address 10.10.10.5, if you navigate to http://10.10.10.5 in your browser (or with cURL) you should be able to see the pypi folder.

Customize pip clients to use local PyPI mirror

On client machines, create the directory ~/.pip and then create ~/.pip/pip.conf (if it doesn't exist already). Now add the following:

[global]
index-url = http://10.10.10.5/pypi/web/simple
trusted-host = 10.10.10.5

This of course assumes that clients are on the 10.10.10.x subnet and that your PyPI mirror is located on 10.10.10.5

In addition, the web server's document root should be /MIRROR as detailed in the previous section.