Thursday, March 29, 2012

Package Predicament Part 3: Python, the final installment

[WARNING: Outdated Material] This post is over a year old, and consequently the ideas, opinions and facts may no longer be relevant or accurate (assuming there were actually facts in this post). Please proceed with caution. You have been warned.

So it turns out that there is a long history between Python and Linux (*). Debian has an official Python Package Policy. This is because Python is an integral part of Linux; Linux libraries depend on Python and Python packages. If you go moving them around, you will break your Linux installation.

One of the main differences you will see are the /lib/site-packages folder is renamed /lib/dist-packages on Linux (* just Ubuntu, see CORRECTION), and the existence of /usr/share/pyshared and /usr/lib/pymodules on Linux as well. I'm not going to pretend I understand everything that's going on, but in a nutshell these are Python modules that Linux is using, for something.

Lucky for you, distribute, setuptools, distutils and pip are tailored for Ubuntu so that they will install packages in the correct locations, and also know where to find them. So in general this supports the argument that you should look for your Python packages from the distro repo, especially pip, distribute, setuptools, distutil, and virtualenv. If you install these from your distro repo, you should be OK (**). And above all, "for the love of Guido," use pip, never easy_install!

[CORRECTION: 2012-04-11] (*) After several adventures with VMs (Fedora16OpenSUSE 12 and even FreeBSD), I have done some experimentation, and some of these Python-peculiarities are only on Ubuntu/Debian distros. For example, there is no dist-packages folder on either OpenSUSE or Fedora. They both use the traditional Python file structure and name conventions such as site-packages. On Fedora it looks like your packages will go in /usr/lib/Python2.7/site-packages, whether you install them with $ sudo yum install python-package or $ sudo pip-python package. Note: on Fedora pip is pip-python, unless you are using it from a virtualenv, then it's just pip. There's nothing in the /usr/local/lib folder on Fedora remotely related to Python, nor /usr/local/share. I didn't get a chance to look for pyshared or pymodules, which are both in /usr/lib on Ubuntu 11.10, but I did notice that there is a lib-dynload in /python2.7.

So what does this all mean? Well, I tried to install numpy with pip in a virtualenv on Fedora, and it still failed miserably (***), see Package Predicament, Part 2. I've research it a little, there are some old bug reports, and several SO questions, but no good answer. I did not try to install it in the base system, but I'm believe it would fail there, just like it did on my Oneiric Ocelot. I think it's a problem with pip, not virtualenv, and I guess the Numpy or egg. BTW: it also fails on Windows, no surprise; where's my libgcc? Sorry I don't have the complete MinGW installation, only MsysGit. But of course I can install it using the *.exe all-in-one isntaller from Numpy/Scipy website just fine.

[UPDATE: 2013-01-16] (***) Duh, I needed the dev packages, _obviously_. Do yum/apt-get/zypper install libatlas-dev, ... and make sure you have gfortran. Numpy, SciPy and Matplotlib all build fine with pip in a virtualenv, once you have the proper dev files. Now Windows is an entirely different story, but it can also be done.

[UPDATE: 2012-04-17] (**) You can seriously f*** up your sh** if you start messing around with your distro's version of distribute, distutils or setuptools. For example, Unity's Software Center depends on these to install packages. If you screw up your version of setuptools, Software Center won't even open. My advice is to (1) use virtualenv for any package that you need that differs from your repo's version. For example python-requests is version 0.5 in Oneiric Ocelot, but newest version of Requests is (as of now) 11.1, so you should create a virtualenv and install it there. (2) Only pip Python packages that are pure Python, that are not in your repo, and do not let them install dependencies that are already installed by your repo. For example, Requests now requires chardet >=1.0.0, which because Ubuntu's version of chardet is named oddly 2.0.1-2 causes pip to replace it with the exact same code. Probably fine, but not smooth. (3) If your desired package has dependencies that already exist in the repo, then use virtualenv and install them there. (4) Put a simlink to packages that require compiled code, such as Numpy in your virtualenv site-packages folder.
Fork me on GitHub