Tuesday, January 28, 2014

setuptools detritus

Pre pip-1.5.1, you used to have to install setuptools prior to Pip. Especially after the distribute/setuptools merger, this potentially left you with a mishmash of setuptools and distribute folders and egg files, as easy-install and pip install packages differently.

If you used the recommended ez-setup.py provided by setuptools, this would always download a tarball or zipfile in whatever directory ez-setup.py was in, probably downloads, but in the case of packages bundled with it, it could be anywhere, never to be deleted, completely undetected by you, and if you never installed pip, and never found the easy way to update your packages, then everytime you used easy-install or distutils (and probably pip too, since ez-setup.py specifies the version of setuptools required before setup.py runs) to install a package bundled with setuptools' ez-setup.py it would leave another piece of detritus.

Then mix in the pip way of doing things, which is indifferent to egg-files. If you use pip to update setuptools, (pip install -U setuptools) which is a great way, because once you have cleaned up all of the random pth and egg leftovers, pip always takes care of cleaning up its own mess. However as mentioned previously, unless that mess is an egg file. Pip doesn't touch these files, which leads to two installs of setuptools, an egg file, which for those wondering is like a wheel, or a zip file, essentially a binary, platform specific installation, and the pip way, which is an uncompressed folder with python source code, byte compiled code and any extensions, similar to what distutils would do, if it weren't monkeypatched.

Well no longer. Pip now takes care of everything. So start from scratch, delete all instances of easy-install, setuptools(, distribute if you still have it) and sure what the heck pip, everywhere, in scripts(/bin [1]) and lib. Then use get-pip.py to install both at the same time. Then periodically update setuptools using pip install -U setuptools. Ah. All better.

[1] Note: that this is really only a Windows or Mac issue, not Linux, because Python packages are included in each distro's repository. It can be an issue for virtualenvs anywhere, even Linux, so it's good to understand in principle. On a Linux share, without root access, you might have this issue with packages installed in .local. On Mac if you are using an official binary installation of Python, then you will find your scripts in /Library/Frameworks/Python.framework/Versions/2.7/bin. In a virtualenv you should find scripts in .virtualenvs/name-of-venv/bin and .virtualenvs\name-of-venv\scripts for Mac/Linux and Windows systems respectively. For non-root local installs on Linux shares the scripts folder is .local/bin.

Trouble updating to virtualenvwrapper 4.2?

I must be the only one, because I haven't found anything online, but using pip, easy-install or distutils fails, even after cleaning up all of my random setuptools pth and egg leftovers, unless I install pbr first. The traceback I get is something like "file not found, warning LoadManifest -c could not find standard file". Doug Hellman's pbr package is supposed to be installed as part of either stevedore or virtualenvwrapper, but something is not working somewhere. No time to troubleshoot, but workaround is just as easy.

git subtree merge strategy in your everyday life

Here is yet another post on the super useful git feature called the subtree merge strategy. Not to be confused with the newish git-subtree command, which uses this strategy, but may or may not be completely ready for general consumption. Imagine a use case where you have a package, for example hg-fast-export, that you want to use in your project. You want to deploy it with your project, so you consider using git submodule. An alternative would be to use the git subtree merge strategy. There are numerous sites that explain the pros and cons of subtree merge vs submodules.
~/myproj (master)$ git remote add hg-fast-export https://github.com/frej/fast-export.git
~/myproj (master)$ git fetch hg-fast-export master
~/myproj (master)$ git read-tree --prefix=hg-fast-export/ -u remotes/hg-fast-export/master
~/myproj (master)$ git pull --squash -s subtree --no-commit hg-fast-export master
~/myproj (master)$ git pull --squash -s recursive -Xsubtree=hg-fast-export/ --no-commit hg-fast-export master
~/myproj (master)$ git diff-tree hg-fast-export master
  1. Add the remote that hosts the sub-package.
  2. Fetch the branch from the remote that you want to use in your project.
  3. Read the remote branch into its own folder tree in the working copy. The name of the new directory is provided by the prefix if you append a slash at the end.
  4. Merge updates from the sub-package into the hg-fast-export tree, squash them all into a single commit, but don't auto-commit. This gives you a chance to check the merge, before commiting. Merge prepopulates the message with a summary of the commit messages from the sub-package's remote branch.
  5. Explicitly specify trees to merge if Git subtree merge strategy can't determine by itself.
  6. Get diff between two trees.
I think if you want to push from your sub-package back to the remote, you would want to create an orphan branch, and push from there. This is covered in the alternate posts linked to from here.

Pushing to gh-pages for Github pages is a perfect use of subtree merge strategy - I provide an example in this SO answer.
Fork me on GitHub