Wednesday, August 22, 2012

Use deep copy to make cheap copies of objects

MATLAB

There is a feature in MATLAB called "cheap copies" that makes expanding an array of different objects of the same class a snap. When I say different objects I mean that in the case of handle class objects, each object references a unique object, so that a change in one of its properties affects only it, and is not simultaneously changed in all of the other objects. That's what would happen if all of the objects were really references to the exact same object.

For example assume you have the following MATLAB handle class.
classdef MyClass < handle
    properties
        a
    end
    methods
        function obj = MyClass(a)
            obj.a = a;
        end
    end
end
Then ...
myClassArray(3) = MyClass(5)
...creates an array of three MyClass objects which are unique.

Python

In Python the way to do this is with deepcopy from the builtin copy package.

For example assume you have the following Python class.
from copy import deepcopy

class MyClass(object):
    def __init__(self, a=5):
        self.a = a

# make 3 copies, but all point to the same instance

myClassArray = [MyClass(a=5)] * 3

# now make deep copies of each object in the array except the first
# one, so that they are now all unique instances

myClassArray[1:] = [deepcopy(dummy) for dummy in myClassArray[1:]]
Enjoy! For objects that are computationally expensive, this is a very fast way to build a bunch of identical but unique "copies" of objects of the same class.

Monday, August 20, 2012

Python Primer

[UPDATED 2018-11-14] recommend Python 3, cross out old recommendation, link to Kenneth Reitz's recommendation and Python-2 countdown
[UPDATED 2016-08-04] add links to online training and references under learning curve
[UPDATED 2015-07-10] add links to Carl Kleffner's Anaconda/Binstar Repo
[UPDATED 2015-05-18] add Mac OSX section and site.USER_BASE usage
[UPDATED 2015-04-04] add tl;dr for Python-2.7.9-x64_scidata.zip
[UPDATED 2015-04-04] pip and setuptools are part of official Python since 2.7.9.
[UPDATED 2014-04-18] add anaconda distro
[UPDATED 2013-08-01] Setuptools has been updated and has merged with Distribute. 
[UPDATED 2013-04-18] I've added some better tutorial resources and the new Enthough Canopy distribution. I also removed the link URLs, sine they are embedded in the text, and rearranged the heading styles, since they were unreadable (yuk!). Also I couldn't resist linking xkcd antigravity post.

TL;DR

Can't be bothered? For any platform (Windows, Mac OSX and Linux) just download and install Anaconda Python by Continuum IO. No admin rights required! Start by opening the Anaconda Launcher to start Spyder a full featured editor. Anaconda already includes most of the packages you'll need for science, engineering, math and data analysis. That's it. Enjoy!

Getting Started

I put this together for my coworkers, but it could be applicable to anyone. Python is really easy to learn. Got Mac OSX? Then you already have Python-2.7.6. Got Windows? Just download the installer for your system. Some people prefer to install a Python distribution, which has stable versions of the most popular Python packages bundled together in a single installer, often together with a package manager and an interactive development environment (IDE). Official Python, Continuum Analytics Anaconda, WinPython, Enthought Canopy and ActivePython Community Edition offer installers for both 32-bit and 64-bit versions of Python on Windows and Macintosh. If you have a 32-bit computer you must use the 32-bit version, however, if you have a 64-bit version, you can use either. Using a 64-bit version allows you to access more than 2GB of RAM, but you may encounter some hurdles building packages that include extensions. The distributions have already compiled the packages for your version, but if you go with the official Python, then you will probably find yourself at Cristoph Gohlke's website. Mac OSX actually already has Python 2.X built in. IMHO do not install official Python on a Mac. Either use the Python version already installed or use Homebrew to install the desired version.

Python 2 or 3?

Only download 2.7.x, not 3.2.x, since no one uses Python 3 yet. Everything is Python 2. Some exaggeration here, but you get the point.

Python-2.7 will not be maintained past 2020. Please follow Kenneth Reitz's recommendations and use Python-3.6 for all new code.

Windows x64

Using a 64-bit version of Python will let you access more than 2GB of data which may be necessary for many large scientific or engineering analyses. Many packages are distributed for Windows-x64 as "wheels", but for those that are not, you can find them on Christoph Gohlke’s Python extensions page. Package distributions for Windows x64 often are marked as win_amd64. If a package is not distributed as wheel anywhere, then, in order for pip to build any C/C++ extensions in the package from source, you will need to install the Microsoft Visual C++ Compiler for Python-2.7, which is free and does not require administrative rights. See the sections on installing packages using pip and wheels below for more information.

The exception to this rule are packages like NumPy and SciPy that require compiling FORTRAN libraries like BLAS and LAPACK. These libraries are part of the Intel Math Kernel Library (MKL) which is what Professor Gohlke uses, however there are several open source versions such as OpenBLAS, GotoBLAS and ATLAS. Unfortunately, you will not be able to install/compile these packages from source using pip and MS VC compilers for Python-2.7. You must use binary wheels from either Professor Gohlke's Python Extensions website or you can get the from Carl Kleffner's Anaconda/Binstar PyPI Repo or his BitBucket downloads.

Mac OSX

Mac OSX already has Python-2.X installed. Mavericks and Yosemite both have Python-2.7, whereas older versions like Snow Leopard and Lion have Python-2.6. Mac OSX Python also has NumPy and Scipy already installed as well, although it might be quite old. Luckily, there is considerable support for Mac. For example there are precompiled NumPy wheels on PyPI and precompiled SciPy wheels as well. Mac OSX Python sets site.USER_BASE to ~/Library/Python/2.X a folder owned by you in your own profile, which allows you to install modules and packages without sudo using the --user scheme. First install pip from PyPI and add it to your path by running the following from a terminal ...

~ $ cd Downloads # into Downloads folder
~/Downloads $ curl -Ok https://pypi.python.org/packages/source/p/pip/pip-6.1.1.tar.gz # download pip package
~/Downloads $ python setup.py install --user # install into site.USER_BASE/lib/python/site-packages
~/Downloads $ cd # back to $HOME
~/ $ echo "# add PYTHONUSERBASE to path" >> .bash_profile
~/ $ echo 'export PATH=~/Library/Python/2.7/bin:${PATH}' >> .bash_profile # use single quotes to delay variable expansion

... now test pip by opening a new terminal and using pip to update setuptools ...

~/Downloads $ pip install --user -U setuptools # 

... so that packages can be installed into your local Python library with pip using the --user option as well. You don't need to add it to your PYTHONPATH, Python adds PYTHONUSERBASE by default. You will probably need to use Homebrew to install Qt, but for everything else pip will work fine. You will need to get XCode from Apple Developer Program to compile extensions. It should be free. Finally mark my words: Beware of sudo!. Don't ever use it on Mac OSX.

Learning Curve

Start by going through the official tutorial. It has some boring parts but it gets to the meat pretty quick. After that you will probably find yourself consulting the standard reference library often. Also Google has a great Python reference site. There are several super fun interactive sites like Philip Guo's python tutor and the ever popular Code Academy. For a more comprehensive treatment, there are two great online books: The first is Learning Python the Hard Way and the other is Kenneth Reitz's Python Guide. SciPy has a NumPy for MATLAB users primer in their wiki. For scientific computing, you can view/download the entire Primer on Scientific Programming with Python by Hans Petter Langtangen. Rockstar Fredrik Lundh provides great tips on eff-bot. Another rockstar Doug Hellman is also a resource. There are also about a million blogs on Python.

Here's a list of links to mostly free online Python tutorials in no particular order:

Getting Set Up

There are individual references for each Python packages (what Python toolboxes are called). You can find links to the references and download the packages from PyPI (aka "the Cheese Shop") or from their individual sites, often on SourceForge.net or Github. Note that pip will automatically download packages from the Cheese Shop first unless you specify a file. The key packages you must have to get started are in the following sections.

Update Setuptools and Pip

Use pip from a command prompt to install new packages. Since Python-2.7.9, pip and setuptools, a dependency, are bundled with official Python but you need to update them to the newest versions.

  1. Install Python first and make sure you select the option to add python.exe and the Scripts folder to your PATH environment variable.
  2. Open a Windows Command Prompt and type pip install -U setuptools to update setuptools to the latest version.
  3. To update pip on Windows you must use python.exe -m pip install -U pip. If you try to update pip using the script it will fail due because you will be denied access to the pip script itself since windows will be using it.

installing packages

There are some interesting, perhaps updated sections on Installing Python Modules in the official Python docs. Most modules are pure Python and can be simply installed using pip like this:

user@computer ~ $ pip install wheel

Usually pip finds the package at the Cheese Shop, downloads the package and its dependencies, and then installs them for you. If there are any issues, pip is nice enough to roll everything back, so using pip is the safest way to install packages. Setuptools also manages packages with the easy_install command, but I really don't recommend using it because it isn't nearly as nice as pip. Some packages are distributed as "wheels" because they have platform dependent binaries. You can also install these using pip after downloading the wheel. EG: Say you download NumPy-MKL from UC Irvine Professor Christoph Gohlke Python Extension Packages website; use the following to install it.

user@computer ~/downloads $ pip install numpy‑1.9.2+mkl‑cp27‑none‑win_amd64.whl

Many packages are simply distributed as a tarball or zip-file; pip installs these too, exactly the same way it installs wheels, although you may need the Microsoft Visual C++ Compiler for Python-2.7 for Windows or XCode for Macintosh if the source has any C/C++ extensions, which will be automatically compiled by pip. Very rarely you may need to edit setup configuration to point to shared libraries, which is well beyond the scope of this primer.

Wheels & eggs

Binary distributions of packages can be distributed as "wheels", "eggs" and Windows installers. You can install a wheel using pip, eggs can be installed using easy_install and just double click a Windows installer. Wheels although the newest type of package distribution, are definitely the preferred method. You don't have to install the wheel package to use wheels, but the wheel package can convert other distributions to wheels so it might be useful.

user@computer ~ $ pip install wheel

EG: Say you download pywin32-219.win-amd64-py2.7.exe from SourceForge then you can convert it to a wheel like this.

user@computer ~/downloads $ wheel convert pywin32-219.win-amd64-py2.7.exe

Numpy/Scipy

Mathematics and scientific libraries essential for numerical computing, engineering analysis and mathematics and scientific research. As discussed in the Windows x64 section these packages use FORTRAN libraries and therefore can't be compiled by MS VC Compilers for Python-27, therefore you must install binaries, either wheels or bdist_wininst distributions. Download individual binary installers from sourceforge for Windows (x86) and Mac OS X. For Windows (x64) download Numpy MKL on Christoph Gohlke's site and Scipy on Christoph Gohlke's site or from Carl Kleffner's Anaconda/Binstar PyPI repo.

pandas

Statistics and data analysis. Think Excel or R. Downloads for all platforms.

matplotlib

Matplotlib is a plotting library that makes beautiful graphics and will be very familiar to MATLAB users. There are downloads for Windows (x86), Windows (x64) and Mac OS.

IPython

Ipython is about as slick as an interpreter gets, with color coding, tab completion and lots of magic. It also can create notebooks using tornado, work in a terminal or a custom Qt shell. It requires Setuptools, PyZMQ, PyQt and Tornado, and on Windows also requires Pyreadline and PyWin32). Download installers from Github for Windows (x86), Windows (x64) and Mac OS X.

Dependencies:

Python Distributions

Also if you can’t be bothered to hunt down and install these separately, the you can install a distro. There are several major ones; here are the scientific/engineering/math ones. They vary in size, packages included, and level of customization. I don’t particularly recommend them, although the convenience may get you over the initial barrier to getting started, Python is so easy to use and learn that you will quickly be hampered by the limits imposed by the distros. Also you may find the user experience burdensome EG: having to start a launcher to access your Python apps will quickly become annoying. Also there may be bugs with user supported software that may not be quickly addressed because the smaller community is just a subset of the much larger Python community that address issues in the most frequently used Python source. In particular I have found Python (x,y) to particularly buggy. Finally, any proprietary software limited to a distribution may be a limit to deployment to other user akin to trying to deploy a MATLAB app. You have been warned.

Portable Python

WinPython and Portable Python are Python environments that run from a USB stick or CD/DVD. These are very cool and can aid in deploying Python apps as stand alone applications!

Development Environments

A good development environment (IDE) will have syntax highlighting, autocompletion and debugging built in.

  • Eclipse + Pydev is the best in my opinion, it comes with debugging, a console, git built in, autocomplete and indent, it is very similar to Visual Studio but in some ways way better. Eclipse is a generic IDE that can be used for nearly every programming language.
  • Spyder is a very nice Python specific IDE that will be very familiar to MATLAB users.  It has a built-in console, variable viewer/editor, help/documentation viewer and debugger! It comes stand alone or bundled with Python (x,y). The developers are very active and responsive.
  • PyCharms by JetBrains is a great IDE. I've tried it, and it's as good or better than eclipse. In fact many people prefer it, since its much smaller, focusing only on Python. JetBrains is also responsible for IntelliJ IDEA, the Java IDE that is super popular.
  • Sublime Text 2 is a pretty good lightweight IDE written completely in Python, the trial is free but to get rid of the occasional popup cost $60. This has become my personal favorite! It also does syntax highlighting for most major computer languages, and is extensible through plugins that can be managed easily with Package Control.
  • Notepad++ also has syntax highlighting for python as well as other languages.
  • Vim, a very lightweight but extremely popular terminal editor, also has syntax highlighting for python and other langauges, but it can be tricky to learn. Generally used from a terminal and standard in most POSIX environments, like MsysGit.
  • Geany is a lightweight multipurpose IDE.
  • Aptana Studio 3 is a preconfigured variant of eclipse + Pydev (I haven’t tried it)
  • Atom is a free editor from GitHub. It's available for all platforms. It is similar to Sublime Text 2/3, but it's free, more recently maintained and has many features.
  • Ninja-IDE, Ninja-IDE Is Not Just Another IDE, is the latest free newcomer.
  • Python Tools for Visual Studio (PTVS), is a mature free extension for Visual Studio 2013, 2012 and 2010. It works with both community edition and desktop express.
  • IDLE, part of standard Python, is a basic IDE with syntax highlighting, autocompletion, debugging and more.
  • LightTable is a free editor with syntax highlighting, some autocompletion, plugin management and and code evaluation. It looks promising and is the framework for Juno, the Julia editor.

Consoles, Terminals and Shells

A console is a place to enter command lines. Think windows CMD. It can be a convenient place to set up environmental variables such as custom paths. I recommend using ConEmu from Maximus5 GitHub Releases. Another excellent, yet older and apparently unmaintained terminal emulator is Console (v2.00) by Bozho available at Sourceforge.net.

Git Version Control

Version control is essential when writing computer code. Inevitably you will make mistakes and need to go back, you may need to work with others, you may have new ideas that you want to flesh out, or your computer will crash and take your hard drive with it. Theses are issues that version control solves. Git has emerged as the go to version control tool. For Windows msysgit is conveniently packaged with MSYS, a posix environment with many BASH and posix tools that compliments Python nicely. If you need a graphical client TortoiseGit will suffice.

Virtual Environments

I can't conscientiously write a Python primer without at least mentioning virtual environments. Occasionally a project may depend on specific Python packages, and due to threats, real or imagined, of backwards incompatibilities, a Python virtual environment is used to freeze the project dependencies. The virtualenv package accomplishes this task by creating a Python sandbox that contains only the specific packages desired. A virtual environment can also be used to test out development distributions, or other untested packages that you don't want to install into your system site-packages folder. In particular, Max OSX and Linux users may prefer to use virtual environments since they don't require root. There is also a very convenient wrapper called virtualenvwrapper that makes managing virtual environments a cinch. Anaconda users should use conda instead of pip and virtualenv.

Thursday, August 16, 2012

Cygwin, MinGW, MSYS, GnuWin32, GNU Utilities for Win32 and now MinGW-w64

Introduction

GNU on Windows has a sordid past. Not really, but sordid sounds so much more interesting than complicated. So here's how I understand it, without any emotion, this is really just a navigational exercise.

Disclaimer

What was just said and what I am about to say is entirely opinion and not based on fact. I hope that I do not offend anyone, but I am sure that I inevitably will, as much of my opinion is based on hearsay. In that case I apologize, and please remember that the purpose of this blog is purely selfish; it is a note to myself to remind me of what took me so long to finally understand. So instead of getting angry with me, perhaps post a comment and disabuse me of my still as yet incomplete knowledge.

Cygwin

In the beginning there was Cygwin. Their tagline is "Get that Linux feeling - on Windows!" This does a lot to explain exactly what Cygwin is, Linux emulated on Windows. What this means in practice is that Cygwin applications use the cygwin.dll, a common runtime that must be linked to any applications that run on Cygwin, and consequently any applicaton compiled with Cygwin gcc. So essentially, when you use Cygwin you are more or less stuck in Cygwin. True, you could distribute the cygwin.dll with your application, so it would appear to be native, but it wouldn't be truly native. By native here I mean that it runs on one of the windows runtimes like the win32api or msvcrt. The Windows binary of GNU nano is an example of an application compiled using Cygwin that requires cygwin.dll to run. The probable downside of using this layer between the Windows API and your application may be potentially slower speed and some limits in its features.

MinGW

From Cygwin was born MinGW. I don't know if this is actually true, but that is what I've heard. Either way it doesn't really matter. The essential thing to realize here is this.
MinGW creates native applications for Windows using native libraries.
This means exactly what it sounds like, applications compiled using MinGW's compilers (gcc, g++, gfortran, etc.) will run on any Windows machine natively without any runtime other than the Windows API or mscvrt. Now it might be fun to speculate about some internal strife or a passionate drive of a group of individuals with some bold ideal, but that's irrelevant to this fundamental difference between the Cygwin and MinGW.

MSYS

MSYS may seem like a variation on Cygwin, except it is really meant to be used as an environment for running shell scripts, similar to bash (the Bourne Again SHell). It's true that if you compile an application using the MSYS version of gcc, then you will need to link it to the msys-1.0.dll, so in a sense it is the same as Cygwin. The difference is really in a state of mind. MSYS is presumably part of MinGW, which stands for Minimalist GNU for Windows. What that translates to is that MinGW and MSYS contain only the most essential tools required for developing native applications for Windows using the GNU open source suite of compilers, whereas Cygwin aims to provide Windows users every application available to Linux. In Cygwin you will find Python, GTK, nano, Ruby, Git, even some games I think, whereas in MinGW you will only find libraries most often used in common code. However there is a huge chunk of open source code that is compiled using autotools and depending on shell scripting, which I think is where MSYS gets involved. MSYS makes it easier to do this. True autotools, make and shell script functions have been ported as native win32 applications, but since they are only used as temporary tools to get to the finished product (a natively compiled win32 application) and the results don't depend on any of those temporary tools, why go through the effort of duplicating what Cygwin has already done so nicely?

GnuWin32 and GNU Utilities for Windows

If you were just reading the section above, then continuing on, GnuWin32 and the now defunct GNU Utilities for Windows seem to do precisely that. Both of these generally utilize MinGW to port GNU applications that were native to Linux or Cygwin to run natively on Windows. And in fact many open source projects now try to configure options for some Windows compiler, be it MinGW, MSVC or BCC (really, I haven't seen this too often although apparently it is still out there, wow they even have Delphi!). Some projects are optionally opting for CMake and alternative to Autotools. OK, so the key distinction here is that these are both compiled using MinGW and are basically an alternate to MSYS, if you prefer to continue to use the native Windows CMD console or something like Powershell.

MinGW-w64

MinGW-w64 is a fork of MinGW that allows you to cross compile code from one machine to another. For example, say you want to make 64-bit code, but you are on a 32-bit machine or a Linux box, or using Cygwin, then you would use MinGW-w64. The "w64" comes from it's ability to compile 64-bit code, which is not currently available (AFAIK) with MinGW (sometimes called mingw32). There are alternate distributions of both MinGW and MinGW-w64 provided by TDM. Also there are other cross-compilers, such as mxe.cc, but it is only for *nix (Linux/Unix). Coincidentally MinGW are MinGW-w64 both available as cross-compilers in most Linux distro repositories. Ubuntu, Fedora and Suse all have it. MacPorts, Fink and Homebrew may have them too. Same probably goes for mxe.cc. And who knows there are probably more cross-compilers out there as well.

Epilogue

Ah. So glad to get that all down, finally. So the moral of the story is this.
If you want to use native Windows apps compiled using GNU gcc use MinGW, but do not polute your toolchain with any MSYS or Cygwin libraries or you will be sorry.

Saturday, August 11, 2012

Building numpy, scipy, matplotlib and PIL in virtualev on both windows and linux

This really should be split onto several posts. And I want to send a patch to matplotlib to make it easier to build in windows.
  1. You need several dev files, most of which come standard on Linux and mingw-users. Still for PIL you new Tk.h and tcl.h, zlib, libjpg and ... Note on windows, you need the same version of Tck/Tk as in Python, which is probably 8.5.2. Look at `init.tcl` in `C:\Python27\tcl\tcl8.5` and it will tell you exactly which package is required.
  2. On Linux, look for the corresponding dev packages, which may not be installed. On Ubuntu they are in the Ubuntu Software Center. In particular you will want the ATLAS, BLAS and LAPACK dev packages for Numpy and Scipy.
  3. On Windows, in some cases, I untarred/unzipped the downloads to edit the setup.py or other relevant file, then zipped it back up and used pip. A few packages took command line options which I passed through pip using --install-options=" --my-option='blah blah blah' " as an example.
  4. I added a pydistutils.cfg file to my home directory with [build] compiler=mingw32 which worked well for pip except with pyzmq, which I had to edit, because it makes its own compiler. So I edited it to make a mingw32 compiler, instead of the default which on Windows is msvcr90. In general I had to edit cygwinccompiler in distutils to remove the "-mno-cygwin" gcc option for the Mingw32Ccompiler class, which is no longer accepted for most gcc versions.
  5. On Linux this was all very easy, as long as I had the correct dev packages, but be prepared on Windows, this took a lot of work.
[POSTSCRIPT 2014-08-01]
This post is seriously out of date, and it was a bit vague to begin with. I don't remember what steps I took to get these tools running on windows anymore, and a lot has changed, most significantly I have switched to 64-bit, I have found that more python packages can be built with microsoft compilers and openBLAS has been released which greatly simplifies building these packages as BLAS and LAPACK are already compiled and optimized.
  • Consider using Windows SDK-7 for Python-2.7 after fixing vcvarsall.bat instead of mingw32. Windows SDK-7 replaces VC90 (aka V90) and comes with both 32-bit ad 64-bit compilers.
  • MinGW only works on 32-bit Windows systems so an alternative I started using recently is Win-Builds which uses mingw-w64 compilers which can cross compile both 32-bit and 64-bit applications. Win-Builds works in MSYS, Cygwin or the native CMD shells, although I don't know how you would use autotools in a Windows CMD shell.
  • Unfortunately win-builds doesn't have gfortran, so either stick with mingw-w64 or build it, ugh!
  • Try using one of the new openBLAS binaries, maybe everything can be done with VC90 instead of using gcc.
Fork me on GitHub