Monday, August 20, 2012

Python Primer

[UPDATED 2018-11-14] recommend Python 3, cross out old recommendation, link to Kenneth Reitz's recommendation and Python-2 countdown
[UPDATED 2016-08-04] add links to online training and references under learning curve
[UPDATED 2015-07-10] add links to Carl Kleffner's Anaconda/Binstar Repo
[UPDATED 2015-05-18] add Mac OSX section and site.USER_BASE usage
[UPDATED 2015-04-04] add tl;dr for Python-2.7.9-x64_scidata.zip
[UPDATED 2015-04-04] pip and setuptools are part of official Python since 2.7.9.
[UPDATED 2014-04-18] add anaconda distro
[UPDATED 2013-08-01] Setuptools has been updated and has merged with Distribute. 
[UPDATED 2013-04-18] I've added some better tutorial resources and the new Enthough Canopy distribution. I also removed the link URLs, sine they are embedded in the text, and rearranged the heading styles, since they were unreadable (yuk!). Also I couldn't resist linking xkcd antigravity post.

TL;DR

Can't be bothered? For any platform (Windows, Mac OSX and Linux) just download and install Anaconda Python by Continuum IO. No admin rights required! Start by opening the Anaconda Launcher to start Spyder a full featured editor. Anaconda already includes most of the packages you'll need for science, engineering, math and data analysis. That's it. Enjoy!

Getting Started

I put this together for my coworkers, but it could be applicable to anyone. Python is really easy to learn. Got Mac OSX? Then you already have Python-2.7.6. Got Windows? Just download the installer for your system. Some people prefer to install a Python distribution, which has stable versions of the most popular Python packages bundled together in a single installer, often together with a package manager and an interactive development environment (IDE). Official Python, Continuum Analytics Anaconda, WinPython, Enthought Canopy and ActivePython Community Edition offer installers for both 32-bit and 64-bit versions of Python on Windows and Macintosh. If you have a 32-bit computer you must use the 32-bit version, however, if you have a 64-bit version, you can use either. Using a 64-bit version allows you to access more than 2GB of RAM, but you may encounter some hurdles building packages that include extensions. The distributions have already compiled the packages for your version, but if you go with the official Python, then you will probably find yourself at Cristoph Gohlke's website. Mac OSX actually already has Python 2.X built in. IMHO do not install official Python on a Mac. Either use the Python version already installed or use Homebrew to install the desired version.

Python 2 or 3?

Only download 2.7.x, not 3.2.x, since no one uses Python 3 yet. Everything is Python 2. Some exaggeration here, but you get the point.

Python-2.7 will not be maintained past 2020. Please follow Kenneth Reitz's recommendations and use Python-3.6 for all new code.

Windows x64

Using a 64-bit version of Python will let you access more than 2GB of data which may be necessary for many large scientific or engineering analyses. Many packages are distributed for Windows-x64 as "wheels", but for those that are not, you can find them on Christoph Gohlke’s Python extensions page. Package distributions for Windows x64 often are marked as win_amd64. If a package is not distributed as wheel anywhere, then, in order for pip to build any C/C++ extensions in the package from source, you will need to install the Microsoft Visual C++ Compiler for Python-2.7, which is free and does not require administrative rights. See the sections on installing packages using pip and wheels below for more information.

The exception to this rule are packages like NumPy and SciPy that require compiling FORTRAN libraries like BLAS and LAPACK. These libraries are part of the Intel Math Kernel Library (MKL) which is what Professor Gohlke uses, however there are several open source versions such as OpenBLAS, GotoBLAS and ATLAS. Unfortunately, you will not be able to install/compile these packages from source using pip and MS VC compilers for Python-2.7. You must use binary wheels from either Professor Gohlke's Python Extensions website or you can get the from Carl Kleffner's Anaconda/Binstar PyPI Repo or his BitBucket downloads.

Mac OSX

Mac OSX already has Python-2.X installed. Mavericks and Yosemite both have Python-2.7, whereas older versions like Snow Leopard and Lion have Python-2.6. Mac OSX Python also has NumPy and Scipy already installed as well, although it might be quite old. Luckily, there is considerable support for Mac. For example there are precompiled NumPy wheels on PyPI and precompiled SciPy wheels as well. Mac OSX Python sets site.USER_BASE to ~/Library/Python/2.X a folder owned by you in your own profile, which allows you to install modules and packages without sudo using the --user scheme. First install pip from PyPI and add it to your path by running the following from a terminal ...

~ $ cd Downloads # into Downloads folder
~/Downloads $ curl -Ok https://pypi.python.org/packages/source/p/pip/pip-6.1.1.tar.gz # download pip package
~/Downloads $ python setup.py install --user # install into site.USER_BASE/lib/python/site-packages
~/Downloads $ cd # back to $HOME
~/ $ echo "# add PYTHONUSERBASE to path" >> .bash_profile
~/ $ echo 'export PATH=~/Library/Python/2.7/bin:${PATH}' >> .bash_profile # use single quotes to delay variable expansion

... now test pip by opening a new terminal and using pip to update setuptools ...

~/Downloads $ pip install --user -U setuptools # 

... so that packages can be installed into your local Python library with pip using the --user option as well. You don't need to add it to your PYTHONPATH, Python adds PYTHONUSERBASE by default. You will probably need to use Homebrew to install Qt, but for everything else pip will work fine. You will need to get XCode from Apple Developer Program to compile extensions. It should be free. Finally mark my words: Beware of sudo!. Don't ever use it on Mac OSX.

Learning Curve

Start by going through the official tutorial. It has some boring parts but it gets to the meat pretty quick. After that you will probably find yourself consulting the standard reference library often. Also Google has a great Python reference site. There are several super fun interactive sites like Philip Guo's python tutor and the ever popular Code Academy. For a more comprehensive treatment, there are two great online books: The first is Learning Python the Hard Way and the other is Kenneth Reitz's Python Guide. SciPy has a NumPy for MATLAB users primer in their wiki. For scientific computing, you can view/download the entire Primer on Scientific Programming with Python by Hans Petter Langtangen. Rockstar Fredrik Lundh provides great tips on eff-bot. Another rockstar Doug Hellman is also a resource. There are also about a million blogs on Python.

Here's a list of links to mostly free online Python tutorials in no particular order:

Getting Set Up

There are individual references for each Python packages (what Python toolboxes are called). You can find links to the references and download the packages from PyPI (aka "the Cheese Shop") or from their individual sites, often on SourceForge.net or Github. Note that pip will automatically download packages from the Cheese Shop first unless you specify a file. The key packages you must have to get started are in the following sections.

Update Setuptools and Pip

Use pip from a command prompt to install new packages. Since Python-2.7.9, pip and setuptools, a dependency, are bundled with official Python but you need to update them to the newest versions.

  1. Install Python first and make sure you select the option to add python.exe and the Scripts folder to your PATH environment variable.
  2. Open a Windows Command Prompt and type pip install -U setuptools to update setuptools to the latest version.
  3. To update pip on Windows you must use python.exe -m pip install -U pip. If you try to update pip using the script it will fail due because you will be denied access to the pip script itself since windows will be using it.

installing packages

There are some interesting, perhaps updated sections on Installing Python Modules in the official Python docs. Most modules are pure Python and can be simply installed using pip like this:

user@computer ~ $ pip install wheel

Usually pip finds the package at the Cheese Shop, downloads the package and its dependencies, and then installs them for you. If there are any issues, pip is nice enough to roll everything back, so using pip is the safest way to install packages. Setuptools also manages packages with the easy_install command, but I really don't recommend using it because it isn't nearly as nice as pip. Some packages are distributed as "wheels" because they have platform dependent binaries. You can also install these using pip after downloading the wheel. EG: Say you download NumPy-MKL from UC Irvine Professor Christoph Gohlke Python Extension Packages website; use the following to install it.

user@computer ~/downloads $ pip install numpy‑1.9.2+mkl‑cp27‑none‑win_amd64.whl

Many packages are simply distributed as a tarball or zip-file; pip installs these too, exactly the same way it installs wheels, although you may need the Microsoft Visual C++ Compiler for Python-2.7 for Windows or XCode for Macintosh if the source has any C/C++ extensions, which will be automatically compiled by pip. Very rarely you may need to edit setup configuration to point to shared libraries, which is well beyond the scope of this primer.

Wheels & eggs

Binary distributions of packages can be distributed as "wheels", "eggs" and Windows installers. You can install a wheel using pip, eggs can be installed using easy_install and just double click a Windows installer. Wheels although the newest type of package distribution, are definitely the preferred method. You don't have to install the wheel package to use wheels, but the wheel package can convert other distributions to wheels so it might be useful.

user@computer ~ $ pip install wheel

EG: Say you download pywin32-219.win-amd64-py2.7.exe from SourceForge then you can convert it to a wheel like this.

user@computer ~/downloads $ wheel convert pywin32-219.win-amd64-py2.7.exe

Numpy/Scipy

Mathematics and scientific libraries essential for numerical computing, engineering analysis and mathematics and scientific research. As discussed in the Windows x64 section these packages use FORTRAN libraries and therefore can't be compiled by MS VC Compilers for Python-27, therefore you must install binaries, either wheels or bdist_wininst distributions. Download individual binary installers from sourceforge for Windows (x86) and Mac OS X. For Windows (x64) download Numpy MKL on Christoph Gohlke's site and Scipy on Christoph Gohlke's site or from Carl Kleffner's Anaconda/Binstar PyPI repo.

pandas

Statistics and data analysis. Think Excel or R. Downloads for all platforms.

matplotlib

Matplotlib is a plotting library that makes beautiful graphics and will be very familiar to MATLAB users. There are downloads for Windows (x86), Windows (x64) and Mac OS.

IPython

Ipython is about as slick as an interpreter gets, with color coding, tab completion and lots of magic. It also can create notebooks using tornado, work in a terminal or a custom Qt shell. It requires Setuptools, PyZMQ, PyQt and Tornado, and on Windows also requires Pyreadline and PyWin32). Download installers from Github for Windows (x86), Windows (x64) and Mac OS X.

Dependencies:

Python Distributions

Also if you can’t be bothered to hunt down and install these separately, the you can install a distro. There are several major ones; here are the scientific/engineering/math ones. They vary in size, packages included, and level of customization. I don’t particularly recommend them, although the convenience may get you over the initial barrier to getting started, Python is so easy to use and learn that you will quickly be hampered by the limits imposed by the distros. Also you may find the user experience burdensome EG: having to start a launcher to access your Python apps will quickly become annoying. Also there may be bugs with user supported software that may not be quickly addressed because the smaller community is just a subset of the much larger Python community that address issues in the most frequently used Python source. In particular I have found Python (x,y) to particularly buggy. Finally, any proprietary software limited to a distribution may be a limit to deployment to other user akin to trying to deploy a MATLAB app. You have been warned.

Portable Python

WinPython and Portable Python are Python environments that run from a USB stick or CD/DVD. These are very cool and can aid in deploying Python apps as stand alone applications!

Development Environments

A good development environment (IDE) will have syntax highlighting, autocompletion and debugging built in.

  • Eclipse + Pydev is the best in my opinion, it comes with debugging, a console, git built in, autocomplete and indent, it is very similar to Visual Studio but in some ways way better. Eclipse is a generic IDE that can be used for nearly every programming language.
  • Spyder is a very nice Python specific IDE that will be very familiar to MATLAB users.  It has a built-in console, variable viewer/editor, help/documentation viewer and debugger! It comes stand alone or bundled with Python (x,y). The developers are very active and responsive.
  • PyCharms by JetBrains is a great IDE. I've tried it, and it's as good or better than eclipse. In fact many people prefer it, since its much smaller, focusing only on Python. JetBrains is also responsible for IntelliJ IDEA, the Java IDE that is super popular.
  • Sublime Text 2 is a pretty good lightweight IDE written completely in Python, the trial is free but to get rid of the occasional popup cost $60. This has become my personal favorite! It also does syntax highlighting for most major computer languages, and is extensible through plugins that can be managed easily with Package Control.
  • Notepad++ also has syntax highlighting for python as well as other languages.
  • Vim, a very lightweight but extremely popular terminal editor, also has syntax highlighting for python and other langauges, but it can be tricky to learn. Generally used from a terminal and standard in most POSIX environments, like MsysGit.
  • Geany is a lightweight multipurpose IDE.
  • Aptana Studio 3 is a preconfigured variant of eclipse + Pydev (I haven’t tried it)
  • Atom is a free editor from GitHub. It's available for all platforms. It is similar to Sublime Text 2/3, but it's free, more recently maintained and has many features.
  • Ninja-IDE, Ninja-IDE Is Not Just Another IDE, is the latest free newcomer.
  • Python Tools for Visual Studio (PTVS), is a mature free extension for Visual Studio 2013, 2012 and 2010. It works with both community edition and desktop express.
  • IDLE, part of standard Python, is a basic IDE with syntax highlighting, autocompletion, debugging and more.
  • LightTable is a free editor with syntax highlighting, some autocompletion, plugin management and and code evaluation. It looks promising and is the framework for Juno, the Julia editor.

Consoles, Terminals and Shells

A console is a place to enter command lines. Think windows CMD. It can be a convenient place to set up environmental variables such as custom paths. I recommend using ConEmu from Maximus5 GitHub Releases. Another excellent, yet older and apparently unmaintained terminal emulator is Console (v2.00) by Bozho available at Sourceforge.net.

Git Version Control

Version control is essential when writing computer code. Inevitably you will make mistakes and need to go back, you may need to work with others, you may have new ideas that you want to flesh out, or your computer will crash and take your hard drive with it. Theses are issues that version control solves. Git has emerged as the go to version control tool. For Windows msysgit is conveniently packaged with MSYS, a posix environment with many BASH and posix tools that compliments Python nicely. If you need a graphical client TortoiseGit will suffice.

Virtual Environments

I can't conscientiously write a Python primer without at least mentioning virtual environments. Occasionally a project may depend on specific Python packages, and due to threats, real or imagined, of backwards incompatibilities, a Python virtual environment is used to freeze the project dependencies. The virtualenv package accomplishes this task by creating a Python sandbox that contains only the specific packages desired. A virtual environment can also be used to test out development distributions, or other untested packages that you don't want to install into your system site-packages folder. In particular, Max OSX and Linux users may prefer to use virtual environments since they don't require root. There is also a very convenient wrapper called virtualenvwrapper that makes managing virtual environments a cinch. Anaconda users should use conda instead of pip and virtualenv.

1 comment:

  1. Daddy: The continuum.io link should point to https://www.anaconda.com/products/individual

    ReplyDelete

Fork me on GitHub