Friday, May 31, 2013

>>> print 'Reading %(Lolita)s in %(Tehran)s' % {'Lolita': 'Excel', 'Tehran': 'Python'}

[UPDATE 2013-06-14] The google group python-excel is a great source of information on this topic. That's where I discovered xlsxWriter and PyXLL. I also searched the PyPI database with keyword: "xlsx", and I got several more hits, which I've added below in addition to these two. Really thought I'd already had checked PyPI - must be losing my mind; need more sleep.
>>> print 'Reading %(Lolita)s in %(Tehran)s' % {'Lolita': 'Excel', 'Tehran': 'Python'}
Reading 
Excel in Python
That was really a stretch. It's actually not funny or witty at all. In fact now it's just terribly awkward. Although it would of made a little sense if Excel and Python were switched, IE: Reading Python in Excel right, but that is not exactly what we mean. Now I'm just makein it worse.

Main Contenders

Moving on, ever tried to read an Excel file into Python? Numpy only supports csv, so you Google and up pop the main contenders:

Python ← Excel

Python → Excel

  • DataNitro, formerly IronSpread, is an excel plugin that replaces allows Python to be used for add-ins and macros in lieu of VBA, proprietary commercial/personal licensing. I didn't review this one, but I plan to revisit it later.
  • PyXLL, version 2.0.3, also lets you use Python to make Excel add-ins using decorators.

openpxl

This project is for Excel 2007 and newer, it is a port of ExcelPHP and is super easy to use, made even easier by its Sphinx docs at read-the-docs. For some weird reason, pypi and bitbucket both point to old (v-1.5.7) documentation @ pythonhosted, but don't be fooled; get the latest which is much expanded!
$ pip install openpyxl
works fine for any platform since there are no extensions, yay!
$ python
>>> from openpyxl import load_workbook
>>> big_wb = openpyxl('C:\path\to\spreadsheet.xlsx')
seems too take awhile and there doesn't seem any logging or progress available. So use the lazy approach by setting use_iterators=True, which is much faster!
>>> big_wb_fast = openpyxl('C:\path\to\spreadsheet.xlsx', use_iterators=True)
This approach uses an optimized reader, but the workbook loaded is read only.

There is a google group, issue tracking, a blog and an old blog.

xlrd

This project covers all Excel spreadsheets, don't be mislead by the 0 major version, this is a relatively mature project. It is part of a triad of Excel targeted packages, including xlwt and xlutils for writing and other basic utilities. The documentation is in one giant html page, which has a lot of info, but is frankly more challenging to read than the nice Sphinx documentation of openpyxl. There is also a tutorial in the form of a pdf.

Install all 3 packages using pip works fine as they are all pure Python. Then in an interpreter ...
>>> from xlrd import open_workbook
>>> import sys
>>> big_wb_xlrd = open_workbook('C:\path\to\spreadsheet.xlsx'',on_demand=True, logfile=sys.stdout, verbosity=1)
This is where better documentation might help as there is no indication what the different levels of verbosity are, although maybe it's obvious. False, 0 or None means none, True or 1 gives you lite feedback, just right, but verbosity=2 gives you a deluge of information - probably best to output this to a file instead of STDOUT.

Now here comes the sad part. If you are using Excel 2007 or newer, on_demand is not supported. If you have logging enabled and verbosity>0, then you will see this message:
WARNING *** on_demand=True not yet implemented; falling back to False
Boo Hoo! Because on_demand is the xlrd equivalent of use_iterators or the openpyxl optimized reader which enhanced speed so much. I will say, that xlrd read a large file slightly faster than openpyxl with use_iterators=False, but definitely slower than the optimized reader.

Conclusion

  • For xls files, use xlrd with on_demand=True .
  • For xlsx files, use openpyxl with use_iterators=True.
  • IMHO openpyxl could be improved with logging to give feedback and xlrd/xlwt could use some better documentation.
In my next post, Round II, I play with reading in actual cell contents. Let's see if openpyxl stays in front.

py4a site-packages revisited without root

Adding pure-python packages is a snap.
  1. add the following file
    import site
    SITEPKGS = "/mnt/sdcard/com.googlecode.pythonforandroid/extras/python/site-packages"
    site.addsitedir(SITEPKGS)

    saved as
    sitecustomize.py
    to your py4a folder, probably
    /mnt/sdcard/com.googlecode.pythonforandroid/extras/python/
    although you may find your external (or internal) storage at either
    /storage/sdcard0 or /storage/sdcard1change the SITEPKGS external storage mount too!
    This file should add your site-packages folder to your python path. test it by running the interpreter, import sys, and checking sys.path. do you see it now?
  2. download the source tarball
  3. untar the source
  4. find the folder that contains the package you want to import and copy it to your site-packages folder
  5. try to import it and then try to use it. does it work?
I've done this successfully with both dulwich, which has pure python versions of all C extensions, and paramiko which requires pycrypto which you can download and install via py4a.

Thursday, May 30, 2013

Bugs Everywhere

[UPDATED 2013-05-30]

This is pretty nifty tool. It still needs some work, but I like how lightweight it is. Perfect for local repositories.
I built this in a virtualenv, with site packages by downloading the tarball and using pip. I installed the dependencies in my base Python install.
Jinja-1.2 can use vcvarsall.bat to speed up, so make sure you have VS2008C++ installed. Python will find it automatically (it looks for VC9 - Visual C++ 9.0). MinGW gcc will also do, but not VS2010 (or I assume VS2005). You will need to edit distutil.cfg in /lib/distutil to use mingw gcc, link, but VC9 works automatically with 2.7. If you are on an x64 Windows machine see building Python extensions for x64 with pip. PyYaml also wants to build c-code. I used the windows installer in my base Python installation. If you can't build it using Windows SDK-7 check Christoph Gohlke's site.

Check out this post to see how to enable an editor when using bugs-everywhere with msysgit.

Thursday, May 23, 2013

MATLAB/Python SQL

The players:

MATLAB

  1. MySQL Database Connector [2005-10-10] by Robert Almgren on FEX with additional information at his site at the NYU Courant Institute of Math Sciences.
  2. MySQL Connector: mYm [v1.36 2010-05-19] by Yannick Maret with many additional files by Dimitri Shvorob
    Access a MySQL database
    A Powerful Sidekick: Using MySQL for High-Volume Data Manipulation in Matlab
    MySQL shortcut files by Carrie Wall
  3. queryMySQL [2012-06-22] by Jonathan Karr
  4. Myblob [2006-01-09] by Joerg Buchholz
  5. java.sql why not use native Java? Always the right solution with MATLAB right?
  6. .NET SqlConnection class
Python

Tkinter Taylor Solder Spy - Hidden Tk Python Modules

Why is tkFileDialog not listed anywhere in the Python reference? On windows 7 it looks like this:
tkFileDialog.askopenfilename()
tkFileDialog.askopenfilename()
If you were using Tk, which I guess nobody is, why wouldn't you want to know about this? If youhit tab after typing tk in an ipython interpreter, you should see all of these tk dialogs.
tkColorChooser tkCommonDialog tkFileDialog   tkFont         tkMessageBox   tkSimpleDialog
There are also these modules
Tkconstants Tkdnd
but Tkconstant is already imported by Tkinter, and Tkdnd is experimental drag-n-drop, not sure how old it is or if it was every re-categorized as stable. Anyway, the tk modules are all listed on Lundh's effbot and pythonware sites, upythonic wiki, a New Mexico Computer Science website and epydoc's sourceforge Python stdlib directory.

Tuesday, May 21, 2013

Tix, ttk and Tkinter

Tkinter is the graphical package based on Tk that ships standard with most Python builds. But Python 2.7 also has ttk and Tix, which leads to some confusion. Right away this is partially cleared up in the reference for ttk, which describes it as an extension to Tkinter that provides missing widgets. But along comes Tix, which says more or less the same thing. Even more confusing, is that we are encouraged to import ttk as follows:

    >>> from Tkinter import *
    >>> from ttk import *

so that those widgets replaced by ttk will be imported over the older ones in Tkinter.

These include:

WidgetTkinterttk
ButtonXX
CanvasX
CheckbuttonXX
ComboboxX
EntryXX
FrameXX
LabelXX
LabelFrameXX
ListboxX
MenuX
MenubuttonXX
MessageX
NotebookX
OptionMenuX
PanedWindowXX
ProgressbarX
RadiobuttonXX
ScaleXX
ScrollbarXX
SeparatorX
SizegripX
SpinboxXTk8.5.9
TextX
TreeviewX
ToplevelX

I'm not going to include Tix because there are over 40 widgets that are added, but there are two important differences between Tix and ttk.
  • Tix hasn't seen development since 2008 whereas ttk was last upgraded in 2012
  • Tix isn't part of Tk/Tcl - it is a 3rd party add-on.
In general ttk looks newer than Tix. Here are some screen shots of a Tix Meter widget and ttk Progressbar widget.
Tix Meter widget
Tix Meter Widget

ttk Progressbar widget
ttk Progressbar widget

So I think I'm going to stick with ttk imported on top of Tkinter as suggested by Python, and I'll only supplement these widgets with Tix if I absolutely must, since the ttk widgets really do seem better.

Thursday, May 9, 2013

Baby Got Backends

What are backends? I mean matplotlib backends for the nearly ubiquitous scientific graphic Python package that complements NumPy/SciPy so well. You can use several graphics libraries including Qt4, GTK, wx and Tk (which is the default). Here in no order are what these backends look like on Windows 7.

Qt4Agg

This is probably the nicest. It is the only one that has the special added feature of allowing you to set the figure title, axes labels and line and marker type and color. Also it is the only one with normal looking sliders.
matplotlib Qt4Agg backend
Qt4Agg backend
 Here you can see the curves tab of the extra figure options box.
matplotlib Qt4Agg backend curves tab of figure options
Curves tab of the figure options box (Qt4Agg backend)

GTKAgg

The other backends look nearly identical. Here is the GTK backend. The sliders are the same as Tk, but the main figure window has buttons whose edges only light up when you hover over them, giving it a slightly more modern look than the TkAgg backend.
matplotlib GTKAgg backend
GTKAgg backend

TkAgg

Here is the default backend. See what I mean about the buttons; but maybe you like that? I know on mac, Tk looks pretty nice, but on Linux and Windows, I think it looks more primitive than the other backends.

matplotlib TkAgg backend
TkAgg backend

wxAgg

The only noticeable difference between wx and the GTK to me was the extra status bar underneath the toolbar. GTK and Tk both put the status on the right side of the toolbar, which makes better use of space, IMHO. Also the nav buttons in the wx backend were grey not blue, and the favicon's were broken - not matplotlibs squiggly curve icon that the other backends used - but that might just be my computer?
matplotlib wxAgg backend
wxAgg backend

Wednesday, May 8, 2013

I heart Doug Hellmann

The most straightforward way to use the debugger is to run it from the command line, giving it your own program as input so it knows what to run.
from his PyMOTW blog on pdb. FYI: you can replace pdb with ipdb.
$ python -m ipdb myscript.py argv1 argv2 ...
This solves the "importing ipdb twice breaks color coding" bug I posted here.

[UPDATE 2013-06-04] Just learned that ipdb also ships with a script that you can run directly from the command line..
$ ipdb myscript.py argv1 argv2 ...
Does the same as the python -m, but with 10 less key strokes, score! (ha, ha, ha!)
Fork me on GitHub