Thursday, May 22, 2014

[Python, Windows] `pkg_resources not found` issue? Check your permissions.

It's been a long time coming ...

I've been meaning to post about this for awhile, and I've seen numerous SO Q&A regarding this issue, while I was trying to resolve it myself.

How embarrassing!

Nothing is more embarrassing than when you are demo-ing some software, and it fails in front of the user in cryptic fashion. Especially if the user is not tech savvy (read: scared of command line) or already disinclined (for some truly bizarre reason) to coding already, this is a real stumbling block and a major put-off. Well that's what happened at my big unveiling, when releasing new analytical tools for my group to use; they logged in to the servers I had setup, trusting me blindly only to be thwarted by this inexplicable error that seemed to substantiate every fear and preconceived misconception they had just below the surface been expecting and had now been confirmed. This was of course the moment where I swoop in and show them that the fix was trivially not only confirming their trust in me, the computer geek, but also reaffirming their faith in their own ability to utilize this fantastical new gizmotron we call the 'puter. But alas, I was completely at a loss, and evidentally so was everyone else in the universe because where ever I did see this issue, the solution was inevitably and misguidedly to re-install Setuptools, the package responsible for `pkg_resources()`.

Redux

Well, I finally had some time to check out the source of this issue. Luckily I have a coworker who actually gets excited about computing. You would think this was the norm at a heavily scientific engineering group. And although he may be a geek, he is mostly a hands on engineer. But he is also a genius who uses whatever the best tools on hand are to get whatever he needs done. And he's a wicked fast learner. I have a workstation with quite a bit of power, and I offered to share it with him so that he could bring it to bear on a particular challenging mixed-integer binary linear programming problem he was working on (that he completely taught himself). Bam! We immediately hit the `pkg_resources() not found` issue.

Interlude

I should mention that we do not work on our computers as admins, but instead using Windows 7 and UAC, elevate our credentials when necessary. It's not as graceful as `sudo` on Mac or Linux, but I think it gives us flexibility yet a smidge more safety.

Eureka!

So I had an inkling that it might be a permissions issue, and sure enough, I had been using pip as a normal user to install all of my pure Python packages. This meant that I was the owner of those files, and even though everyone in my domain had read and execute rights, that still wasn't enough for `pkg_resources()` to function. Not surprising because usually, unless you are using a virtualenv or a .local site-packages folder, you always have to use `sudo` to pip packages on Mac or Linux.

Solution

Probably other ways I could have solved this, but I changed permissions of all of the packages and scripts to `SYSTEM`, and voila, the issue was solved, without re-installing a thing.

OK, I did actually update Setuptools while logged in as the other user, since so many SO posts were suggesting this, and it gave me a clue to what the issue was, especially when it still didn't solve the issue for a couple of other packages that still could not be imported, or only functioned partially.

Thursday, May 8, 2014

Tab Tintinabulation


Enabling Tab Auto-Completion For Python

The Python Standard Library Reference on rlcompleter provides everything you need. Following the directions there here's what I did.
  1. Make a directory in your home directory
  2. ~$ mkdir .pythonrc
    
  3. Create a file in your new directory with the following lines
  4. ~/.pythonrc$ vim pythonstartup.py
    
    #! /usr/bin/env python
    try:
        import readline
    except ImportError:
        print "Module readline not available."
    else:
        import rlcompleter
        readline.parse_and_bind("tab: complete")
    
  5. Add the following line to your .bashrc file
  6. export PYTHONSTARTUP=~/.pythonrc/pythonstartup.py
    
  7. restart your shell and start python

Friday, May 2, 2014

loading MATLAB mat-file into Pandas

Pandas is a great tool for analyzing large data sets, especially time-series data. It quickly and easily imports most basic data files: Excel, comma-separated values, etc., but not MATLAB mat-files. However, SciPy does import MATLAB mat-files, so combining packages gets the job done.

Here's an example of a mat-file that has a single variable, called measuredData, that contains a MATLAB structure with a timeStamps field and several time series data fields, voltage, current and temperature and some other fields that are irrelevant. There is also a field called numIntervals that contains the number of intervals in the time series data sets. The struct itself has only one element.

import numpy as np
from scipy.io import loadmat  # this is the SciPy module that loads mat-files
import matplotlib.pyplot as plt
from datetime import datetime, date, time
import pandas as pd

mat = loadmat('measured_data.mat')  # load mat-file
mdata = mat['measuredData']  # variable in mat file
mdtype = mdata.dtype  # dtypes of structures are "unsized objects"
# * SciPy reads in structures as structured NumPy arrays of dtype object
# * The size of the array is the size of the structure array, not the number
#   elements in any particular field. The shape defaults to 2-dimensional.
# * For convenience make a dictionary of the data using the names from dtypes
# * Since the structure has only one element, but is 2-D, index it at [0, 0]
ndata = {n: mdata[n][0, 0] for n in mdtype.names}
# Reconstruct the columns of the data table from just the time series
# Use the number of intervals to test if a field is a column or metadata
columns = [n for n, v in ndata.iteritems() if v.size == ndata['numIntervals']]
# now make a data frame, setting the time stamps as the index
df = pd.DataFrame(np.concatenate([ndata[c] for c in columns], axis=1),
                  index=[datetime(*ts) for ts in ndata['timestamps']],
                  columns=columns)
Fork me on GitHub