Wednesday, July 16, 2014

WinMerge with Git

WinMerge is a fine diff tool for Windows platforms.
I downloaded a portable version and extracted into my root folder as C:\WinMerge\. Then I used it as a Git difftool by executing these lines in my Git Bash shell.
$ git config --global difftool.winmerge.cmd '/c/winmerge/winmergeu.exe -e -u -x -wl -wr -dl LOCAL -dr REMOTE "${LOCAL}" "${REMOTE}"'
$ git config --global mergetool.winmerge.cmd '/c/winmerge/winmergeu.exe -e -u -x -dl LOCAL -dr REMOTE "${LOCAL}" "${REMOTE}" "${MERGED}"'
The command line options are explained in the WinMerge Documentation. Now you can call it as your diff or merge tool.
$ git difftool -t winmerge

Friday, June 27, 2014

using South to migrate a Django sqlite3 database with unique_together

This has been fixed in Django-1.7, which is still in development. So this post apples to Django-1.6.5 (or older perhaps).

If you add a new field to a Django model with the Meta option tag 'unique_together' in a project that uses a backend sqlite3 database, then you will get the following error from South when you try to migrate it:
"object reserved for internal use"
And also that South can't roll back the changes to a sqlite3 database. So unless you had a backup you're hosed.

Luckily you backed up your database before applying the changes right? So restore it and now try this:
  1. Remove the offending meta option tag.
  2. Migrate the change to the model.
  3. Restore the meta tag option and migrate again.
Did it work? If it didn't guess you'll have to wait for Django-1.7 or switch to a more robust backend database. It did work for me, this time at least, but who knows about next time. Can't wait for Django-1.7. Can't wait for Sphinx-1.3 too, for that matter. Check out Napoleon! No more bizarre doc strings. Numpy/Google style here we come . . .

dynamically upload file to Django model FileField

This post was inspired by this SO Q&A which is for an ImageField but which I adapted for an FileField.

Does your app generate some content that is too large or not appropriate for a database? You can store it as a FileField which points to a file in your MEDIAROOT folder. How do you upload the generated content to the FileField? Creation is a bit similar to ManyToManyField if you are familiar with using that.
  1. Add the FileField to your model and set blank=True, null=True so that you can instantiate it without setting the FileField.
  2. Create the object leaving off the FileField. Save the instance.
  3. When you retrieve the FileField from your model you get a FieldFile (note the word order swap) which allows you to interact with the File object (a Django version of a Python file object). You could save the content to disk then call the method, but you can skip this unnecessary step. Let's assume the content can be serialized as a JSON object. The following code will upload the content to Django.
    from StringIO import StringIO
    import json
    f = StringIO(json.dumps(my_content, indent=2, sort_keys=True))
    try:'my_file_name.json', f)
        # `` saves the instance of the model by default
        f.close()  # `StringIO` has no `__exit__` method to use `with`
Setting the FileField to null=True and blank=True is only necessary if you want to upload a file object, otherwise you can pass file name as the FileField when you construct the database object. EG:
my_model_object(my_char_field='some other model fields', my_file_field='my_file_field.json')
This will upload 'my_file_field.json' from disk if it is a valid path.

Thursday, May 22, 2014

[Python, Windows] `pkg_resources not found` issue? Check your permissions.

It's been a long time coming ...

I've been meaning to post about this for awhile, and I've seen numerous SO Q&A regarding this issue, while I was trying to resolve it myself.

How embarrassing!

Nothing is more embarrassing than when you are demo-ing some software, and it fails in front of the user in cryptic fashion. Especially if the user is not tech savvy (read: scared of command line) or already disinclined (for some truly bizarre reason) to coding already, this is a real stumbling block and a major put-off. Well that's what happened at my big unveiling, when releasing new analytical tools for my group to use; they logged in to the servers I had setup, trusting me blindly only to be thwarted by this inexplicable error that seemed to substantiate every fear and preconceived misconception they had just below the surface been expecting and had now been confirmed. This was of course the moment where I swoop in and show them that the fix was trivially not only confirming their trust in me, the computer geek, but also reaffirming their faith in their own ability to utilize this fantastical new gizmotron we call the 'puter. But alas, I was completely at a loss, and evidentally so was everyone else in the universe because where ever I did see this issue, the solution was inevitably and misguidedly to re-install Setuptools, the package responsible for `pkg_resources()`.


Well, I finally had some time to check out the source of this issue. Luckily I have a coworker who actually gets excited about computing. You would think this was the norm at a heavily scientific engineering group. And although he may be a geek, he is mostly a hands on engineer. But he is also a genius who uses whatever the best tools on hand are to get whatever he needs done. And he's a wicked fast learner. I have a workstation with quite a bit of power, and I offered to share it with him so that he could bring it to bear on a particular challenging mixed-integer binary linear programming problem he was working on (that he completely taught himself). Bam! We immediately hit the `pkg_resources() not found` issue.


I should mention that we do not work on our computers as admins, but instead using Windows 7 and UAC, elevate our credentials when necessary. It's not as graceful as `sudo` on Mac or Linux, but I think it gives us flexibility yet a smidge more safety.


So I had an inkling that it might be a permissions issue, and sure enough, I had been using pip as a normal user to install all of my pure Python packages. This meant that I was the owner of those files, and even though everyone in my domain had read and execute rights, that still wasn't enough for `pkg_resources()` to function. Not surprising because usually, unless you are using a virtualenv or a .local site-packages folder, you always have to use `sudo` to pip packages on Mac or Linux.


Probably other ways I could have solved this, but I changed permissions of all of the packages and scripts to `SYSTEM`, and voila, the issue was solved, without re-installing a thing.

OK, I did actually update Setuptools while logged in as the other user, since so many SO posts were suggesting this, and it gave me a clue to what the issue was, especially when it still didn't solve the issue for a couple of other packages that still could not be imported, or only functioned partially.

Thursday, May 8, 2014

Tab Tintinabulation

Enabling Tab Auto-Completion For Python

The Python Standard Library Reference on rlcompleter provides everything you need. Following the directions there here's what I did.
  1. Make a directory in your home directory
  2. ~$ mkdir .pythonrc
  3. Create a file in your new directory with the following lines
  4. ~/.pythonrc$ vim
    #! /usr/bin/env python
        import readline
    except ImportError:
        print "Module readline not available."
        import rlcompleter
        readline.parse_and_bind("tab: complete")
  5. Add the following line to your .bashrc file
  6. export PYTHONSTARTUP=~/.pythonrc/
  7. restart your shell and start python

Friday, May 2, 2014

loading MATLAB mat-file into Pandas

Pandas is a great tool for analyzing large data sets, especially time-series data. It quickly and easily imports most basic data files: Excel, comma-separated values, etc., but not MATLAB mat-files. However, SciPy does import MATLAB mat-files, so combining packages gets the job done.

Here's an example of a mat-file that has a single variable, called measuredData, that contains a MATLAB structure with a timeStamps field and several time series data fields, voltage, current and temperature and some other fields that are irrelevant. There is also a field called numIntervals that contains the number of intervals in the time series data sets. The struct itself has only one element.

import numpy as np
from import loadmat  # this is the SciPy module that loads mat-files
import matplotlib.pyplot as plt
from datetime import datetime, date, time
import pandas as pd

mat = loadmat('measured_data.mat')  # load mat-file
mdata = mat['measuredData']  # variable in mat file
mdtype = mdata.dtype  # dtypes of structures are "unsized objects"
# * SciPy reads in structures as structured NumPy arrays of dtype object
# * The size of the array is the size of the structure array, not the number
#   elements in any particular field. The shape defaults to 2-dimensional.
# * For convenience make a dictionary of the data using the names from dtypes
# * Since the structure has only one element, but is 2-D, index it at [0, 0]
ndata = {n: mdata[n][0, 0] for n in mdtype.names}
# Reconstruct the columns of the data table from just the time series
# Use the number of intervals to test if a field is a column or metadata
columns = [n for n, v in ndata.iteritems() if v.size == ndata['numIntervals']]
# now make a data frame, setting the time stamps as the index
df = pd.DataFrame(np.concatenate([ndata[c] for c in columns], axis=1),
                  index=[datetime(*ts) for ts in ndata['timestamps']],
Fork me on GitHub Creative Commons License
poquitopicante by Mark Mikofski is licensed under a Creative Commons Attribution 3.0 Unported License.
Based on a work at
Permissions beyond the scope of this license may be available at