Thursday, July 24, 2014

Django from development to production: Apache and PostgreSQL

Perhaps like many, I start my Django projects with the default settings. That means that my database backend is SQLite and I use the simple HTTP server provided by Django for debugging. (See this post for a hack to get the debug server to run as a Windows service.) Now it's time for production, and that means using a real HTTP server like Apache and a perhaps a more flexible database like PostgreSQL. Here are some notes on the steps I took, and a couple of missteps as well. There are many other blogs with similar notes, eg Salty Crane. By the way I am using Django-1.6.5 and South-1.0. The Django How-To Deployment and Installation guides both recommend using Apache with mod-wsgi. The Install FAQs recommend PostgreSQL with psycopg2.

Apache HTTP Server

  1. Download Apache from ApacheLounge. I chose the 64-bit Windows binary built with VC10 because my system is 64-bit and I have VC10. Also it matches the mod-wsgi binary available for Windows-x64. I also chose Apache-2.4 version instead of the older 2.2.
  2. The ApacheLounge zip file has instructions, but it's simple, just extract it to c:\Apahce24. I also made a shortcut to ApacheMonitor.exe in my Startup folder. This nifty app runs in the system tray giving you the server status, and lets you restart, stop or start the server or open services. Finally I changed ownership of the folder recursively to SYSTEM.
  3. Download mod_wsgi from Christoph Gohlke's Python Extension Packages for Windows. How would we survive without this site? Extract the library, and copy it to your Apache/modules folder.
  4. edit httpd.conf to load the mod_wsgi module. See the Django documentation on how to Use Apache with mod_wsgi and the mod_wsgi quick installation guide for configuration. Specifically add the line LoadModule wsgi_module modules/mod_wsgi.so - I added it to the end of the list of modules. Note comments are preceded by # (aka: the hash symbol).
  5. Provide the mod_wsgi parameters that allow the server to serve the Django folder:
    WSGIScriptAlias / /path/to/mysite.com/mysite/wsgi.py
    WSGIPythonPath /path/to/mysite.com
    
    <Directory /path/to/mysite.com/mysite>
    <Files wsgi.py>
    Require all granted
    </Files>
    </Directory>
    
  6. Alias and allow the static and media folders:
    Alias /static/ /path/to/mysite.com/STATIC_ROOT/
    Alias /media/ /path/to/mysite.com/MEDIA_ROOT/
    <Directory /path/to/mysite.com/STATIC_ROOT>
    Require all granted
    </Directory>
    <Directory /path/to/mysite.com/MEDIA_ROOT>
    Require all granted
    </Directory>
    
    I put these and the preceding lines in the section of httpd.conf where it says to specify which folders to allow access.
  7. Use manage.py collectstatic to make all admin files and other css/js files available to server. Make sure STATIC_ROOT is set to the same folder that is aliased in the httpd.conf file, and that it is empty because it will be overwritten. I keep all of my bootstrap and tablesorter files as well as images and icons in an assets folder that is on my STATIC_DIRS list.
  8. Turn DEBUG and TEMPLATE_DEBUG off.
  9. You must specify ALLOWED_HOSTS, e.g. mydomain.com, or you will get a 500 server error.
  10. Install Apache as service, from admin COM window, navigate to C:\Apache24\bin and type httpd.exe -k install, now go to your site and see if it works? You may need to start the service. You can use the ApacheMonitor to start it or open services. If you are not an admin, it will prompt you for admin creds.

Now for Postgre

  1. first make a copy of your db.sqlite3 file.
  2. Make a mental check list of all of the apps and models you have and use manage.py dumpdata --natural myapp1 myapp2 myapp3.mymodel auth.user etc. > myfixtures.json to save them all to a JSON file. Note: only select your models, or you may get integrity or other errors when loading fixtures, and don't forget auth.user or auth.groups if you are using them. Specify models of apps using dot notation. You may need the --natural option for auth objects, not sure, but it doesn't hurt.
  3. Download PostgreSQL from EnterpriseDB and install it. I also made a file in my profile's bin folder with the following code
    #! /bin/sh
    /c/Program\ Files/9.3/bin/psql.exe "$@"
    
    so that I can use manage.py dbshell in my Git Bash shell.
  4. Download the Python PostgreSQL binding psycopg (aka psycopg2) from Stickpeople Project another extraordinary good Samaritan service like Christoph Gohlke's Python Extension Packages for Windows, who also has another version of psycopg2 build from PostgreSQL-9.3 available for download.
  5. Use the pgAdmin3 panel to connect to the server and create a new user and set the password, e.g. django.
    Just right click on Postgre-9.3 (localhost:5432), select Connect and enter your password. Then right click on Databases and select New Database. For new users right click on Login Roles and select New Login Role.
  6. Create a new database for you Django project and set the new user as the owner. See the PostgreSQL notes in the Django Database documentation.
  7. Update your settings for the new database. A PostgreSQL example is given in the Settings documentation for DATABASE. Set the ENGINE key to postgresql_psycopg2, NAME to the name you gave your Django project's database, USER to the owner you set for the new database, and PASSWORD to the database owner's password. HOST is probably 'localhost' and PORT is probably 5432.
  8. Check one last time that you've backed up the old database and used dumpdata to save fixtures of your apps and models, including auth.users and auth.groups in a JSON file. Then use manage.py syncdb to install your Django project in the new database. Say yes or no when it asks to create a superuser, because when you load the fixtures it will overwrite any rows in your tables.
  9. Use South to migrate the databases, eg manage.py migrate <app>, etc. for all apps in your project.
  10. Now use manage.py loaddata myfixtures.json to load your app and model data into the new database. You should be able to load them with one file containing all of the fixtures, because Django will reference the fixture for tables or rows not yet created. Do not include any extra Django fixtures, or you will raise exceptions. Specifically do not use manage.py dumpdata without specifying any of your apps or models or Django saves extra duplicate info such as content_types, which can not be loaded into the new database because they already exist.
Finally, does your app work? Test the data. Hopefully everything is hunky dory.

Wednesday, July 16, 2014

WinMerge with Git

WinMerge is a fine diff tool for Windows platforms.
I downloaded a portable version and extracted into my root folder as C:\WinMerge\. Then I used it as a Git difftool by executing these lines in my Git Bash shell.
$ git config --global difftool.winmerge.cmd '/c/winmerge/winmergeu.exe -e -u -x -wl -wr -dl LOCAL -dr REMOTE "${LOCAL}" "${REMOTE}"'
$ git config --global mergetool.winmerge.cmd '/c/winmerge/winmergeu.exe -e -u -x -dl LOCAL -dr REMOTE "${LOCAL}" "${REMOTE}" "${MERGED}"'
The command line options are explained in the WinMerge Documentation. Now you can call it as your diff or merge tool.
$ git difftool -t winmerge

Friday, June 27, 2014

using South to migrate a Django sqlite3 database with unique_together

This has been fixed in Django-1.7, which is still in development. So this post apples to Django-1.6.5 (or older perhaps).

If you add a new field to a Django model with the Meta option tag 'unique_together' in a project that uses a backend sqlite3 database, then you will get the following error from South when you try to migrate it:
"object reserved for internal use"
And also that South can't roll back the changes to a sqlite3 database. So unless you had a backup you're hosed.

Luckily you backed up your database before applying the changes right? So restore it and now try this:
  1. Remove the offending meta option tag.
  2. Migrate the change to the model.
  3. Restore the meta tag option and migrate again.
Did it work? If it didn't guess you'll have to wait for Django-1.7 or switch to a more robust backend database. It did work for me, this time at least, but who knows about next time. Can't wait for Django-1.7. Can't wait for Sphinx-1.3 too, for that matter. Check out Napoleon! No more bizarre doc strings. Numpy/Google style here we come . . .

dynamically upload file to Django model FileField

This post was inspired by this SO Q&A which is for an ImageField but which I adapted for an FileField.

Does your app generate some content that is too large or not appropriate for a database? You can store it as a FileField which points to a file in your MEDIAROOT folder. How do you upload the generated content to the FileField? Creation is a bit similar to ManyToManyField if you are familiar with using that.
  1. Add the FileField to your model and set blank=True, null=True so that you can instantiate it without setting the FileField.
  2. Create the object leaving off the FileField. Save the instance.
  3. When you retrieve the FileField from your model you get a FieldFile (note the word order swap) which allows you to interact with the File object (a Django version of a Python file object). You could save the content to disk then call the FieldFile.save() method, but you can skip this unnecessary step. Let's assume the content can be serialized as a JSON object. The following code will upload the content to Django.
    from StringIO import StringIO
    import json
    f = StringIO(json.dumps(my_content, indent=2, sort_keys=True))
    try:
        my_model_object.my_file_field.save('my_file_name.json', f)
        # `FieldFile.save()` saves the instance of the model by default
    finally:
        f.close()  # `StringIO` has no `__exit__` method to use `with`
Setting the FileField to null=True and blank=True is only necessary if you want to upload a file object, otherwise you can pass file name as the FileField when you construct the database object. EG:
my_model_object(my_char_field='some other model fields', my_file_field='my_file_field.json')
This will upload 'my_file_field.json' from disk if it is a valid path.

Thursday, May 22, 2014

[Python, Windows] `pkg_resources not found` issue? Check your permissions.

It's been a long time coming ...

I've been meaning to post about this for awhile, and I've seen numerous SO Q&A regarding this issue, while I was trying to resolve it myself.

How embarrassing!

Nothing is more embarrassing than when you are demo-ing some software, and it fails in front of the user in cryptic fashion. Especially if the user is not tech savvy (read: scared of command line) or already disinclined (for some truly bizarre reason) to coding already, this is a real stumbling block and a major put-off. Well that's what happened at my big unveiling, when releasing new analytical tools for my group to use; they logged in to the servers I had setup, trusting me blindly only to be thwarted by this inexplicable error that seemed to substantiate every fear and preconceived misconception they had just below the surface been expecting and had now been confirmed. This was of course the moment where I swoop in and show them that the fix was trivially not only confirming their trust in me, the computer geek, but also reaffirming their faith in their own ability to utilize this fantastical new gizmotron we call the 'puter. But alas, I was completely at a loss, and evidentally so was everyone else in the universe because where ever I did see this issue, the solution was inevitably and misguidedly to re-install Setuptools, the package responsible for `pkg_resources()`.

Redux

Well, I finally had some time to check out the source of this issue. Luckily I have a coworker who actually gets excited about computing. You would think this was the norm at a heavily scientific engineering group. And although he may be a geek, he is mostly a hands on engineer. But he is also a genius who uses whatever the best tools on hand are to get whatever he needs done. And he's a wicked fast learner. I have a workstation with quite a bit of power, and I offered to share it with him so that he could bring it to bear on a particular challenging mixed-integer binary linear programming problem he was working on (that he completely taught himself). Bam! We immediately hit the `pkg_resources() not found` issue.

Interlude

I should mention that we do not work on our computers as admins, but instead using Windows 7 and UAC, elevate our credentials when necessary. It's not as graceful as `sudo` on Mac or Linux, but I think it gives us flexibility yet a smidge more safety.

Eureka!

So I had an inkling that it might be a permissions issue, and sure enough, I had been using pip as a normal user to install all of my pure Python packages. This meant that I was the owner of those files, and even though everyone in my domain had read and execute rights, that still wasn't enough for `pkg_resources()` to function. Not surprising because usually, unless you are using a virtualenv or a .local site-packages folder, you always have to use `sudo` to pip packages on Mac or Linux.

Solution

Probably other ways I could have solved this, but I changed permissions of all of the packages and scripts to `SYSTEM`, and voila, the issue was solved, without re-installing a thing.

OK, I did actually update Setuptools while logged in as the other user, since so many SO posts were suggesting this, and it gave me a clue to what the issue was, especially when it still didn't solve the issue for a couple of other packages that still could not be imported, or only functioned partially.

Thursday, May 8, 2014

Tab Tintinabulation


Enabling Tab Auto-Completion For Python

The Python Standard Library Reference on rlcompleter provides everything you need. Following the directions there here's what I did.
  1. Make a directory in your home directory
  2. ~$ mkdir .pythonrc
    
  3. Create a file in your new directory with the following lines
  4. ~/.pythonrc$ vim pythonstartup.py
    
    #! /usr/bin/env python
    try:
        import readline
    except ImportError:
        print "Module readline not available."
    else:
        import rlcompleter
        readline.parse_and_bind("tab: complete")
    
  5. Add the following line to your .bashrc file
  6. export PYTHONSTARTUP=~/.pythonrc/pythonstartup.py
    
  7. restart your shell and start python

Friday, May 2, 2014

loading MATLAB mat-file into Pandas

Pandas is a great tool for analyzing large data sets, especially time-series data. It quickly and easily imports most basic data files: Excel, comma-separated values, etc., but not MATLAB mat-files. However, SciPy does import MATLAB mat-files, so combining packages gets the job done.

Here's an example of a mat-file that has a single variable, called measuredData, that contains a MATLAB structure with a timeStamps field and several time series data fields, voltage, current and temperature and some other fields that are irrelevant. There is also a field called numIntervals that contains the number of intervals in the time series data sets. The struct itself has only one element.

import numpy as np
from scipy.io import loadmat  # this is the SciPy module that loads mat-files
import matplotlib.pyplot as plt
from datetime import datetime, date, time
import pandas as pd

mat = loadmat('measured_data.mat')  # load mat-file
mdata = mat['measuredData']  # variable in mat file
mdtype = mdata.dtype  # dtypes of structures are "unsized objects"
# * SciPy reads in structures as structured NumPy arrays of dtype object
# * The size of the array is the size of the structure array, not the number
#   elements in any particular field. The shape defaults to 2-dimensional.
# * For convenience make a dictionary of the data using the names from dtypes
# * Since the structure has only one element, but is 2-D, index it at [0, 0]
ndata = {n: mdata[n][0, 0] for n in mdtype.names}
# Reconstruct the columns of the data table from just the time series
# Use the number of intervals to test if a field is a column or metadata
columns = [n for n, v in ndata.iteritems() if v.size == ndata['numIntervals']]
# now make a data frame, setting the time stamps as the index
df = pd.DataFrame(np.concatenate([ndata[c] for c in columns], axis=1),
                  index=[datetime(*ts) for ts in ndata['timestamps']],
                  columns=columns)
Fork me on GitHub Creative Commons License
poquitopicante by Mark Mikofski is licensed under a Creative Commons Attribution 3.0 Unported License.
Based on a work at http://poquitopicante.blogspot.com.
Permissions beyond the scope of this license may be available at http://poquitopicante.blogspot.com/p/darn-disclaimer-and-litigious-license.html.