Thursday, June 27, 2013

What is a scope for lambda?

Okay computer science students out there, riddle me this.
A scope is home for a function,
>>> def g(i):
>>>     def f():
>>>         return i
>>>     return f
>>> print [f() for f in (g(i) for i in xrange(3))]
[0, 1, 2]
but what is a scope for a lambda?
>>> print [f() for f in [lambda: i for i in xrange(3)]]
[2, 2, 2]
A scope is home for a generator,
>>> print [f() for f in (lambda: i for i in xrange(3))]
[0, 1, 2]
and a default parameter is a hack for a lack of a scope,
>>> print [f() for f in [lambda a=i: a for i in xrange(3)]]
[0, 1, 2]
but a new scope is home for a lambda.
>>> print [f() for f in ((lambda a=i: lambda: a)() for i in xrange(3))]
[0, 1, 2]

And while we're on the topic of weird Python hacks and weird comprehensions.
What the heck is this?
>>> foobar = [(1, 2, 3), (4, 5), (6, 7, 8, 9), (0, )]
>>> # foo are elements in bar which are elements in foobar
>>> [foo for bar in foobar for foo in bar]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
Well this was really just an excuse to rock the new syntax highlighter.

Tuesday, June 25, 2013

syntax sensation: A comparison of syntax highlighters

define CSS styles in html

I added 2 CSS definitions, .block-code, which I'm using here, and .inline-code. The definitions are in this Gist: Which looks like this:
To see the XKCD Python comic, type: import antigravity
>>> import this
The Zen of Python, by Tim Peters
 
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
I have been using <pre></pre> tags for blocks of code and <span></span> for inline code instead of <code></code> and that seems to work great. Some downsides of this approach are that there are no line numbers, no syntax highlighting, and it doesn't scroll.

google-code-prettify

This is a very simple syntax highlighter. It works similar to mathjax, the embedded Gist in the previous section and some of my NPR posts, by loading a js script from the web.
<script src="https://google-code-prettify.googlecode.com/svn/loader/run_prettify.js"></script>
It automatically detects languages, e.g. Python, there are several skins (this is sons-of-obsidian), and you can add linenumbers. It will format <pre></pre> and <code></code> tags that contain class="prettyprint", but it doesn't use <span></code> tags. For line numbers use class="prettyprint linenum".
#!/usr/bin/python

def fib():
  '''
  a generator that produces the elements of the fibonacci series
  '''

  a = 1
  b = 1
  while True:
    a, b = a + b, a
    yield a

def nth(series, n):
  '''
  returns the nth element of a series,
  consuming the earlier elements of the series
  '''

  for x in series:
    n = n - 1
    if n <= 0: return x

print nth(fib(), 10)

SyntaxHighlighter

This is the most ubiquitous and snazzy highlighter. Same as prettify, you load javascript.
<link href='http://alexgorbatchev.com/pub/sh/current/styles/shCore.css' rel='stylesheet' type='text/css'/>
<link href="http://alexgorbatchev.com/pub/sh/current/styles/shThemeFadeToGrey.css" rel="stylesheet" type="text/css" />
<script src="http://alexgorbatchev.com/pub/sh/current/scripts/shCore.js" type="text/javascript"></script>
<script src="http://alexgorbatchev.com/pub/sh/current/scripts/shAutoloader.js" type="text/javascript"></script>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushPython.js' type='text/javascript'></script>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushXml.js' type='text/javascript'></script>
<script language='javascript'>
SyntaxHighlighter.config.bloggerMode = true;
SyntaxHighlighter.all();
</script>
Then use <pre class="brush: python">your Python code goes here</pre>. It scrolls nicely, and there are several themes and supported languages - just load the appropriate brush script, e.g.: "shBrushPython.js" script, then specify the language in lower case, i.e.: "python". The <script></script> mode doesn't work on Blogger, and note the bloggerMode = true configuration in the script. Also, there is no inline mode, but there is a nice scrollbar.
#!/usr/bin/python

def fib():
  '''
  a generator that produces the elements of the fibonacci series
  '''

  a = 1
  b = 1
  while True:
    a, b = a + b, a
    yield a

def nth(series, n):
  '''
  returns the nth element of a series,
  consuming the earlier elements of the series
  '''

  for x in series:
    n = n - 1
    if n <= 0: return x

print nth(fib(), 10)

Monday, June 24, 2013

rascally resize tk scrollbars

First, is <pre></pre> the coolest html tag ever? I don't have to worry about non-breaking spaces or line-breaks with pre-formatted text, I just copy and paste my code.
Second, in case you were wondering, I've added a some CSS for inline and block code to my blogger posts. They're in this Gist.
Third, here's a demo of scrollbars and listbox that resize with the window. The trick is in the row and column weights. Set it to a positive number to resize, by using either columnconfigure or rowconfigure.
#! /usr/bin/env python

from Tkinter import *
from ttk import *
import calendar

root = Tk()
root.title('Listy')

master = Frame(root)
master.pack(expand=True, fill=BOTH)
master.columnconfigure(0, weight=1)
# keep scrollbar same width, ie don't resize!
master.columnconfigure(1, weight=0)
master.rowconfigure(0, weight=1)

# y-scrollbar
scrolly = Scrollbar(master, orient=VERTICAL)
scrolly.grid(row=0, column=1, sticky=N+S)

# listbox
listy = Listbox(master)
listy.grid(row=0, column=0, sticky=N+S+E+W)

# content
for m in calendar.month_name:
    listy.insert(END, m)
for d in calendar.day_name:
    listy.insert(END, d)

# bind scrollbar to listbox
listy.config(yscrollcommand=scrolly.set)
scrolly.config(command=listy.yview)

if __name__ == '__main__':
    master.mainloop()
You could do this with the packer geometry manager, by using pack(expand=YES, fill=BOTH) for the listbox and pack(fill=Y) for the scrollbar. The trick is expand which causes the listbox to resize, but not the scrollbar.
#! /usr/bin/env python

from Tkinter import *
from ttk import *
import calendar

root = Tk()
root.title('Listy')

master = Frame(root)
master.pack(expand=True, fill=BOTH)

# y-scrollbar
scrolly = Scrollbar(master, orient=VERTICAL)
scrolly.pack(side=RIGHT, fill=Y)

# listbox
listy = Listbox(master)
listy.pack(side=LEFT, expand=YES, fill=BOTH)

# content
for m in calendar.month_name:
    listy.insert(END, m)
for d in calendar.day_name:
    listy.insert(END, d)

# bind scrollbar to listbox
listy.config(yscrollcommand=scrolly.set)
scrolly.config(command=listy.yview)

if __name__ == '__main__':
    master.mainloop()

ttk Notebook demo for Py2

There is a very nice ttk Notebook demo on a very cleverly named blog called Py in my eye. Note: there are very few differences between the Python 3 version of this demo and the Python 2 version, other than
For a Python 2 version of Jane's demo, see this Gist. To help myself understand what was going on, I forced myself to decompose that stellar example into this super easy demo:
#! /usr/bin/env python

from Tkinter import *
from ttk import *

root = Tk() # create a top-level window

master = Frame(root, name='master') # create Frame in "root"
master.pack(fill=BOTH) # fill both sides of the parent

root.title('EZ') # title for top-level window
# quit if the window is deleted
root.protocol("WM_DELETE_WINDOW", master.quit)

nb = Notebook(master, name='nb') # create Notebook in "master"
nb.pack(fill=BOTH, padx=2, pady=3) # fill "master" but pad sides

# create each Notebook tab in a Frame
master_foo = Frame(nb, name='master-foo')
Label(master_foo, text="this is foo").pack(side=LEFT)
# Button to quit app on right
btn = Button(master_foo, text="foo", command=master.quit)
btn.pack(side=RIGHT)
nb.add(master_foo, text="foo") # add tab to Notebook

# repeat for each tab
master_bar = Frame(master, name='master-bar')
Label(master_bar, text="this is bar").pack(side=LEFT)
btn = Button(master_bar, text="bar", command=master.quit)
btn.pack(side=RIGHT)
nb.add(master_bar, text="bar")

# start the app
if __name__ == "__main__":
    master.mainloop() # call master's Frame.mainloop() method.
    #root.destroy() # if mainloop quits, destroy window
Some notes:
  • The original demo puts the notebook in a frame, in another frame inside the top-level window, but you can just go nb->frame->root, and skip the extra frame. Not sure what you gain or lose.
  • If you want to see the demo decomposed, I converted the original demo as a script in the Gist.
  • You don't have to call Tk() to create a top-level window, Frame will do it for you. Then you can access the window via the master attribute of Frame.
  • If you call the Frame's quit() method, it will destroy the window for you, so the last line, root.destroy(), is not necessary.
  • If you don't bind the "WM_DELETE_WINDOW" protocol to Frame's quit() method, you will get a traceback when root.destroy() is called, saying that it can't destroy the window because it's already been deleted.
  • Use fill=BOTH if your labels and buttons are smaller than the parents they occupy if you want them to extend to both sides.
  • All of these demos are included in your Python distribution. On MS Windows it is here: C:\Python27\tcl\tk8.5\demos
Enjoy!!!

Wednesday, June 19, 2013

Dates and Datetimes


NumPy Datetimes

NumPy has datetimes, called datetime64 to avoid confusion with the Python datetime module and class. But it only uses ISO 8601 formats for text entries. i.e.: 2013-06-19T16:14:32.00-0700. It will also take a Python datetime.datetime() or numpy.datetime64() as an argument, but NumPy will always shift the date/time to the local timezone. If the Python datetime.datetime() object is naive (IE no tzinfo ) then NumPy will assume it is UTC (Zulu, GMT or +0000). Calling numpy.datetime64().item() will return the UTC equivalent Python datetime.datetime() object.

Examples with np.datetime64 dtype:

>>> import numpy as np
>>> from datetime import datetime
>>> np.datetime64(datetime.today().isoformat())
numpy.datetime64('2013-06-19T16:17:27.612000-0700')

Examples with np.array:

>>> dt = np.dtype([('dates', 'datetime64[D]'), ('dni', float)])
>>> data = [('2001-01-01', 834.34),
...         ('2001-01-02', 635.12)]
>>> npdata = np.array(data, dt)
array([(datetime.date(2001, 1, 1), 834.34),
       (datetime.date(2001, 1, 2), 635.12)],
      dtype=[('dates', '<M8[D]'), ('dni', '<f8')])

Repeat that with a datetime using Zulu time.

>>> dt = np.dtype([('dates', 'datetime64[m]'), ('dni', float)])
>>> data = [('2001-01-01T00:30Z', 834.34),
...         ('2001-01-01T01:30Z', 635.12)]
>>> npdata = np.array(data, dt)
array([(datetime.datetime(2001, 1, 1, 0, 30), 834.34),
       (datetime.datetime(2001, 1, 1, 1, 30), 635.12)],
      dtype=[('dates', '<M8[m]'), ('dni', '<f8')])

Repeat that with a datetime using UTC offset (+0000) for Zulu.

>>> dt = np.dtype([('dates', 'datetime64[m]'), ('dni', float)])
>>> data = [('2001-01-01T00:30-0000', 834.34),
...         ('2001-01-01T01:30-0000', 635.12)]
>>> npdata = np.array(data, dt)
array([(datetime.datetime(2001, 1, 1, 0, 30), 834.34),
       (datetime.datetime(2001, 1, 1, 1, 30), 635.12)],
      dtype=[('dates', '<M8[m]'), ('dni', '<f8')])
n.b.: Numpy converts strings for you, so you don't have to use np.datetime64 to cast them as datetime64 dtypes. Also it converts them to Python datetime.datetime or datetime.date, depending on your date units and shifts them to GMT (or Zulu) time. NumPy seems to handle dates and datetimes with the default units of day, e.g.: [D], but for structured arrays you must specify the datetime units e.g.: [D], [m], [s] or [ms] (see datetime units) in addition to datetime64 as the dtype or NumPy gives you this cryptic error:
Value Error: Cannot create a NumPy datetime other than NaT with generic units
Thanks to this answer on SO for unriddling that puzzle. If you make a NumPy datetime with nothing, you'll discover that NaT means "Not a time". In addition you may get this error, which is a bit more informative.
TypeError: Cannot cast datetime.datetime object from metadata [us] to [D] according to the rule 'same_kind'
This is because datetime.datetime uses micro-seconds [us] as its default, but NumPy uses days [D]. Specify the dtype using [us] or some form of seconds units, e.g.: [s], [ms], and it should work.

Matplotlib.dates.datestr2num()

This is an undocumented function that uses dateutils to convert a string to a floating decimal number that matplotlib uses to treat dates, similar to MATLAB and Excel. The function can also be imported via pylab.
>>> import pylab
>>> import pytz
>>> pst = pytz.timezone('US/Pacific')
>>> some_date = pylab.datestr2num('1998-1-1 12:15-0800')
>>> pylab.num2date(some_date, pst)
datetime.datetime(1998, 1, 1, 12, 15, tzinfo=<DstTzInfo 'US/Pacific' PST-1 day, 16:00:00 STD>)
>>> same_date = pylab.datestr2num('1/1/1998 12:15-0800')
>>> pylab.num2date(some_date, pst)
datetime.datetime(1998, 1, 1, 12, 15, tzinfo=<DstTzInfo 'US/Pacific' PST-1 day, 16:00:00 STD>)
Pretty nifty! Works better than I thought! In fact I like it way better than NumPy or Python for that matter. Note how using pytz helps matplotlib set the timezone; the tz class from dateutils can also be used to set tzinfo. Also note that if we hadn't set the UTC offset in the string, then it would have output 4:15 AM instead of 12:15, since it would have assumed GMT. Also matplotlib is pretty smart about determining the format; the default is month/day/year.

Python datetime

The lame way to do this is with Python's datetime.strptime(), but it doesn't support the %z directive for UTC offset (that's only for strftime() functions of datetime, date and time instances), and it only has an abstract class tzinfo for timezone, which can be replaced by pytz.
>>> dt = datetime.strptime('1998-1-1 12:15', '%Y-%m-%d %H:%M')  # naive datetime instance
>>> print dt
1998-01-01 12:15:00
>>> new_dt = datetime(*dt.timetuple()[0:6], tzinfo=pst)  # aware datetime instance
>>> print new_dt
1998-01-01 12:15:00-08:00
>>> new_dt.toordinal()
729390
The ordinal of the datetime is the date and hour part but matplotlib also outputs the fractional sub-day portion. The strptime() classmethod lets you set the format, which is very nice.

Time

When working with times, it will assume 1900-01-01, while NumPy assumes 1970 and matplotlib will default to today's date. But it's actually hard to create a NumPy time only as in Python datetime.time. Maybe there is a correct way to do it, but I could only make datetime.date and datetime.datetime with numpy.datetime64.

NumPy time example:

>>> np.array(datetime.time(8, 30), dtype='datetime64[m]')
Could not convert object to NumPy datetime
The only way I could do it was with timedelta64.
>>> t = np.timedelta64(8, 'h') + np.timedelta64(30, 'm')
>>> print t
510 minutes
>>> np.array(t, dtype='datetime64')
>>> array(datetime.datetime(1970, 1, 1, 8, 30), dtype='datetime64[m]')
So you can see, NumPy just randomly chose 1970 to be the year! As I said, Python reverts to 1900. For these examples I have to use datetime.time, so reimport datetime by itself. Also I assume that pytz was imported and pst is a pytz.timezone instance of 'US/Pacific' as in the sections above.

Python datetime.time example:

>>> import datetime
>>> t = datetime.time(8, 30, tzinfo=pst)
>>> t.strftime('%Y-%m-%d %H:%M %Z')
'1900-01-01 08:30 US/Pacific'
The timezone name, %Z, worked but I couldn't get %z to show the UTC offset. But both worked for datetime.datetime.
>>> dt = datetime.datetime.strptime('8:30','%H:%M')
>>> dt.replace(tzinfo=pst).strftime('%m/%d/%Y %H:%M:%S %z (%Z)')
'01/01/1900 08:30:00 -0800 (PST)'
Finally, as I said, matplotlib assumes whatever the current date. Notice too, that since it knows that it's currently daylight savings time!

matplotlib time example:

>>> md = pylab.num2date(pylab.datestr2num('8:30 -0700'), pst)
>>> md
datetime.datetime(2013, 6, 19, 8, 30, tzinfo=<DstTzInfo 'US/Pacific' PDT-1 day, 17:00:00 DST>)
You can use datetime.replace() to swap out the date for whatever you want.
>>> print md.replace(1998, 1, 1)
1998-01-01 08:30:00-07:00
This works for tzinfo too.
>>> print dt.replace(tzinfo=pst)  # aware datetime instance
1998-01-01 12:15:00-08:00

ISO 8601 Format

All of the datetime classes and therefore matplotlib too, all have an isoformat function.
>>> md.isoformat()
'2013-06-19T08:30:00-07:00'

Timezone

Hope you noticed that there are lots of timezone info.
>>> t.tzname()
'US/Pacific'
>>> md.tzname()
'PDT'
Please see the pytz and dateutils packages for complete details on using timezones, as there are package specific methods other than replacing tzinfo. For example, pytz exposes the localize method to create a datetime directly from a timezone object.
>>> pst.localize(datetime(2013,4,20,12,30))
datetime.datetime(2013, 4, 20, 12, 30, tzinfo=<DstTzInfo 'US/Pacific' PDT-1 day, 17:00:00 DST>)

Python time

This is a separate module for timing CPU operations. Also IPython has its own magical time, which can be called using %time. You can use time.clock() to measure how fast code runs on most platforms, and time.sleep() will make it pause. Wow! I hope I can remember all of this!

Saturday, June 15, 2013

[Python] read formatted input

I have come to the same conclusion as this blog and this "physics forum thread" (is this a real forum? or a copy of another forum?).
There is no Python equivalent of C/C++ fscanf or MATLAB fscanf, sscanf or textscan.
Here are some alternatives that I have found.
  1. numpy.genfromtext() does more or less exactly the same thing. It reads strings, via StringIO, and file. There is a nice section on importing data in the NumPyUser Guide.
    • instead of format specifiers like '%8f%4s%2d' use delimiter=(8, 4, 2) and set dtype=(float, str, int). Voila!
    • But genfromtext() does so much more! Using dtypes you can also set field names. There are options for skipping headers and footers, See the documentation.
  2. parse 1.6.1 offers parse(), the opposite of format() on PyPI. I haven't tried it, and I wish there was more documentation, specifically examples of multiple parsed tokens, but it does seem to be a python version of textscan, but for strings only.
  3. The re module in the standard Python reference is an obvious choice to parse tokens from strings. There is even a section on simulating scanf that offers recipes for %f and other formatters.
  4. For simple delimiters, one can use either of the following:
    • csv module from the standard Python reference
    • numpy.loadtxt() which has the added advantage of reading in data as NumPy arrays.
    • str.split() obviously
There are probably many other methods, but for MATLAB converts, once they move from disbelief and denial onto acceptance, it's pretty straightforward issue to resolve.

Wednesday, June 12, 2013

use generators, yield, xrange and obj.iter*() whenever possible

Python has a very cool feature. Java also has it, but Java isn't cool. Probably C and its variants have it, and I'm sure that super cool Ruby has it too. What about MATLAB, anyone? Bueller? ... Bueller?

iterators

An iterator uses less memory and is generally faster than an iterable.

Generators

Instead of returning a list from a function, turn it into an iterator by using yield

iterable

def listy(x=5):
    return range(x)

iterator

def genny(x=5):
    idx = 0
    while idx < 5:
        yield idx
        idx += 1

xrange

The example above is the same as the difference between range and xrange.

iterable

>>> listy = range(5):
>>> print listy
[0, 1, 2, 3, 4]

iterator

>>> for idx in xrange(5):
...     print idx,
0, 1, 2, 3, 4:

Generator Expression

Just like you can make a list on the fly with a list comprehension, you can make a generator on the fly with a generator expression!

iterable

>>> listy = [idx for idx in range(5)]:
>>> print listy
[0, 1, 2, 3, 4]

iterator

>>> animals = {'dog': 'spot', 'cat': 'felix'}
>>> genny = ('%s is a %s' % (name, animal) for animal, name in animals.iteritems())
>>> for animal_name in genny:
...     print animal_name,
spot is a dog felix is a cat

obj.iter*()

This actually links to a great section in the tutorial on looping techniques, where you will see examples of xrange(), enumerate(), and iteritems(). You can see in the generator comprehension above that dictionary has a method called iteritems() that produces an iterator (generator in Pythonish) instead of an iterable (like a Python list).

iterable

>>> animals = {'dog': 'spot', 'cat': 'felix'}
>>> listy = animals.items()
>>> print listy
[('dog', 'spot'), ('cat', 'felix')]

iterator

>>> animals = {'dog': 'spot', 'cat': 'felix'}
>>> genny = animals.iteritems())
>>> for animal in genny:
...     print animal,
('dog', 'spot') ('cat', 'felix')

Last Word

Just like in Java, an iterator is a one use container. Each time its next() method is called it advances to the next item until it reaches the end and then its raises a StopIteration exception. When the loop catches the exception, it exits gracefully. In order to reuse it you would have to create a new generator, but if you find that you need to use it multiple times, then you are better off using an iterable instead. In Summary ...
  • iterator: one time use, faster and uses less memory, good when only need to iterate through items one time
  • iterable: slower, uses more memory, good if you need to use any list methods or if you need to iterate through the same items many times.

Tuesday, June 11, 2013

Religion can be philisophy, aristocracy or imaginary friend

Disclaimer: I know this is a touchy subject, and I am not the most tactful person, so please
stop reading now!
if you have strong views on religion. What I am about to express is my opinion, and is not meant to influence or offend anyone. If you are already offended then I apologize, and hope that by turning back now you can avert any more offense.

Another theory; religion could be dissected as either a philosophy or an aristocracy. Wait, hear me out. I know that is way oversimplifying something so complex and evidently intricately woven into the human condition. But I am expressly thinking about understanding the will of each religion's deity/deities - what is done with those instructions is a different topic.

So let's consider, hypothetically a religion has a deity or some deities. This essentially what defines a religion right? That it has gods? Perhaps a religion has no gods, merely guidelines that were divined by a group of humans that are now revered for their amazing insight. That sounds a bit like a philosophical cult which is my first proposition, but I'm getting ahead of myself. Now that we've established a god or gods, how does information exchange occur?

  1. There is an elite class of god listeners who alone hear god's messages and then repeat them to the rest of that god's followers.
  2. All of that god's followers attempt to divine their god's meaning and then share and debate their theories to come up with some consensus.
Number one is clearly a form of aristocracy because the god has divined who shall be the people who receive the message and make decisions for others just as a monarch is generally chosen through some cosmic means. However if the god-listener class is elected it might possibly be considered some form of democracy. More likely the god-listener class takes that right through an exertion of their power (either by force or through influence), which might make it either a tyranny or oligarchy. Perhaps these types of institutions are all called republics - a small group represents a larger group. But it can go terribly wrong if the larger group doesn't question the validity of the small group. I wonder is it a sin to question the pope? Or a pastor's interpretation of the bible. Even a simple Sunday morning comic strip has multiple interpretations; isn't it more likely that a literary work of unknown origin transcribed multiple times may have ambiguous meaning? I'm not saying that we can't ultimately come to a consensus, there is meaning everywhere that we can all agree on, e.g. that murder is generally bad is agreed by all. What I mean is that unquestioning acceptance of religious, governmental or scientific dogma is both lazy and very dangerous. As Socrates said in the Apology, "an unexamined life is not worth living."

Number two sounds like a philosophy to me. I like it.

OK, let's take this a few steps further. Now replace god or gods with some other belief. Say god = the universe? Or gods = scientific theories, because let's be frank, no matter how much proof we have, even acceptance of a law is still merely just a belief. We believe that electrons tunnel through energy barriers because we have seen so much evidence that suggests convincingly that it may be true. But Einstein and Copernicus and Galileo can attest to the fact that even "scientific" theories and laws evolve and shift as new evidence comes to light. So I digress, my point with this last exercise is that religion has many societal parallels.

I also realized, while talking with my wife about suffering and grief that even if you can't hear your deity or deities message, when you are in need, merely believing that they exist and love you, may be a solace, and I like that too. The universe is a cold hard place, and it's always nice to have a friend.

Equality increases self esteem

Similar to my post on "the second law of infodynamics" this post proposes another completely hypothetical theory of human social interaction.
Equality increases self esteem.
Right now my son loves wearing pink, and says today, "I'm wearing a dress." Next week it will be a different color. He is completely innocent. Theses issues seem trivial to us today in the socially enlightening 21st century. In fact we view it as a victory over the absurd and old-fashioned dogma about distinct gender roles.

So lets examine that dogma. What was its motivation? I propose that it was a defensive coping mechanism. To cope with what? What could possible happen if a boy did wear a dress? Or a woman was a combatant? Or a same sex marriage occurred? Did that mean that I might start wearing dresses, because secretly I wanted to but my society told me it was wrong so I felt insecure and bad about myself? If I was secure about my individuality, why would I care what another person did? Merely being irked or irritated is not a reason for outrage is it? No there has to be a deeper reason. Our prejudices are manifestations of our inward fears. We are racist because we seek to dehumanize and theretofore justify the luxuries we take for granted at the expense of others' suffering. We are sexist and homophobic for the same reasons.

But what happens when we remove these barriers? The we don't have to be defensive. There is nothing to cope with. We can feel good about ourselves whoever we are. Equality increases our self esteem.

Monday, June 10, 2013

quantities and units

[UPDATED 2013-07-24] add buckingham.py

Main Contenders

Looking at doing calculations with units? Let's see what's out there. Start by doing a quick Google search with Python + Units. The first site that looks like a match is Python Units.

Python units 0.06

  • Last updated: 2013-2-25
  • Download: PyPI
  • Documentation: None [1]
  • Repository: Bitbucket
  • Last commit: 2013-02-24
  • Owner: Aran Donohue
Then there's a few SO hits and some personal blog entries similar to pp. Python Quantities seems to be a recurring theme.

Python quantities 0.10.1

A little further down is new contender called Pint.

Python Pint 0.2

With some digging a few more packages pop up. A relative newcomer is Python-numericalunits.

Python numericalunits 1.11

  • Last updated: 2013-02-21
  • Download: PyPI
  • Documentation: None [1]
  • Repository: Github
  • Last commit: 2013-02-22
  • Owner: Steve Byrnes
One package that was really hard to find, only saw it in a SO post was Unum.

Python Unum 4.1.1

  • Last updated: 2010-06-19
  • Download: PyPI
  • Documentation: linked to from here
  • Repository: Bitbucket
  • Last commit: 2012-03-25
  • Owners: Chris MacLeod, Pierre Denis

Others

There are probably several others, but I think these are the main contenders. I found some by using search within PyPI, eg: magnitude-0.9.1  (c. 2007). Several are listed in this SO question including buckingham.py. Finally, DimPy (c. 2008) just randomly appeared way down the list when I Googled how to add a new unit to quantities, which is possible, but not well documented.
>>> US_cent = pq.UnitCurrency('cent', 1, u_symbol=u'¢')
>>> US_dollar = pq.UnitCurrency('dollar', 100 * US_cent,
                                'cent', u_symbol=u'$')
>>> cost = 10 * US_cent / pq.kWh
>>> print cost

SciPy.constants

I think it's important to note that SciPy does have many physical constants and conversion-factors to SI units. In fact it's a bit disappointing to see such a flagrant violation of the DRY principle with numerous physical constants and CODATA files floating around. But SciPy does not really have a good representation of units and a framework for using units in calculations.

Usage

Most of the packages are the same, multiplication by the units, creates a new class instance of the units. Here a snippet from Pint's documentation:
>>> distance = 24.0 * ureg.meter
>>> print(distance)
24.0 meter
>>> time = 8.0 * ureg.second
>>> print(time)
8.0 second
>>> speed = distance / time
>>> print(speed)
3.0 meter / second
The exception to this pattern is Python-units which uses a call to create objects.
>>> meters = unit('m')
>>> distance = meters(10)
Python-quantities is the only package with dependencies; it depends on NumPy, which really doesn't matter to me. Pint also supports NumPy arrays, which is important.

Snap Decision

Difficult to compare and decide without trying them all out. Who has time for that? So I think unit and numericalunits are both to undocumented for my taste. Unum looks like it is unsupported and/or not active anymore. That leaves Pint and quantities. Pint looks really slick, I like their design principles and it looks like their 0.3 release is coming out soon. It looks like quantities has been around for a while, there are both positive and  negative reviews, although to be fair that post about temperature conversions from C to F is the main reason SciPy doesn't have support for units conversions although it does have a great constants class with units. So I think I'll try quantities first, but keep my eye on Pint too. I hope to have a part II with some comparisons between these two soon.

Footnote

[1] There is some documentation for both units and numericalunits on their PyPI sites.

Tuesday, June 4, 2013

Sphinx with NumPyDoc and Consolidated Fields

[UPDATE 2014-01-31] This is a major update - Sphinx-1.3 now packages Napoleon, allowing you to use Google or NumPy style documentation and have them produce Sphinx formatted documentation.

Sphinx documentation is awesome as is, although IMHO the unformatted docstring is not easy to read (EG: using help(my_fun) to get help on my_fun will show the Sphinx ReST roles and directives). A couple of cool tweeks are consolidated fields and the NumPyDoc extension. For contrast I've also included the Google recommended Python documentation style for unformatted docstrings.

Monday, June 3, 2013

XLRD vs OPENPYXL, Round II

[UPDATE 2013-12-02] The major issue discussed in this post, RE: charts not read, worksheets skipped and out of order, was resolved and pulled into the latest release 1.7.0 as well as many other bug fixes. With this latest version, I think that OpenPyXL can be considered the dominant OOXML (post 2007) Excel reader and writer. Note that OpenPyXL is the default Excel reader for Pandas the rapidly growing Python data analysis toolset.

This is a continuation of the previous post on reading Excel from Python. Uh, I might have called it too early! XLRD pulls ahead, but will it win the bout? Read on ...

Reading the contents

Assume we have a sample Excel spreadsheet with 3 worksheets and 2 charts on sheets. In OpenPyXL, you load the workbook, but right away you notice something wrong with the sheet names. Where's 'Sheet3'?
>>> wb_openpyxl = load_workbook(sample)
>>> wb_openpyxl.get_sheet_names()
['Sheet1',
 'Sheet2',
 'Sheet3']
Loading the sheets with XLRD get's it right.
>>> wb_xlrd = open_workbook(sample)
>>> wb_xlrd.sheet_names()
[u'Sheet1',
 u'Sheet2',
 u'Sheet3']
XLRD returns the sheets in the same order as they are visible in the actual spreadsheet, and omits the charts which don't actually contain any data. Unfortunately OpenPyXL can't tell charts from sheets just yet, and is actually naming some of the sheets incorrectly after the charts.
'Sheet1' --> 'Sheet1'
'Sheet2' --> 'Sheet2'
'ChartA' --> 'Sheet3'
This would be OK, since all of the sheets are there, and you could use the sheets' indices, but if you don't only know their names and not the order, then this is an issue. It has been reported in issues #179, #165 and #209. Unfortunately, this issues affects the optimized reader as well. I sent a pull request with a proposed fix for it that has already been merged with master. This issue was resolved and pulled into the current release, OpenPyXL-1.7.0.

Reading Cells

OpenPyXL can use the Excel format, EG: 'A3' or by row & column.
>>> ws1_openpyxl = wb_openpyxl.get_sheet_by_name('Sheet1')
>>> ws1_openpyxl.cell('A3').value
XLRD only reads cells by (row, column).
>>> ws1_xlrd = wb_xlrd.get_sheet_by_name('Sheet1')
>>> ws1_xlrd.cell_value(2, 0)
Both can let you slice the data, but OpenPyXL also allows ranges using Excel format.
>>> ws1_openpyxl.range('A1:C2')
Weird thing about the optimized reader in OpenPyXL, is that it only allows reading sheet contents using the iter_rows() function, which in a way defeats the purpose of the optimized reader, since you have to read in all of the columns in each row!
>>> all_rows = [r for r in ws1_openpyxl.iter_rows()]

The Winner

I think XLRD wins this round, because even though its documentation is sparse, it's not rocket science, and it get's the worksheets, relatively quickly, and more importantly correctly! The screw up with the charts is kind of a non-starter for OpenPyXL.

And another thing occurred to me during Round II. XLRD can open any Excel spreadsheet dating back to like 1995, but OpenPyXL is only for Excel 2007 and newer, which if you didn't know is a zipped XML file.

Finally, even though XLRD doesn't let you use the easy Excel cell reference notation, it is generally faster. And the iter_rows() limitation for the optimized reader in OpenPyXL is a bit annoying, since you're forced to read in many columns that you might not have wanted to read!

please start your project major version at zero

This post is probably a repeat of something on Coding Horror, but in case it hasn't been stated before, I'd like to make the case for starting your version numbering at zero. Why, well, because some code is available before it has been thoroughly tested and depends on user feedback to determine when the code has enough of the bugs worked out to consider it mature. During this time its version number may be changing rapidly, but the one thing that distinguishes a fledgling package from a robust one is the zero in front of the version number. It says, hey I was just born. I might not be complete. I might have bugs. I might break or die. I'll let you all know when I get the code equivalent of a bar-mitzvah by changing my zero to a one.
if major_version:
    print "I'm mature.",
    print "This is release #%d." % major_version
else:
    print "I'm newish,",
    print "so I might still have some big issues."
Is that so hard?
Fork me on GitHub