Friday, April 22, 2016

Uncertainty Unwrapped

Please read my disclaimer.

I'm proud to announce UncertaintyWrapper at the Cheese Shop. This work was supported by my employer, SunPower Corp. and is currently offered with a standard 3-clause BSD license. The documentation, source code and releases are also available our SunPower org GitHub page.

So what does uncertainty_wrapper do? Let's say you have a Python function, to calculate solar position, and the function uses a C/C++ library via the Python ctypes library. Or maybe you just have a really complicated set of calculations that you repeat 8760 times, and you want it to run super fast, so you don't want it to calculate derivatives and uncertainty at every internal step, just the final output. Oh and by the way, you want all 8760 calculations vectorized, _ie_: done concurrently as much as possible.

Heres an example using PVLIB of just the first 24 hours.

import numpy as np  # v1.11.0
import pandas as pd  # v0.18.0
import pytz  # v2016.1
import pvlib  # v0.3.2
from uncertainty_wrapper import unc_wrapper_args  # v0.4.1

PST = pytz.timezone('US/Pacific')  # Pacific Standard Time
times = pd.DatetimeIndex(start='2015/1/1', end='2015/1/2', freq='1h', tz=PST)  # date range

# arguments for the number of observations
# new in UncertaintyWrapper==0.4.1, jagged arrays are okay
latitude, longitude, pressure, altitude, temperature = 37., -122., 101325., 0., 22.

# standard deviation of 1% assuming normal distribution
covariance = np.tile(np.diag([0.0001] * 5), (times.size, 1, 1))  # tile this for the number of observations

@unc_wrapper_args(1, 2, 3, 4, 5)
# indices specify positions of independent variables:
# 1: latitude, 2: longitude, 3: pressure, 4: altitude, 5: temperature
def spa(times, latitude, longitude, pressure, altitude, temperature):
    # location class only used prior to pvlib-0.3
    dataframe = pvlib.solarposition.spa_c(times, latitude, longitude, pressure, altitude, temperature)
    retvals = dataframe.to_records()
    zenith = retvals['apparent_zenith']
    zenith = np.where(zenith<90, zenith, np.nan)
    azimuth = retvals['azimuth']
    return zenith, azimuth

ze, az, cov, jac = spa(times, latitude, longitude, pressure, altitude, temperature, __covariance__=covariance)
df = pd.DataFrame({'zenith': ze, 'az': az}, index=times)  # easier to view as dataframe
print df
#                                    az     zenith
# 2015-01-01 00:00:00-08:00  349.297715        NaN
# 2015-01-01 01:00:00-08:00   40.210628        NaN
# 2015-01-01 02:00:00-08:00   66.719304        NaN
# 2015-01-01 03:00:00-08:00   80.930185        NaN
# 2015-01-01 04:00:00-08:00   90.852887        NaN
# 2015-01-01 05:00:00-08:00   99.212426        NaN
# 2015-01-01 06:00:00-08:00  107.181217        NaN
# 2015-01-01 07:00:00-08:00  115.450451        NaN
# 2015-01-01 08:00:00-08:00  124.564183  84.113440
# 2015-01-01 09:00:00-08:00  135.023137  74.984664
# 2015-01-01 10:00:00-08:00  147.247403  67.475783
# 2015-01-01 11:00:00-08:00  161.371578  62.273878
# 2015-01-01 12:00:00-08:00  176.922804  60.008978
# 2015-01-01 13:00:00-08:00  192.742327  61.017538
# 2015-01-01 14:00:00-08:00  207.519768  65.144340
# 2015-01-01 15:00:00-08:00  220.494108  71.839001
# 2015-01-01 16:00:00-08:00  231.600910  80.422988
# 2015-01-01 17:00:00-08:00  241.184075  89.948123
# 2015-01-01 18:00:00-08:00  249.726361        NaN
# 2015-01-01 19:00:00-08:00  257.751550        NaN
# 2015-01-01 20:00:00-08:00  265.873170        NaN
# 2015-01-01 21:00:00-08:00  275.014534        NaN
# 2015-01-01 22:00:00-08:00  287.078877        NaN
# 2015-01-01 23:00:00-08:00  307.283646        NaN
# 2015-01-02 00:00:00-08:00  348.921385        NaN

# covariance at 8AM
idx = 8
print times[idx]
# Timestamp('2015-01-01 08:00:00-0800', tz='US/Pacific', offset='H')
nf = 2  # number of dependent variables: [ze, az]
print cov[(nf * idx):(nf * (idx + 1)), (nf * idx):(nf * (idx + 1))]
# [[ 0.6617299  -0.6152971 ]
#  [-0.6152971   0.62483904]]

# standard deviation
print np.sqrt(cov[(nf * idx), (nf * idx)]) / ze[idx]
# 0.0096710802029002577

# Jacobian at 9AM
nargs = 5  # number of independent args
print jac[nf*(idx-1):nf*idx, nargs*(idx-1):nargs*idx]
# [[  5.56456716e-01  -6.45065654e-01  -1.37538277e-06   0.00000000e+00    4.72409055e-04]
#  [  8.29163154e-02   6.47436098e-01   0.00000000e+00   0.00000000e+00    0.00000000e+00]]

First this tells us that the standard deviation of the zenith is 1% if the input has a standard deviation of 1%. That's reasonable. This also tells that zenith is more sensitive to latitude and longitude than pressure or temperature and more sensitive to latitude than azimuth is.

Thursday, November 5, 2015

Wrangling Django ArrayField Migrations

Unfortunately you can't depend on makemigrations to generate the correct SQL to migrate and cast data from a scalar field to a PostgreSQL ARRAY. But Django provides a nifty RunSQL that's also described in this post, "Down and Dirty - 9/25/2013" by Aeracode, the original creator of South predecessor of Django migrations. That post even mentions using RunSQL to alter a column using CAST.

The issue and trick to migrating a column to an ArrayField is given by PostgreSQL in the traceback, which says:

column "my_field " cannot be cast automatically to type double precision[]
HINT:  Specify a USING expression to perform the conversion.
Further hints can be found by rtfm and searching the internet, such this stackoverflow Q&A. My procedure was to use makemigrations to get the state_operations and then wrap each one into a RunSQL migration operation.

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

from django.db import migrations, models
import datetime
from django.utils.timezone import utc
import django.contrib.postgres.fields
import simengapi_app.models
import django.core.validators

class Migration(migrations.Migration):

    dependencies = [
        ('my_app', '0XYZ_auto_YYYYMMDD_hhmm'),

    operations = [
            ALTER TABLE my_app_mymodel
            ALTER COLUMN "my_field"
            TYPE double precision[]
            USING array["my_field"]::double precision[];
                        base_field=models.FloatField(), default=list,
                        verbose_name=b'my field', size=None

Tuesday, October 20, 2015

REST-ful revelations

I've started using Django REST Framework, and it is simply magic!

Here is a technique I've used to input lists of primitive types and serializers with many=True

from functools import partial
from rest_framework import viewsets
from rest_framework.response import Response
from rest_framework import status
from my_app.serializers import MyNestedModelSerializer

class MyNestedModelViewSet(viewsets.ViewSet):
    serializer_class = MyNestedModelSerializer

    def create(self, request):
        serializer = self.serializer_class(
        # get the submodel list serializer since it can't render/parse html
        submodel_list_serializer = serializer.fields['submodels']
        # make a partial function by setting the submodel list serializer
        partial_get_value = partial(custom_get_value, submodel_list_serializer)
        # monkey patch submodel_list_serializer.get_value() with partial function
        submodel_list_serializer.get_value = partial_get_value
        if serializer.is_valid():
            simulate_data =
            # do stuff ...
            return Response(, status=status.HTTP_201_CREATED)
        return Response(serializer.errors, status=status.HTTP_400_BAD_REQUEST)

The function `custom_get_value()` uses JSON to parse the input:

def custom_get_value(serializer, dictionary):
    if serializer.field_name not in dictionary:
        if getattr(serializer.root, 'partial', False):
            return empty
    # We override the default field access in order to support
    # lists in HTML forms.
    if html.is_html_input(dictionary):
        listval = dictionary.getlist(serializer.field_name)
        if len(listval) == 1 and isinstance(listval[0], basestring):
            # get only item in value list, strip leading/trailing whitespace
            listval = listval[0].strip()
            # add brackets if missing so that it's a JSON list
            if not (listval.startswith('[') and listval.endswith(']')):
                listval = '[' + listval + ']'
            # try to deserialize JSON string
                listval = json.loads(listval)
            except ValueError as err:
                # return original string and log error
            # set the field with the new value list
            dictionary.setlist(serializer.field_name, listval)
        val = dictionary.getlist(serializer.field_name, [])
        if len(val) > 0:
            # Support QueryDict lists in HTML input.
            return val
        return html.parse_html_list(dictionary, prefix=serializer.field_name)
    return dictionary.get(serializer.field_name, empty)

Thursday, May 21, 2015

Dr. Horrible's Sing Along Blog

Love struck super villain (Neil Patrick Harris) loses to super hero (Nathan Fillion) musical. This just never get’s old. Almost as good as the official Star Wars trailer.

Thursday, May 14, 2015

[MATLAB] Do *not* use `obj.empty` to preallocate object array

FYI: You do not need to use `obj.empty` to preallocate an object array.

In fact as soon as you assign a value to any element in the object array it grows the array to that size, which allocates (or reallocates) RAM for the new object array, therefore defeating the point of preallocating space.

From Empty Arrays section of OOP documentation:

“If you make an assignment to a property value, MATLAB calls the SimpleClass constructor to grow the array to the require size:”

Instead if you want to preallocate space for an object array, grow the array once by assigning the last object first. This requires the class to have a no-arg constructor. Each time you grow your array you will reallocate RAM for it, wasting time and space, so do it once with the max expected size of the array. See Initialize Object Arrays and Initializing Arrays of Handle Objects in the OOP documentation.

  >> S(max_size) = MyClass(args)

Another option is to preallocate any other container like a cell array (best IMHO), structure or containers.Map and then fill in the class objects as they are created. An advantage to this is you don’t have to subclass matlab.mixin.Heterogeneous to group different classes together.

  >> S = cell(max_size); args = {1,2,3;4,5,6;7,8,9};
  >> for x = 1:size(args,1), S(x) = MyClass(args{x,:});end

The only time to use an empty object is if you want it as a default for the situation where nothing gets instantiated, and you need the it be an instance of the class. Of course any empty array will do this, IE: '', [] and {} are also empty.

  >> S = MyClass.empty
  >> if blah,S = MyClass(args);end
  >> if isa(S, 'MyClass') && isempty(S),do stuff; end

Another reason might be to clear defaults if the constructor is called recursively, although obj.delete will do the same thing.

I hope this helps someone; it definitely helped me understand the odd nature of MATLAB. This behavior is because everything in MATLAB is an array, even a scalar is a <1x1 double> read the C-API mxArray for external references and mwArray for compiled/deployed MATLAB for more info.

MATLAB = Matrix Laboratory
Class definitions didn’t appear until 2008. Other languages like C++, Java, Python and Ruby are object first. So the empty method is meant to duplicate the ability to be empty similar to other MATLAB datatypes such as double, cell, struct, etc. IMO outside of MATLAB it's a very artificial and somewhat meaningless construct.

Wednesday, April 22, 2015

Git big media on Windows

[UPDATE 2015-06-09] I have switched to git-fat-0.5.0 [2015-05-06]; the fork on PyPI maintained on GitHub by Alan Braithwaite of Cyan Inc. Unfortunately, I could not figure out how to clone a git-media repo, therefore git-media sucks.

Git-Fat - just works

Basically this comes with everything you need to work on Windows, Mac or Linux. It uses rsync for transport, and the rest is mostly written in Python but does depend on some libraries that are standard in Linux and have mature ports in Windows and Mac. One of the major benefits of git-fat over git-media is that it uses a .gitfat config file which updates your .git/config when git fat init is run. This is similar to git submodules and makes repos portable. In general there's more functionality and features than git-media. For example, you can list the files managed by git-fat, check for orphans and pull data from or push data to your remote storage. The only catch is that the wheel file at PyPI has metatags for win32 not amd64. This is easy to fix, but I think there are a couple of use cases that might differ from how the distribution was implemented.


If you look at a Linux install, the repository has a symlink to in bin called git-fat. Why not just bootstrap git-fat if we only really need one file. Just dump everything in a single folder, change the file name to git-fat and make sure there's a shebang that uses #! /usr/bin/env python which git seems to prefer, then stick it on your path. This works for both msys-git and Windows cmd.


msysgit comes with Git Bash, a posix shell which includes many Linux libraries ported to Windows, such as gawk and ssh. Unfortunately it does not come with rsync, however you can get rsync from the msys source either from mingw-w64 (that's where I got it), from msys2, from the original mingw project, from mingw builds and from lots of places. You could even get it from cygwin. I usually stick files like this in my local bin folder which is always first on my path in git bash. You'll need to also grab the iconv, intl, lzma and popt msys libraries which rsync depends on. Anyway, since you have these libraries, you don't need the ones bundled in the wheel, however, git-fat is written to look for those bundled files if it detects that your platform is Windows, so just comment out those lines. You will need to change awk to gawk, since awk is a shell script that calls gawk. Again you can bootstrap this file, ie: put it in your local bin folder or install it into your Python site-packages and/or scripts folder.


This is the way I ended up using it. You can download my version here and install it with pip. I put the windows libraries into the site-packages git-fat folder instead of in scripts and then in the git-fat script, added the site-packages git-fat folder to the shell's path. Then I called the git-fat module as a script by adding a file to it which basically imports and calls it using Python with the -m option, but you could just as easily call the module as a script. This just keeps these extra libraries bundled together rather than dumping them into the scripts folder with everything else. Also since I mostly use git bash it doesn't put git-fat's libraries ahead of git's since they both use gawk and ssh.


Usage is extremely easy compared to git-media, which is a plus! Note these instructions are for msysgit git bash. For Windows cmd window replace git fat with git-fat everywhere. Both methods should work fine.

  1. Clone a repo that uses git-fat: git clone my://remote/repo.git
  2. At this point there are only placeholders for your files with the same names, but just sha numbers that tell git-fat which file to grab from your remote storage
  3. Run git fat init which sets up the filters and smudges that tell your local repo how to use git-fat with the .gitattributes file which is part of the repo already.
  4. Run git fat pull which downloads your files from the remote storage specified in the .gitfat, which is also already in the repo
  5. Run git fat list to see a list of managed files
  6. Run git fat status to see a list of orphans waiting to be pulled/pushed?

Creating a repo and setting it up to use git-fat is also easy. There is great help in the readme at PyPI and the readme at GitHub.

  1. Create a .gitfat file that specifies where rsync should store files. Note there are no indents. A windows UNC path seems to work fine.
    remote = //server/share/repo/fat
  2. Create a .gitattributes file to specify which files to store at the remote
  3. Commit the .gitfat and .gitattribute files
  4. Run git fat init to set up your local .git/config
  5. Hack, commit, push, etc.
  6. Run git fat push to send stuff to your remote

Git-Media - sucky

Finally time to install Ruby. You're going to need it if you want to use git-media which let's you mix big biinary files within your git repo, but store them in some remote host, which could be google-drive, amazon s3, another server via ssh/scp or a network share. Why don't you want to store big binary files in your git repo? Since Git stores each revision instead of deltas, that means that it will quickly blow up as you make new commits.


Super easy, they recommend 2.1, no admin rights required, unzips into c:\ just like python, I checked all of the options: tk/tcl, add ruby to path and what was the last option? Then I ran gem update.

git-media gem

gem install git-media trollup right_aws

right_aws OpenSSL::Digest issue

There is a tiny issue with right_aws where it outputs the message:

Digest::Digest is deprecated; use Digest
which is easily fixed by following the comments in git-media issue #3 or right_aws pull request #176.

git-media setup

The readme on the github overview page has everything you need to know.

other large file storage

Fork me on GitHub