Wednesday, June 19, 2013

Dates and Datetimes


NumPy Datetimes

NumPy has datetimes, called datetime64 to avoid confusion with the Python datetime module and class. But it only uses ISO 8601 formats for text entries. i.e.: 2013-06-19T16:14:32.00-0700. It will also take a Python datetime.datetime() or numpy.datetime64() as an argument, but NumPy will always shift the date/time to the local timezone. If the Python datetime.datetime() object is naive (IE no tzinfo ) then NumPy will assume it is UTC (Zulu, GMT or +0000). Calling numpy.datetime64().item() will return the UTC equivalent Python datetime.datetime() object.

Examples with np.datetime64 dtype:

>>> import numpy as np
>>> from datetime import datetime
>>> np.datetime64(datetime.today().isoformat())
numpy.datetime64('2013-06-19T16:17:27.612000-0700')

Examples with np.array:

>>> dt = np.dtype([('dates', 'datetime64[D]'), ('dni', float)])
>>> data = [('2001-01-01', 834.34),
...         ('2001-01-02', 635.12)]
>>> npdata = np.array(data, dt)
array([(datetime.date(2001, 1, 1), 834.34),
       (datetime.date(2001, 1, 2), 635.12)],
      dtype=[('dates', '<M8[D]'), ('dni', '<f8')])

Repeat that with a datetime using Zulu time.

>>> dt = np.dtype([('dates', 'datetime64[m]'), ('dni', float)])
>>> data = [('2001-01-01T00:30Z', 834.34),
...         ('2001-01-01T01:30Z', 635.12)]
>>> npdata = np.array(data, dt)
array([(datetime.datetime(2001, 1, 1, 0, 30), 834.34),
       (datetime.datetime(2001, 1, 1, 1, 30), 635.12)],
      dtype=[('dates', '<M8[m]'), ('dni', '<f8')])

Repeat that with a datetime using UTC offset (+0000) for Zulu.

>>> dt = np.dtype([('dates', 'datetime64[m]'), ('dni', float)])
>>> data = [('2001-01-01T00:30-0000', 834.34),
...         ('2001-01-01T01:30-0000', 635.12)]
>>> npdata = np.array(data, dt)
array([(datetime.datetime(2001, 1, 1, 0, 30), 834.34),
       (datetime.datetime(2001, 1, 1, 1, 30), 635.12)],
      dtype=[('dates', '<M8[m]'), ('dni', '<f8')])
n.b.: Numpy converts strings for you, so you don't have to use np.datetime64 to cast them as datetime64 dtypes. Also it converts them to Python datetime.datetime or datetime.date, depending on your date units and shifts them to GMT (or Zulu) time. NumPy seems to handle dates and datetimes with the default units of day, e.g.: [D], but for structured arrays you must specify the datetime units e.g.: [D], [m], [s] or [ms] (see datetime units) in addition to datetime64 as the dtype or NumPy gives you this cryptic error:
Value Error: Cannot create a NumPy datetime other than NaT with generic units
Thanks to this answer on SO for unriddling that puzzle. If you make a NumPy datetime with nothing, you'll discover that NaT means "Not a time". In addition you may get this error, which is a bit more informative.
TypeError: Cannot cast datetime.datetime object from metadata [us] to [D] according to the rule 'same_kind'
This is because datetime.datetime uses micro-seconds [us] as its default, but NumPy uses days [D]. Specify the dtype using [us] or some form of seconds units, e.g.: [s], [ms], and it should work.

Matplotlib.dates.datestr2num()

This is an undocumented function that uses dateutils to convert a string to a floating decimal number that matplotlib uses to treat dates, similar to MATLAB and Excel. The function can also be imported via pylab.
>>> import pylab
>>> import pytz
>>> pst = pytz.timezone('US/Pacific')
>>> some_date = pylab.datestr2num('1998-1-1 12:15-0800')
>>> pylab.num2date(some_date, pst)
datetime.datetime(1998, 1, 1, 12, 15, tzinfo=<DstTzInfo 'US/Pacific' PST-1 day, 16:00:00 STD>)
>>> same_date = pylab.datestr2num('1/1/1998 12:15-0800')
>>> pylab.num2date(some_date, pst)
datetime.datetime(1998, 1, 1, 12, 15, tzinfo=<DstTzInfo 'US/Pacific' PST-1 day, 16:00:00 STD>)
Pretty nifty! Works better than I thought! In fact I like it way better than NumPy or Python for that matter. Note how using pytz helps matplotlib set the timezone; the tz class from dateutils can also be used to set tzinfo. Also note that if we hadn't set the UTC offset in the string, then it would have output 4:15 AM instead of 12:15, since it would have assumed GMT. Also matplotlib is pretty smart about determining the format; the default is month/day/year.

Python datetime

The lame way to do this is with Python's datetime.strptime(), but it doesn't support the %z directive for UTC offset (that's only for strftime() functions of datetime, date and time instances), and it only has an abstract class tzinfo for timezone, which can be replaced by pytz.
>>> dt = datetime.strptime('1998-1-1 12:15', '%Y-%m-%d %H:%M')  # naive datetime instance
>>> print dt
1998-01-01 12:15:00
>>> new_dt = datetime(*dt.timetuple()[0:6], tzinfo=pst)  # aware datetime instance
>>> print new_dt
1998-01-01 12:15:00-08:00
>>> new_dt.toordinal()
729390
The ordinal of the datetime is the date and hour part but matplotlib also outputs the fractional sub-day portion. The strptime() classmethod lets you set the format, which is very nice.

Time

When working with times, it will assume 1900-01-01, while NumPy assumes 1970 and matplotlib will default to today's date. But it's actually hard to create a NumPy time only as in Python datetime.time. Maybe there is a correct way to do it, but I could only make datetime.date and datetime.datetime with numpy.datetime64.

NumPy time example:

>>> np.array(datetime.time(8, 30), dtype='datetime64[m]')
Could not convert object to NumPy datetime
The only way I could do it was with timedelta64.
>>> t = np.timedelta64(8, 'h') + np.timedelta64(30, 'm')
>>> print t
510 minutes
>>> np.array(t, dtype='datetime64')
>>> array(datetime.datetime(1970, 1, 1, 8, 30), dtype='datetime64[m]')
So you can see, NumPy just randomly chose 1970 to be the year! As I said, Python reverts to 1900. For these examples I have to use datetime.time, so reimport datetime by itself. Also I assume that pytz was imported and pst is a pytz.timezone instance of 'US/Pacific' as in the sections above.

Python datetime.time example:

>>> import datetime
>>> t = datetime.time(8, 30, tzinfo=pst)
>>> t.strftime('%Y-%m-%d %H:%M %Z')
'1900-01-01 08:30 US/Pacific'
The timezone name, %Z, worked but I couldn't get %z to show the UTC offset. But both worked for datetime.datetime.
>>> dt = datetime.datetime.strptime('8:30','%H:%M')
>>> dt.replace(tzinfo=pst).strftime('%m/%d/%Y %H:%M:%S %z (%Z)')
'01/01/1900 08:30:00 -0800 (PST)'
Finally, as I said, matplotlib assumes whatever the current date. Notice too, that since it knows that it's currently daylight savings time!

matplotlib time example:

>>> md = pylab.num2date(pylab.datestr2num('8:30 -0700'), pst)
>>> md
datetime.datetime(2013, 6, 19, 8, 30, tzinfo=<DstTzInfo 'US/Pacific' PDT-1 day, 17:00:00 DST>)
You can use datetime.replace() to swap out the date for whatever you want.
>>> print md.replace(1998, 1, 1)
1998-01-01 08:30:00-07:00
This works for tzinfo too.
>>> print dt.replace(tzinfo=pst)  # aware datetime instance
1998-01-01 12:15:00-08:00

ISO 8601 Format

All of the datetime classes and therefore matplotlib too, all have an isoformat function.
>>> md.isoformat()
'2013-06-19T08:30:00-07:00'

Timezone

Hope you noticed that there are lots of timezone info.
>>> t.tzname()
'US/Pacific'
>>> md.tzname()
'PDT'
Please see the pytz and dateutils packages for complete details on using timezones, as there are package specific methods other than replacing tzinfo. For example, pytz exposes the localize method to create a datetime directly from a timezone object.
>>> pst.localize(datetime(2013,4,20,12,30))
datetime.datetime(2013, 4, 20, 12, 30, tzinfo=<DstTzInfo 'US/Pacific' PDT-1 day, 17:00:00 DST>)

Python time

This is a separate module for timing CPU operations. Also IPython has its own magical time, which can be called using %time. You can use time.clock() to measure how fast code runs on most platforms, and time.sleep() will make it pause. Wow! I hope I can remember all of this!

No comments:

Post a Comment

Fork me on GitHub