Pandas is a great tool for analyzing large data sets, especially time-series data. It quickly and easily imports most basic data files: Excel, comma-separated values, etc., but not MATLAB mat-files. However, SciPy does import MATLAB mat-files, so combining packages gets the job done.
Here's an example of a mat-file that has a single variable, called measuredData
, that contains a MATLAB structure with a timeStamps
field and several time series data fields, voltage
, current
and temperature
and some other fields that are irrelevant. There is also a field called numIntervals
that contains the number of intervals in the time series data sets. The struct itself has only one element.
import numpy as np from scipy.io import loadmat # this is the SciPy module that loads mat-files import matplotlib.pyplot as plt from datetime import datetime, date, time import pandas as pd mat = loadmat('measured_data.mat') # load mat-file mdata = mat['measuredData'] # variable in mat file mdtype = mdata.dtype # dtypes of structures are "unsized objects" # * SciPy reads in structures as structured NumPy arrays of dtype object # * The size of the array is the size of the structure array, not the number # elements in any particular field. The shape defaults to 2-dimensional. # * For convenience make a dictionary of the data using the names from dtypes # * Since the structure has only one element, but is 2-D, index it at [0, 0] ndata = {n: mdata[n][0, 0] for n in mdtype.names} # Reconstruct the columns of the data table from just the time series # Use the number of intervals to test if a field is a column or metadata columns = [n for n, v in ndata.iteritems() if v.size == ndata['numIntervals']] # now make a data frame, setting the time stamps as the index df = pd.DataFrame(np.concatenate([ndata[c] for c in columns], axis=1), index=[datetime(*ts) for ts in ndata['timestamps']], columns=columns)
No comments:
Post a Comment