Carousel - A Python Model Simulation Framework
I want to introduce Carousel, a project that my employer, SunPower has been supporting for use in prediction models. Carousel is an extensible framework for mathematical models that handles generic routines such as loading and saving data, generating reports, converting units, propagating uncertainty and running simulations so developers can focus on creating complex algorithms that are easy to share and maintain.
Mathematical models consist of algorithms glued together with generic routines. While the algorithms may sometimes be unique and complex, the rest of the code is often simple and routine. Sometimes mathematical models developed by teams of developers over time become difficult to update because there is no framework for how new data, calculations and outputs are integrated into the existing models. Carousel allows developers to focus on creating complex mathematical models that are robust and easy to maintain by abstracting generic routines and establishing a simple but extensible framework.
A Carousel basic model consists of 5 built in layers:
Carousel is extensible by creating more advanced models and layers. A Carousel model is a collection of layers. Carousel layers share a common base class. Each layer also has a corresponding object and a registry where objects are stored. All layers have a load method that loads all of the layer objects specified into the model. When a model is loaded it loads the objects specified for each layer.
Consider a load shifting algorithm for residential or commercial rooftop solar power. The model might have a performance calculation, a load calculation, a cost calculation and an optimization algorithm that determines how home or business appliances are operated to minimize overall yearly cost of the system.
The performance calculation contains several formulas which require input data from an internal database of solar panel parameters and an online API of weather conditions, so the user creates a data source and reader for each of these. There are some data readers already included in Carousel and once a data reader is created it can be reused in many different projects. Maybe the user submits a pull request to Carousel to add the new API and database reader. The load calculation contains formulas for how the appliances are used. The input data for the appliances are entered into a generic worksheet so the user creates a data source for appliances and uses the
XLRDReader to collect the data for each appliance from their worksheets.
The user organizes the formulas into 4 modules, that correspond to each of the calculations, but some formulas are reused since they are generic. For example, data frame summation formulas are used with different time series to create daily, monthly and annual outputs. The user maps out the calculations and specifies their interdependence to other formulas. For example, the cost calculation depends on the load and performance calculations and the optimization algorithm depends on the cost.
The user specifies each output name, initial value and other attributes. Specifications for each layer can be in a JSON parameter file or directly in the code as class attributes; Carousel will interpret either at runtime when it creates the model. Finally the user creates a simulation which in this case is unique because instead of marching through time or space, the simulation iterates over potential load shifting solutions from the algorithm. The user decides which data or outputs to log during the simulation and which to save in reports. Now that the model is created, the user loads the model and sends it the "start" command. After the simulation is complete, the user can examine the outputs and their variances. The outputs will have been automatically converted to the units specified in the model.
The data layer handles all inputs to the models. The data layer object is a data source. Each data source has a data reader. A data source is a document, API, database or other place from which input data for the model can be obtained. The data source and reader provide a framework for specifying how data is acquired. For example a data source for stock market prices might be a public API. An implementation of the stock market API data source specifies the names and attributes of each input data that will be read from the API and how the data reader should read them. The data source is similar to a deserializer because it describes how the data from the source should be interpreted by the model and creates an object in the data registry.
The formula layer handles operations on input data that generate new outputs. It differs from the calculation layer which handles how formulas are combined together. The formula layer object is a formula, and each formula has a formula importer. For example the Python formula importer can import formulas that are written as Python functions.
Calculations are combinations of formulas. Each calculation also has a map of what data and output are used as arguments and what outputs the return values will be. Calculations also implement calculators. Currently there is a static calculator and a dynamic calculator, but new calculators can be implemented that can be reused in other models. The calculation also implements indexing into data and output registries in order retrieve items by index or at a specific time.
Outputs are just like data except they don't need a reader because they are only generated from calculations. Each output is like a serializer because it determines how output objects will be reported or displayed to the user.
The simulation layer determines the order of calculations based on the calculation dependencies. It first executes all of the static calculations and then loops over dynamic calculations, displaying logs and saving periodic reports as specified.
The model is a low level class that can be extended to add new layers or implement new simulation commands. Currently only the basic model is implemented. A new model might contain a post processing layer that generates plots and reports from outputs.
Every layer has a dictionary called a registry that contains all of the layer objects and metadata corresponding to the layer attributes. The registry implements a register method that doesn't allow an item to registered more than once. Each layer registry is subclassed from the base registry so that specific layer attributes can be associated to each key. For example, data sources and outputs have a
variance attribute while formulas have an
Running Model Simulations
After a model has been described using the framework, it can be loaded. Then any or all of the model simulations can be executed from the model. The simulation specifies
commands to the model that user sends using the models
command method. Currently the basic model can execute the simulation
Units and Uncertainty
Carousel uses the Pint units wrapper to convert units as specified. Uncertainty is propagated using the UncertaintyWrapper package which was developed for Carousel. It can wrap Python extensions and non-linear algorithms without changing any code. It propagates covariance and sensitivity across all formulas.
A basic version of Carousel is ready now. There is an example of a photovoltaic module performance model in the documentation online at GitHub. Some ideas for new features are listed in the Carousel wiki on GitHub.
- Data validation
- Reuse 3rd party serializer/deserializer for data layer
- Model integrity check
- Database data reader
- REST API data reader
- Online repository to share data readers, simualtions, formulas and layers
- Automatic solver selection
- Post processing layer
- Testing tools
- Concurrency and speedups
- Remote process and Carousel client
Source, Docs, Issues and Wiki
Carousel was presented at the 5th Sandia PVPMC Workshop hosted by EPRI in Santa Clara in May 2016
Carousel and UncertaintyWrapper were developed with the support of SunPower Corp. They are distributed with a BSD 3-clause license.