CSV on the Web: Working with EnergyPlus results
Introduction
The ABCE Open Data Project is studying how different types of data can be published and shared in open, transparent and reusable ways in line with the FAIR data guidelines. This work considers the case of simulation or modelling data, when an engineering model generates an output file of prediction results based on a given set of inputs.
Here the EnergyPlus building simulation software is used. A simulation is run using the ‘1ZoneUncontrolled.idf’ input file, taken from the Examples folder of the EnergyPlus installation. The weather file ‘USA_CA_San.Francisco.Intl.AP.724940_TMY3.epw’ is used. Once the simulation is run there are a number of different output files. In this work we consider the CSV file of prediction results, which is named ‘eplusout.csv’.
The eplusout.csv results file.
The ‘eplusout.csv’ file is a CSV file of results from an EnergyPlus simulation run. The file contains a number of columns each relating to the results variables as shown below:
The first column is always the ‘Date/Time’ column and the remaining columns vary depending on what is asked for in the .idf input file. In our case the second column has the header ‘Environment:Site Outdoor Air Drybulb Temperature [C] (hourly)’. This means:
- The variable is about the ‘Environment’, i.e. the outdoor or external conditions.
- The quantity is the ‘Site Outdoor Air Drybulb Temperature’, which is a type of air temperature quantity that we can either measure or calculate.
- The units are ‘C’, i.e. degrees Celsius.
- The time interval is ‘hourly’, i.e. for this variable predictions are made for every hour.
The rows in the CSV file are the simulation predictions for each time stamp. The file first contains the results for a single day (24 rows, the winter design day on 21st December) and then the results for another single day (24 rows, the summer design day on 21st July). Following this there are 8,760 further rows which represent the predictions on an hour-by-hour basis for a complete year.
Two interesting things to note about this CSV file are:
- The timestamps (column 1) are in a non-standard format. No year value is provided and midnight is recorded with a time of ‘24:00:00’ (rather than the more standard ‘00:00:00).
- Where the variables have time intervals such as ‘daily’ or ‘monthly’ then this results is missing values for the intervening hourly rows. So a variable recorded daily will have a single result, then 23 rows of blank or missing data, and then the next daily result.
Converting the result to CSVW format
The ‘eplusout.csv’ file is fairly understandable but we can improve this by converting the file to CSVW format (see here for an Introduction to CSVW). This will place the header data into a more formal and easier to access format which will allow us to analyse the data in a quicker and easier fashion.
The two principles for converting the data to CSV format are:
- To not make changes to the initial CSV data unless necessary. This makes the process much easier and reproducible, and means that someone who understands EnergyPlus output CSV files but doesn’t understand CSVW will still be able to use the data.
- To create a CSVW metadata file (a metadata.json file) to accompany the CSV data. This metadata file will contain more information about the data as a whole and about the individual columns.
Creating the metadata.json file
To create the metadata.json file we follow the process outlined in the blog post CSV on the Web: Creating descriptive metadata files.
Table Description object
To create a metadata.json file for the ‘eplusout.csv’ file, we first create a Table Description object as follows:
This follows the standard format for a Table Description metadata object. The metadata.json file will be placed in the same folder as the ‘eplusout.csv’ file, so we can use a relative reference for the url
property. The Dublin Core vocabulary is used to provide additional information about the data.
The Date/Time Column Description object
Next we need to provide the Column Description objects for the columns
property. Here is the Column Description object for the ‘Date/Time’ column:
The titles
property here matches the header text in the CSV file. Because the timestamp used in EnergyPlus results files is a non-standard format, the datatype
given here is a string and an explanation of why this is the case is given in the rdfs:comment
property. The dc:description
and schema:variableMeasured
properties are included to match the method used for the remaining columns in the dataset.
The outdoor temperature Column Description object
The Column Description object for the second column in the CSV file is:
Here the titles
property matches the header text in the CSV file. The dc:description
property is given the header text as well. In the EnergyPlus documentation there is further information about what this particular variable represents and this is linked to using the dc:references
property. Note that the value of the dc:references
property is another object with an @id
property. This means that the dc:references
value is a URL (in this case http://bigladdersoftware.com/…). Following the advice in the CSV Primer (and as explained in a previous blog post on Working with Units of Measure) we can include a formal statement about the units of this variable using the http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure
property and the QUDT vocabulary (here degrees Celsius are represented using the QUDT URL http://qudt.org/vocab/unit/DEG_C
).
Finally the schema.org vocabulary was used to provide a more formal description of the information within the column header. schema:variableMeasured
is used to describe the variable (in the case this includes the feature of interest - Environment - and the quantity under study - Site Outdoor Air Drybulb Temperature). schema:unitText
provides another opportunity to state the units of the variable in this case as a simple text string (’C’). schema:duration
is used to formally describe the time interval of the variable (in this case ‘H1’ refers to ‘one hour’ i.e. hourly time intervals - see here for more details).
The remaining Column Description objects
The same process was completed for the remaining columns in the ‘eplusout.csv’ file. As there were many columns in the file, the process was automated using a Python script written in a Jupyter Notebook (available to view on GitHub here).
The final CSVW dataset was placed on the Loughborough University Research Repository here: https://figshare.com/s/464885898d0041bfa8fd
An example of analysis using the CSVW format
Once the metadata.json file is created, this can be used in the analysis of the data in the CSV file.
An example of this has been created and is available in a Jupyter Notebook on GitHub here. In this example the metadata.json file is used to filter the columns in the CSV file and return only those variables which are measured in degrees Celsius and which have an hourly time interval.
The resulting plot is:
Summary
This blog post has shown a method for creating a CSVW metadata.json file to accompany an EnergyPlus results CSV file. This demonstrates that EnergyPlus results can be shared using the CSVW format, making the data more understandable and reusable.
The approach taken here shows just one method of creating the CSVW files, there are many other ways that this could be done. There no 'right' method here but over time it's possible that the EnergyPlus community could reach an agreement on a best practice approach for creating the metadata files.
Further information
- The GitHub page with the script which created the metadata.json file: https://github.com/stevenkfirth/stevenfirth/tree/main/csv-on-the-web-working-with-energyplus-results
- The GitHub page with the analysis example: https://github.com/building-energy/ABCE_Open_Data_Project/tree/main/internal_test_datasets/simulation
- The final dataset on the Loughborough University Research Repository: https://figshare.com/s/464885898d0041bfa8fd
- My blog posts on CSVW