A collection of projects used in wrangling & exploring various Irish energy-related datasets
codema-dev
projects!Download, wrangle & explore all Irish energy datasets used by the codema-dev
team
⚠️ Some projects use closed-access datasets for which you will need permission from the
codema-dev
team to use! Email us at codema-dev@codema.ie
Run the projects in your browser by clicking on the following buttons:
⬅️ click me to launch workspace
Binder
can take a few minutes to setup this workspace, click Build logs > show
to see view the build progress.README.md
file, Open With > Notebook
and run all cellsBinder
runs this code in the cloud for free with the help of NumFocus
, if you find this useful consider donating to them hereBinder
on https://jupyterhub.github.io/nbgitpuller/link.htmlREADME.md > Open Preview
to view the project guidebash
cd NAME-OF-PROJECT
(/workspace/projects/venv)
disappears from your prompt this means your Terminal no longer has access to all of the dependencies required to run projects so you need to reactivate it by running:bash
conda activate /workspace/projects/venv
≡ > Terminal > New`` Terminal
💻 Running locally
environment.yml
of a project via Anaconda Navigatorenvironment.yml
in your Terminal:{code-cell} bash
conda create env --file environment.yml && conda activate NAME-OF-ENVIRONMENT
environment.yml
to view the environment name⚠️ Accessing closed-access data
.env
in your project directory.env
file:AWS_ACCESS_KEY_ID = "AKIA...."
AWS_SECRET_ACCESS_KEY = "KXY6..."
❓ FAQ
python-traceback
botocore.exceptions.NoCredentialsError: Unable to locate credentials
python-traceback
ModuleNotFoundError
conda install NAME
or pip install NAME
and raise an issue on our GithubAll raw data is saved on both Google Drive
and Amazon s3
. Amazon s3
is easier to query from within code than Google Drive
as it is possible to authenticate via environment variables to avoid a username/password login step. Google Drive
is still used for all data manipulated by Excel
or QGIS
. Amazon s3
enables the sharing of data between projects by storing intermediate datasets which in most cases here change only in frequently.
All code is saved to GitHub
which uses git
for version control: updating, reverting, branching, merging etc.
Getting code up and running on your local machine can be somewhat involved. Code engines such as binder
or Gitpod
enable running this code on cloud machines for free. They automate the building of the required installations using configuration files: environment.yml
for binder
and .gitpod.yml
+ .gitpod.Dockerfile
for Gitpod
.
All Python
packages are installed (mostly) from the conda-forge
channel using the conda
package manager.
Package | Use | Equivalent-To | Example-Use |
---|---|---|---|
pandas |
Data wrangling, visualisation & analysis | Microsoft Excel |
Estimating annual residential heat loss by combining columns and constants |
GeoPandas |
Geodata wrangling, visualisation & analysis | QGIS |
Linking small areas to postcode boundaries |
Ploomber |
To specify and execute all of the steps that need to be run in order to generate the output datasets or visualisations | - | Downloading and cleaning building data, and plotting district heating viability on a map |
seaborn |
Plotting charts and maps | QGIS |
Plotting building energy ratings |
bokeh |
Plotting interactive charts and maps | Tableau |
Plotting district heating viability on a map |
NetworkX |
Graph analysis | - | Finding the nearest substation to each region along the nearest electricity line |
Scikit Learn |
Machine learning | - | Clustering substations via intersubstation distances |
Package | Use | Equivalent-To | Example-Use |
---|---|---|---|
Microsoft Excel |
Data wrangling, visualisation & analysis | pandas |
Estimating waste heat source potential |
Google Sheets |
Data wrangling, visualisation & analysis | pandas |
“” |
QGIS |
Geodata wrangling, visualisation & analysis | GeoPandas |
Plotting report-ready images of district heating viability |
Tableau |
Plotting charts and maps | QGIS |
Plotting residential fuel poverty & hosting it online on Tableau Public |
Jekyll
is used to generate the website from simple text (or Markdown
) files and a pre-defined template.
It creates the necessary
HTML
,css
&JavaScript
files
GitHub Pages
is used to build and deploy the website from the file generated by Jekyll
In previous years all data wrangling was performed solely using Microsoft Excel
. Although this is useful for small datasets, it soon becomes a burden when working with multiple, large datasets.
For example, when generating the previous residential energy estimates it was necessary to create up to 16 separate workbooks for each local authority each containing as many as 15 sheets, as the datasets were too large to fit into a single workbook. Although each workbook performed the same logic to clean and merge datasets, changing this logic meant changing all of the separate workbooks one at a time.
Moving to open-source scripting tools enabled using logic written down in scripts (or text files) to wrangle and merge data files, thus separating data from the logic operating on it. This means that if any dataset is updated, re-generating outputs is as simple as running a few scripts. Furthermore these scripts can be shared without sharing the underlying datasets.
Criteria: a tool capable of modelling retrofitting hundreds of thousands of buildings to estimate energy & carbon savings, BER rating improvement and costs.
EnergyPLAN
is an energy system model that works well for comparing aggregated demand against renewable supply profiles. It doesn’t, however, model individual buildings and instead requires aggregated inputs for building energy demands.
SEAI’s Dwelling Energy Assessment Procedure (DEAP
) Excel model, EnergyPlus
and RC_BuildingSimulator
can model individual buildings using simple physics-based simulations but are difficult to scale. As a result, it is necessary to create a limited number of representative archetypes (<100) in order to use them to model building stocks. At present, archetype creation for these models is a long, manual process. To avoid this limitation some scripting libraries were experimented with to see if this process could be sped up:
DEAP
: pycel
enables replacing individual building characteristics specified in a DEAP
Excel
model via a Python
process, however, as of January 2020 pycel
library didn’t support all operations performed in the DEAP
spreadsheet.
EnergyPlus
: eppy
enables replacing building characteristics and geomeppy
geometry-specific characteristics via Python
. As of September 2020 these libraries are better suited to parameterising existing models than for creating them from scratch.
RC_BuildingSimulator
is a Python
library and so can be easily scaled. This library wasn’t used as it is not actively maintained, cumbersome to adapt to this use case and would require some validation as to its accuracy as it is not a widely used library.
CityEnergyAnalyst
also models individual buildings using physics-based simulations but is designed for district-level simulations. However, it is tied to Openstreetmaps
as a data source for building geometries and ages and to swiss building standards by building age for archetypes. As of October 2020 Openstreetmaps
was not as complete as in Switzerland, and decoupling CityEnergyAnalyst
from it proved difficult.
Tool | Barrier |
---|---|
EnergyPLAN |
Modelling building energy demands |
DEAP |
Scaling building energy demands |
EnergyPlus |
“” |
RC_BuildingSimulator |
Adaptation & validation for the Dublin building stock |
CityEnergyAnalyst |
Poor data quality for Dublin buildings |
As a consequence, we developed rc-building-model
which re-implements the DEAP
model in Python
. This model was tested and validated against the DEAP
Excel
model for individual buildings, and implemented to easily and rapidly scale to the Dublin building stock.
environment.yml
up to dateThis environment.yml
is built by merging the environment.yml
from each project. Binder
& GitPod
use it to create a sandbox environment in which all dependencies are installed.
To update this file run:
conda env create --file environment.meta.yml --name codema-dev-projects-meta
conda activate codema-dev-projects-meta
invoke merge-environment-ymls
conda env create
creates a virtual environment by readingenvironment.meta.yml
in whichinvoke
is defined as a dependency.invoke
then runs the functionmerge_environment_ymls
fromtasks.py
which merges theenvironment.yml
from each project and fromenvironment.meta.yml
together into a singleenvironment.yml
To speed up Binder
builds, Binder
reads the codema-dev/projects
dependencies from a separate repository codema-dev/projects-sandbox. You must also update the environment.yml
here with your newly generated environment.yml
to keep Binder
up to date!
Every time any file is changed
Binder
rebuilds the entire repository and reinstalls the dependencies. By keeping the environment and the content separateBinder
only reinstalls dependencies when the dependencies change. This means that it no longer has to download & resolve dependency conflicts which can take ~20 minutes.