Cambridge UK temperature forecast R models
Time series and other models for Cambridge UK temperature forecasts in R
If you like CambridgeTemperatureModel, give it a star, or fork it and contribute!
Summary of forecast error for mean, naive, simple exponential smoothing and Holt
smoothing methods for horizons up to 24 hours (48 * 1/2 hour forecasts):
Further details in Daily forecast baselines
section below.
NOTE: Data set has been substantially updated and cleaned since graph creation.
Requires R version 3.2.0 and higher.
To install the required libraries in an R session:
install.packages("caret", dependencies = c("Depends", "Suggests"))
install.packages("data.table")
install.packages("stationaRy")
install.packages("lubridate")
install.packages("forecast")
install.packages("prophet")
install.packages("suncalc")
install.packages("ggplot2")
install.packages("tseries")
Clone repository:
git clone https://github.com/makeyourownmaker/CambridgeTemperatureModel
cd CambridgeTemperatureModel
The R files can be ran in sequence or the R session image can be loaded.
Run files in sequence in an R session:
setwd("CambridgeTemperatureModel")
source("1-load.R", echo = TRUE)
source("2-clean.R", echo = TRUE)
Or load R session image:
setwd("CambridgeTemperatureModel")
load("data/CambridgeTemperatureModel.RData")
The Digital Technology Group in the Cambridge University
Computer Laboratory maintain a weather station.
I live close to this weather station. When I started looking at this data the UK Met Office
were updating forecasts every 2 hours. I thought I could produce a more frequent
nowcast) (one step ahead forecast)
using time series or statistical learning methods. Day long forecasts
are of secondary interest. Temperature and rainfall are the primary variables of
interest. Unfortunately, the rain sensor has issues.
I have no affiliation with Cambridge University, the Computer Laboratory or the Digital Technology Group.
The weather measurements include the following variables.
Variables | Units |
---|---|
Temperature | Celsius (°C) * 10 |
Dew Point | Celsius (°C) * 10 |
Humidity | Percent |
Pressure | mBar |
Wind Speed Mean | Knots * 10 |
Wind Bearing Mean | Degrees |
Timestamp | Data Hours![]() |
Dew point is the temperature at which air, at a level of constant pressure, can no longer hold all the
water it contains. Dew point is defined here
and in more detail here.
There are known issues with the sunlight and rain sensors. These measurements are not included for now.
Measurements are recorded every 30 minutes.
The data included start on 2008-08-01 when the weather station was moved to it’s current
location.
Unfortunately, the data is quite noisy and usually have a couple of hundred
missing observations every year. The following cleaning steps are performed:
The most recent cleaned data have no missing values.
Data older than 2021/04/26 have had less cleaning.
Outlier exclusion has been fairly conservative.
Some problems may remain in the data, such as short and/or long term sensor drift
or periods of anomolously high variance.
The following figure shows an older cleaned temperature time series.
A visual inspection indicates a lack of trend.
The ADF and
KPSS tests in the exloratory data analysis
file (described in the Files subsection below) implies the stationarity of this time
series.
Cambridge Airport weather measurements from ISD
were used to find outliers in the Computer Laboratory measurements and to replace
missing values. The stationaRy
package was used to download the ISD data. Unfortunately there are no pressure
measurements in the Airport observations. The ISD data is somewhat cleaner
than the Computer lab data. Data cleaning and limited interpolation were applied
to the Cambridge Airport data before being used to replace NAs in the Computer
lab data.
The following table shows accuracy metrics for baseline nowcast methods:
Method | RMSE | MAE | MAPE |
---|---|---|---|
Mean temperature | 64.46 | 52.63 | 249.91 |
Persistent temperature | 6.26 | 4.13 | 9.49 |
Simple exponential smoothing | 6.05 | 4.03 | 9.81 |
Holt exponential smoothing | 5.62 | 3.94 | 10.25 |
These results are from older partially cleaned observations.
These metrics are calculated in the baselines file briefly
described in the Files subsection. Numbers in bold indicate
the lowest value for each metric.
The three accuracy metrics:
The four baseline methods:
The following graph shows RMSE values for baseline daily forecast methods:
The ses and naive methods give almost identical results.
Two different Holt-Winters exponential smoothing implementations failed!
Sadly the double seasonal Holt-Winters exponential smoothing implementation in
the forecast package is not suitable when data contain zeros or negative numbers.
Vanilla ARIMA models are not suitable
for this temperature data due to multi-seasonality which is explained
next.
In general, time series can be decomposed into seasonal and trend components.
The Cambridge temperature data contains two seasonal components:
The next two figures show the daily and yearly components found using the
prophet package. This
code is briefly described in the Files subsection.
The daily and yearly components show smooth cyclic change as expected. The
vertical axis shows percent change in temperature.
Prophet models are robust to missing data,
shifts in the trend and typically handle outliers well. Yearly, weekly,
and daily seasonality, plus holiday effects can be accomodated. Seasonal
components are represented using Fourier terms. Prophet models work
best with time series that have strong seasonal effects and several seasons
of historical data. Stan is used for fitting models.
Two prophet models were built:
In both cases a floor of -150 and a cap of 400 were used for
logistic growth.
A changepoint is a specific timepoint where the statistical properties differ before and after
the timepoint. The prophet package detects 25 changepoints automatically.
Additive seasonality is assumed for both models.
The accuracy results for one step ahead forecasts:
Method | RMSE | MAE | MAPE |
---|---|---|---|
Logistic growth, automatic changepoints | 28.82 | 25.88 | 50.25 |
Logistic growth, 50 changepoints | 28.66 | 25.80 | 50.13 |
These results are from older partially cleaned observations.
Using more changepoints showed little to no improvement.
These results are substantially higher than most of the baseline one step
ahead forecasts. It’s possible that using more data would improve the yearly
seasonal component and in turn improve the nowcasts.
The prophet models may perform better for daily forecasts. Unfortunately,
daily forecast cross-validation will be quite time-consuming to run.
The forecast package supports multi-seasonal models using the
TBATS (Trigonometric Exponential Smoothing) method.
This function uses a trigonometric representation of seasonality, instead of conventional
seasonal indices. It also automatically performs Box-Cox transformation
of the time series, as required. It can be very slow to estimate, especially with
multiple seasonal time series. The tbats() function does not support including additional
regressors.
Unfortunately, cross-validation fails. See the source code described in the Files
subsection for details and
this unanswered stackoverflow question.
FWIW here are the training set accuracy metrics for one step ahead forecasts:
Method | RMSE | MAE | MAPE |
---|---|---|---|
TBATS | 5.7 | 3.8 | Inf |
These results are from older partially cleaned observations.
These results are not comparable with the baseline methods which are
calculated on a separate test data set.
The infinite MAPE value comes from the forecast package mape() function
implementation which permits division by zero. Other implementations add
one to the denominator to avoid this behavior.
These files demonstrate how to build models for the Cambridge UK temperature data:
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.