2018 [Julia v1.0] machine learning (linear regression & kernel-ridge regression) examples on the Boston housing dataset
This code and its results are “correct” as they had been reviewed by some well-informed PhD at UCL, 12th Dec, 2018.
Julia 1.0 was released during JuliaCon in August 2018, two months before I started to look into machine learning. Dr David Barber, once in a lecture, expressed his faith in Julia’s future success, so I decided to pick up Julia for a serious piece of coursework on supervised learning. To my surprise, my friends, including some of the most distinguished students, working on the same piece of coursework chose Python, MATLAB and R. However, Julia is fast(er?) and (more?) expressive, from my recent experience. More importantly, I had fun playing with it.
[1] means the example can be found in the .jl
file with names starting with “1”
A\b
.X\y
.alpha_hat
“-like variables, but Boston housing is a classic dataset described in detail at University of Toronto’s Website, and the data was originally published by Harrison, D. and Rubinfeld, D.L. ‘Hedonic prices and the demand for clean air’, J. Environ. Economics & Management, vol.5, 81-102, 1978.
Dataset Naming:
The name for this dataset is simply boston. It has two prototasks: nox
, in which the nitrous oxide level is to be predicted; and price
, in which the median value of a home is to be predicted. However, here, I am using everything to predict price
.
Miscellaneous Details:
CRIM
- per capita crime rate by townZN
- proportion of residential land zoned for lots over 25,000 sq.ft.INDUS
- proportion of non-retail business acres per town.CHAS
- Charles River dummy variable (1 if tract bounds river; 0 otherwise)NOX
- nitric oxides concentration (parts per 10 million)RM
- average number of rooms per dwellingAGE
- proportion of owner-occupied units built prior to 1940DIS
- weighted distances to five Boston employment centresRAD
- index of accessibility to radial highwaysTAX
- full-value property-tax rate per $10,000PTRATIO
- pupil-teacher ratio by townB
- 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by townLSTAT
- % lower status of the populationMEDV
- Median value of owner-occupied homes in $1000’s “The boston housing data set as “.mat” file is located at Prof. Mark Herbster’s Website (UCL) otherwise please go to URL above to retrieve in as a text file.”