Tips and Tricks to setup a cloud machine for Analytics and Data Science with R, RStudio and Shiny Servers, Python and JupyterLab
Author: Luca Valnegri
Last Updated: 01 November 2021
Last Addition: OSRM Routing Server
If you’ve always wanted to have:
the following notes will help you!
This tutorial is quite lengthy, as it’s been thought full of details useful for the very novice. If you just want the step-by-step list, a sort of cloud server setup cheat-sheet, it’s more convenient for you to follow this document instead.
check your email and validate your new account
enter username and password
Account
(bottom left) > Settings
> Security
(tab) > Secure your account
> Enable Two-Factor Authentication
in both cases, remember to generate the backup codes, and save them in some secure place
Ubuntu 20.04 (LTS) x64
RAM 2GB
, Power 1CPU
, Storage 50GB
, Transfer 2TB
, cost $10 monthly
London
Password
(we’ll move to SSH key
later)Create
SSH stands for *Secure SHell which is a cryptographic network protocol that allows secure access over an otherwise unsecured network. SSH is encrypted with Secure Sockets Layer* (SSL), which makes it difficult for these communications to be intercepted and read.
Any VPS could be accessed with a typical user/password exchange, but it’s also possible to setup SSH keys that identify trusted computers without the need for passwords. For additional security, it’s also possible to add a passphrase to the key pair, that act as a password to access the key itself.
Windows has no embedded ssh client by default. Many software can be downloaded for free, one of the most famous is PuTTY, but we are going to use the much enhanced MobaXTerm, which is free for personal use and allows, among other functionalities, sftp, tunnelling, multi-tabbing and saving sessions.
Settings
towards the far right of the button bar, then click the tab Terminal
. Uncheck the option Paste using right-click, then click OK
. Now you can paste content in any terminal window using the standard SHIFT+INS
keys combination (but you can’t copy and paste using the more frequent CTRL+C
and CTRL+V
). In addition, a right-click button of the mouse exposes a quite extensive actions menu.Session
in the upper left button bar, then SSH
in the upper left button bar of the new windowtype in root
when asked to login as, then copy the password you received with the email and paste it into the terminal. Notice that by default Linux systems do not give any feedback from the password field. So don’t try to paste again and again only because you feel the need to see some feedback, just paste the password once and hit enter!
Both Linux distros and macOS have a built-in SSH client called Terminal which can be used to connect to remote servers:
Applications > Utilities
folder. Double-click on the icon to start the client.CTRL+ALT+T
. At the prompt you would type in general: ssh usrname@ip_address
. At the moment there is no other user than root , so to connect to your droplet just type:ssh root@ip_address
If the IP address and the user name are correctly recognized, the system then prompts to enter the password associated with the specified user.
After a few minutes, you’ll start to see a bunch of graphs and KPIs populating your droplets dashboard.
curl -sSL https://repos.insights.digitalocean.com/install.sh | sudo bash
date
to test if the timezone is correct. If it doesn’t show the correct time and/or desired timezone, run the following commands: then enter the correct zone for your location. Notice that if you leave the timezone as UTC, there will be no automatic passage between winter and summer time (the timezones for the UK are GMT from November to March, and BST from April to October).
dpkg --configure -a
dpkg-reconfigure tzdata
If during the above upgrading session a window pops up and asks for any changes, be sure to accept the choice:
apt-get update
apt-get -y full-upgrade
apt-get -y autoremove
keep the local version currently installed
apt-get -y install apt-transport-https software-properties-common htop file nano dos2unix man-db ufw git-core libgit2-dev libauthen-oath-perl openssh-server build-essential libsocket6-perl dirmngr
shutdown -r now
You should now wait a few seconds, to give the server sufficient time to reboot, then reconnect. If you’re using MobaXTerm you can simply press the R key every now and then until it’s asking for the login step.
If you’re on a different service than Digital Ocean, it’d also a good idea to disable the boot menu, or reduce the time it shows up:
nano /etc/default/grub
GRUB_TIMEOUT=0
, then you don’t need any changes and you can exit the editor pressing the combination key CTRL+x
. Otherwise, add or change the following lines:
GRUB_TIMEOUT=3
GRUB_RECORDFAIL_TIMEOUT=3
CTRL+x
then y
followed by Enter
update-grub
You can find yourself sometimes in a situation where you have no sufficient memory to run your scripts, or even to install some packages, and for any reason you can’t upzize the RAM to your machine. You can add what is called swap space, that essentially mimic some storage to use as memory.
You can see if the system has any configured swap by typing:
swapon --show
If you don’t get back any output, this means that your system doesn’t have any swap space availability. You can instead verify the memory activity using either the free -h
command or the more complete top
or htop
utilities.
Before starting the operation, you should run the df
command to be sure you have the necessary storage space for the planned swap file. Although there are many opinions about the appropriate size of a swap space, it really depends on application requirements. Generally, an amount equal to or double the amount of RAM on the system is a good starting point. The following are the instructions to add 1GB of swap space, you can easily change the value to add less or more of that (G
stands for gigabyte, M
would stand for megabyte).
swapfile
of the desired size:
fallocate -l 1G /swapfile
root
:
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
/etc/fstab
file:
echo '/swapfile none swap sw 0 0' | tee -a /etc/fstab
echo '
###########################
# ADDED BY user
vm.swappiness=10
vm.vfs_cache_pressure=40
###########################
' | tee -a /etc/sysctl.conf
Notice that Digital Ocean, like many other VPS providers, highly discourage the creation of swap space, practice often used to keep down the size, and hence the cost, of the droplet. This is due to the fact that their system is all made up of SSD storage, that is highly degraded by the continous read/write access, typical when swapping. Besides, upgrading the droplet leads to much better results in general.
locale
to check for your current locale (the default at OS installation is en_US.UTF-8).
locale -a
list all the installed locales
sudo locale-gen <locale>
generate and install a new locale (see here for the available strings)
`` to change a locale value on a temporary basis
sudo nano /etc/default/locale
and add or modify the instruction(s) therein to change locale on a permanent basis (needs a fresh login)
The Linux system is well known for its strong management of users, file and directories permissions and ownerships. In particular, it’s an absolute no-no to use the default admin user, called root, as it could be a disaster if you get something even slightly wrong in a command (rm /
will completely wipe your system disk with no possibility of return), or simply for security to avoid giving complete control of the machine to anyone stealing your password.
It’s customary instead to use a group called sudo whose components can act as temporary admins.
adduser usrname
usermod -aG sudo usrname
su - usrname
sudo su
CTRL+D
as shortcut):
exit
From now on you should forget there exists a user called root, and try not to run the command sudo su
unless you actually need to. Always use instead usrname to run admin stuff through the sudo
preprocessor.
If you need to change a user’s password, run the command:
sudo passwd usrname
then enter the new password. Notice that only the root user, or a sudoer, can change a password of any other user.
If you want instead to completely delete a user, you need to properly log in as root, or switch to the root user:
sudo su -
then run the command:
userdel -r usrname
You would drop the -r
option if you want to keep the user’s home directory.
One of the main problems beginners encounter when they start using Linux, and the Shiny Server in particular, is related to the much dreaded file permissions. Briefly explained, everything in Linux is a file, and each file admits three operations: read, write, execute, that can be carried out by three (groups of) users: the owner of the file, any user belonging to one or more specific groups, and all the other users. When you list the content of a directory, using for example the ls -l
command, you can see all the permissions in a form of nine binary numbers attached to it, where 0 means not permitted and 1 means permitted. These numbers must be read in group of three (see also the picture below): the first three (from the left) are the operations allowed to the owner, the next three are for the group, the last three for others. Besides the binary mode, there is also a more common octal mode that simply collapse each group of three numbers using their octal value.
Having said that, why things become problematic? Well, because you usually deploy an application using RStudio in your own home directory, which you can acces because it’s yours. When you’re done, you then copy your code to the location where the Shiny Server reads its files. But you quickly discover that… you can’t! as that directory is owned by the shiny user connected to the Shiny Server, and you can’t access it. You could think that copying it using sudo
would do the trick, and it will, but then shiny can’t access those files because they are owned by root! Moreover, besides the code a data application usually needs data, often lots of different data, and these data need to be stored somewhere where they can be read by shiny for the app to actually works. All of the above often ends up with duplications, missed or wrong updating, and so on.
There are a few different solutions, each with its own ups and downs. The solution proposed here will become more practical when using docker containers to deploy shiny applications. We simply define a public group, containing the shiny user plus each user interested in the development of shiny applications, and a subfolder somewhere in the filesystem to use as a public repository for the group itself. This repository will also contain a subfolder dedicated to to the R packages.
Let’s start creating the public group, and adding to it the user usrname we’ve just created (any other user user can be added afterwards in the same way):
sudo groupadd public
sudo usermod -aG public usrname
We can now create the public repository. I decided to go for /usr/local/share/public
, but feel free to change the location as you wish.
sudo mkdir /usr/local/share/public/
sudo chgrp -R public /usr/local/share/public/
sudo chmod -R 2775 /usr/local/share/public/
To add the above path to a system variable, so that you can inse run the following command:
export PUB_PATH="/usr/local/share/public"
after which the path can be retrieved issuing a simple $PUB_PATH
command. It’s worth noting that the above command is only a temporary solution, as with a reboot the content of PUB_PATH
is lost. To add the path to a permanent system variable, first open for editing the file that stores the system-wide environment variables:
sudo nano /etc/environment
then add the following line at the end:
PUB_PATH="/usr/local/share/public"
Save the file, using CTRL+x
==> y
==> Enter
, then sudo reboot
to make sure the above changes have been applied.
Once you’ve decided the actual location, you have to build some structure in it, and that task mostly depends on your future projects. Any of the subdir can been created with the generic command:
mkdir -p /usr/local/share/public/newsubdir
or, if you’ve included the public path in the system environment:
mkdir -p $PUB_PATH/newsubdir
A possible quicker way to build a complete structure at once is to create a loop over a list of subdirs conveniently saved in a text file, as in the following example:
subdirs.lst
in some directory in your home folder with the list of subdirs to be created, each on its own linesubdirs.sh
in the same directory as the previous subdirs.lst
:
#!/bin/sh
while read SDIR
do
mkdir -p $PUB_PATH/$SDIR
done < `dirname $0`/subdirs.lst
chmod +x ~/path/to/subdirs.sh
~/path/to/subdirs.sh
ls $PUB_PATH -l
You can use an online service to speed up a bit, and automate the above process, in case you plan to use multiple VPS. I saved two example files subdirs.lst and subdirs.sh in the repository, but you should create two of your own, using whichever service you prefer, and change the below command accordingly. The complete process is outlined below:
mkdir -p ~/scripts/subs
cd ~/scripts/subs
wget https://raw.githubusercontent.com/lvalnegri/workshops-setup_cloud_analytics_machine/master/subdirs.lst
wget https://raw.githubusercontent.com/lvalnegri/workshops-setup_cloud_analytics_machine/master/subdirs.sh
chmod +x subdirs.sh
./subdirs.sh
You can now list the content of the public folder ls $PUB_PATH -l
to verify that the operation has been successful.
Notice that some service, like Pastebin, returns text files in DOS format, that uses as a line separator a combination of carriage return followed by a linefeed (usually abbreviated as CRLF), typical of Windows machines, instead of a linefeed character (LF), like all other modern operating systems. This means that once you’ve downloaded them, you have to convert every line separator before processing. In the following code, you can verify the same example as above, but using instead two files subdirs.lst and subdirs.sh hosted on Pastebin:
mkdir -p ~/scripts/subs
cd ~/scripts/subs
wget -O subdirs.lst https://pastebin.com/raw/0sTpFmyu
wget -O subdirs.sh https://pastebin.com/raw/Sb5Qdgtu
dos2unix *
chmod +x subdirs.sh
./subdirs.sh
You can follow a similar approach if you plan to partition your home folder in lots of subfolders.
The first step in highten the security of your new VPS is to block any remote access for the root
user:
sudo nano /etc/ssh/sshd_config
then change
PermitRootLogin yes
yes
to no
CTRL+x
==> y
==> Enter
sudo systemctl restart ssh
Login back again as the new user, and let’s change the standard ssh port 22 to some random integer number xxxx
between 1024 and 65535:
`sudo nano /etc/ssh/sshd_config
Port 22
22
into the number xxxx
you’ve settled upon, then save the file (CTRL+x ==> Enter). If there’s a hashtag #
at the start of the line, meaning that the line is a comment and so not to be processed by the system, delete it.
sudo systemctl restart ssh
xxxx
port, but not from the standard 22
. If anything does not sounds right, close this session and fix using the original session.Lastly, let’s add to the system an antivirus and a firewall. Starting with the antivirus, we’re going to use the ClamAV package, being it open source and particularly suited for Ubuntu Server installations.
sudo apt-get -y install clamav clamav-daemon
sudo systemctl stop clamav-freshclam
sudo freshclam
sudo systemctl start clamav-freshclam
clamscan --help
man clamscan
The following are the exit return codes.
clamscan -i -r /home/ /usr/local/share/public/
0
: No virus found.1
: Virus(es) found.2
: Some error(s) occured.
cpulimit -z -e clamscan -l 50 & clamscan -ir /
nice -n 15 clamscan && clamscan -ir /
Let’s now proceed with the firewall. We’re using the ufw package that’s included by default in the Ubuntu installation. Notice that as soos as it is installed, by default everything is blocked, so proceed carefully always keeping an active session open.
answering
sudo ufw enable
y
es to the questionxxxx
if it’s been changed, or the standard 22
if you inadvertently haven’t follow my above suggestion to change it)
sudo ufw allow xxxx
You can also limit other services in the same way, like FTP
sudo ufw limit SSH
Now, using a different session without closing the current you’re working on, test that the new user is still capable to ssh into the machine.
sudo ufw status
The following table lists the default ports for the main services used in this document. For a more comprehensive list of default ports used by various well-known services see this Wikipedia article .
service | default |
---|---|
HTTP | 80 |
HTTPS | 443 |
SSH | 22 |
FTP | 21 |
SFTP | 22* |
WEBMIN | 10000 |
RSTUDIO | 8787 |
SHINY | 3838 |
JUPYTER | 8888 |
MYSQL | 3306 |
POSTGRES | 5432 |
SQLSERVER | 1433 |
MONGODB | 27017 |
NEO4J | 7474 |
REDIS | 6379* |
HIVE | |
HBASE | |
INFLUXDB | 8086 |
NEXTCLOUD | |
CALIBRE |
A short list of some of the most used commands for the standard firewall ufw
is the following:
ufw enable
ufw disable
ufw status
ufw status verbose
ufw allow app list
ufw allow app_profile
ufw allow XXXX/tcp
ufw allow XXXX/udp
ufw allow XXXX:YYYY
ufw allow from www.xxx.yyy.zzz
ufw delete rule
where rule
is any of the literal you have previously specified in an allow
command.allow XXXX` ufw deny XXXX
ufw reset
All the above commands must obviously be launched as a sudoer.
If anything happens, and you can’t login anymore through remote SSH, most VPS and Cloud providers allow users to open a shell from the dashboard account. On Digital Ocean head for the droplet dashboard. At the top right, there is a Console button which allows to login directly using password authentication. You often need to actually click into it before it becomes active. Notice that when in this screen keyboard shortcuts usually don’t work. To paste the root password from the clipboard you need to use the right menu.
Moreover, if you forget the root password, or you’ve never set it, head again for the droplet dashboard, and from i ts left menu click on the Access item. There you can find the magic button to Reset Root Password
. Once you log in, if not asked by the system itself, you should reset the password using the following commands:
sudo -i
passwd
While powerful and efficient, the command line for some people can get just annoying and troublesome. It’s just nicer to work with a simple and intuitive graphic interface to manage the system. Here comes Webmin, a web-based interface for remote system administration of Unix systems. In particular, Webmin removes the need to manually edit configuration files. We’ll touch here only a couple of settings, but nested inside its cavernous menus there are thousands of possibilities.
Before installing Webmin, I want you to follow me on a very quick introduction about software Linux distributions or distros, and software management.
Let’s move on now with the Webmin installation:
echo -e "\n# WEBMIN\ndeb http://download.webmin.com/download/repository sarge contrib\n" | sudo tee -a /etc/apt/sources.list
wget http://www.webmin.com/jcameron-key.asc
sudo apt-key add jcameron-key.asc
sudo apt-get update
sudo apt-get install -y webmin
sudo ufw allow 10000
http
does not work). Don’t worry for now about the warnings about security, we’ll solve this problem laterhttp
calls to encrypted https
protocol:Check Yes for
Webmin >
Webmin Configuration >
SSL Encryption >
Redirect non-SSL requests to SSL mode?
, then Save
change default port to some random integer number xxxx
of your choice between 1024 and 65535, but obviously different from the one you previously chose for the SSH service:
Webmin >
Webmin Configuration >
Ports and Addresses >
Listen on IPs and ports >
Listen on port
Also check:
Accept IPv6 connections?
Listen for broadcasts on UDP port
When you’re done with the changes click Save
. After the last change has been saved, the website will go down and an error message about lost connection will appear. This is normal, as the port has changed and it can’t reconnect to its server anymore
sudo service webmin restart
xxxx
port:
sudo ufw allow xxxx
sudo ufw delete allow 10000
xxxx
portIt’s a safer choice to add Two-Factor Authentication to all web services that offer it. To do it with your new Webmin system configuration manager:
and choose your preferred provider, then click
Webmin >
Webmin Configuration >
Two-Factor Authentication
Save
click
Webmin >
Webmin Users >
Two-Factor Authentication
Enroll For Two-Factor Authentication
, and follow the instructions. If you choose the Google authenticator, you now have to open the app on your phone, click the plus sign, and scan the barcode. From now on, you need user, password and Google app temporary token to enter the Webmin manager. You should try it now to check it actually works!How boring and annoying is to always remember an IP address? Enter domain names! For the current purpose, there’s no point though in spending any money to own a fancy domain. Head instead to Freenom World to grab a free one! The catch here is that the choice of Top-Level Domain is restricted in the set: tk, ga, ml, cf and gq. Moreover, the free offer lasts for one year only, but it is renewable. Notice that no credit card is required.
Anyway, once you’re on the Freenom landing page:
Get it Now
, and then move to the checkout page.Use DNS
, then the tab Use your own DNS
, and in the two textboxes labelled with Nameserver
insert respectively:ns1.digitalocean.com
ns2.digitalocean.com
Once you own a domain, head to your account on the Digital Ocean website, then:
Manage
> Networking
, then enter the tab Domains
Enter domain
write the domain name you’ve just boughtAdd Domain
, and you should be moved to the DNS record page (if not, and you’ve instead been routed on the list of domains, click the domain you want to manage): HOSTNAME
textbox enter @
WILL DIRECT TO
textbox choose the server to associate with that Create Record
www
in the HOSTNAME
textbox Now you should simply wait from a few seconds to a few hours, depending on how fast the global sytem will update your changes, and if you head to http://hostname you should see the same content as http://ip_address. Currently, though, the only content you can check is the Webmin service, but we’ll soon add lots of bells and whistles!
At this point in time, it’d be useful to save the current state of the machine, called snapshot, so that if something happens in the future it’s always possible to revert back to the current situation in a few minutes with a click from the droplet dashboard. Moreover, we could also build other similar droplets but slighlty different, and use this snapshot as a starting point, instead of going back to the entire droplet creation process. Notice though that this is not a proper backup, as you can’t choose any single element of the machine to restore, it’s an all or nothing situation.
To snapshot a droplet:
sudo shutdown now
Snapshots
, enter a memorable name in the textbox, then click Take Snapshot
Notice that storing a snapshot is not free, but charged at a rate of $0.05 per GB per month.
To restore a snapshot on the droplet it was created from:
Manage
> Snapshots
from the main menu on the left to see all the snapshots you’ve createdMore
dropdown menu on the far right of the desired snapshot boxRestore Droplet
In case you want to create an entirely new droplet from a snapshot:
Snapshots
tab. Notice that you won’t find this tab if you’ve never created any snapshots.Create
As we noticed above, because R is a fast-moving project, the latest stable version of the R software is not always available from the official Ubuntu repositories. To install the latest version we need to add to the system list of trusted repositories the address of the external repository maintained by CRAN, together with its public key that allows the package management system to recognize it as a trusted source.
add the CRAN repository address to the system list:
echo -e "\n# CRAN REPOSITORY\ndeb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/\n" | sudo tee -a /etc/apt/sources.list
Notice that the above command:
focal
with the correct adjective using this list as a reference. In particular, the previous 18.04.x LTS version is named bionic
.cran.rstudio.com
, which is the the generic redirection service provided by RStudio, but it’s also possible to switch to a static closer location (according to the chosen VM region, not the user’s location!) using this list.
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
sudo apt-get update
sudo apt-get install -y r-base r-base-dev
We now add a constant to the R environment associated with the path of our public repository :
sudo nano $(R RHOME)/etc/Renviron
PUB_PATH = '/usr/local/share/public'
Sys.getenv('PUB_PATH')
In the same way as above, you can add other important constants to the R environment file:
In a more advanced scenario, where you need to store for example more than one version of a package, you can also think about describing multiple directories as R libraries, even define a dedicated library for every app or particular use. In that case, though, you’re probably better off using a package built for that very purpose: packrat
R_LIBS_USER = '/usr/local/share/public/R_library'
In that case, you need to tell R that it can open more than the default 100 libraries in a single session. The line to add in the configuration file is the following:
maximal number of DLLs reached...
Notice that you can write down a smaller number, but 1,000 is the biggest number that you can set the variable to without R complaining.
R_MAX_NUM_DLLS = 1000
remotes
in the DESCRIPTION
file associated with private repo on GitHub, it is convenient to set here your key:In a similar way, if you use cloud services or APIs and you need keys to access them, do NOT put them in your scripts, but save them them here using a recognizable name, and then recall them in the script using
GITHUB_PAT = WRITE_YOUR_KEY_HERE
Sys.getenv
While useful, Renviron
can only save key:value
pairs to set variables at launch. But you can also use another file called Rprofile.site
, in the same location,
that accepts any kind of valid $R$ commands. For example, you can set options
that you know you would always set in a specific way, or if you use the same library(ies) every time you work with $R$, you can set it here. A particular set of options useful when installing or updating packages are the following:
options(
repos = c(CRAN = 'https://cran.rstudio.com/'),
download.file.method = 'libcurl',
Ncpus = 6
)
where Ncpus
should be modified according to your machine.
software
in your home directory and move into it:
cd
mkdir software
cd software
Please note that the above command downloads the preview 64bit version at the time of writing, and presumes that your OS version is at least Ubuntu Xenial 18.04 LTS. It’s worth verifying the newest version visiting this page (scroll down till the Server section, and copy the link for the Ubuntu 18/Debian 10 (64-bit) installer), and in case substitute where needed.
wget -O rstudio.deb https://download2.rstudio.org/server/bionic/amd64/rstudio-server-1.4.1717-amd64.deb
sudo apt install -y ./rstudio.deb
sudo ufw allow 8787
8787
to some random integer number xxxx
(obviously different from the above choices for SSH and Webmin):
sudo nano /etc/rstudio/rserver.conf
www-port=xxxx
sudo service rstudio-server restart
sudo ufw allow xxxx
8787
port:
sudo ufw delete allow 8787
You should now give yourself some time to play around the configurations, that you can find under the menu: Tools > Global Options
. You can find a file rstudio.conf
in the repository of this workshop that lists most of the changes I usually apply as soon as I install the software. You should also take some time to build your personal snippets library, clicking the button Edit snippets...
at the bottom of the Code > Editing
window. You can read more about it at the dedicated RStudio documentation page.
Before using git, you first have to inform git about the user, running from the terminal the following commands:
git config --global user.name "name here"
git config --global user.email "email here"
You can check that the above changes have been applied running the following command:
git config --list
If you now want to create a new project, which will be connected to a new repository, it’s much simpler to create the repository first, and then add it as a new project in RStudio as a clone from that hosted version control project. When creating a new repository, remember to:
README
fileLICENCE
.gitignore
fileTo add an existing repository to RStudio:
Project: (None)
), then New Project
, Version Control
, and finally Git
(this is going to work for both GitHub and BitBucket)Create Project
. Notice that if the repository is private, you have to insert your username and password to start cloning the repo. If you’re using GitHub, it’s a smart choice not to use the access password when dealing with RStudio projects, but create instead a GitHub token to use instead of the password. You should limit the token scope only to Access public repositories or Full control of private repositories, depending on your needs. You could also generate a specific RSA key from the Git/SVN
section of the Tools
> Global Options
menu, and add it to your GitHub account using the New SSH key
button under the SSH and GPG keys
section of the Settings
menu.Before installing the Shiny Server, it is usually suggested you first install the shiny
and rmarkdown
packages in the R system. This is actually not necessary for the correct functioning of the Shiny Server, but it’s just to ensure that its landing page loads completely correct, showing the shiny app and the rmarkdown document on the right side of the screen.
R
If you’re asked to use a personal library, answer yes. You should be prompted with the path we used above. Answer yes again.
install.packages(c('rmarkdown', 'shiny'))
You can hit
q()
n
when prompted to avoid saving this session.
cd ~/software
wget -O shiny.deb https://download3.rstudio.org/ubuntu-14.04/x86_64/shiny-server-1.5.16.958-amd64.deb
sudo apt install -y ./shiny.deb
sudo ufw allow 3838
3838
to some random integer number xxxx
:sudo nano /etc/shiny-server/shiny-server.conf
3838
to xxxx
:listen xxxx
sudo service shiny-server restart
sudo ufw allow xxxx
3838
port:sudo ufw delete allow 3838
sudo usermod -aG public shiny
Please notice that the dots in the above statements are not a typo
cd /srv/shiny-server
sudo chown -R usrname:public .
sudo chmod g+w .
sudo chmod g+s .
mkdir /srv/shiny-server/<APP-NAME>
cp -R /home/usrname/<APP-PATH>/* /srv/shiny-server/<APP-NAME>/
There is a repository on the WeR GiHub website called shinyapps. At the time of writing these notes, there’s at least one app (subfolder) called uk_petitions that lets you easily download all data regarding any of the petitions created under the current UK government, and then draw a choropleth map of the provenance of the corresponding subscribers using the leaflet package.
If you still haven’t installed any package, besides shiny and rmarkdown, let’s install the ones needed for the app to run correctly. We first need to install some system dependencies though.
sudo apt-get install -y curl libssl-dev libcurl4-gnutls-dev
sudo add-apt-repository ppa:ubuntugis/ppa
sudo apt-get update
sudo apt-get install -y gdal-bin libgdal-dev libgeos++-dev libudunits2-dev libnode-dev libjq-dev libcairo2-dev libxt-dev
You can now enter R, then install the required packages:
install.packages('devtools')
library(devtools)
pkgs <- c(
'Cairo', 'classInt', 'colourpicker', 'data.table', 'DT', 'jsonlite', 'leaflet', 'leaflet.extras',
'RColorBrewer', 'readxl', 'rgdal', 'rgeos', 'shinyjs', 'shinyWidgets'
)
install.packages(pkgs, dep = TRUE)
The above process is going to take some time, possibly half an hour, so go and grab a cup of cofee to keep you happy.
When finished, let’s create a directory for the app in the Shiny Server repository:
mkdir /srv/shiny-server/uk_petitions
To copy the app code into the above folder, we first need to create a new project in RStudio Server, cloning the repository directly in the user home folder.
Once the repo has been pulled on the server, run the following simple command to actually copy the code:
cp ~/path/to/repo/app/* /srv/shiny-server/uk_petitions/
You can now open a browser and head to http://ip_address/uk_petitions to see the app up and running!
sudo apt-get install curl libssl-dev libcurl4-gnutls-dev libxml2-dev
sudo apt-get install -y libmysqlclient-dev
sudo apt-get install libsasl2-dev
sudo apt-get install -y libhiredis-dev
sudo apt-get install -y libsodium-dev
sudo apt-get install -y libharfbuzz-dev libfribidi-dev
sudo add-apt-repository ppa:ubuntugis/ppa
sudo apt-get update
sudo apt-get install -y gdal-bin libgdal-dev
sudo apt-get install -y libgeos++-dev
sudo apt-get install -y libudunits2-dev
sudo apt-get install -y libprotobuf-dev protobuf-compiler libnode-dev libjq-dev
sudo apt-get install -y libcairo2-dev
sudo apt-get install -y libxt-dev
sudo apt-get install -y libgsl0-dev
sudo apt-get install -y libgmp3-dev
sudo apt-get install -y libcgal-dev libglu1-mesa-dev
sudo apt-get install -y libglpk-dev
sudo apt-get install -y libmagick++-dev
sudo apt-get install -y cargo
rJava:
sudo apt-get install -y default-jdk
sudo apt-get update
sudo R CMD javareconf
When dealing with many packages, the manual approach is tedious, and most important very prone to errors. A more efficient and safer way to install multiple packages is to store the list of packages in a simple text file, check and drop from the list the ones that are already installed, then install only the remaining.
You should now clone from the WeR GitHub website you can find some list of suggested packages:
r_packages_min.lst
relates to packages found directly on the CRAN websiter_packages_gh.lst
lists packages that are found ony on GitHub, and still to be released on CRAN, and possibly never be, and have therefore to be installed using the functionalities provided by the devtools package. Feel free to delete any package from any of the above lists, or add anyone else you need for your job. In the repository there is also a markdown file that contains a brief description of most of the packages contained in the above lists, divided by their (main) application in the Data Science Workflow. A star after the name indicates that the package is (still) not available on CRAN, and must be installed from its GitHub repository using the devtools
package. When possible, some links to available resources have also been included.
It’s important to note that if you want to install ALL the packages in the above CRAN list, you do need to up the RAM of your machine to a minimum of 4GB, at least for the time necessary to install all the packages and their dependencies (you actually need even 8GB if you want to install also the prophet package, that requires the RStan probabilistic language).
To accomplish the resizing:
sudo shutdown now
Resize
from the left menuCPU and RAM only
, then choose the 4GB/2vCPUs size (or the 8GB/4vCPUs depending on your needs)Resize
at the bottom, and wait for the task to finishTo install the packages in the above lists, enter first the R software from the Linux terminal, then source the R script with the same name of the list you’re interested in. For example, to install the minimal set of CRAN packages in the r_packages_min.lst
list, run:
source('r_packages_min.R')
while to install the GitHub only packages contained in the r_packages_gh.lst
list, run:
source('r_packages_gh.R')
You should try to keep the SSH connection open during the whole installation, to avoid the scripts to break. If anything happens and the script stops suddenly, before running the script again you should look into the library repository and delete, if present, any folder that starts with 00LOCK-
, together with the folders recalled by them. For example, if the script breaks while installing the xxx
package, there will exist a (temporary) folder called 00-LOCK-xxx
and the actual package folder called xxx
.
cd /usr/local/share/public/R_library
ls 00* -l
rm -rf 00LOCK-*
rm -rf xxx
Once the script has finished, don’t be scared by all the warnings (for now!), but instead exit R without saving the session, then re-enter R and run the same script again. At the end of this second run, that’s the right time to look at all the warnings that R has probably thrown out (most of them have probably disappeared). Read carefully if there are any warnings about packages having non-zero exit status, those are the packages that have not been installed. If that’s the case, scroll back until you find the errors in the log, usually in red bold ink, and act accordingly. It is often a lack of one or more Linux libraries that needs to be installed in the Ubuntu system, before installing the packages (you should only look at the libraries meant for the Debian systems). If you can’t get over the error(s), just google the entire feedback. Often, limiting the timespan of the search in the previous year only is a good move, as the same error could be related to different past situations.
Some packages, like iheatmapr
, need also R dependencies from the Bioconductor repository, that have to be installed with their own package manager:
install.packages('BiocManager')
BiocManager::install('S4Vectors')
BiocManager::install('graph')
BiocManager::install('Biobase')
Finally, take also note also that starting with version 3.5, and even more after version 4, some R internals have changed so much that all packages need to be rebuilt to work properly, and some of them have even been removed from CRAN because of issues that have to be fixed to pass all due checks.
Nginx is a free, open-source, high-performance HTTP server software, that also works as a proxy, load balancer, and Reverse Proxy. It’s been developed with the clear intention to run on small resources, yet with the capacity to handle a large volume of concurrent connections. For these reasons, it is a great alternative to the more commonly used Apache web server.
sudo systemctl stop apache2
sudo apt-get remove -y apache*
80
, edit its configuration file again to change the port to whatever else, as 80
has to be dedicated to the web server. 80
for unencrypted traffic:sudo ufw allow 80
sudo apt-get install -y nginx
sudo systemctl status nginx
To test that the service is actually working, enter the server_ip or hostname directly into the browser’s address bar, and you should see the default Nginx landing page.
To facilitate the management of the content of the website —- i.e. copying, editing and deleting files, plus accessing, making and removing directories, in /var/www/html/
, the default root directory of the nginx default website, without elevating privileges with sudo
—- you need to change some permissions:
sudo chown -R $USER /var/www/html/
set your user to be the owner of all the files and directories in /var/www/html
;sudo find /var/www/html -type d -exec chmod u+rwx {} +
set read and write permissions on each folder, and permit your user to access the folders and go into them (you need a folder to be flagged as executable for this to happen)sudo find /var/www/html -type f -exec chmod u+rw {} +
set all the files inany directory to have read and write permissions for your owner (but not execution)We’re going to install PHP-FPM, a FastCGI implementation alternative to the more common PHP usually installed besides the Apache Web Server.
sudo apt install -y php-fpm php-mysql
sudo nano /etc/nginx/sites-available/default
index.php
to the following line:
index index.html index.htm index.nginx-debian.html;
and:
location ~ \.php$ {
include snippets/fastcgi-php.conf;
# With php-fpm (or other unix sockets):
fastcgi_pass unix:/var/run/php/php7.4-fpm.sock;
# With php-cgi (or other tcp sockets):
# fastcgi_pass 127.0.0.1:9000;
}
location ~ /\.ht {
deny all;
}
/var/run/php/
, and in case it’s different from the 7.4
above, modify the code accordingly.
sudo nginx -t
sudo systemctl reload nginx
sudo systemctl status nginx
with content:
sudo nano /var/www/html/info.php
After opening the page http://hostname/info.php you should be greeting with a horrible php welcome page listing lots of stuff, and on top the php and Ubuntu versions running on the system:
<?php phpinfo();
sudo nano /etc/nginx/sites-enabled/default
server
directive, add the following lines:
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server
directive:yyyy
with the correct port:
location /rstudio/ {
proxy_pass http://127.0.0.1:yyyy/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
}
xxxx
with the correct port:
location /shiny/ {
proxy_pass http://127.0.0.1:xxxx/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
rewrite ^(/shiny/[^/]+)$ $1/ permanent;
}
and replace the underscore
server_name _;
_
at the end with the domain name of your choice (keep the semicolon though!)If you get any errors, reopen the file and check for typos, then test it again, until you get a succesful feedback.
sudo nginx -t
sudo systemctl reload nginx
xxxx
and respectively yyyy
once you’re happy with the changes, you can delete the rules from the firewall related to the Shiny Server port xxxx
and the RStudio Server port yyyy
.
We will use Let’s Encrypt to obtain a free SSL certificate.
443
to allow SSL/TSL encrypted traffic through the firewall:
sudo ufw allow 443
sudo apt-get install certbot
sudo apt-get install -y python3-certbot-nginx
where
sudo certbot --nginx -d hostname.tld -d www.hostname.tld
hostname.tld
has to be substituted with the true hostname of your choicehttp
siblingsNote that every certificate has an expiry date, so that:
--certonly
option
sudo certbot renew
sudo certbot renew --dry-run
On the other hand, if you no longer need a certificate or if the certificate has been compromises, you should revoke it and issue a new request. To delete all certificates for a specified website hostname.tld run:
sudo certbot delete --cert-name hostname.tld
Alternatively, you can run instead:
sudo certbot delete
then select the one(s) you want to get rid of from the proposed list.
As a reference, when a new certificate is issued, it is store in the /etc/letsencrypt/live
directory. The /etc/letsencrypt/archive
folder stores instead copies of the live certificates.
As we’ve seen before, from the server’s point of view a Shiny app is nothing more than a subfolder in the Shiny Server base folder, which by default is /srv/shiny-server
. Using Nginx capabilities, it’s easy to add a basic form of authentication to any shiny app, where basic means that the system simply asks for a user and password, checking that the user exists and that the password is associated with it (for more capabilities, like grouping users, associate functionalities with users, or tracing behaviour, it is possible to use some convenient R packages or you can actually build your self-designed layer on top of the app itself).
utils
library, that you possibly need to install:
sudo apt update && sudo apt upgrade
sudo apt install -y apache2-utils
we are now in a position to create users and passwords, we are going to save a file for each app in a dedicated subfolder of our public repo PUB_PATH
:
htpasswd -c $PUB_PATH/shiny_server/pwds/appname.pwds username
Once the command has been issued, you’ll be ask to provide the associated password twice.
Notice the -c
option that instructs the system to create the appname.pwds
file; if the file already exists, it is rewritten and truncated. In case you want to add another entry, you need to leave out the c
option.
Other useful options are:
v
verify a password for a usern
display the result of the hash password without updating the fileb
let include the password in clear in the command after the username, but it is not a recommendend action as in that case it can be seen from anyone behind you looking at the screen.B
force bcrypt encryption of the password instead of the less secure default MD5 (try not to use the other much insecure options: d
CRYPT), s
SHA, p
plain text (!!!).D
delete the specified user from the appname.pwds
fileand add the following code anywhere inside the
sudo nano /etc/nginx/sites-available/default
server
directive:Notice that you should have one and only of the above for each
location /shiny/appname/ {
auth_basic "Username and Password are required";
auth_basic_user_file /usr/local/share/public/shiny_server/pwds/appname.pwds;
}
appname
directive, although the reference file appname.pwds
could be the same for more than one app
sudo nginx -t
sudo systemctl nginx reload
The following are the location and names of the configuration and log files:
/etc/nginx
the Nginx parent directory that contain all the server configuration file/etc/nginx/nginx.conf
the main configuration file of Nginx/etc/nginx/sites-available/
you can store the server blocks in this directory. It has the configuration files which will not be used until they are linked with sites-enable directory./etc/nginx/sites-enabled/
This directory stores the “server blocks”. They link to the configuration file in the sites-available directory./etc/nginx/snippets/
Here the configuration fragments are stored and they can be used anywhere in the Nginx Configuration. If you are using specific configuration segments repeatedly, then they can be added to this directory./var/log/nginx/
the Nginx parent directory for the server log files/var/log/nginx/access.log
stores all the entry requests to the web server (it has to be configured to do that)./var/log/nginx/error.log
Nginx errors are recorded in this file/var/www/html/
the default directory for the content of the website(s)The default Nginx installation will have only one default server block, enabled with a document root set to:/var/www/html/
It is possible to add as many blocks as desired as follows:
sudo mkdir -p /var/www/newdomain.com
sudo nano /var/www/newdomain.com/index.html
<html>
<head>
<title>Welcome to the "newdomain.com>" nginx webserver!</title>
</head>
<body bgcolor="white" text="black">
<center><h1>newdomain.com is working!</h1></center>
</body>
</html>
create a new server block:sudo nano /etc/nginx/sites-available/newdomain.com.conf
and add the following content:
server {
listen 80;
listen [::]:80;
server_name newdomain.com www.newdomain.com;
root /var/www/newdomain.com;
index index.html;
location / {
try_files $uri $uri/ =404;
}
}
sudo ln -s /etc/nginx/sites-available/newdomain.com.conf /etc/nginx/sites-enabled/newdomain.com.conf
sudo nginx -t
sudo systemctl restart nginx
newdomain.com
is working as desired.Although Python is often automatically installed on Ubuntu, take a moment to confirm that version 3.8 is already installed on the system, by issuing the following command:python3 -V
. In a similar way, the pip
package manager is usually installed on Ubuntu, but take a moment though to confirm if version 20= is installed, by issuing the command: pip3 -V
.
You should also check how many versions of Python you’ve managed to install using the whereis python
command. Often, you have no response when trying to run the python
command. You can redirect one of your binaries to it using the command alias python="/path/to/python3.x"
where x
is the version you want to run.
In any case, run the following commands to install both last versions:
sudo apt-get update
sudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get install -y python3.8
sudo apt-get install -y python3-pip
We are now in a position to install all the top packages needed for a decent data science stack:
Some of the above packages requires the following libraries to be installed beforehand on the system:
sudo apt-get install -y openmpi-bin
If any error shows up, you should first ensure your version of Python is 3.7.x, as indicated in the above filename. If your version of Python is different, try first to adjust the filename according to the version number.
pip install torch==1.8.0+cpu torchvision==0.9.0+cpu torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
sudo apt-get install -y libopenblas-dev
sudo apt install -y caffe-cpu
It’s possible to install the above packages one by one when needed, but you can also install all of them at once, as follows:
cd ~/scripts
wget -O python_libraries.lst https://raw.githubusercontent.com/lvalnegri/workshops-setup_cloud_analytics_machine/master/python_libraries.lst
run the following command:
python3 -m pip install --user -r python_libraries.lst
IPython is an interactive command-line interface for the python language. Jupyter offers an interactive web interface to many languages, including Python, R, and C. JupyterLab is the next-generation web-based user interface for Project Jupyter. JupyterLab is served from the same server and uses the same notebook document format as the classic Jupyter Notebook, but it will eventually replace it.
as a prerequisite, you need to install the Node.js runtime and its npm package manager. Currently, if you use in any way the V8 JavaScript engine, or any R packages depending on it (like rmapshaper
), you need to install nodejs
using the newer snap
package system instead of the classic apt
:
sudo apt install snapd
sudo snap install node --classic --channel=14
create a new python environment:
sudo apt install python3-venv
sudo python3 -m venv /opt/jupyterhub/
By default, a python environment has its own interpreter executable and directory for installing packages, and is therefore isolated from other packages installed on the rest of the system. If you prefer instead that your jupyter server could share the global libraries, you need to add the option --system-site-packages
to the previous command. You can always change idea later, and set to true
the value for the boolean include-system-site-packages
in the pyvenv.cfg
configuration file for the environment, stored at the root directory for the environment itself, in this case /opt/jupyterhub/
.
install the server:
sudo /opt/jupyterhub/bin/python3 -m pip install wheel
sudo /opt/jupyterhub/bin/python3 -m pip install jupyterhub jupyterlab
sudo /opt/jupyterhub/bin/python3 -m pip install ipywidgets
generate the default configuration file:
sudo mkdir -p /opt/jupyterhub/etc/jupyterhub/
cd /opt/jupyterhub/etc/jupyterhub/
sudo /opt/jupyterhub/bin/jupyterhub --generate-config
open the configuration file for editing:
sudo nano /opt/jupyterhub/etc/jupyterhub/jupyterhub_config.py
then set the followings:
c.Spawner.default_url = '/lab'
c.JupyterHub.base_url = '/jupyter'
c.JupyterHub.bind_url = 'http://:8000/jupyter'
Notice that the above options are already in the file, some commented.
create the configuration file for the service, then open it for editing:
sudo mkdir -p /opt/jupyterhub/etc/systemd
sudo nano /opt/jupyterhub/etc/systemd/jupyterhub.service
and add the following code:
[Unit]
Description=JupyterHub
After=syslog.target network.target
[Service]
User=root
Environment="PATH=/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/jupyterhub/bin"
ExecStart=/opt/jupyterhub/bin/jupyterhub -f /opt/jupyterhub/etc/jupyterhub/jupyterhub_config.py
[Install]
WantedBy=multi-user.target
initialize and start the service:
sudo ln -s /opt/jupyterhub/etc/systemd/jupyterhub.service /etc/systemd/system/jupyterhub.service
sudo systemctl daemon-reload
sudo systemctl enable jupyterhub.service
sudo systemctl start jupyterhub.service
set up a reverse proxy, adding the following location
directive in the server
block of the Nginx configuration file:
location /jupyter/ {
proxy_pass http://127.0.0.1:8000;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Scheme $scheme;
proxy_buffering off;
}
check the validity of your changes, then restart the Nginx server:
sudo nginx -t
sudo systemctl reload nginx.service
You can now browse to the desired https://ip_address/jupyter to see your JupyterHub is actually up and running correctly. Here you should find a barebone login page, where you need to enter your usual Linux username and password. When logged in, you will be presented with the JupyterLab interface, with the file browser pane on the left showing the contents of your home directory on the server.
More information about JupyterHub can be found here.
The Jupyter Notebook Server depends on the IPython kernel functionality. However, many other languages, in addition to Python, may be used in the notebook, mostly developed by the community or third parties. See this GitHub repo for a fairly exhaustive list.
install the R kernel:
R
install.packages('IRkernel')
# choose between one of the following
IRkernel::installspec() # install for the current user only
IRkernel::installspec(user = FALSE) # install system-wide
Also add the usual shortcuts for the assignment <-
and the pipe %>%
:
jupyter labextension install @techrah/text-shortcuts
install the C kernel:
sudo /opt/jupyterhub/bin/python3 -m pip install jupyter-c-kernel
sudo /opt/jupyterhub/bin/install_c_kernel
We’ve already installed RStudio Server allowing us to run R scripts on any machine anywhere accessing it from the browser. But it’s highly possible that R won’t be the only language you use for coding, and while notebooks — being them about Python, R or Javascript — are absolutely great for both interactive programming and for data analysis and visualization, they lack lots of functionalities of a proper IDE.
code-server the open source server version of the famous Microsoft Visual Studio Code desktop general IDE.
curl -fOL https://github.com/cdr/code-server/releases/download/v3.8.0/code-server_3.8.0_amd64.deb
sudo dpkg -i code-server_3.7.3_amd64.deb
sudo systemctl enable --now code-server@$USER
By default, the server run at port 8080
, and the access is granted to each user with a password stored in the file ~/.config/code-server/config.yaml
. It’s better you edit this file changing the port and the password.
You can now run the server by visiting the address , but a better way is to proceed writing an entry in the nginx server block we’ve already started for RStudio and Shiny Servers:
sudo apt-get -y install mysql-server
skip the first question, then insert a strong new password for root, and finally answer Yes to all the remaining questions.
sudo mysql_secure_installation
sudo mysql -u root -p
You should check that the plugin value for the root user in the following query is actually
ALTER USER 'root'@'localhost' IDENTIFIED WITH caching_sha2_password BY 'ENTER-ROOT-PASSWORD-HERE';
FLUSH PRIVILEGES;
caching_sha2_password
, and that there is no more entry for root with the unsecure auth_socket
plugin. If that’s the case, rerun the previous query for all the values of host associated with root:
SELECT user,authentication_string, plugin, host FROM mysql.user;
create at least two new agnostic users to be used in:
CREATE USER 'devs'@'localhost' IDENTIFIED BY 'pwd';
GRANT ALL PRIVILEGES ON *.* TO 'devs'@'localhost';
FLUSH PRIVILEGES;
Notice that it is really necessary for the shiny user to have both the localhost and the % statements to be able to connect from anywhere as shiny. Moreover, if it is known beforehand the exact IP address of the machine where the shiny user is going to query from, then that IP should be included in the above statements, instead of the percent sign.
CREATE USER 'shiny'@'localhost' IDENTIFIED BY 'pwd';
GRANT SELECT ON *.* TO 'shiny'@'localhost';
CREATE USER 'shiny'@'%' IDENTIFIED BY 'pwd';
GRANT SELECT ON *.* TO 'shiny'@'%';
FLUSH PRIVILEGES;
In a similar way, it is possible to create additional personal users. See here for a list of all possible specifications for the privileges.
exit
MySQL server
If you’ve created, as above, a user with potential remote access, you also have to:
sudo ufw allow 3306
sudo nano /etc/mysql/mysql.conf.d/mysqld.cnf
and change it to:
bind-address 127.0.0.1
bind-address 0.0.0.0
We’re now in a position to add credentials in a way that avoid people to see password in clear in scripts:
`sudo nano /etc/mysql/my.cnf
[groupname]
host = ip_address
user = usrname
password = 'password'
database = dbname
restart the server:
sudo service mysql restart
sudo nano /etc/mysql/my.cnf
[mysqld]
init_connect='SET collation_connection = utf8_unicode_ci'
init_connect='SET NAMES utf8'
character-set-server=utf8
collation-server=utf8_unicode_ci
skip-character-set-client-handshake
default-storage-engine=MYISAM
local_infile=1
library(RMySQL)
load the librarydbc <- dbConnect(MySQL(), host = 'hostname', username = 'usrname', password = 'pwd', dbname = 'dbname')
Connect to the server using credentials in clear (not safe to put in scripts!)dbc <- dbConnect(MySQL(), group = 'grpname')
Connect to the server using credentials taken from the configuration filedbGetQuery(conn, 'strSQL')
Run an arbitarry query to return a result (usually a SELECT
query)dbReadTable(dbc, 'tblname')
Read a complete tabledbSendQuery(dbc, 'strSQL')
Run an arbitrary querydbWriteTable(dbc, 'tblname', dfname, row.names = FALSE, append = TRUE)
Write a dataframe to a database as the specified table. It is possible either to append or overwrite the values.dbRemoveTable(dbc, 'tblname)
Drop a tabledbDisconnect(dbc)
Close the connection. This is a very important step to avoid filling up the pool of connections, and being rejected a connection the next time. If this should happen, run (carefully!) the following command:
for(x in dbListConnections(RMySQL())) dbDisconnect(x)
This step requires to have a Web server and a php processor already installed on the system.
cd ~/software
wget http://dbninja.com/download/dbninja.tar.gz
sudo mkdir /var/www/html/sql
sudo tar -xvzf dbninja.tar.gz -C /var/www/html/sql --strip-components=1
sudo ls /var/www/html/sql/_users/
Hide the ...
. Click Save.This step requires to have a Web server and a php processor already installed on the system.
sudo apt install -y phpmyadmin
ln -s /usr/share/phpmyadmin /var/www/html/
cd /var/www/html/
mv phpmyadmin yoururl
it’s recommended adding a security layer; we first create a password file:
sudo htpasswd -c /etc/nginx/phpmyadmin.pwd username
answering twice with your desired password, and then add a directive in the server block in the nginx config file /etc/nginx/sites-available/default
:
location /yoururl/ {
auth_basic "Please provide username and password";
auth_basic_user_file /etc/nginx/phpmyadmin.pwd;
}
Remember to check the file and reload the server:
sudo nginx -t
sudo systemctl reload nginx
If you now open the page at https://hostname.tld/yoururl
you should be greeted with a preliminary access modal window, asking for the user and password just saved in the password file, before entering the desired database user and password
MS SQL Server is a relational database system by Microsoft that was open-sourced in 2016.
sudo wget -qO- https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
If after the updating the following, or similar, error message appears:
sudo add-apt-repository "$(wget -qO- https://packages.microsoft.com/config/ubuntu/16.04/mssql-server-2017.list)"
you have to manually add that key to the trusted keyset of the apt packaging system:
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY EB3E94ADBE1229CF
If the key that your system is missing differs, simply replace the key at the end of the above command with your key, and run it.
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EB3E94ADBE1229CF
sudo apt-get update
sudo apt-get install -y mssql-server
sudo /opt/mssql/bin/mssql-conf setup
systemctl status mssql-server
After the Server installation, we also need to install some additional tool to connect and run T-SQL statements on the server:
sudo add-apt-repository "$(wget -qO- https://packages.microsoft.com/config/ubuntu/16.04/prod.list)"
sudo apt-get install -y mssql-tools unixodbc-dev
if you get a
sqlcmd -S localhost -U SA -P 'password'
sqlcmd: command not found
error, then you need to create a symlink to make a virtual copy of the :
sudo ln -sfn /opt/mssql-tools/bin/sqlcmd /usr/bin/sqlcmd
1> SELECT name FROM sys.databases
2> go
name
-------------------------------------------------------------------------------------
master
tempdb
model
msdb
test
(5 rows affected)
for a more extensive check, let’s create a test table in the tempdb database, add some records, query their existence, then finally drop the table :
1> USE tempdb
2> CREATE TABLE test (id INT, name NVARCHAR(50), quantity INT)
3> INSERT INTO test VALUES (1, 'one', 10)
4> INSERT INTO test VALUES (2, 'two', 200)
5> INSERT INTO test VALUES (3, 'three', 3000)
6> go
Changed database context to 'tempdb'.
(1 rows affected)
(1 rows affected)
(1 rows affected)
1> SELECT * FROM test
2> go
id name quantity
----------- -------------------------------------------------- -----------
1 one 10
2 two 200
3 three 3000
(3 rows affected)
1> DROP TABLE test
2> go
1> exit
To connect to Sql Server from remote machines you need first to open the TCP port where SQL Server listens for connections. By default, this port is set to 1433
, but we’re going to change at once for security reasons.. To change the port, run first the following commands, replacing the xxxx
string with your desired integer number:
sudo /opt/mssql/bin/mssql-conf set network.tcpport xxxx
then restart the server to
systemctl restart mssql-server.service
Finally, open the port in the firewall:
sudo ufw allow xxxx
When connecting from remote, you now have to specify the port beside the IP address:
sqlcmd -S ipaddress,port -U usrname -P 'password'
From inside R you have multiple possibilities to connect to sql server.
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2930ADAE8CAF5059EE73BB4B58712A2291FA4AD5
apt
sources:echo -e "\n# MONGODB\ndeb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.6 multiverse\n" | sudo tee -a /etc/apt/sources.list
sudo apt-get update
sudo apt-get install mongodb-org
mongo --host 127.0.0.1:27017
sudo rm /var/lib/mongodb/mongod.lock
sudo service mongod restart
Neo4j is an extremely popular graph database used to store and query connected data. Rather than having foreign keys and select statements, it uses edges and graph traversals to query the data. This method of querying data is extremely powerful in any situation where data is best represented as items that have relationships with other items in the dataset, such as social networks, biology, and chemistry.
Neo4j is implemented in Java, so you’ll need to have the Java Runtime Environment (JRE) installed. You can check it using the command: java -version
. If the feedback is negative:
apt
sources:sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
Once you’ve installed java, you can proceed with Neo4j:
wget --no-check-certificate -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add -
apt
sources:echo -e "\n# NEO4J\ndeb http://debian.neo4j.org/repo stable/\n" | sudo tee -a /etc/apt/sources.list
sudo apt-get update
sudo apt-get install neo4j
You can now head to http://ip_address:7474/browser/ to access the neo4j dashboard, using the default username and password neo4j
and neo4j
. You will be prompted to set a new password. If you find yourself in trouble logging in the first time, try to delete the file /var/lib/neo4j/data/dbms/auth
and restart the server before trying to access again.
If the browser refuse to connect:
sudo nano /etc/neo4j/neo4j.conf
dbms.connectors.default_listen_address=0.0.0.0
sudo service neo4j restart
Redis (GitHub repo) is a distributed in-memory key-value storage engine that persists on disk, and supports different kinds of abstract data structures. You can walk through the most important features of the Redis engine at the Try Redis demonstration website.
We need to install Redis as non-root user, and to accomplish that task we must build and install the package from source.
install first the build and test dependencies:
sudo apt update
sudo apt install build-essential tcl
software
directory we’ve already created above, and move into it:
cd ~/software
mkdir redis
cd redis
curl -O http://download.redis.io/redis-stable.tar.gz
tar xzvf redis-stable.tar.gz
cd redis-stable
cd deps
sudo make hiredis jemalloc linenoise lua geohash-int
cd ..
make
make test
sudo make install
redis
ID:
sudo adduser --system --group --no-create-home redis
sudo mkdir /var/lib/redis
sudo chown redis:redis /var/lib/redis
sudo chmod 770 /var/lib/redis
sudo mkdir /etc/redis
sudo cp ~/software/redis/redis-stable/redis.conf /etc/redis
sudo nano /etc/redis/redis.conf
supervised no
to supervised systemd
dir ./
to dir /var/lib/redis
systemd unit
file for the new Redis service:
sudo nano /etc/systemd/system/redis.service
add the following text:
[Unit]
Description=Redis In-Memory Data Store
After=network.target
[Service]
User=redis
Group=redis
ExecStart=/usr/local/bin/redis-server /etc/redis/redis.conf
ExecStop=/usr/local/bin/redis-cli shutdown
Restart=always
[Install]
WantedBy=multi-user.target
sudo systemctl start redis
sudo systemctl stop redis
sudo systemctl restart redis
sudo systemctl enable redis
systemctl status redis
redis-cli
with expected result
127.0.0.1:6379> ping
PONG
with expected results respectively
127.0.0.1:6379> set mykey "Hello, World!"
127.0.0.1:6379> get mykey
OK
and Hello, World!
127.0.0.1:6379 exit
to test the connectivity from inside R, let’s first load the redux package (no need to enter R as sudo
at this time):
library(redux)
redis_api
object:
r <- redux::hiredis()
redis_api
object:with expected result
r$SET('mykey', 'Hello, World!')
[Redis: OK]
with expected result
r$GET('mykey')
[1] "Hello, World!"
q()
Being of the columnar type, MonetDB is somewhat slow in writing data, but it is faster than most other database system in reading data. This is important for data science use, as it is often the case that some data, like Census data, are stored only once in a decade but read thousands times.
While you can install the entire [Client MonetDB] program, and using the other package MonetDB.R, we’ll focus here on the lighter R version that does not require to install a separate software before usage.
Once started a connection, permanent or temporary, the various commands are the ones already familiar with RMySQL, as both packages rely on the same DBI interface.
TBD
Docker is a platform
Containers are:
Container vs Virtual Machine
docker hub and images
sudo apt install -y apt-transport-https ca-certificates curl gnupg software-properties-common lsb_release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
apt-cache policy docker-ce
sudo apt -y install docker-ce docker-ce-cli containerd.io
sudo systemctl status docker
sudo docker version
sudo docker run hello-world
returns a bunch of info about the Docker daemon:
sudo docker info
There are three main types of objects in the Docker world:
For each of the above types, there is a set of commands starting with docker
then the type
and the action
followed by some (optional) parameter
and the actual object
scope of the command. In practice, for both images and containers the only command in which you do need to specify the type is prune
, while the following commands change entirely:
docker image ls
changes in docker images
docker image rm
changes in docker rmi
docker container ls
changes in docker ps
To get quick help for any command use the parameter--help
without any scope (do not use the now deprecated -h
parameter)
build
Build an image from a Dockerfile (see below)history
Show the history of an imageimport
Import the contents from a tarball to create a filesystem imageinspect
Display detailed information on one or more imagesload
Load an image from a tar archive or STDINls
List images (==> docker images)prune
Remove unused images (==> you can’t shorten: docker image prune
)pull
Pull an image or a repository from a registrypush
Push an image or a repository to a registry (you first need docker login accountname
)rm
Remove one or more images (you can shorten changing the command: docker rmi
)save
Save one or more images to a tar archive (streamed to STDOUT by default)tag
Create a tag TARGET_IMAGE that refers to SOURCE_IMAGEattach
Attach local standard input, output, and error streams to a running containercommit
Create a new image from a container’s changescp
Copy files/folders between a container and the local filesystemcreate
Create a new containerdiff
Inspect changes to files or directories on a container’s filesystemexec
Run a command in a running containerexport
Export a container’s filesystem as a tar archiveinspect
Display detailed information on one or more containerskill
Kill one or more running containerslogs
Fetch the logs of a containerls
List containers (you can shorten changing the command: docker ps
)pause
Pause all processes within one or more containersport
List port mappings or a specific mapping for the containerprune
Remove all stopped containers (==> you can’t shorten: docker container prune
)rename
Rename a containerrestart
Restart one or more containersrm
Remove one or more containersrun
Run a command in a new containerstart
Start one or more stopped containersstats
Display a live stream of container(s) resource usage statisticsstop
Stop one or more running containerstop
Display the running processes of a container unpause
Unpause all processes within one or more containersupdate
Update configuration of one or more containerswait
Block until one or more containers stop, then print their exit codescreate
Create a volumeinspect
Display detailed information on one or more volumes (==> docker inspect)ls
List volumesprune
Remove all unused local volumesrm
Remove one or more volumes
docker pull [account/][repository:]imgname
The
docker run imgname
run
command is usually called with a few additional parameters:-it
to use interactive mode-p int-port:ext-port
to map the internal port int-port of the container to the external port ext-port of the host-v int-path:ext-path
to map the internal folder int-path of the container to the external folder ext-path of the host--rm
delete the container once stopped--name
-d
-f label
sudo docker ps
sudo docker stop contname
sudo docker stop $(sudo docker ps -q)
sudo docker stop $(sudo docker ps -q -f )
docker inspect -f "{{.NetworkSettings.Networks.nat.IPAddress }}" contname
A Dockerfile is a script that contains a collection of (Dockerfile) instructions and operating system commands (tipycally Linux commands), that will be automatically executed in sequence in the docker environment for building a new docker image.
Below are some of the most used dockerfile instructions:
FROM registry/image:tag
The base image used to start the build processo of a new image. This command must be on top of the dockerfileINCLUDE+
MAINTAINER
Optional, it defines a full name and email address of the image creatorRUN
Used to execute a command during the build process of the docker imageCOPY
Copy a file from the host machine to the new docker imageADD
Allows to copy a file from the internet using its url, or extract a tar file from the source directly into the Docker imageENV
Define an environment variableEXPOSE
Associates a specific port to enable networking between the container and the hostCMD
Used for executing commands when we build a new container from the docker imageWORKDIR
This is directive to set the path where the command, defined with the above CMD
, is to be executedVOLUME
Enable access/linked directory between the container and the host machineENTRYPOINT
Define the default command that will be executed when a container is created with the imageUSER
sets the UID (or the user name) which is to run the container at the startLABEL
Allows to add a label to the Docker image.While not mandatory, you should always call your Dockerfile as Dockerfile
. Don’t use any extension, just leave it null. In particular, do not change the name of the dockerfile if you want to use the autobuilder at Docker Hub.
You should never put multiple dockerfiles for different images in the same direcotries. Use instead a single . If need be otherwise, start using Docker-compose.
Before we create a Dockerfile and build an image from it, we need to make a new directory from which to work. After that we can move into it and open a new test file for editing:
mkdir -p ~/Docker/dockerbuild
cd ~/Docker/dockerbuild
nano Dockerfile
then enter the following:
FROM ubuntu:latest
MAINTAINER Luca l.valnegri@datamaps.co.uk
RUN apt-get -y update
RUN apt-get -y upgrade
RUN apt-get install -y build-essential
When you’re confident the Dockerfile is complete, or you simply want to test it, you can now build the image from that file (the dot at the end is not a ):
docker build -t "imgname:Dockerfile" .
where imgname
is the name we want to give to the image.
Let’s say we now want to build a new image with R and some packages on it (or Python, or whatever else can be based on the previous image). It’s not efficient to start again from the ubuntu:latest image, but instead exploit the
FROM ubuntu:latest
MAINTAINER Luca l.valnegri@datamaps.co.uk
RUN apt-get -y update
RUN apt-get -y upgrade
RUN apt-get install -y build-essential
We can now build additional images, based on the previous R image, each taking care of different things. For example, we can have different images for:
The following is an example of Dockerfile that creates a minimal image, capable to run a RStudio/Shiny server connected with the public shared repository on the host as described above:
# Download base image ubuntu 20.04.1 or use "latest" instead. See https://hub.docker.com/_/ubuntu
FROM ubuntu:20.04.1
ARG PUB_GRP=public
ARG PUB_PATH=/usr/local/share/$PUB_GRP
ARG USERNAME=datamaps
ARG USER_UID=1000
ARG USER_GID=$USER_UID
RUN \
# Update software repository + Upgrade system
apt update && apt -y full-upgrade \
# Install potentially missing basic commands
&& apt install -y --no-install-recommends \
apt-utils \
apt-transport-https \
build-essential \
dos2unix \
git-core \
libgit2-dev \
libauthen-oath-perl \
libsocket6-perl \
man-db \
nano \
openssh-server \
software-properties-common \
ufw \
# Install R packages dependencies
&& apt-get install -y \
curl \
libcairo2-dev \
libcurl4-gnutls-dev \
libssl-dev \
libxml2-dev \
libxt-dev \
pandoc \
pandoc-citeproc \
xtail \
# cleaning
&& apt -y autoremove && apt clean \
&& rm -rf /var/lib/apt/lists/ \
&& rm -rf /tmp/downloaded_packages/ /tmp/*.rds
RUN \
# Create a new "public" group
groupadd $PUB_GRP &&
# Create a new directory to be used by the "public" group and connected with the similar host public dir
mkdir -p $PUB_PATH/R_library \
&& chgrp -R $PUB_GRP $PUB_PATH \
&& chmod -R 2775 $PUB_PATH
RUN \
# add CRAN repository to apt
echo -e "\n# CRAN REPOSITORY\ndeb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/\n" | tee -a /etc/apt/sources.list \
# add public key of CRAN maintainer
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \
# Update package manager
&& apt update \
# Install R
&& apt install -y r-base r-base-dev \
# Add configurations to .Rprofile
echo '
#####################################################
### ADDED BY DOCKERFILE
PUB_PATH = '/usr/local/share/public'
R_LIBS_USER = '/usr/local/share/public/R_library'
R_MAX_NUM_DLLS = 1000
#####################################################
' | tee -a $(R RHOME)/etc/Renviron \
# Install devtools, shiny, and rmarkdown packages
&& su - -c "R -e \"install.packages(c('devtools', 'shiny', 'rmarkdown'), repos='https://cran.rstudio.com/')\""
RUN \
# download and install RStudio Server
wget -O rstudio.deb https://download2.rstudio.org/server/bionic/amd64/rstudio-server-1.4.1103-amd64.deb \
&& sudo apt -y install ./rstudio.deb \
&& rm rstudio.deb \
# download and install Shiny Server
&& wget -O shiny.deb https://download3.rstudio.org/ubuntu-14.04/x86_64/shiny-server-1.5.16.958-amd64.deb \
&& sudo apt -y install ./shiny.deb \
&& rm shiny.deb \
# add "shiny"" to the "public" group
&& usermod -aG public shiny
# install R packages using R script (plus cleaning)
# RUN Rscript -e "install.packages()" \
# && rm -rf /tmp/downloaded_packages/ /tmp/*.rds
RUN \
# Create a new user datamaps
useradd --create-home --home-dir /home/$USERNAME --no-log-init --shell /bin/bash --groups $PUB_GRP $USERNAME \
# add user and "public" group as owners of the shiny directory
&& cd /srv/shiny-server \
&& chown -R $USERNAME:$PUB_GRP . \
&& chmod g+w . \
&& chmod g+s .
# Volume configuration
VOLUME ["/usr/local/share/public"]
# pass control to user "datamaps"
USER $USERNAME
WORKDIR /home/$USERNAME
sudo docker pull selenium/standalone-firefox
sudo docker run -d -p 4445:4444 selenium/standalone-firefox
sudo docker run -d -p 4445:4444 selenium/standalone-firefox
start a container with a mapping between some host directory and the guest browser download directorydocker run -d -p 4445:4444 -v /home/usrname/some/path:/home/seluser/Downloads selenium/standalone-firefox
docker pull osrm/osrm-backend
mkdir $PUB_PATH/osrm
cd $PUB_PATH/osrm
download the geographic file(s) you need, for example from Geofabrik. I usually work with British and Italian clients, so I download both the British and Italian files:
wget http://download.geofabrik.de/europe/britain-and-ireland-latest.osm.pbf
wget http://download.geofabrik.de/europe/italy-latest.osm.pbf
With multiple files, it’s better first merging them in a unique file:
sudo apt install -y osmium-tool
osmium cat italy-latest.osm.pbf britain-and-ireland-latest.osm.pbf -o italy_uk.osm.pbf
before starting the server, we need to pre-process the previous extract(s), using a specific profile, depending on the means of transport you want to use. Each profile characteristics are stored in Lua files, and describe which streets and manoeuvres are permitted, and the correspondent speeds. There are three generic profiles already built from the devs inside the image (i.e., three Lua files already written out in the /opt/
directory) related to the three main profiles of general interest: car, foot, bike. But you can modify them, or add your own, before proceeding using the following set of commands:
osrm/osrm-backend
image:
sudo docker run -it osrm/osrm-backend bin/bash
or adding a new file:
nano /opt/car.lua
copy /path/to/profile.lua /opt/profile.lua
exit
docker ps -a
docker commit cont_id img_name
docker images
docker login
docker push repo/imgname:tag
unfortunately, the OSRM Server does not currently support multiprofiles, so you have to build one specific server for each profile you need, then run each on its own separate host port (the default port inside the container is 5000; I usually use 5001, 5002 and 5003 as host ports for the three standard profiles). Let’s start with the car profile. Considering it will take some time to process all of it, it’s advisable to use the screen
function, so that you can run all of them simultaneously without worries about connection drop-outs.
mkdir car
cd car
cp ../italy_uk.osm.pbf .
docker run -t -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-extract -p /opt/car.lua /data/italy_uk.osm.pbf
docker run -t -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-partition /data/italy_uk.osrm
docker run -t -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-customize /data/italy_uk.osrm
docker run -t -i -p 5001:5000 -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-routed --algorithm mld /data/italy_uk.osm.pbf
If you had modified the image to change the profile(s), you need to change in the above (and below) set of commands the name of the image (osrm/osrm-backend
) accordingly.
you can then process the other profiles, the commands are exactly the same but must be fired in their own (different) directories, changing the host port when running the final container:
cd ..
mkdir foot
cd foot
cp ../italy_uk.osm.pbf .
docker run -t -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-extract -p /opt/foot.lua /data/italy_uk.osm.pbf
docker run -t -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-partition /data/italy_uk.osrm
docker run -t -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-customize /data/italy_uk.osrm
docker run -t -i -p 5002:5000 -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-routed --algorithm mld /data/italy_uk.osm.pbf
cd ..
mkdir bike
cd bike
cp ../italy_uk.osm.pbf .
docker run -t -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-extract -p /opt/bicycle.lua /data/italy_uk.osm.pbf
docker run -t -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-partition /data/italy_uk.osrm
docker run -t -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-customize /data/italy_uk.osrm
docker run -t -i -p 5003:5000 -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-routed --algorithm mld /data/italy_uk.osm.pbf
At this point you should have three docker containers running at three different port for three different profiles:
You can now easily use some R code to calculate for example isochrones:
function(x, brk, rsl, prf = 1) # 1-car, 2-foot, 3-bike
osrm::osrmIsochrone(sc = x, breaks = brk, res = rsl, returnclass = 'sf', osrm.server = paste0('http://127.0.0.1:500', prf, '/'))
or the shortest path between two locations:
function(xs, xd, prf = 1) # 1-car, 2-foot, 3-bike
osrm::osrmRoute(src = xs, dst = xd, overview = 'full', returnclass = 'sf', osrm.server = paste0('http://127.0.0.1:500', prf, '/')) |> sf::st_transform(4326)
When doing geo-analytics, you often need, for example, to geocode thousands of addresses, if not hundreds of thousands or even millions, and you want the process obviously to be an automated backend operation. We all know that Google Maps is the gold standard for this job, but it’s fairly expensive out of its free quota, and its API conditions are quite strict, as you are supposed only to geocode addresses you will be displaying in conjunction with a Google map. Moreover, it doesn’t easily accept bulk geocoding.
Here comes Nominatim, a private geoserver based on open source efforts. The official instructions for installing Nominatim are fairly complete, but brief in places and a bit scattered around different pages, and some steps must be changed or reordered in order to get ASAP to the end of the installation, and ready to geocode!
Luckily for us, a group of good guys has teamed together to build a Docker image and take us out of that nightmare. Now it’s only a matter of three (yes THREE!!!) instructions (and a few hours, or days depending on the machine and the extension ofyour geography) to have a geoserver up and running.
ssh-keygen -t ed25519 -C "githubmst@master-i.com"
cat ~/.ssh/id_ed25519.pub
cd ~/software/
cd
into the last version:
git clone git@github.com:mediagis/nominatim-docker.git
cd nominatim-docker/<version>
XXXX
that should be changed with the port number (of the host) you want to use:
docker run -it --rm \
-e PBF_URL=https://download.geofabrik.de/europe/britain-and-ireland-latest.osm.pbf \
-e REPLICATION_URL=https://download.geofabrik.de/europe/britain-and-ireland-updates/ \
-e IMPORT_WIKIPEDIA=true \
-e NOMINATIM_PASSWORD=<INSERT-A-PASSWORD-HERE> \
-p XXXX:8080 \
--name nominatim \
mediagis/nominatim:3.7
tmap
and friends, a simple line of code would be:
tmaptools::geocode_OSM(address, server = 'http://127.0.0.1:XXXX')
Samba, a re-implementation of the popular SMB/CIFS protocol (Server Message Block/Common Internet File System), is a stable and free server application that allows sharing of files and print services across a network. Once installed on a central Linux server, shared files can be accessed seamlessy from both Linux and Windows systems.
sudo apt update & sudo apt install samba
whereis samba
mkdir $PUB_PATH/samba
sudo nano /etc/samba/smb.conf
add settings:
[sambashare]
comment = Samba on Ubuntu (or whatever else you prefer to appear in the connection details)
path = /usr/local/share/public/samba
browsable = yes
read only = no
write list username ...
valid users username ...
guest ok = no
Notice that:
username
must be an already exixting system user, even though the password will be different. You can set up a system user with no presence and other access as:
sudo adduser --no-create-home --disabled-password --disabled-login username
sambashare
can be subsituted with any other name, and it’s not directly related to the folderYou can find more options here.
add a Samba password for any system users:
sudo smbpasswd -a username
sudo service smbd restart
sudo ufw allow samba
smb://ip-address/sambashare
\\ip-address\sambashare
Nextcloud is a free Open Source alternative to commercial cloud storage services, like DropBox or OneDrive, that allows its users ready access to personal or professional files, documents, images, music, and videos wherever they are, without having to concede information to any 3rd party company.
Nextcloud is a fork of the much older ownCloud project, is written in PHP and JavaScript, and supports many database systems as back-end. In order to keep files synchronized between the server and other devices, Nextcloud also provides desktop applications for Windows, Linux, and Mac, plus mobile apps for Android and iOS. Nextcloud is not just a Dropbox clone, it provides additional features like Calendar, Contacts, Schedule tasks, and streaming media.
You first need to install the web server Nginx, the database system MySQL, the PHP preprocessor, already described in the preceeding paragraphs.
/usr/share/fonts/
sudo chown -R root:public /usr/share/fonts/
sudo chmod -R 644 /usr/share/fonts/*
sudo chmod 755 /usr/share/fonts/
sudo apt-get install fontconfig
install the R package extrafont
sudo su
R
install.packages('extrafont')
q()
exit
sudo fc-cache -fv > /dev/null
open R as sudoer, load the extrafont
package and import the new installed fonts (takes time…):
sudo su
R
library(extrafont)
font_import()
q()
exit
cd /usr/share/fonts/
sudo mkdir google
cd google
sudo wget https://github.com/google/fonts/archive/master.zip
sudo unzip master.zip
sudo rm master.zip
sudo apt-get install ttf-mscorefonts-installer
/usr/share/fonts/truetype/msttcorefonts
cd /usr/share/fonts/
sudo mkdir windows
C:\Windows\fonts
to a temporary folder in the shared repository /usr/local/share/public/fonts
cp /usr/local/share/public/fonts/* /usr/share/fonts/windows/
rm -rf /usr/local/share/public/fonts/*
Open MobaXTerm, then follow these steps:
Tools >
MobaKeyGen >
(leave parameters as default) >
Generate >
Move the mouse around in the big empty area over the **Generate** button >
insert a password twice in the textboxes called **passphrase** >
Save both public and private keys >
Close
ssh-keygen
./home/usrname/.ssh
with the displayed names (usually id_rsa.pub and id_rsa for the [public and respectively private key). Both the files should be copied somewhere safe, and the private key promptly deleted from the server. The public key is a simple text that can be shared with anyone, and can be easily read with a simple cat
command if in need of pasting its content.gettinmg help:
help
man cmdname
shutdown
reboot
pwd
ls
cd
mkdir
rmdir
cp /path/to/origin/fname /path/to/destination
mv /path/to/origin/fname /path/to/destination
rm /path/to/origin/fname
rmdir /path/to/origin
cat fname
less fname
more fname
head fname
tail fname
touch fname
nano fname
find fname
history
df
whoami
adduser usrname
usermode -aG sudo usrname
passwd usrname
su usrname
sudo
exit
logout
chmod
chown
update
upgrade
dist-upgrade
autoremove
clean
install
/etc/apt/sources.list
Locations to fetch packages fromps
top
kill
I’m not a devOps or sysAdmin, and most of this document has been built over years of experience trying to overcome the problem of the hour. So it’s very possible that some steps here are not the very best way of performing the tasks they refer to.
If anyone has any comments on anything in this document, I’d love to hear about it!