OSM Database Replication Tools
Tools for creating replication feeds from the main OSM database.
These tools are only useful if you run an OSM database like the main central
OSM database!
You need a C++17 compliant compiler. GCC 8 and later as well as clang 7 and
later are known to work.
You also need the following libraries:
Libosmium (>= 2.15.0)
https://osmcode.org/libosmium
Debian/Ubuntu: libosmium2-dev
Fedora/CentOS: libosmium-devel
Protozero (>= 1.6.3)
https://github.com/mapbox/protozero
Debian/Ubuntu: libprotozero-dev
Fedora/CentOS: protozero-devel
boost-filesystem (>= 1.55)
https://www.boost.org/doc/libs/1_55_0/libs/filesystem/doc/index.htm
Debian/Ubuntu: libboost-filesystem-dev
Fedora/CentOS: boost-devel
boost-program-options (>= 1.55)
https://www.boost.org/doc/libs/1_55_0/doc/html/program_options.html
Debian/Ubuntu: libboost-program-options-dev
Fedora/CentOS: boost-devel
bz2lib
http://www.bzip.org/
Debian/Ubuntu: libbz2-dev
Fedora/CentOS: bzip2-devel
zlib
https://www.zlib.net/
Debian/Ubuntu: zlib1g-dev
Fedora/CentOS: zlib-devel
Expat
https://libexpat.github.io/
Debian/Ubuntu: libexpat1-dev
Fedora/CentOS: expat-devel
cmake
https://cmake.org/
Debian/Ubuntu: cmake
Fedora/CentOS: cmake
yaml-cpp
https://github.com/jbeder/yaml-cpp
Debian/Ubuntu: libyaml-cpp-dev
Fedora/CentOS: yaml-cpp-devel
libpqxx (version 6)
https://github.com/jtv/libpqxx/
Debian/Ubuntu: libpqxx-dev
Fedora/CentOS: libpqxx-devel
Pandoc
(Needed to build documentation, optional)
https://pandoc.org/
Debian/Ubuntu: pandoc
Fedora/CentOS: pandoc
gettext
(envsubst command for tests)
Debian/Ubuntu: gettext-base
Fedora/CentOS: gettext
PostgreSQL database and pg_virtualenv
Debian/Ubuntu: postgresql-common, postgresql-server-dev-all
Fedora/CentOS: postgresql-server, postgresql-server-devel
On Linux systems most of these libraries are available through your package
manager, see the list above for the names of the packages. But make sure to
check the versions. If the packaged version available is not new enough, you’ll
have to install from source. Most likely this is the case for Libosmium.
Use CMake to build in the usual way, for instance:
mkdir build
cd build
cmake ..
cmake --build .
If there are several versions of PostgreSQL installed on your system, you
might have to set the PG_CONFIG
variable to the full path like so:
cmake -DPG_CONFIG=/usr/lib/postgresql/14/bin/pg_config ..
If you don’t want to build the PostgreSQL plugin set BUILD_PLUGIN
to OFF
.
cmake -DBUILD_PLUGIN=OFF ..
If you only want to build the PostgreSQL plugin:
cd postgresql-plugin
mkdir build
cd build
cmake ..
cmake --build .
You need a PostgreSQL database with
wal_level=logical
,max_replication_slots
set to at least 1,There is an unofficial test/structure.sql
provided in this repository to
set up an OSM database for testing. Do not use it for production, use the
official way of installing an OSM database instead.
(The original for this file is in the openstreetmap-website repository at
https://github.com/openstreetmap/openstreetmap-website/raw/master/db/structure.sql)
There are several commands in the build/src
directory. They all can be
called with -h
or --help
to see how they are run. They all need a common
config file, a template is in osmdbt-config.yaml
. This will be found
automatically if it is in the current directory, use -c
or --config
to
set a different path.
There are man pages for all commands in man
and an overview page inman/osmdbt.md
.
If you have pandoc
installed they will be built when running make
.
To run the tests after build call ctest
.
To create a Debian/Ubuntu package, call debuild -I
.
The Debian package will contain the executables and the man pages. It will
not contain the PostgreSQL plugin, because that needs to be built for the
exact PostgreSQL version you have.
First set up the configuration file and make sure you can access the database
by running:
osmdbt-testdb
Then enable replication:
osmdbt-enable-replication
After that get current log files once every minute (or whatever you want the
update interval to be). Use cron or something like it to handle this:
osmdbt-get-log --catchup
To create an OSM change file from the log, call
osmdbt-create-diff
To disable replication, use:
osmdbt-disable-replication
For more details see the individual man pages.
This section describes how everything is supposed to work in detail. The
whole process is controlled from one shell script run once per minute. It
looks something like this:
#!/bin/sh
set -e
osmdbt-catchup
osmdbt-get-log
# optionally copy log file(s) to other hosts
osmdbt-catchup
osmdbt-create-diff
If there are complete log files left over from a crash, they will be in thelog_dir
directory and named *.log
.
osmdbt-catchup
is called without command line arguments. It finds those
left-over log files and tells the PostgreSQL database the largest of the LSNs
so that the database can “forget” all changes before that.
If there was no crash, no such log files are found and osmdbt-catchup
does
nothing.
Now osmdbt-get-log
is called which creates a log file in the log_dir
named
something like osm-repl-2020-03-18T14:18:49Z-lsn-0-1924DE0.log
. The file is
first created with the suffix .new
, synced to disk, then renamed and the
directory is synced.
If any of these steps fail or if the host crashes, a .new
file might be
left around, which should be flagged for the sysadmin to take care of. The
file can be removed without loosing data, but the circumstances should be
reviewed in case there is some systematic problem.
All files named *.log
in the log_dir
can now be copied (using scp or
rsync or so) to a separate host for safekeeping. These will only be used if
the local host crashes and log files on its disk are lost. In this case
manual intervention is necessary.
Now osmdbt-catchup
is called to catch up the database to the log file just
created in step 2.
If the system crashes in step 2, 3, or 4 a log file might be left around
without the database being updated. In this case step 1 of the next cycle
will pick this up and do the database update.
Now osmdbt-create-diff
is called which reads any log files in the log_dir
and creates replication diff files. Files are first created in the tmp_dir
directory and then moved into place in the changes_dir
and its
subdirectories. osmdbt-create-diff
will also read the state.txt
in thechanges_dir
file and create a new one. See the manual page forosmdbt-create-diff
for the details on how this is done exactly.
osmdbt-get-log
, osmdbt-fake-log
, and osmdbt-catchup
userun_dir/osmdbt-log
making sure that only one ofosmdbt-create-diff
uses a different PID/lock file, it can runosmdbt-create-diff
can handle any number of log files, so if it is nottmp_dir
, before they arechanges_dir
, a file osmdbt-create-diff.lock
is createdtmp_dir
. This is removed after osmdbt-create-diff
finished moving.done
osmdbt-create-diff
When run in production you should regularly
log_dir
named *.log.done
)To make sure everything runs smoothly, the age of the PID files can be checked
(should never be more than a few seconds) and the existence of older (more
than a minute or so) log files named *.log.new
.
Copyright (C) 2021-2022 Jochen Topf (jochen@topf.org)
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see https://www.gnu.org/licenses.
This program was written and is maintained by Jochen Topf (jochen@topf.org).