项目作者: openstreetmap

项目描述 :
OSM Database Replication Tools
高级语言: C++
项目地址: git://github.com/openstreetmap/osmdbt.git
创建时间: 2020-02-10T17:04:00Z
项目社区:https://github.com/openstreetmap/osmdbt

开源协议:Other

下载


OSM Database Tools

Tools for creating replication feeds from the main OSM database.

These tools are only useful if you run an OSM database like the main central
OSM database!

Prerequisites

You need a C++17 compliant compiler. GCC 8 and later as well as clang 7 and
later are known to work.

You also need the following libraries:

  1. Libosmium (>= 2.15.0)
  2. https://osmcode.org/libosmium
  3. Debian/Ubuntu: libosmium2-dev
  4. Fedora/CentOS: libosmium-devel
  5. Protozero (>= 1.6.3)
  6. https://github.com/mapbox/protozero
  7. Debian/Ubuntu: libprotozero-dev
  8. Fedora/CentOS: protozero-devel
  9. boost-filesystem (>= 1.55)
  10. https://www.boost.org/doc/libs/1_55_0/libs/filesystem/doc/index.htm
  11. Debian/Ubuntu: libboost-filesystem-dev
  12. Fedora/CentOS: boost-devel
  13. boost-program-options (>= 1.55)
  14. https://www.boost.org/doc/libs/1_55_0/doc/html/program_options.html
  15. Debian/Ubuntu: libboost-program-options-dev
  16. Fedora/CentOS: boost-devel
  17. bz2lib
  18. http://www.bzip.org/
  19. Debian/Ubuntu: libbz2-dev
  20. Fedora/CentOS: bzip2-devel
  21. zlib
  22. https://www.zlib.net/
  23. Debian/Ubuntu: zlib1g-dev
  24. Fedora/CentOS: zlib-devel
  25. Expat
  26. https://libexpat.github.io/
  27. Debian/Ubuntu: libexpat1-dev
  28. Fedora/CentOS: expat-devel
  29. cmake
  30. https://cmake.org/
  31. Debian/Ubuntu: cmake
  32. Fedora/CentOS: cmake
  33. yaml-cpp
  34. https://github.com/jbeder/yaml-cpp
  35. Debian/Ubuntu: libyaml-cpp-dev
  36. Fedora/CentOS: yaml-cpp-devel
  37. libpqxx (version 6)
  38. https://github.com/jtv/libpqxx/
  39. Debian/Ubuntu: libpqxx-dev
  40. Fedora/CentOS: libpqxx-devel
  41. Pandoc
  42. (Needed to build documentation, optional)
  43. https://pandoc.org/
  44. Debian/Ubuntu: pandoc
  45. Fedora/CentOS: pandoc
  46. gettext
  47. (envsubst command for tests)
  48. Debian/Ubuntu: gettext-base
  49. Fedora/CentOS: gettext
  50. PostgreSQL database and pg_virtualenv
  51. Debian/Ubuntu: postgresql-common, postgresql-server-dev-all
  52. Fedora/CentOS: postgresql-server, postgresql-server-devel

On Linux systems most of these libraries are available through your package
manager, see the list above for the names of the packages. But make sure to
check the versions. If the packaged version available is not new enough, you’ll
have to install from source. Most likely this is the case for Libosmium.

Building

Use CMake to build in the usual way, for instance:

  1. mkdir build
  2. cd build
  3. cmake ..
  4. cmake --build .

If there are several versions of PostgreSQL installed on your system, you
might have to set the PG_CONFIG variable to the full path like so:

  1. cmake -DPG_CONFIG=/usr/lib/postgresql/14/bin/pg_config ..

If you don’t want to build the PostgreSQL plugin set BUILD_PLUGIN to OFF.

  1. cmake -DBUILD_PLUGIN=OFF ..

If you only want to build the PostgreSQL plugin:

  1. cd postgresql-plugin
  2. mkdir build
  3. cd build
  4. cmake ..
  5. cmake --build .

Database Setup

You need a PostgreSQL database with

  • config option wal_level=logical,
  • config option max_replication_slots set to at least 1,
  • a user with REPLICATION attribute, and
  • a database containing an OSM database where this user has access.

There is an unofficial test/structure.sql provided in this repository to
set up an OSM database for testing. Do not use it for production, use the
official way of installing an OSM database instead.

(The original for this file is in the openstreetmap-website repository at
https://github.com/openstreetmap/openstreetmap-website/raw/master/db/structure.sql)

Running

There are several commands in the build/src directory. They all can be
called with -h or --help to see how they are run. They all need a common
config file, a template is in osmdbt-config.yaml. This will be found
automatically if it is in the current directory, use -c or --config to
set a different path.

Documentation

There are man pages for all commands in man and an overview page in
man/osmdbt.md.

If you have pandoc installed they will be built when running make.

Tests

To run the tests after build call ctest.

Debian Package

To create a Debian/Ubuntu package, call debuild -I.

The Debian package will contain the executables and the man pages. It will
not contain the PostgreSQL plugin, because that needs to be built for the
exact PostgreSQL version you have.

Usage

First set up the configuration file and make sure you can access the database
by running:

  1. osmdbt-testdb

Then enable replication:

  1. osmdbt-enable-replication

After that get current log files once every minute (or whatever you want the
update interval to be). Use cron or something like it to handle this:

  1. osmdbt-get-log --catchup

To create an OSM change file from the log, call

  1. osmdbt-create-diff

To disable replication, use:

  1. osmdbt-disable-replication

For more details see the individual man pages.

How it works in detail

This section describes how everything is supposed to work in detail. The
whole process is controlled from one shell script run once per minute. It
looks something like this:

  1. #!/bin/sh
  2. set -e
  3. osmdbt-catchup
  4. osmdbt-get-log
  5. # optionally copy log file(s) to other hosts
  6. osmdbt-catchup
  7. osmdbt-create-diff

1. Catch up old log files

If there are complete log files left over from a crash, they will be in the
log_dir directory and named *.log.

osmdbt-catchup is called without command line arguments. It finds those
left-over log files and tells the PostgreSQL database the largest of the LSNs
so that the database can “forget” all changes before that.

If there was no crash, no such log files are found and osmdbt-catchup does
nothing.

2. Create log file

Now osmdbt-get-log is called which creates a log file in the log_dir named
something like osm-repl-2020-03-18T14:18:49Z-lsn-0-1924DE0.log. The file is
first created with the suffix .new, synced to disk, then renamed and the
directory is synced.

If any of these steps fail or if the host crashes, a .new file might be
left around, which should be flagged for the sysadmin to take care of. The
file can be removed without loosing data, but the circumstances should be
reviewed in case there is some systematic problem.

3. Copy log file to separate host (optional)

All files named *.log in the log_dir can now be copied (using scp or
rsync or so) to a separate host for safekeeping. These will only be used if
the local host crashes and log files on its disk are lost. In this case
manual intervention is necessary.

4. Catch up database to new log file

Now osmdbt-catchup is called to catch up the database to the log file just
created in step 2.

If the system crashes in step 2, 3, or 4 a log file might be left around
without the database being updated. In this case step 1 of the next cycle
will pick this up and do the database update.

5. Creating diff file

Now osmdbt-create-diff is called which reads any log files in the log_dir
and creates replication diff files. Files are first created in the tmp_dir
directory and then moved into place in the changes_dir and its
subdirectories. osmdbt-create-diff will also read the state.txt in the
changes_dir file and create a new one. See the manual page for
osmdbt-create-diff for the details on how this is done exactly.

Log files and lock files

  • The programs osmdbt-get-log, osmdbt-fake-log, and osmdbt-catchup use
    the same PID/lock file run_dir/osmdbt-log making sure that only one of
    them is running.
  • The program osmdbt-create-diff uses a different PID/lock file, it can run
    in parallel to the other programs, but only one copy of it will run.
  • osmdbt-create-diff can handle any number of log files, so if it is not
    run for a while it will recover by reading all log files it finds and
    creating one replication diff file with the (sorted) data from all of them.
  • All programs will write files under different names or in separate
    directories, sync the files and only then move them into place atomically.
    After that directories are synced.
  • After diff and state files are created in the tmp_dir, before they are
    moved to the changes_dir, a file osmdbt-create-diff.lock is created
    in tmp_dir. This is removed after osmdbt-create-diff finished moving
    all state and diff files to their final place and appending the .done
    suffix to the log file(s). If this is left lying around, osmdbt-create-diff
    will not run any more and the sysadmin needs to intervene.

External processing needed

When run in production you should regularly

  • remove old log files marked as done (files in log_dir named *.log.done)
  • remove log file copies you might have made on a separate host

To make sure everything runs smoothly, the age of the PID files can be checked
(should never be more than a few seconds) and the existence of older (more
than a minute or so) log files named *.log.new.

License

Copyright (C) 2021-2022 Jochen Topf (jochen@topf.org)

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see https://www.gnu.org/licenses.

Authors

This program was written and is maintained by Jochen Topf (jochen@topf.org).