Data retrieval, simple validation, Chouette import and export
Marduk orchestrates the timetable data import pipeline.
Marduk receives datasets from the following channels:
Input data are primarily NeTEx datasets. GTFS datasets are still in use but will ultimately be migrated to NeTEx.
Marduk performs basic validation checks on the input files (check that the file is a valid zip archive, simple data format check) and then initiates the import workflow:
Some NeTEx validation rules are time-dependent, in particular those that rely on external reference data such as the Norwegian Stop Place Register.
It is necessary to revalidate periodically the imported datasets to guarantee that they still refer to valid stop places. Revalidation allows also for pruning expired data, such as trip whose effective date is in the past.
Marduk schedules a nightly revalidation of every dataset which triggers a regeneration of each NeTEx export file. Expired data are removed from the new exports.
In addition to orchestrating NeTEx data export, Marduk triggers also an export of GTFS data (Damu)
Marduk merges the NeTEx datasets containing flexible timetables generated in NPlan (Uttu and Enki) with those generated in Chouette.
OpenTripPlanner relies on OpenStreetMap data to calculate the first/last leg of a journey (walk from start point or to destination point).
Marduk schedules a nightly download of OpenStreetMap data that in turn is used by OpenTripPlanner to build an updated street graph.
A minimal local setup requires a database, a Google PubSub emulator and access to a providers repository service (Baba)
Marduk uses a database to store the history of imported file names and checksums.
This is used by the idempotent filter to reject files that have been already imported.
A Docker PostgreSQL database instance can be used for local testing:
docker run -p 5432:5432 --name marduk-database -e POSTGRES_PASSWORD=myPostgresPassword postgres:13
A test database can be created with the following commands from the psql client:
create database marduk;
create user marduk with password 'mypassword';
ALTER ROLE marduk SUPERUSER;
The database configuration is specified in the Spring Boot application.properties file:
# Datasource
spring.datasource.driver-class-name=org.postgresql.Driver
spring.datasource.url=jdbc:postgresql://localhost:5432/marduk
spring.datasource.username=marduk
spring.datasource.password=mypassword
spring.flyway.enabled=true
When setting the property spring.flyway.enabled=true
, the database will be auto-created at application startup.
See https://cloud.google.com/pubsub/docs/emulator for details on how to install the Google PubSub emulator.
The emulator is started with the following command:
gcloud beta emulators pubsub start
and will listen by default on port 8085.
The emulator port must be set in the Spring Boot application.properties file as well:
spring.cloud.gcp.pubsub.emulatorHost=localhost:8085
camel.component.google-pubsub.endpoint=localhost:8085
Access to the providers database is configured in the Spring Boot application.properties file:
providers.api.url=http://localhost:11101/services/providers/
The application.properties file used in unit tests src/test/resources/application.properties can be used as a template.
The Kubernetes configmap helm/marduk/templates/configmap.yaml can also be used as a template.
mvn package
to generate the Spring Boot jar.java -Xmx500m -Dspring.config.location=/path/to/application.properties -Dfile.encoding=UTF-8 -jar target/marduk-0.0.1-SNAPSHOT.jar