Fourth-generation web archive workflow system
The fourth-generation of the PANDAS web archiving workflow system. While open source
this is not yet documented or packaged for use outside our (NLA)
infrastructure.
PANDAS provides a user interface for curators to perform website selection, collection building,
scheduled crawling and quality assuarance using various web crawlers (Heritrix, Browsertrix, HTTrack).
PANDAS requires the following tools to be installed:
On RHEL/CentOS 7 install with:
yum install -y epel-release
yum install -y ghostscript httrack java-11-openjdk-devel python36 python36-devel python36-setuptools
pip install pywb
PANDAS is known to work with Oracle, Postgresql and MariaDB. It may work with other databases that are supported by
Hibernate and support sequences and recursive CTEs.
MySQL is assumed to currently not work due to the use of sequences. H2 is used for automated tests but is not
recommended for production due to bugs in its CTE support.
Compile with:
mvn package
Run the package jar with:
java -jar ui/target/pandas-admin.jar
Set the following environment variables:
## Database details
#PANDAS_DB_URL=
#PANDAS_DB_USER=
#PANDAS_DB_PASSWORD=
## Webapp details
# PORT=3001
# CONTEXT_PATH=/admin
## Path to store lucene indexes
#DATA_PATH=/tmp/data
## OpenID Connect authentication (optional)
#OIDC_URL=
#OIDC_CLIENT_ID=
#OIDC_CLIENT_SECRET=
## Browsertrix backend
#BROWSERTRIX_WORKERS=4
#BROWSERTRIX_PAGE_LIMIT=1000
#BROWSERTRIX_USER_AGENT_SUFFIX=
PANDAS can use Keycloak for single sign on. It may be possible to adapt it other OpenID Connect auth servers although
account editing and access roles currently make use of Keycloak-specific features.
Versions known to work:
Configure Keycloak in this way:
Create a new client for each application (e.g. pandas-ui, pandas-gatherer, pandas-delivery, bamboo etc).
/*
Press save.
https://localhost:8443/auth/realms/pandas
If you want to be able to manage Keycloak users from within PANDAS, you’ll need to grant it the manage-users permission.
Then set OIDC_ADMIN_URL to the save value as OIDC_URL in the PANDAS UI environment:
OIDC_ADMIN_URL=http://keycloak.example/auth/realms/pandas
SAVE_USER_TO_KEYCLOAK=true