GMQL Packages and installation
GMQL is a GenoMetric Query Language, that runs over GDMS, Genomic Data Management System. This manual will help you to install GDMS to get started scripting GMQL.
For more information about GDMS architecture and GDMS packages go to GMQL.
- Guide for Apache Spark installation can be found in [Spark documentation page](https://spark.apache.org/docs/2.2.0/).
Maven installation (3 or greater) :
You can use this command in terminal (Ubuntu/Debian):
sudo apt-get install maven
Or go to maven installation web page
Git installed:
You can use this command in terminal (Ubuntu/Debian):
sudo apt-get install git-core
Or see Git installation website
The engine configurations should be set first for the shell installation.
In case of Cluster installation (see engine configurations), make sure that your Hadoop installation is configured and running.
Download GMQL Package,
using the following terminal command:
git clone https://github.com/DEIB-GECO/GMQL_Package.git
Or, by downloading a Tar, you should extract the tar in this case.
Install GMQL by running the following GMQL command in GMQL installation directory:
cd GMQL_Package
sh ./install.sh
The installer will pull the latest code of GMQL from the master branch of GMQL and compile the code using maven, finally, copy the Jars to lib/ directory.
You will find in the package bin/ folder the following shell executables:
bin/repositoryManager RegisterUser
For information about the repository manager see [repository shell APIs](https://github.com/DEIB-GECO/GMQL/blob/master/docs/SHELL_API.md).
GMQL-Submit:
This executable is used to submit GMQL script to GDMS engine without constructing GDMS repository. For example code see GMQL examples. The selection in this case is from a dataset directories and the materialization is to output directories. This is useful for trying GDMS without installing repository management but not recommended for long use of GDMS for multi-users, where a big number of datasets are generated which leads users to loosing track of the generated datasets.
Local execution Example:
bin/GMQL-Submit -scriptpath /home/$USER/GMQL_Package/examples/GMQL_Submit_Example_LOCAL.gmql
Another example that will read from HDFS and store in HDFS:
bin/GMQL-Submit -scriptpath /home/user/GMQL/examples/GMQL_Submit_Example_HDFS.gmql
GMQL-Submit-R
This executable is used to submit GMQL script to GDMS engine with the use of GDMS repository. For example code see GMQL examples. The selection in this case is using dataset names from the repository and the materialization is to datasets in repository. This is simpler to track generated datasets and manage the data in the system.
The following example read datasets from the repository and write the result in the repository:
bin/GMQL-Submit-R -scriptpath /home/user/GMQL/examples/GMQL_Submit_Repository_Example.gmql
The datasets mentioned in the code (ann, and exp) should be added to the repository first using repositoryManager command. You can find a sample data in examples/data/ann and examples/data/exp folders.
We provides a shell code that adds four datasets to GDMS repository and another shell script to run four scripts on those daatasets.
This code will add four datasets to the repository; run the code form GDMS installation folder:
examples/createInputDataSets.sh
This code will run four scripts of GMQL, the result will be found in the repository; run the code form GDMS installation folder:
examples/runScriptExamples.sh