CRISPRroots: RNA-seq based on-target and off-target assessment of Cas9-mediated edits
CRISPRroots v.1.3: CRISPR–Cas9-mediated
edits with accompanying RNA-seq data assessed for on-target and off-target sites
The CRISPR/Cas9 genome editing tool can be used to study genomic variants and gene knockouts.
By combining CRISPR/Cas9 mediated editing with transcriptomic analyses it is possible to measure
the effects of genome alterations on gene expression. In such experiments it is crucial to understand
not only if the editing was successful but also if the observed differential gene expression is the result
of intended genome edits and not that of unwanted off-target effects. Potential off-targets sites that
CRISPR/Cas9 may have edited need therefore to be verified by sequencing. However, a CRISPR/Cas9
gRNA may have hundreds (or thousands) potential off-target binding sites in a genome. How to decide
which of them should be validated with higher priority? The RNA-seq data that can be sequenced
as part of a CRISPR/Cas9 experiment contains information about the sequence and expression level
of potential off-target sites/genes located in transcribed regions. Here we preset CRISPRroots, a
method that combines CRISPR/Cas9 and guide RNA binding properties, gene expression changes,
and sequence variants between edited and non-edited cells, to discover and rank potential off-targets.
The method is described in the corresponding publication (see below).
The CRISPRroots pipeline can be executed via
Snakemake.
All other software requirements are satisfied by the CRISPRroots container, available at Docker Hub:
docker pull gcorsi1993/crisprroots
The container is provided to Snakemake using the option --use-singularity
.
The pipeline was tested in a x86 64 GNU/Linux environment with Ubuntu v.18.04.1 (or newer)
installed.
CRISPRroots is available to download from
https://rth.dk/resources/crispr/.
A test dataset is also provided on the same website.
After downloading and un-packing the software and the test dataset, we recommend
to explore the directory structure of the test dataset.
resources
# folder with reference filesQPRT_DEL268T_chr16_10M-40M
# folder with data examplesmake_config.py
# setup scriptthe script make_config.py
can be used to automatically set the paths to the data, resources, code directory and singularity in the configuration file config.yaml
located inCRISPRroots_test_dataset/QPRT_DEL268T_chr16_10M-40M
.
To run the script, you need python3.
To setup the config file for the test dataset, run:
$ cd CRISPRroots_test_dataset
$ python3 make_config.py --CRISPRroots <path_to_CRISPRroots> --singularity <path_to_singularity>
The config file contains the parameters defined for the execution of the various steps of the pipeline.
A copy of the config file for the CRISPRroots test dataset is provided together with
the CRISPRroots software package, under resources
. This file can be used as template to create the configuration file
for your own dataset.
For a complete list of parameters, options, and usage examples for CRISPRroots please read the
CRISPRroots_Manual.pdf included in the CRISPRroots software folder.
You can run the pipeline from within the directory containing the config file (in the test dataset this is
the subfolder QPRT_DEL268T_chr16_10M-40M) as follows:
$ cd QPRT_DEL268T_chr16_10M-40M # example with test dataset
$ snakemake -s <path_to_CRISPRroots>/run.smk --cores <int> --use-singularity --singularity-args
"--bind <global_path_to_QPRT_DEL268T_chr16_10M-40M>:<global_path_to_QPRT_DEL268T_chr16_10M-40M>
--bind <global_path_to_resources>:<global_path_to_resources>
--bind <global_path_to_CRISPRroots-1.3>:<global_path_to_CRISPRroots-1.3>" --dry-run
$ snakemake -s <path_to_CRISPRroots>/run.smk --cores <int> --use-singularity --singularity-args
"--bind <global_path_to_QPRT_DEL268T_chr16_10M-40M>:<global_path_to_QPRT_DEL268T_chr16_10M-40M>
--bind <global_path_to_resources>:<global_path_to_resources>
--bind <global_path_to_CRISPRroots-1.3>:<global_path_to_CRISPRroots-1.3>"
We recommend to first run Snakemake with --dryrun
.
This displays what will be done without executing it and highlights if any input file is missing.
In the commands above, --cores
specifies the maximum number of cores used in parallel by Snakemake.
The option --singularity-args
is necessary to provide snakemake all the parameters to be passed to the singularity run. These are usually the --bind
parameters that link local folders to the singularity.
The pre-computed results/reports for the test dataset are available in the folder
CRISPRroots_test_dataset/QPRT_DEL268T_chr16_10M-40M/pre-computed.
Hint: Snakemake allows to visualize the jobs as a graph (directed acyclic graph, or DAG),
highlighting the jobs completed and those to be run in different ways.
To create an svg plot of your DAG, run the following command:
snakemake -s <path_to_CRISPRroots>/run.smk --dag | dot -Tsvg > dag.svg
To learn more about the DAG and the visualization of jobs please visit the Snakemake tutorial
at https://snakemake.readthedocs.io/en/stable/tutorial/basics.html (Step 4: Indexing read alignments and visualizing the DAG of jobs).
The pipeline can also be used to accomplish only specific tasks.
For example, to only perform the the pre-processing of the dataset and read mapping,
you can run CRISPRroots with the rule flag preproc_and_map
at the end:
$ snakemake -s <path_to_CRISPRroots>/run.smk --cores <int> --use-singularity --singularity-args
"--bind <global_path_to_QPRT_DEL268T_chr16_10M-40M>:<global_path_to_QPRT_DEL268T_chr16_10M-40M>
--bind <global_path_to_resources>:<global_path_to_resources>
--bind <global_path_to_CRISPRroots-1.3>:<global_path_to_CRISPRroots-1.3>" preproc_and_map
The rule preproc_and_map only executes the pre-processing and mapping steps.
Relevant predefined target rules include:
<path_to_results_folder>/2_sortaligned/<sample name>/Aligned.Sorted.bam
<path_to_results_folder>/preproc/
<path_to_results_folder>/6_GATK_variants/<sample name>/variants_filtered.vcf
<path_to_report_folder>/on_target_knockin.xlsx
;<path_to_report_folder>/on_target_knockout.xlsx
<path_to_results_folder>/6_GATK_variants/variated_genome.fa
<path_to_results_folder>/2-1_RSeQC_libtype/
<path_to_results_folder>/2_sortaligned/
<path_to_report_folder>/report/mapping_stats.xlsx
<path_to_results_folder>/preproc/
<path_to_report_folder>/report/multiqc_samples_stats.xlsx
NB: In the test dataset, <path_to_results_folder>
and <path_to_report_folder>
correspond to the
subfolders results and report in that will be generated inside
CRISPRroots_test_dataset/QPRT_DEL268T_chr16_10M-40M/ after completing the pipeline.
A useful flag for the execution of specific tasks is --notemp
.
This avoids removing output files defined as temporary in the pipeline
(e.g. partially processed reads). It can be convenient to use it when executing only a
part of the pipeline, to avoid the removal of temporary files that will need to be recreated
if required by a subsequent execution of the pipeline.
An example of how to set up CRISPRroots
to run it with the Slurm Workload Manager is given in the test directory as cluster_run.sh
. Configurations are given in cluster_config.yaml
. To test the cluster_run.sh
script on your system, first you need to fill in the fields:
<set_global_path_to_run.smk>
: path to the run.smk script in the CRISPRroots-1.3 folder--singularity-args "<set_singularity_args_eg._binds>"
: all your singularity arguments,Once the script is complete, you can run it simply as:
./cluster_run.sh
You can add the name of a target rule to run only a part of the pipeline
./cluster_run.sh [target_rule]
The pipeline’s output files are collected in two folders: report and results.
The report folder contains the main output, including the candidate off-targets and the knockin/knockout
assessment. Results regarding differential expression and processing statistics (data quality and mapping) are also present.
The results folder contains all the output files of the pipeline that are not marked as temp
(use --notemp
to persist all files).
Please refer to the CRISPRroots_Manual.pdf included in the CRISPRroots software folder for a complete description fo the
output and its content.
Copyright 2021 by the contributors:
Giulia Corsi giulia@rth.dk, Veerendra Gadekar veer@rth.dk
GNU GENERAL PUBLIC LICENSE
This is a free software: you can redistribute it and/or modify it under the terms of the GNU
General Public License, either version 3 of the License, or (at your option) any later version.
See http://www.gnu.org/licenses/.
This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
If you use CRISPRroots in your publication please cite:
CRISPRroots: on- and off-target assessment of RNA-seq data in CRISPR-Cas9 edited cells
Corsi GI, Gadekar VP, Gorodkin J, Seemann SE. Nucleic Acids Research (2021, in press).
In case of problems or bug reports, please contact software+crisprroots@rth.dk