Investigating the resistome, bacterial composition, and mobilome in hospital wastewaters in Metro Manila using a shotgun metagenomics approach
This study provides an initial report of antibiotic resistance genes (ARGs), antibiotic resistant bacteria (ARBs), and mobile genetic elements (MGEs) in influent hospital wastewater (HWW) from three hospitals using shotgun metagenomic sequencing.
This serves as a guide to run the analysis pipeline written in Snakemake.
This Snakemake pipeline requires the package manager Conda and the workflow management system Snakemake. Additional dependencies not handled by Snakemake are described in Section 1.3.
$ curl -sL \
"https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > \
"Miniconda3.sh"
$ bash Miniconda3.sh
$ conda update conda
$ rm Miniconda3.sh
$ conda install wget
$ conda config --add channels conda-forge
$ conda update -n base --all
$ conda install -n base mamba
$ mamba create -c conda-forge -c bioconda -n snakemake snakemake
This creates an isolated enviroment containing the latest Snakemake. To activate it:
$ conda activate snakemake
To test snakemake:
$ snakemake --help
Install git and gawk. We require gawk to process the filtering stage of our databases.
$ mamba install git
$ mamba install gawk
Download ATTACK-AMR from the online repository, or using the command line:
$ git clone https://github.com/bioinfodlsu/attack_amr_pipeline
The pipeline requires, at the very least: (1) Metagenomic sequences (sample sequences can be downloaded here), and (2) reference databases (CARD, Kraken2, ISFinder, PlasmidFinder, and INTEGRALL).
Note: For CARD, only the nucleotide_fasta_protein_homolog_model.fasta file was used. For Kraken, the Standard-16 Database was used for taxonomic analysis. CARD and MGE fasta files were renamed as card.fasta, ISFinder.fasta, PlasmidFinder.fasta, and integrall.fasta respectively.
All downloaded databases should be placed in the following directories:
- Metagenomic Sequences: ~/data
- CARD: ~/card_db
- Kraken: ~/kraken2_db
- ISFinder: ~/ISFinder_db
- PlasmidFinder: ~/PlasmidFinder_db
- INTEGRALL: ~/integrall_db
With the snakemake conda environment activated, you can call the pipeline from the top-level directory of ATTACK-AMR:
$ cd attack_amr_pipeline
$ snakemake --use-conda --cores all
In case of errors encountered relating to the use of conda environments, please use the following command:
$ snakemake --use-conda --cores all --conda-frontend conda
Outputs are stored the top-level directory of ATTACK-AMR. The following outputs should be present.
ARG (CARD):
- card_db/card_length.txt
- card_out/ARG_genemat.txt
Taxonomic (Kraken2):
- kreport2mpa_norm/merged_metakraken_abundance_table.txt
MGE
- ISFinder_db/IS_length.txt
- ISFinder_out/ISFinder_genemat.txt
- PlasmidFinder_db/PlasmidFinder_length.txt
- PlasmidFinder_out/PlasmidFinder_genemat.txt
- integrall_db/integrall_length.txt
- integrall_out/integrall_genemat.txt
Before running the R analysis notebooks, it is ideal to place all of the above output files into one directory (the same directory where the R analysis notebooks are at).
Before running the notebooks in this repository, ensure you have prepared the following files for your samples:
metadata.csv– metadata describing your samplesbases_number.csv– number of bases in your samplescard_drug_class.txt– CARD drug class information with columns, Gene and Class retrieved from CARD website
Analysis scripts are made each for ARG analysis, Taxonomic analysis and MGE analysis. These are located in the notebooks folder in the repository. The notebooks are written in R used to produced data visualizations.
Note that some library dependencies may need to be first installed through this command:
install.packages('<insert name of library>')