METABOLIC-container

This is the 2nd attempt of containerizing the METABOLIC software for easy deployment, the previous attempt is done by tin6150 can be found here. We used the miniconda3 base image to install the dependencies using conda (Original yml contributed by Dr. Daan Speth), therefore improving the image's long-term maintainability.

Building the container

We recommend our users to use the pre-builds images to save the troubles.

For docker users:

docker pull magicprotoss/metabolic:20260326

For apptainer users:

apptainer build metabolic-4.0.sif docker://magicprotoss/metabolic:20260326

If you still want to do the building on your own, please continue with the following steps

Make sure that docker and apptainer is installed, and you have the right privileges to run them
Clone the repo to your local machine and cd into it
Prep the dbCAN2 and MEROPS database in accordance with the steps in run_to_setup.sh, save them to the current dir (you'll need a working METABOLIC install on your machine to do so).

Build the container

sudo docker build --build-arg GTDBTK_VER=<GTDB_tk_version> -t metabolic-container .
sudo apptainer build metabolic-4.0.sif docker-daemon://metabolic-container:latest

Database versions

The METABOLIC-4.0 software needs the following databases to run

These files are distributed alongside the software on Github, when building the container, the latest versions will be pulled alongside the executables.

METABOLIC_hmm_db (Commits on Mar 31, 2022)
METABOLIC_template_and_database (Commits on Jun 12, 2023)
Motif (Commits on Sep 3, 2021)

These 2 databases are downloaded when running the run_to_setup.sh, since these 2 won't get updated regularly, we've packed them into the container

dbCAN2 (Aug 18, 2021), (btw, currently the link is broken)
MEROPS (Feb 22, 2023)

These 2 databases are extra large, and updated frequently, we recommend the users to download them separately, and bind mount (-B <local_path>:<container_path>) them to the container when running the software

kofam_scan (ftp://ftp.genome.jp/pub/db/kofam) note that you can run the 'hmmpress_kegg_db' convenience script the first time to prep the binary version of the .hmm files to reduce the runtime.
NMDC (国家微生物科学数据中心) mirror: ftp://download.nmdc.cn/Kofam
GTDB The database version should match the GTDB-Tk version, we packed GTDB-tk v2.6.1 into the container, so the database version should be R226 in this instance. For full reference, please check the GTDB-tk docs.
NMDC (国家微生物科学数据中心) Mirror: ftp://download.nmdc.cn/tools/meta/gtdb

To download from ftp sites using your PC, you need a ftp client like FileZilla.

Usage

We recommend using apptainer, to test whether the container works, run the command down below (note that this will produce a METABOLIC_out folder in your current directory):

# METABOLIC-G
apptainer exec \
    -e \
    -B <path_to_gtdbtk_databse>:/metabolic_dbs/gtdbtk \
    -B <path_to_kofam_scan_database>:/metabolic_dbs/kofam_scan \
    -B <path_to_METABOLIC_test_files>:/opt/METABOLIC/METABOLIC_test_files \
    <path_to_metabolic-4.0.sif> \
    METABOLIC-G.pl -test true
# METABOLIC-C
apptainer exec \
    -e \
    -B <path_to_gtdbtk_databse>:/metabolic_dbs/gtdbtk \
    -B <path_to_kofam_scan_database>:/metabolic_dbs/kofam_scan \
    -B <path_to_METABOLIC_test_files>:/opt/METABOLIC/METABOLIC_test_files \
    <path_to_metabolic-4.0.sif> \
    METABOLIC-C.pl -test true

The METABOLIC software has 3 main scripts:

METABOLIC-G.pl for genome wide profiling (TLDR: everything except relative abundance)
METABOLIC-C.pl for community wide profiling (TLDR: everything above + relative abundance)
METABOLIC-C.2nd_run.pl for running 2 repeatedly (TLDR: You want per-sample MW-scores to plot an error bar)

The detailed usage can be found on the author's wiki

Now lets explain what else needs to be done when you're running it inside a container:

apptainer exec \
    -e \
    -B <path_to_gtdbtk_databse>:/metabolic_dbs/gtdbtk \
    -B <path_to_kofam_scan_database>:/metabolic_dbs/kofam_scan \
    <path_to_metabolic-4.0.sif> \
    METABOLIC-*.pl <OPTIONS>

The -e flag tells your host to not map additional environment vaivables inside the container. For example, if you have a different version of R installed on the host and the R_LIBS_USER and the R_LIBS_SITE got mixed into the container, the plotting scripts might break.

The -B flag maps the folders into the containers, in this example the first line maps the GTDB R226 database, the second line maps the kofam_scan database, and the third line maps your data folder. Apptainer also binds your current dir into the container (please also bear in mind that apptainer does not support soft links, you need to cd to the real path beforehand, otherwise your $HOME path will be bind to the working dir instead), so no need to change the fps in the Reads_pairs.txt file. For docker users however, the default working dir is /data inside the container, you should use '/data/your_fastq_folder/your.fastq' in Reads_pairs.txt instead of the path on the host system ('<path_to_your_data>/your_fastq_folder/your.fastq').

Then you put the path to the metabolic-4.0.sif after the flags, after which you can put in your metabolic commands like in the tutorials.

Note that path to the output folder also appear inside the container, for simplicity you can just use relative file path (i.e. METABOLIC_out), this will create it in (docker: the <path_to_your_data> folder); (apptainer: the current dir) and move it afterwards.

There's a few tricks we want to share:

Make the sample names (*.fastq file names), MAG name (*.fasta), and especially Contig-IDs (check for duplicates as well, it's super rare but it happens from time to time) in MAGs (fasta header) as simple as you can, having special characters in these files may cause problems.
METABOLIC can get stuck at formatting output table if you throw too many genomes at it, we recommend splitting the genomes into sub-folders (i.e. 500 genomes per folder) and run the Perl scripts multiple times to reduce run time.
The METABOLIC-C.2nd_run.pl doesn't do much computing, if your server have enough cpus and ram, you can soft link the outputs of METABOLIC-C.pl into separate folders and use GNU parallel to run multiple instances of METABOLIC-C.2nd_run.pl at once instead of waiting for a loop to finish.

Offline installation

For users having trouble pulling images or downloading the databases, we also host the pre-built metabolic-4.0.sif file on BaiduNetdisk (百度网盘).

Citation

The source of the METABOLIC software is at https://github.com/AnantharamanLab/METABOLIC

If you found this tool useful, please consider citing the original work, which can be found at Microbiome:

Zhou, Z., Tran, P.Q., Breister, A.M. et al. METABOLIC: high-throughput profiling of microbial genomes for functional traits, metabolism, biogeochemistry, and community-scale functional networks. Microbiome 10, 33 (2022). https://doi.org/10.1186/s40168-021-01213-8

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
METABOLIC-C.2nd_run.pl		METABOLIC-C.2nd_run.pl
METABOLIC-C.pl		METABOLIC-C.pl
METABOLIC_v4.0_env_no_gtdbtk.yml		METABOLIC_v4.0_env_no_gtdbtk.yml
README.md		README.md
Release Note.md		Release Note.md
hmmpress_kegg_db.sh		hmmpress_kegg_db.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

METABOLIC-container

Building the container

Database versions

Usage

Offline installation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

METABOLIC-container

Building the container

Database versions

Usage

Offline installation

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages