This is the 2nd attempt of containerizing the METABOLIC software for easy deployment, the previous attempt is done by tin6150 can be found here. We used the miniconda3 base image to install the dependencies using conda (Original yml contributed by Dr. Daan Speth), therefore improving the image's long-term maintainability.
We recommend our users to use the pre-builds images to save the troubles.
For docker users:
docker pull magicprotoss/metabolic:20260326For apptainer users:
apptainer build metabolic-4.0.sif docker://magicprotoss/metabolic:20260326If you still want to do the building on your own, please continue with the following steps
-
Make sure that docker and apptainer is installed, and you have the right privileges to run them
-
Clone the repo to your local machine and cd into it
-
Prep the dbCAN2 and MEROPS database in accordance with the steps in run_to_setup.sh, save them to the current dir (you'll need a working METABOLIC install on your machine to do so).
-
Build the container
sudo docker build --build-arg GTDBTK_VER=<GTDB_tk_version> -t metabolic-container . sudo apptainer build metabolic-4.0.sif docker-daemon://metabolic-container:latest
The METABOLIC-4.0 software needs the following databases to run
These files are distributed alongside the software on Github, when building the container, the latest versions will be pulled alongside the executables.
-
METABOLIC_hmm_db (Commits on Mar 31, 2022)
-
METABOLIC_template_and_database (Commits on Jun 12, 2023)
-
Motif (Commits on Sep 3, 2021)
These 2 databases are downloaded when running the run_to_setup.sh, since these 2 won't get updated regularly, we've packed them into the container
These 2 databases are extra large, and updated frequently, we recommend the users to download them separately, and bind mount (-B <local_path>:<container_path>) them to the container when running the software
-
kofam_scan (ftp://ftp.genome.jp/pub/db/kofam) note that you can run the 'hmmpress_kegg_db' convenience script the first time to prep the binary version of the .hmm files to reduce the runtime.
NMDC (国家微生物科学数据中心) mirror: ftp://download.nmdc.cn/Kofam -
GTDB The database version should match the GTDB-Tk version, we packed GTDB-tk v2.6.1 into the container, so the database version should be R226 in this instance. For full reference, please check the GTDB-tk docs.
NMDC (国家微生物科学数据中心) Mirror: ftp://download.nmdc.cn/tools/meta/gtdbTo download from ftp sites using your PC, you need a ftp client like FileZilla.
We recommend using apptainer, to test whether the container works, run the command down below (note that this will produce a METABOLIC_out folder in your current directory):
# METABOLIC-G
apptainer exec \
-e \
-B <path_to_gtdbtk_databse>:/metabolic_dbs/gtdbtk \
-B <path_to_kofam_scan_database>:/metabolic_dbs/kofam_scan \
-B <path_to_METABOLIC_test_files>:/opt/METABOLIC/METABOLIC_test_files \
<path_to_metabolic-4.0.sif> \
METABOLIC-G.pl -test true
# METABOLIC-C
apptainer exec \
-e \
-B <path_to_gtdbtk_databse>:/metabolic_dbs/gtdbtk \
-B <path_to_kofam_scan_database>:/metabolic_dbs/kofam_scan \
-B <path_to_METABOLIC_test_files>:/opt/METABOLIC/METABOLIC_test_files \
<path_to_metabolic-4.0.sif> \
METABOLIC-C.pl -test trueThe METABOLIC software has 3 main scripts:
-
METABOLIC-G.pl for genome wide profiling (TLDR: everything except relative abundance)
-
METABOLIC-C.pl for community wide profiling (TLDR: everything above + relative abundance)
-
METABOLIC-C.2nd_run.pl for running 2 repeatedly (TLDR: You want per-sample MW-scores to plot an error bar)
The detailed usage can be found on the author's wiki
Now lets explain what else needs to be done when you're running it inside a container:
apptainer exec \
-e \
-B <path_to_gtdbtk_databse>:/metabolic_dbs/gtdbtk \
-B <path_to_kofam_scan_database>:/metabolic_dbs/kofam_scan \
<path_to_metabolic-4.0.sif> \
METABOLIC-*.pl <OPTIONS>The -e flag tells your host to not map additional environment vaivables inside the container. For example, if you have a different version of R installed on the host and the R_LIBS_USER and the R_LIBS_SITE got mixed into the container, the plotting scripts might break.
The -B flag maps the folders into the containers, in this example the first line maps the GTDB R226 database, the second line maps the kofam_scan database, and the third line maps your data folder. Apptainer also binds your current dir into the container (please also bear in mind that apptainer does not support soft links, you need to cd to the real path beforehand, otherwise your $HOME path will be bind to the working dir instead), so no need to change the fps in the Reads_pairs.txt file. For docker users however, the default working dir is /data inside the container, you should use '/data/your_fastq_folder/your.fastq' in Reads_pairs.txt instead of the path on the host system ('<path_to_your_data>/your_fastq_folder/your.fastq').
Then you put the path to the metabolic-4.0.sif after the flags, after which you can put in your metabolic commands like in the tutorials.
Note that path to the output folder also appear inside the container, for simplicity you can just use relative file path (i.e. METABOLIC_out), this will create it in (docker: the <path_to_your_data> folder); (apptainer: the current dir) and move it afterwards.
There's a few tricks we want to share:
-
Make the sample names (*.fastq file names), MAG name (*.fasta), and especially Contig-IDs (check for duplicates as well, it's super rare but it happens from time to time) in MAGs (fasta header) as simple as you can, having special characters in these files may cause problems.
-
METABOLIC can get stuck at formatting output table if you throw too many genomes at it, we recommend splitting the genomes into sub-folders (i.e. 500 genomes per folder) and run the Perl scripts multiple times to reduce run time.
-
The
METABOLIC-C.2nd_run.pldoesn't do much computing, if your server have enough cpus and ram, you can soft link the outputs ofMETABOLIC-C.plinto separate folders and use GNU parallel to run multiple instances ofMETABOLIC-C.2nd_run.plat once instead of waiting for a loop to finish.
For users having trouble pulling images or downloading the databases, we also host the pre-built metabolic-4.0.sif file on BaiduNetdisk (百度网盘).
The source of the METABOLIC software is at https://github.com/AnantharamanLab/METABOLIC
If you found this tool useful, please consider citing the original work, which can be found at Microbiome:
Zhou, Z., Tran, P.Q., Breister, A.M. et al. METABOLIC: high-throughput profiling of microbial genomes for functional traits, metabolism, biogeochemistry, and community-scale functional networks. Microbiome 10, 33 (2022). https://doi.org/10.1186/s40168-021-01213-8