Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/github_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.8]
python-version: [3.12]
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
Expand All @@ -44,17 +44,17 @@ jobs:
echo "TEST fdog.setup"
fdog.setup -d /home/runner/work/fDOG/fDOG/dt --woFAS
echo "TEST fdog.checkData"
fdog.checkData -s /home/runner/work/fDOG/fDOG/dt/searchTaxa_dir -c /home/runner/work/fDOG/fDOG/dt/coreTaxa_dir -a /home/runner/work/fDOG/fDOG/dt/annotation_dir --reblast
fdog.checkData -s /home/runner/work/fDOG/fDOG/dt/searchTaxa_dir -c /home/runner/work/fDOG/fDOG/dt/coreTaxa_dir -a /home/runner/work/fDOG/fDOG/dt/annotation_dir --reblast --ignoreAnno
echo "TEST fdog.showTaxa"
fdog.showTaxa
echo "TEST fdog.run"
fdog.run --seqFile infile.fa --jobName test --refspec HUMAN@9606@3 --fasOff --group mammalia
fdog.run --seqFile infile.fa --jobName test --refspec HUMAN@9606@qfo24_02 --fasOff --group mammalia
mkdir seeds
path=$(fdog.setup -d ./ --getSourcepath); a="1 2 3"; for i in ${a[@]}; do cp $path/data/infile.fa seeds/$i.fa; done
echo "TEST fdogs.run"
fdogs.run --seqFolder seeds --jobName test_multi --refspec HUMAN@9606@3 --fasOff --searchTaxa PARTE@5888@3,THAPS@35128@3 --hmmScoreType sequence
fdogs.run --seqFolder seeds --jobName test_multi --refspec HUMAN@9606@qfo24_02 --fasOff --searchTaxa PARTE@5888@qfo24_02,THAPS@35128@qfo24_02 --hmmScoreType sequence
echo "TEST fdog.addTaxon"
head /home/runner/work/fDOG/fDOG/dt/searchTaxa_dir/HUMAN@9606@3/HUMAN@9606@3.fa > hm.fa
head /home/runner/work/fDOG/fDOG/dt/searchTaxa_dir/HUMAN@9606@qfo24_02/HUMAN@9606@qfo24_02.fa > hm.fa
fdog.addTaxon -f hm.fa -i 9606 -o ./ -c -a
ls
- name: Deploy
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# fDOG - Feature-aware Directed OrtholoG search
[![published in: MBE](https://img.shields.io/badge/published%20in-MBE-ff69b4)](https://doi.org/10.1093/molbev/msaf120)
[![PyPI version](https://badge.fury.io/py/fdog.svg)](https://pypi.org/project/fdog/)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![Build Status](https://travis-ci.com/BIONF/fDOG.svg?branch=master)](https://travis-ci.com/BIONF/fDOG)
![Github Build](https://github.com/BIONF/fDOG/workflows/build/badge.svg)

# Table of Contents
Expand All @@ -19,7 +19,7 @@

# How to install

*fDOG* tool is distributed as a python package called *fdog*. It is compatible with [Python ≥ v3.7](https://www.python.org/downloads/).
*fDOG* tool is distributed as a python package called *fdog*. It is compatible with [Python ≥ v3.12](https://www.python.org/downloads/).

## Install the fDOG package
You can install *fdog* using `pip`:
Expand Down Expand Up @@ -59,7 +59,7 @@ You will get a warning if any of the dependencies are not ready to use, please s
*fdog* will run smoothly with the provided sample input file 'infile.fa' if everything is set correctly.

```
fdog.run --seqFile infile.fa --jobName test --refspec HUMAN@9606@3
fdog.run --seqFile infile.fa --jobName test --refspec HUMAN@9606@qfo24_02
```
The output files with the prefix `test` will be saved at your current working directory.
You can have an overview about all available options with the command
Expand All @@ -71,15 +71,15 @@ Please find more information in [our wiki](https://github.com/BIONF/fDOG/wiki) t

# fDOG data set

Within the data package we provide a set of 78 reference taxa. They can be automatically downloaded during the setup. This data comes "ready to use" with the *fdog* framework. Species data must be present in the three directories listed below:
Within the data package we provide a set of [81 reference taxa](https://ftp.ebi.ac.uk/pub/databases/reference_proteomes/QfO/QfO_release_2024_02.tar.gz). They will be automatically downloaded during the setup. This data comes "ready to use" with the *fdog* framework. Species data must be present in the three directories listed below:

* searchTaxa_dir (Contains sub-directories for proteome fasta files for each species)
* coreTaxa_dir (Contains sub-directories for BLAST databases made with `makeblastdb` out of your proteomes)
* annotation_dir (Contains feature annotation files for each proteome)

For each species/taxon there is a sub-directory named in accordance to the naming schema ([Species acronym]@[NCBI ID]@[Proteome version])

*fdog* is not limited to those 78 taxa. If needed the user can manually add further gene sets (multiple fasta format) using provided functions.
*fdog* is not limited to those 81 reference taxa. If needed the user can manually add further gene sets (multiple fasta format) using provided functions.

## Adding a new gene set into fDOG
For adding **one gene set**, please use the `fdog.addTaxon` function:
Expand Down Expand Up @@ -112,7 +112,7 @@ _**NOTE:** After adding new taxa into *fdog*, you should [check for the validity
Any bug reports or comments, suggestions are highly appreciated. Please [open an issue on GitHub](https://github.com/BIONF/fDOG/issues/new) or be in touch via email.

# How to cite
Ebersberger, I., Strauss, S. & von Haeseler, A. HaMStR: Profile hidden markov model based search for orthologs in ESTs. BMC Evol Biol 9, 157 (2009), [doi:10.1186/1471-2148-9-157](https://doi.org/10.1186/1471-2148-9-157)
Tran V, Langschied F, Muelbaier H, Dosch J, Arthen F, Balint M, Ebersberger I. 2025. Feature architecture-aware ortholog search with fDOG reveals the distribution of plant cell wall-degrading enzymes across life. Molecular Biology and Evolution:msaf120. https://doi.org/10.1093/molbev/msaf120

# Contributors
- [Ingo Ebersberger](https://github.com/ebersber)
Expand Down
7 changes: 3 additions & 4 deletions fdog/addTaxa.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,11 @@
from Bio import SeqIO
import multiprocessing as mp
from tqdm import tqdm
from ete3 import NCBITaxa
import re
import shutil
from datetime import datetime
import time
from pkg_resources import get_distribution
from importlib.metadata import version, PackageNotFoundError
from collections import OrderedDict

import fdog.libs.zzz as general_fn
Expand Down Expand Up @@ -66,8 +65,8 @@ def parse_map_file(mapping_file, folIn):


def main():
version = get_distribution('fdog').version
parser = argparse.ArgumentParser(description='You are running fDOG version ' + str(version) + '.')
fdog_version = version("fdog")
parser = argparse.ArgumentParser(description='You are running fDOG version ' + str(fdog_version) + '.')
required = parser.add_argument_group('required arguments')
optional = parser.add_argument_group('optional arguments')
required.add_argument('-i', '--input', help='Path to input folder', action='store', default='', required=True)
Expand Down
6 changes: 3 additions & 3 deletions fdog/addTaxon.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,16 @@
import shutil
import multiprocessing as mp
from datetime import datetime
from pkg_resources import get_distribution
from importlib.metadata import version, PackageNotFoundError

import fdog.libs.zzz as general_fn
import fdog.libs.tree as tree_fn
import fdog.libs.addtaxon as add_taxon_fn


def main():
version = get_distribution('fdog').version
parser = argparse.ArgumentParser(description='You are running fDOG version ' + str(version) + '.')
fdog_version = version("fdog")
parser = argparse.ArgumentParser(description='You are running fDOG version ' + str(fdog_version) + '.')
required = parser.add_argument_group('required arguments')
optional = parser.add_argument_group('optional arguments')
required.add_argument('-f', '--fasta', help='FASTA file of input taxon', action='store', default='', required=True)
Expand Down
25 changes: 14 additions & 11 deletions fdog/checkData.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,12 @@
import subprocess
import shutil
from Bio import SeqIO
from ete3 import NCBITaxa
from ete4 import NCBITaxa
import re
from datetime import datetime
import multiprocessing as mp
from tqdm import tqdm
from pkg_resources import get_distribution
from Bio.Blast.Applications import NcbiblastpCommandline

from importlib.metadata import version, PackageNotFoundError

import fdog.libs.zzz as general_fn
import fdog.libs.blast as blast_fn
Expand Down Expand Up @@ -176,13 +174,18 @@ def run_check_fasta(checkDir, replace, delete, concat):

def check_blastdb(args):
""" Check for outdated blastdb """
(query, taxon, coreTaxa_dir, searchTaxa_dir) = args
blast_db = '%s/%s/%s' % (coreTaxa_dir, taxon, taxon)
query, taxon, coreTaxa_dir, searchTaxa_dir = args
blast_db = f"{coreTaxa_dir}/{taxon}/{taxon}"

try:
blastp_cline = NcbiblastpCommandline(query = query, db = blast_db)
stdout, stderr = blastp_cline()
except:
result = subprocess.run(
["blastp", "-query", query, "-db", blast_db],
capture_output=True, text=True, check=True
)
return(result.stdout)
except subprocess.CalledProcessError as e:
return([query, blast_db])

fai_in_genome = "%s/%s/%s.fa.fai" % (searchTaxa_dir, taxon, taxon)
fai_in_blast = "%s/%s/%s.fa.fai" % (coreTaxa_dir, taxon, taxon)
# check if fai_in_blast is a valid symlink
Expand Down Expand Up @@ -418,8 +421,8 @@ def run_check(args):
return(caution)

def main():
version = get_distribution('fdog').version
parser = argparse.ArgumentParser(description='You are running fDOG version ' + str(version) + '.')
fdog_version = version("fdog")
parser = argparse.ArgumentParser(description='You are running fDOG version ' + str(fdog_version) + '.')
parser.add_argument('-s', '--searchTaxa_dir', help='Path to search taxa directory (e.g. fdog_dataPath/searchTaxa_dir)', action='store', default='')
parser.add_argument('-c', '--coreTaxa_dir', help='Path to blastDB directory (e.g. fdog_dataPath/coreTaxa_dir)', action='store', default='')
parser.add_argument('-a', '--annotation_dir', help='Path to feature annotation directory (e.g. fdog_dataPath/annotation_dir)', action='store', default='')
Expand Down
1 change: 0 additions & 1 deletion fdog/libs/addtaxon.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
from pathlib import Path
from Bio import SeqIO
import subprocess
from ete3 import NCBITaxa
import re
from datetime import datetime
from collections import OrderedDict
Expand Down
29 changes: 14 additions & 15 deletions fdog/libs/blast.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@

import os
import sys
from Bio.Blast.Applications import NcbiblastpCommandline
import xml.etree.ElementTree as ET
import subprocess

Expand All @@ -29,21 +28,21 @@ def do_blastsearch(
""" Perform blastp search for a query fasta file
Return an XML string contains blast result
"""
filter = 'no'
if lowComplexityFilter == True:
filter = 'yes'
filter_value = "yes" if lowComplexityFilter else "no"
try:
blastp_cline = NcbiblastpCommandline(
query = query, db = blast_db, evalue = evalBlast, seg = filter,
max_target_seqs = 10, outfmt = 5)
stdout, stderr = blastp_cline()
return(stdout)
except:
sys.exit(
'ERROR: Error running blastp search for %s against %s\n%s'
% (query, blast_db, NcbiblastpCommandline(
query = query, db = blast_db, evalue = evalBlast, seg = filter,
max_target_seqs = 10, outfmt = 5)))
cmd = [
"blastp",
"-query", query,
"-db", blast_db,
"-evalue", str(evalBlast),
"-seg", filter_value,
"-max_target_seqs", "10",
"-outfmt", "5"
]
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
return result.stdout
except subprocess.CalledProcessError as e:
sys.exit(f"ERROR: Error running BLASTP search for {query} against {blast_db}\n{e.stderr}")


def parse_blast_xml(blast_output):
Expand Down
2 changes: 1 addition & 1 deletion fdog/libs/corecompile.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
import os
import shutil
from pathlib import Path
from ete3 import NCBITaxa
from ete4 import NCBITaxa
from Bio import SeqIO
import time

Expand Down
22 changes: 10 additions & 12 deletions fdog/libs/preparation.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@

import sys
import os
import subprocess
from pathlib import Path
from Bio import SeqIO
from Bio.Blast.Applications import NcbiblastpCommandline
from ete3 import NCBITaxa
from ete4 import NCBITaxa

import fdog.libs.zzz as general_fn
import fdog.libs.fasta as fasta_fn
Expand Down Expand Up @@ -107,17 +107,15 @@ def check_input(args):

def check_blast_version(corepath, refspec):
""" Check if blast DBs in corepath is compatible with blastp version """
fdog_path = os.path.realpath(__file__).replace('/libs/preparation.py','')
query = fdog_path + '/data/infile.fa'
blast_db = '%s/%s/%s' % (corepath, refspec, refspec)
fdog_path = os.path.realpath(__file__).replace('/libs/preparation.py', '')
query = os.path.join(fdog_path, 'data', 'infile.fa')
blast_db = os.path.join(corepath, refspec, refspec)
try:
blastp_cline = NcbiblastpCommandline(
query = query, db = blast_db)
stdout, stderr = blastp_cline()
except:
sys.exit(
'ERROR: Error running blast (probably conflict with BLAST DBs versions)\n%s'
% (NcbiblastpCommandline(query = query, db = blast_db)))
cmd = ["blastp", "-query", query, "-db", blast_db]
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
except subprocess.CalledProcessError as e:
sys.exit(f"ERROR: Error running BLAST (probably conflict with BLAST DB versions)\n{e.stderr}")


def check_ranks_core_taxa(corepath, refspec, minDist, maxDist):
""" Check if refspec (or all core taxa) have a valid minDist and maxDist tax ID
Expand Down
14 changes: 8 additions & 6 deletions fdog/libs/tree.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
#######################################################################

import re
from ete3 import NCBITaxa
from ete4 import NCBITaxa

import fdog.libs.zzz as general_fn

Expand Down Expand Up @@ -57,8 +57,9 @@ def get_ancestor(id1, id2, ncbi):
Return dictionary {ancestor_id: ancestor_rank}
"""
tree = ncbi.get_topology([id1, id2], intermediate_nodes = False)
ancestor = tree.get_common_ancestor(id1, id2).name
return(ncbi.get_rank([ancestor]))
ancestor_name = tree.common_ancestor(id1, id2)
ancestor_id = int(ancestor_name.name)
return(ncbi.get_rank([ancestor_id]))


def check_common_ancestor(ref_id, ancestor, minDist, maxDist, ncbi):
Expand All @@ -68,6 +69,7 @@ def check_common_ancestor(ref_id, ancestor, minDist, maxDist, ncbi):
"""
ref_lineage = ncbi.get_lineage(ref_id)
(min_ref, max_ref) = get_rank_range(ref_lineage, minDist, maxDist, ncbi)
ancestor = int(ancestor)
if not ancestor in ref_lineage:
return(0)
ancestor_index = len(ref_lineage) - ref_lineage.index(ancestor) - 1
Expand All @@ -78,7 +80,7 @@ def check_common_ancestor(ref_id, ancestor, minDist, maxDist, ncbi):

def remove_clade(tree, node_id):
""" Remove a clade from a tree """
removed_clade = tree.search_nodes(name = str(node_id))[0]
removed_clade = list(tree.search_nodes(name = str(node_id)))[0]
removed_node = removed_clade.detach()
return(tree)

Expand All @@ -96,12 +98,12 @@ def get_leaves_dict(spec_lineage, tree, min_index, max_index):
for i in range(len(spec_lineage)):
if i >= min_index and i <= max_index:
curr_node = spec_lineage[i]
node = tree.search_nodes(name = str(curr_node))
node = list(tree.search_nodes(name = str(curr_node)))
if len(node) > 0:
for leaf in node:
node_dict[spec_lineage[i]] = []
for t in leaf.traverse():
if t.is_leaf():
if t.is_leaf:
if not t.name in already_added:
already_added.append(t.name)
node_dict[spec_lineage[i]].append(t.name)
Expand Down
6 changes: 3 additions & 3 deletions fdog/mergeOutput.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
from os import listdir as ldir
import argparse
import yaml
from pkg_resources import get_distribution
from importlib.metadata import version, PackageNotFoundError
from Bio import SeqIO

def createConfigPP(phyloprofile, domains_0, ex_fasta, directory, out):
Expand All @@ -37,8 +37,8 @@ def createConfigPP(phyloprofile, domains_0, ex_fasta, directory, out):


def main():
version = get_distribution('fdog').version
parser = argparse.ArgumentParser(description='You are running fDOG version ' + str(version) + '.')
fdog_version = version("fdog")
parser = argparse.ArgumentParser(description='You are running fDOG version ' + str(fdog_version) + '.')
parser.add_argument('-i', '--input',
help='Input directory, where all single output (.extended.fa, .phyloprofile, _forward.domains, _reverse.domains) can be found',
action='store', default='', required=True)
Expand Down
6 changes: 3 additions & 3 deletions fdog/removefDog.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
import argparse
import subprocess
import shutil
from pkg_resources import get_distribution
from importlib.metadata import version, PackageNotFoundError

import fdog.setupfDog as setupfDog_fn

Expand Down Expand Up @@ -48,8 +48,8 @@ def query_yes_no(question, default='yes'):


def main():
version = get_distribution('fdog').version
parser = argparse.ArgumentParser(description='You are running fDOG version ' + str(version) + '.')
fdog_version = version("fdog")
parser = argparse.ArgumentParser(description='You are running fDOG version ' + str(fdog_version) + '.')
parser.add_argument('--all', help='Remove fdog together with all files/data within the installed fdog directory', action='store_true', default=False)
args = parser.parse_args()
data = args.all
Expand Down
8 changes: 4 additions & 4 deletions fdog/runMulti.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@
import shutil
import multiprocessing as mp
from tqdm import tqdm
from ete3 import NCBITaxa
from pkg_resources import get_distribution
from ete4 import NCBITaxa
from importlib.metadata import version, PackageNotFoundError
import time

import fdog.libs.zzz as general_fn
Expand Down Expand Up @@ -161,8 +161,8 @@ def join_outputs(outpath, jobName, seeds, keep, silentOff):


def main():
version = get_distribution('fdog').version
parser = argparse.ArgumentParser(description='You are running fDOG version ' + str(version) + '.',
fdog_version = version("fdog")
parser = argparse.ArgumentParser(description='You are running fDOG version ' + str(fdog_version) + '.',
epilog="For more information on certain options, please refer to the wiki pages "
"on github: https://github.com/BIONF/fDOG/wiki")
required = parser.add_argument_group('Required arguments')
Expand Down
Loading