Skip to content

Conversation

@martin-steinegger
Copy link
Member

No description provided.

{"resultDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::resultDb },
{"alignmentDB", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &DbValidator::alignmentDb }}},
{"alignproteome", alignproteome, &par.alignproteome, COMMAND_ALIGNMENT,
"Within-result all-vs-all gapped local alignment",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would improve this text.

{"alignproteome", alignproteome, &par.alignproteome, COMMAND_ALIGNMENT,
"Within-result all-vs-all gapped local alignment",
NULL,
"Martin Steinegger <[email protected]>",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add you @Gyuuul2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file looks incomplete

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

}
}
if (dataMode & USE_LOOKUP || dataMode & USE_LOOKUP_REV) {
Debug(Debug::INFO) << "ReadLookup file: " << dataFileName << "\n"; //gyuri
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this print out.

}
char* lookupDataChar = (char *) lookupData.getData();
size_t lookupDataSize = lookupData.size();
Debug(Debug::INFO) << "Lookup Data size is " << lookupDataSize << "\n"; //gyuri
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this print out.

size_t lookupDataSize = lookupData.size();
Debug(Debug::INFO) << "Lookup Data size is " << lookupDataSize << "\n"; //gyuri
lookupSize = Util::ompCountLines(lookupDataChar, lookupDataSize, threads);
Debug(Debug::INFO) << "Lookup size is " << lookupSize << "\n"; //gyuri
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this print out.

commons/BaseMatrix.cpp
commons/Command.cpp
commons/CommandCaller.cpp
# commons/ClusterSpecies.cpp
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove it.


static size_t resultToBuffer(char * buffer, const result_t &result, bool addBacktrace, bool compress = true, bool addOrfPosition = false);

static size_t resultToBuffer_str(char * buffer, const result_t &result, bool addBacktrace, bool compress, bool addOrfPosition=false);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?


DBReader<unsigned int> tdbr(par.db1.c_str(), par.db1Index.c_str(), par.threads, DBReader<unsigned int>::USE_DATA|DBReader<unsigned int>::USE_INDEX);
// DBReader<unsigned int> tdbr(par.db1.c_str(), par.db1Index.c_str(), par.threads, DBReader<unsigned int>::USE_DATA|DBReader<unsigned int>::USE_INDEX);
DBReader<unsigned int> tdbr(par.db1.c_str(), par.db1Index.c_str(), par.threads, DBReader<unsigned int>::USE_DATA|DBReader<unsigned int>::USE_INDEX|DBReader<unsigned int>::USE_LOOKUP);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is not needed anymore

EvalueComputation evaluer(tdbr.getAminoAcidDBSize(), subMat, gapOpen, gapExtend);
const size_t flushSize = 100000000;
size_t iterations = static_cast<int>(ceil(static_cast<double>(dbr_res.getSize()) / static_cast<double>(flushSize)));
Debug(Debug::INFO) << "Number of iterations: " << iterations << " Size of linclust dbr_res : " << dbr_res.getSize() << '\n';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

}
};

struct __attribute__((__packed__)) memberProteinEntry{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MemberProteinEntry

size_t getSize() const;

unsigned int getProteomeTotalLen(size_t id); //gyuri
unsigned int getProteomeTotalLen(size_t id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would name this getSourceTotalLen or getSetTotalLen to be consistent with the .source naming

}

// void setEasyAlignproteomeMustPassAlong(Parameters *p){

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to implement this whenever you use the createParameterString(..., true) parameter. Or it won't get passed along.

par.filenames.pop_back();

CommandCaller cmd;
cmd.addVariable("ALIGNPROTEOME_PAR", par.createParameterString(par.alignproteome,true).c_str()); // what?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLUSTER_PAR, THREADS_PAR etc are missing

hash = FileUtil::getHashFromSymLink(tmpDir + "/latest");
}
tmpDir = FileUtil::createTemporaryDirectory(tmpDir, hash);
par.filenames.pop_back();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to pop_back() the output path too and pass it as an environment variable. Or else it will also be passed to createdb.

) : sequenceReader(NULL), index(NULL) {
int targetDbtype = FileUtil::parseDbType(dataName.c_str());
if (Parameters::isEqualDbtype(targetDbtype, Parameters::DBTYPE_INDEX_DB)) {
index = new DBReader<unsigned int>(dataName.c_str(), (dataName + ".index").c_str(), 1, DBReader<unsigned int>::USE_DATA|DBReader<unsigned int>::USE_INDEX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened here?

const unsigned int ProteomeId = lookup[i].fileNumber;
const unsigned int ProteinId = lookup[i].id;
ProteomeList[ProteomeId].addSeqLen(tProteinDB.getSeqLen(ProteinId));
if (ProteomeList[ProteomeId].proteomeKey == -1){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: please add spaces around if:
if[space](...)[space]{

std::vector<unsigned int> memberKeys;
memberKeys.reserve(50); // store key for every protein in a cluster

std::vector<bool> isProteomeInCluster(localMemCount.size(), false); ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double ;

also careful about bool vectors. They are not multi-threading friendly. Here its fine, as you don't do any multi-threading

if (memberKeys.size() > 1) { //If not a singleton cluster
ClusterEntry eachClusterRep(memberKeys.size());
//init MemberProteinEntry and add it to memberProteins vector
for (auto& eachMemberKey : memberKeys){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: missing space after the for

for (size_t idx = 0; idx < ProteomeList.size(); idx++){
if (ProteomeList[idx].isCovered()) {
continue;
}else{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: missing spaces around else

DBReader<unsigned int>::LookupEntry* tLookup = tProteinDB.getLookup();
const size_t tLookupSize = tProteinDB.getLookupSize();
unsigned int totalProteomeNumber = tLookup[tLookupSize - 1].fileNumber;
std::vector<ProteomeEntry> ProteomeList(totalProteomeNumber + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: should be lowercase: proteomeList


int gapOpen, gapExtend;
BaseMatrix *subMat;
subMat = new SubstitutionMatrix(par.scoringMatrixFile.values.aminoacid().c_str(), 2.0, par.scoreBias);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid the explicit allocation here:

SubstitutionMatrix subMat(par.scoringMatrixFile.values.aminoacid().c_str(), 2.0, par.scoreBias);
...

#pragma omp critical
{
for (size_t idx=0; idx < localMemCount.size(); idx++){
ProteomeList[idx].incrementMemCount(localMemCount[idx]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you use __sync_fetch_and_add inside incrementMemCount you don't need the critical

#pragma omp critical
{
for (size_t idx = 0; idx < ProteomeList.size(); idx++) {
ProteomeList[idx].addSeqId(localSeqIds[idx]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you might need the critical as atomic floats can be tricky to do correctly

proteomeClustWriter.close();
proteinClustWriter.close();
tProteinDB.close();
delete subMat;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you rewrite the subMat init above, you can remove this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants