Skip to content

[mergeMSTs] Problems with mst and query #24

@eseiler

Description

@eseiler

Hey there,

While using the mergeMSTs branch, I ran into some trouble with mst and query.

mst

mantis mst doesn't seem to work.

It wants to load eqclass_rr.cls files:

mantis/src/mst.cc

Lines 33 to 34 in 7406e8f

eqclass_files =
mantis::fs::GetFilesExt(prefix.c_str(), mantis::EQCLASS_FILE);

This will later lead to a segmentation fault because the files do not exist.

mantis build will always delete eqclass_rr.cls files at the end:

mantis/src/mst.cc

Lines 729 to 737 in 7406e8f

if (opt.remove_colorClasses && !opt.keep_colorclasses) {
for (auto &f : mantis::fs::GetFilesExt(opt.prefix.c_str(), mantis::EQCLASS_FILE)) {
std::cerr << f.c_str() << "\n";
if (std::remove(f.c_str()) != 0) {
std::cerr << "Unable to delete file " << f << "\n";
std::exit(1);
}
}
}

mantis build doesn't have an option to toggle this behavior.
Changing qopt.remove_colorClasses = true; to qopt.remove_colorClasses = false; here, fixes the issue:

qopt.prefix = bopt.out; qopt.numThreads = bopt.numthreads; qopt.remove_colorClasses = true;

query

The default non-bulk query only works if the eqclass_rr.cls files are present and -1 is used:

mantis query -1 -k 20 -p index/ reads.fasta

To have eqclass_rr.cls files, the above fix is needed, and mst must have been run with -k.

Alternatively, bulk-mode (-b) works without the eqclass_rr.cls files. So, mst can also be run with -d.

mantis query -b -k 20 -p index/ reads.fasta

The problem in non-bulk query seems to be that findSamples is called for every query sequence:

mantis/src/mstQuery.cc

Lines 492 to 498 in 7406e8f

while (ipfile >> read) {
mstQuery.reset();
mstQuery.parseKmers(numOfQueries, read, indexK);
mstQuery.findSamples(cdbg, cache_lru, &rs, queryStats, 1);
output_results(mstQuery, opfile, sampleNames, queryStats, 1);
numOfQueries++;
}

The function then accesses cdbg.get_current_cqf()->keybits():

uint64_t ksize{cdbg.get_current_cqf()->keybits()}, numBlocks{cdbg.get_numBlocks()};

This works fine for the first query, but for the second one there is no CQF to access because it has been replaced with
an invalid one:

cdbg.replaceCQFInMemory(invalid);

I tried loading the first block 0 at the begin of findSamples and just passing the keybits as an extra parameter.
But then there is an out-of-bounds access at

allQueries[q][numSamples]++;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions