TY - GEN
T1 - Fast and accurate genome anchoring using fuzzy hash maps
AU - Healy, John
AU - Chambers, Desmond
PY - 2011
Y1 - 2011
N2 - Although hash-based approaches to sequence alignment and genome assembly are long established, their utility is predicated on the rapid identification of exact k-mers from a hash-map or similar data structure. We describe how a fuzzy hash-map can be applied to quickly and accurately align a prokaryotic genome to the reference genome of a related species. Using this technique, a draft genome of Mycoplasma genitalium, sampled at 1X coverage, was accurately anchored against the genome of Mycoplasma pneumoniae. The fuzzy approach to alignment, ordered and orientated more than 65% of the reads from the draft genome in under 10 seconds, with an error rate of <1.5%. Without sacrificing execution speed, fuzzy hash-maps also provide a mechanism for error tolerance and variability in k-mer centric sequence alignment and assembly applications.
AB - Although hash-based approaches to sequence alignment and genome assembly are long established, their utility is predicated on the rapid identification of exact k-mers from a hash-map or similar data structure. We describe how a fuzzy hash-map can be applied to quickly and accurately align a prokaryotic genome to the reference genome of a related species. Using this technique, a draft genome of Mycoplasma genitalium, sampled at 1X coverage, was accurately anchored against the genome of Mycoplasma pneumoniae. The fuzzy approach to alignment, ordered and orientated more than 65% of the reads from the draft genome in under 10 seconds, with an error rate of <1.5%. Without sacrificing execution speed, fuzzy hash-maps also provide a mechanism for error tolerance and variability in k-mer centric sequence alignment and assembly applications.
UR - http://www.scopus.com/inward/record.url?scp=80052935774&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-19914-1_21
DO - 10.1007/978-3-642-19914-1_21
M3 - Conference contribution
AN - SCOPUS:80052935774
SN - 9783642199134
T3 - Advances in Intelligent and Soft Computing
SP - 149
EP - 156
BT - 5th International Conference on Practical Applications of Computational Biology and Bioinformatics (PACBB 2011)
A2 - Rocha, Miguel
A2 - Corchado Rodriguez, Juan
A2 - Fdez-Riverola, Florentino
A2 - Valencia, Alfonso
ER -