Fast and accurate genome anchoring using fuzzy hash maps

John Healy, Desmond Chambers

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

Although hash-based approaches to sequence alignment and genome assembly are long established, their utility is predicated on the rapid identification of exact k-mers from a hash-map or similar data structure. We describe how a fuzzy hash-map can be applied to quickly and accurately align a prokaryotic genome to the reference genome of a related species. Using this technique, a draft genome of Mycoplasma genitalium, sampled at 1X coverage, was accurately anchored against the genome of Mycoplasma pneumoniae. The fuzzy approach to alignment, ordered and orientated more than 65% of the reads from the draft genome in under 10 seconds, with an error rate of <1.5%. Without sacrificing execution speed, fuzzy hash-maps also provide a mechanism for error tolerance and variability in k-mer centric sequence alignment and assembly applications.

Original languageEnglish
Title of host publication5th International Conference on Practical Applications of Computational Biology and Bioinformatics (PACBB 2011)
EditorsMiguel Rocha, Juan Corchado Rodriguez, Florentino Fdez-Riverola, Alfonso Valencia
Pages149-156
Number of pages8
DOIs
Publication statusPublished - 2011

Publication series

NameAdvances in Intelligent and Soft Computing
Volume93
ISSN (Print)1867-5662

Fingerprint

Dive into the research topics of 'Fast and accurate genome anchoring using fuzzy hash maps'. Together they form a unique fingerprint.

Cite this