ACME: Efficient Parallel Motif Extraction from Very Long Sequences

ACME is an advanced parallel motif extractor. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90\% speedup in parallel execution. ACME is the only method that: (i) scales to gigabyte-long sequences (e.g., entire human genome); (ii) scales to large alphabets (e.g., English alphabet for Wikipedia); (iii) supports interesting types of motifs (e.g., supermaximal motifs) with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud and supercomputers. Compared to the current state of the art, ACME reduces the extraction time for an exact-length query from almost 4 hours to 7 minutes on a typical workstation; can handle 3 orders of magnitude longer sequences; and scales up to 16,384 cores on a supercomputer..



Relevant Data Sets

Human Genome (2.6GB, 8MB) , Protein (32MB) and English text from Wikipedia​ (1GB)

Relevant Publications

Majed SahliEssam Mansour, Panos Kalnis: ACME: Efficient Parallel Motif Extraction from Very Long Sequences. Technical Report  (PDF​)