kmerDB
Jan 2024
·
1 min read
Overview
kmerDB is a comprehensive database that consolidates genomic and proteomic k-mer sequence information across all species in Genbank and UniProt. This resource enables rapid species identification, comparative genomic studies, and evolutionary analysis.
Features
- Comprehensive Coverage: Encompasses k-mer data from all species in major sequence databases
- Dual Coverage: Includes both genomic (DNA) and proteomic (amino acid) sequences
- Fast Queries: Optimized data structures enable rapid k-mer lookups
- Species Identification: Enables efficient molecular diagnostics and species authentication
- 100-fold Compression: Novel compression procedures reduce data storage requirements dramatically
Technical Implementation
The database was built using advanced compression algorithms achieving 100-fold data reduction while maintaining query performance. This enables storage and analysis of k-mer information from the entire tree of life.
Applications
- Species identification and authentication
- Comparative genomics
- Evolutionary studies
- Molecular diagnostics
- Environmental monitoring
- Food authentication
Publications
Mouratidis, I., Baltoumas, F. A., Chantzi, N., et al. (2024). kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species. Computational and Structural Biotechnology Journal, 23.

Authors
Ioannis Mouratidis
(he/him)
Senior Research Engineer/Scientist Associate
Machine learning and genomics researcher with 35 publications (10 first or senior author).
Co-founded AI-driven cancer biomarker startup, authored grants securing $4M+ in competitive funding
and currently lead a 5-member team with a focus in developing novel computational methods and testing
the capabilities and safety profiles of biological foundation models.