kmerDB | Ioannis Mouratidis

kmerDB

Overview

kmerDB is a comprehensive database that consolidates genomic and proteomic k-mer sequence information across all species in Genbank and UniProt. This resource enables rapid species identification, comparative genomic studies, and evolutionary analysis.

Features

Comprehensive Coverage: Encompasses k-mer data from all species in major sequence databases
Dual Coverage: Includes both genomic (DNA) and proteomic (amino acid) sequences
Fast Queries: Optimized data structures enable rapid k-mer lookups
Species Identification: Enables efficient molecular diagnostics and species authentication
100-fold Compression: Novel compression procedures reduce data storage requirements dramatically

Technical Implementation

The database was built using advanced compression algorithms achieving 100-fold data reduction while maintaining query performance. This enables storage and analysis of k-mer information from the entire tree of life.

Applications

Species identification and authentication
Comparative genomics
Evolutionary studies
Molecular diagnostics
Environmental monitoring
Food authentication

Publications

Mouratidis, I., Baltoumas, F. A., Chantzi, N., et al. (2024). kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species. Computational and Structural Biotechnology Journal, 23.

Last updated on Jan 2024

Bioinformatics Database K-Mers Genomics

Authors

Ioannis Mouratidis (he/him)

Research Engineer

Research engineer focused on AI safety, alignment, and AIxBio security. I study how capabilities and values emerge during model training and build scalable interventions to align frontier models, drawing on deep expertise in biosecurity and biological foundation models. 38 publications (12 first or senior author) and 3 patents in large-scale data analysis and ML for biology; co-founded an AI-driven cancer-diagnostics startup and authored grants securing $4M+ in competitive funding.

← Zseeker Jan 2025

Neomer Diagnostics Jan 2022 →