kmerDB

kmerDB

Overview

kmerDB is a comprehensive database that consolidates genomic and proteomic k-mer sequence information across all species in Genbank and UniProt. This resource enables rapid species identification, comparative genomic studies, and evolutionary analysis.

Features

Comprehensive Coverage: Encompasses k-mer data from all species in major sequence databases
Dual Coverage: Includes both genomic (DNA) and proteomic (amino acid) sequences
Fast Queries: Optimized data structures enable rapid k-mer lookups
Species Identification: Enables efficient molecular diagnostics and species authentication
100-fold Compression: Novel compression procedures reduce data storage requirements dramatically

Technical Implementation

The database was built using advanced compression algorithms achieving 100-fold data reduction while maintaining query performance. This enables storage and analysis of k-mer information from the entire tree of life.

Applications

Species identification and authentication
Comparative genomics
Evolutionary studies
Molecular diagnostics
Environmental monitoring
Food authentication

Publications

Mouratidis, I., Baltoumas, F. A., Chantzi, N., et al. (2024). kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species. Computational and Structural Biotechnology Journal, 23.

Last updated on Jan 2024

Bioinformatics Database K-Mers Genomics

Authors

Ioannis Mouratidis (he/him)

Senior Research Engineer/Scientist Associate

Machine learning and genomics researcher with 35 publications (10 first or senior author). Co-founded AI-driven cancer biomarker startup, authored grants securing $4M+ in competitive funding and currently lead a 5-member team with a focus in developing novel computational methods and testing the capabilities and safety profiles of biological foundation models.

← Zseeker Jan 2025

Neomer Diagnostics Jan 2022 →

No results found

Overview

Features

Technical Implementation

Applications

Publications