Genomic Data Compression Tool
Jan 2025
·
1 min read
Overview
A novel compression tool developed in C++ and Python specifically optimized for multiple genomic file formats. This tool significantly reduces storage requirements while dramatically improving compression speed compared to existing solutions.
Performance
- 10-20% smaller file sizes compared to standard genomic compression tools
- 50-70% faster compression times enabling real-time analysis
- Multiple format support: Handles various genomic data formats
- Lossless compression: Maintains data integrity for scientific applications
Technical Approach
The tool leverages domain-specific knowledge about genomic data structure to achieve superior compression ratios and speeds. Implementation in C++ provides low-level performance optimization while Python bindings enable easy integration into bioinformatics pipelines.
Impact
This compression tool enables:
- Reduced storage costs for large-scale genomic projects
- Faster data transfer and backup operations
- Real-time compression for sequencing pipelines
- More efficient cloud-based genomic analysis
Applications
- Large-scale sequencing projects
- Genomic data archiving
- Cloud-based bioinformatics platforms
- Real-time sequencing data processing

Authors
Ioannis Mouratidis
(he/him)
Senior Research Engineer/Scientist Associate
Machine learning and genomics researcher with 35 publications (10 first or senior author).
Co-founded AI-driven cancer biomarker startup, authored grants securing $4M+ in competitive funding
and currently lead a 5-member team with a focus in developing novel computational methods and testing
the capabilities and safety profiles of biological foundation models.