Projects | Ioannis Mouratidis

Genomic Data Compression Tool

Wed, 01 Jan 2025 00:00:00 +0000

Overview

A novel compression tool developed in C++ and Python specifically optimized for multiple genomic file formats. This tool significantly reduces storage requirements while dramatically improving compression speed compared to existing solutions.

Performance

10-20% smaller file sizes compared to standard genomic compression tools
50-70% faster compression times enabling real-time analysis
Multiple format support: Handles various genomic data formats
Lossless compression: Maintains data integrity for scientific applications

Technical Approach

The tool leverages domain-specific knowledge about genomic data structure to achieve superior compression ratios and speeds. Implementation in C++ provides low-level performance optimization while Python bindings enable easy integration into bioinformatics pipelines.

Impact

This compression tool enables:

Reduced storage costs for large-scale genomic projects
Faster data transfer and backup operations
Real-time compression for sequencing pipelines
More efficient cloud-based genomic analysis

Applications

Large-scale sequencing projects
Genomic data archiving
Cloud-based bioinformatics platforms
Real-time sequencing data processing

Zseeker

Wed, 01 Jan 2025 00:00:00 +0000

Overview

Zseeker is an open-source Python tool designed for optimized detection of Z-DNA forming sequences in large genomic datasets. Z-DNA is an alternative left-handed DNA structure implicated in gene regulation, genome instability, and various biological processes.

Features

High Performance: Optimized algorithms enable analysis of entire genomes
Accuracy: Improved detection accuracy compared to previous methods
Scalability: Designed to handle large-scale genomic datasets
Easy to Use: Simple Python interface with clear documentation
Open Source: Freely available for academic and commercial use

Technical Details

Zseeker implements advanced algorithms for identifying sequences capable of forming Z-DNA structures based on sequence composition and thermodynamic properties. The tool is optimized for both speed and accuracy, making it suitable for genome-wide analyses.

Use Cases

Genome-wide Z-DNA mapping
Regulatory element identification
Genome stability studies
Comparative genomics of non-B DNA structures
Disease-associated variant analysis

Publications

Wang, G., Mouratidis, I., Provatas, K., et al. (2025). ZSeeker: an optimized algorithm for Z-DNA detection in genomic sequences. Briefings in Bioinformatics, 26(3).

kmerDB

Mon, 01 Jan 2024 00:00:00 +0000

Overview

kmerDB is a comprehensive database that consolidates genomic and proteomic k-mer sequence information across all species in Genbank and UniProt. This resource enables rapid species identification, comparative genomic studies, and evolutionary analysis.

Features

Comprehensive Coverage: Encompasses k-mer data from all species in major sequence databases
Dual Coverage: Includes both genomic (DNA) and proteomic (amino acid) sequences
Fast Queries: Optimized data structures enable rapid k-mer lookups
Species Identification: Enables efficient molecular diagnostics and species authentication
100-fold Compression: Novel compression procedures reduce data storage requirements dramatically

Technical Implementation

The database was built using advanced compression algorithms achieving 100-fold data reduction while maintaining query performance. This enables storage and analysis of k-mer information from the entire tree of life.

Applications

Species identification and authentication
Comparative genomics
Evolutionary studies
Molecular diagnostics
Environmental monitoring
Food authentication

Publications

Mouratidis, I., Baltoumas, F. A., Chantzi, N., et al. (2024). kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species. Computational and Structural Biotechnology Journal, 23.

Neomer Diagnostics

Sat, 01 Jan 2022 00:00:00 +0000

Overview

Co-founded Neomer Diagnostics in 2022 as Chief Technical Officer to translate patented nullomer research into a clinical cancer detection platform. The company developed machine learning pipelines for detecting cancer from liquid biopsies.

Role & Achievements

As CTO, I:

Developed ML pipeline in Bash, Julia, Python, and Slurm for cancer detection from liquid biopsies
Achieved AUC ranging from 0.89 to 0.94 in lung and ovarian cancers
Established regulatory roadmap for clinical validation and FDA approval
Secured $850K in translational research funding
Led technical team and coordinated with clinical partners

Technology

The platform leveraged sequences absent from the human genome (nullomers) as biomarkers for cancer detection. Machine learning models were trained on cell-free DNA and RNA data from liquid biopsies to distinguish cancer patients from healthy controls.

Clinical Applications

Early cancer detection
Cancer screening in high-risk populations
Monitoring treatment response
Detecting minimal residual disease

Funding & Recognition

Secured $850K in translational research funding
Patent portfolio covering nullomer-based diagnostics
Partnerships with clinical institutions

Period

January 2022 - May 2023