Skip to content

brunobrise/genomx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenomX: Advanced Offline Genomic Analytics

License: MIT Demo

Overview

GenomX is a hyper-localized, offline-first advanced genomic analysis engine. It was engineered to parse, visualize, and evaluate human DNA sequences strictly on-premise. By severing all dependencies on cloud computing APIs and third-party data serialization, GenomX ensures absolute privacy, rendering data brokering impossible.

The software is capable of ingesting everything from standard consumer microarrays to advanced Whole Genome Sequencing (WGS) datasets, processing the raw logic against universal biomedical registries in real time.

⚖️ Disclaimer: Not Medical Advice

This software is provided for research, educational, and informational purposes only.

  • Not a Medical Device: GenomX is not a clinical tool or a medical device. It is not intended to diagnose, treat, cure, or prevent any disease or medical condition.
  • No Physician-Patient Relationship: Use of this software does not establish a physician-patient relationship.
  • Consult Professionals: Always seek the advice of a board-certified genetic counselor or a qualified physician regarding any genomic findings. Never disregard professional medical advice or delay in seeking it because of information generated by this software.
  • Accuracy: Genomic science is an evolving field. The interpretations provided by this tool are based on public datasets (ClinVar) and may be incomplete, outdated, or potentially inaccurate.
  • Liability: The authors and contributors assume no liability for any health decisions or actions taken based on the output of this software.

Core Architectural Modules

1. Universal Ingestion Router

GenomX bypasses format locking by parsing datasets irrespective of the host laboratory.

  • Consumer Microarray Matrices (.csv): Parses standard consumer arrays (23andMe, MyHeritage, etc.), safely dropping no-call markers and standardizing single nucleotide polymorphisms (SNPs) into isolated memory chunks.
  • Advanced WGS Processing (src/vcf_parser.py): Enterprise-grade Variant Call Format parser. Intakes massive advanced sequences, imposes strict FILTER=PASS logic, and executes mathematical unpacking of probabilistic quality matrices.

Acquiring Advanced Datasets (WGS): To utilize the VCF analytical engine at its maximum capability (mapping ~100% of the 3.2 billion human base pairs), users are advised to acquire 30x Whole Genome Sequencing arrays. Certified providers include:

  • Nebula Genomics (Cryptographic privacy, pristine VCF exports).
  • Dante Labs (European high-throughput sequencing hub).
  • Sequencing.com (Developer-centric raw-data extraction).

2. Pathological Sweep Engine (src/global_sweep.py)

Replaces conventional string-matching algorithms with direct physical coordinate alignments.

  • Correlates the user's localized genome against over 8 million registry lines from the NIH ClinVar Database.
  • Operates a multi-stage strand-flip resolution tool (normalizing plus and minus strand inversions) to effectively suppress false-positive disease assertions.

3. Polygenic Risk Formulation (src/advanced_prs_engine.py)

Progresses beyond monogenic (single-gene) screening into cumulative biological modeling.

  • Executes weighted statistical modeling for complex conditions (e.g., Cardiovascular Disease, Type 2 Diabetes).
  • Tracks paternal and maternal macro-lineage chromosomal markers.

4. Air-Gap Initializer (utils/init_offline_databases.py)

Enforces the Air-Gap computing doctrine. Pulls the multi-gigabyte pathobiological registries and global mapping coordinates directly to the local drive, permanently sealing the environment for isolated, offline execution.

Frontend Visualization Interface (index.html)

  • Physical Architectural Mapping: Integrated Karyotype Ideogram reconstructs 23 chromosomes visually, locating detected base pairs on physical coordinates.
  • High-Density Database Engine: Instantaneous local searching across hundreds of complex phenotype markers.
  • EHR PDF Export: Native CSS print intelligence for exporting standardized diagnostic records.
  1. Run Analysis & Generate Report: Place your DNA file (CSV or VCF) in the data/ directory.

    python3 utils/generate_local_report.py

    This consolidates all pathological, polygenic, and lineage data into a secure, untracked local payload.

  2. View Dashboard:

    open index.html

Security & Privacy

  • Zero Telemetry: No data ever leaves your machine.
  • No Remote Auth: No login required.
  • Local Databases: All clinical references are stored locally.

License

Distributed under the MIT License. See LICENSE for more information.

Contributors