Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture
-
Updated
Jan 19, 2023 - Jupyter Notebook
Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture
This project aims to build a Retrieval-Augmented Generation (RAG) engine to provide context-aware recommendations based on user queries.
Builds a Spark Standalone Cluster on Docker in local with MinIO integration
Quick look into Iceberg Table that underpin Iceberg Data Lake
Quick look into Delta Table that underpin Delta Lake
This project implements my master’s thesis on building a scalable, ACID-compliant data lakehouse architecture for IoT and industrial workloads, in a AWS-native environment.
Quick look into Hudi Table that underpin Hudi Data Lake
🚀 Automate nightly builds of MinIO Community Edition binaries and Docker images for easy access to the latest releases.
Add a description, image, and links to the open-table-format topic page so that developers can more easily learn about it.
To associate your repository with the open-table-format topic, visit your repo's landing page and select "manage topics."