Real-Time Streaming Data Pipeline

Overview

This project was developed by Eduardo Passos, and it implements a real-time streaming data pipeline using Apache Airflow, RabbitMQ, and MongoDB within a Dockerized environment. It's designed to fetch data from an external API, process it, and store it for further analysis or real-time applications.

Architecture

Components

Data Source
- Purpose: Fetches data from an external API (https://randomuser.me/api/).
- Benefits: Provides fresh and real-time data for processing.
Airflow Scheduler
- Purpose: Triggers DAGs (Directed Acyclic Graphs) to execute at scheduled intervals.
- Benefits: Ensures timely and regular data fetching, maintaining the consistency of the data flow.
Airflow Worker
- Purpose: Executes the tasks defined in the DAGs, such as data fetching and processing.
- Benefits: Enables scalable and parallel processing of data.
RabbitMQ
- Purpose: Acts as a message broker to decouple data processing and data storage.
- Benefits: Provides a reliable and scalable way to handle message queues, ensuring that data is not lost between processing steps.
Data Processor
- Purpose: Consumes messages from RabbitMQ and processes or formats them as needed.
- Benefits: Offers flexibility in data processing and enables additional processing layers if required.
MongoDB
- Purpose: Stores the processed data.
- Benefits: Offers a scalable and flexible NoSQL database for storing large volumes of data with varied structures.
Airflow Webserver
- Purpose: Provides a user interface to monitor and manage the Airflow DAGs.
- Benefits: Enhances the user experience with easy monitoring and operational capabilities for the data pipeline.

Conclusion

This real-time streaming data pipeline leverages modern data engineering tools to efficiently process and store data, ensuring scalability, reliability, and ease of monitoring and management.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
dags		dags
img		img
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time Streaming Data Pipeline

Overview

Architecture

Components

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Real-Time Streaming Data Pipeline

Overview

Architecture

Components

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages