Streaming Data Pipeline

Description

This project serves as an example for teaching the HWE Course at 1904Labs.

A Kafka producer publishes to the kafka topic reviews.
A spark streaming application consumes reviews from the kafka topic. Within each review is a customer_id.
The Spark streaming application joins each review with a record retrieved from Hbase, and uses this customer_ic to make that join.
Spark streaming stores this enriched record in HDFS.
Hive is used to query the data from hdfs.