2nd generation of Data Warehouse, with improved processing latency and data quality.
Mysql -> Kafka -> Intermediate Avro Files -> Hive ACID Files
cache DML binglog events(insert update delete), and write events as avro files every 5min.
Customized hive streaming-mutation apis.
includes:
1. convert AVRO to ACID ORC Files.
2. adjust & convert compatible Data Types.
3. implement transaction fitrues.
4. batch put/get recordId to/from HBase.
used in streaming-mutation program and data repaiment.
convert and put intermediate avro files to warehouse.
1. insert intermediate avro files to hive.
2. process Data duplication and Data delay.
maintenance tools
includes
1. load recordId to HBase.
2. data duplication.
3. hive table compact.
4. HBase record insight.
5. create hive transaction table from mysql.