This repository is a collection of codes implementing Big data concepts and algorithms, many of them being direct implementation of some well known research papers. This reposirory includes following:
- Classification based on Associations
- A-Close
- Improved Apriori implementation using hashing
- Improved Apriori implementation using partition based approach
- Improved Apriori implementation using transaction reduction
- CHARM
- Dynamic Itemset Counting
- Equivalence Class LAttice Traversal
- MAximal Frequent Itemset Analysis
- Pincer Search
- Pyspark programs: A collection of basic programs written with pyspark
The generate_itemsets.py can be used to generate many custom datasets for the given codes.