- Final Project (20%)
- Paper presentation (80%)
Frequent pattern mining is a powerful tool to mine the shopping behavior of customers. In this homework, students are asked to mine the frequent patterns from the given data.
The goal is
- Implement the Apriori algorithm
- Compare the scalability of the Apriori algorithm and understand the reasons
- Improve the performance of the original Apriori algorithm by designing some mechanisms
- Group Size: 2-3 people
- Grouping Deadline: May 6th, 2024
- Grouping Form Link
- Due Date:23:59:59 on June 10th, 2024. Submission on Tronclass including slides and the Apriori algorithm code. Late submission is not allowed
- Presentation Dates:June 11th to June 14th
- Presentation Groups:
- 13:00-16:00 on June 11th, Group1 to Group10 (ES705)
- 09:00-12:00 on June 12th, Group11 to Group20 (ES705)
- 16:00-19:00 on June 13th, Group21 to Group30 (EB109)
- 09:00-12:00 on June 14th, (ES705) for students who are unable to attend the presentation at the scheduled time due to conflicts, please complete your presentation on this alternate date
- Presentation Duration:No more than 20 minutes
- Presentation Format:
- Live demo of the program
- Prepared slide presentation covering data preprocessing, algorithm, quantitative and qualitative analysis, data output, and division of project roles and contributions
-
(10%) Implement the Apriori algorithm using packages with min_support=0.05 on Data.txt and collect the final frequent itemsets S
-
(5%) Implement the Apriori algorithm without packages with min_support=0.05 on Data.txt and collect the final frequent itemsets S
-
(5%) Implement the Apriori algorithm without packages with min_sup=0.0003, min_sup=0.0006, and min_sup=0.0009, respectively on Music.txt and collect the final frequent itemsets S
-
(Bonus) Implement the Apriori algorithm in R with min_support=0.05 on Data.txt and collect the final frequent itemsets S
-
(Bonus) Design some mechanisms to further improve the performance of the original Apriori algorithm, e.g. TID list, Bitmap, and FP-Growth. You should provide the improved program, and describe the implementation detail and the differences between your improved algorithm and the original Apriori algorithm as clearly as possible
-
(Bonus) Implement the Apriori algorithm on the real database, and analyze meaningful patterns of the shopping behavior of customers
-
Plagiarism from the internet when not using packages will result in zero for the project
- Deadline for for Paper Selection : May 6th, 2024
- Students must fill out the form with the paper's title, conference/journal name, and publication date.
- Submission Due Date:23:59:59 on June 10th, 2024 on Tronclass with presentation slides. Late submission is not allowed
- Note:
- Selection must be from the provided list of conferences and journals
- Ensure no duplication of paper topics in the form
- The paper must be from the last three years
- The selected papers must be closed related to design of algorithms in data science or data mining domain only, Machine Learning and Deep Learning techniques couldn't be present
- Do not copy and paste the content of the paper presentation, prepare your presentation appropriately
Should you have any inquiries or concerns, please contact the teaching assistants
林書帆 M11217028@yuntech.edu.tw
黃建智 M11217029@yuntech.edu.tw