- [ ] run existing scripts for data cleaning and removing redundancy - [ ] filter out large tandem repeats - [ ] cluster them - [ ] blast representatives vs all tandem repeats blast database (x3-10 monomers)