my corpus is too big to be put in one large file, my computer runs out of memory in doing that. Is it possible to run this code on multiple files? or run it using iterator?