Conversation
|
|
||
| parser = etree.XMLParser(remove_blank_text=True) | ||
| rows = [] | ||
| for _, row in df.iterrows(): |
There was a problem hiding this comment.
Maybe we could merge this and the next for loop together. Otherwise it seems that we are a bit duplicating our already done work (seems to be same lines)?
There was a problem hiding this comment.
Yeah it looks messy but it works and is also miniscule in terms of resource use compared to get_page_counts. Also I would need to look into how this script works in more detail to refactor it, which takes a lot of time.
There was a problem hiding this comment.
Haha, okay, thanks! :)
mandlilaast
left a comment
There was a problem hiding this comment.
Small comments here and there, proposed them mainly as suggestions.
But thank you for the code, and if you agree with my questions, please feel free to change :)
mandlilaast
left a comment
There was a problem hiding this comment.
Yup, looks good! :)
Green light from me!
Also switched to tqdm and added the ability to concatenate all samples into one CSV.