Skip to content

fix: bugs in sample pages new script#42

Merged
ninpnin merged 6 commits intodevfrom
sample-pages-fix
Mar 13, 2026
Merged

fix: bugs in sample pages new script#42
ninpnin merged 6 commits intodevfrom
sample-pages-fix

Conversation

@ninpnin
Copy link
Copy Markdown
Contributor

@ninpnin ninpnin commented Feb 27, 2026

Also switched to tqdm and added the ability to concatenate all samples into one CSV.

@ninpnin ninpnin requested a review from mandlilaast February 27, 2026 10:53

parser = etree.XMLParser(remove_blank_text=True)
rows = []
for _, row in df.iterrows():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could merge this and the next for loop together. Otherwise it seems that we are a bit duplicating our already done work (seems to be same lines)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it looks messy but it works and is also miniscule in terms of resource use compared to get_page_counts. Also I would need to look into how this script works in more detail to refactor it, which takes a lot of time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, okay, thanks! :)

Copy link
Copy Markdown
Contributor

@mandlilaast mandlilaast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments here and there, proposed them mainly as suggestions.

But thank you for the code, and if you agree with my questions, please feel free to change :)

@ninpnin ninpnin requested a review from mandlilaast March 11, 2026 14:13
Copy link
Copy Markdown
Contributor

@mandlilaast mandlilaast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, looks good! :)
Green light from me!

@ninpnin ninpnin merged commit fbf3737 into dev Mar 13, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants