The regex is too relaxed and can removed unwanted text. E.g. P<0.005 ... desired text ... x>= 5 would remove the desired text.
Spacing is also not preserved well enough. E.g., <ul><li>item 1</li><li>item 2</li></ul> is now cleaned to item 1item 2 instead of item 1 item 2, or with bleach to \nitem 1\nitem 2, which is more desired.
We would need to add an additional flag for removing stuff between certain tags, like <script> and <style>, as bleach simply removes the tags and not stuff in between. We'll make this list of tags configurable, like bleach does.
The regex is too relaxed and can removed unwanted text. E.g.
P<0.005 ... desired text ... x>= 5would remove the desired text.Spacing is also not preserved well enough. E.g.,
<ul><li>item 1</li><li>item 2</li></ul>is now cleaned toitem 1item 2instead ofitem 1 item 2, or with bleach to\nitem 1\nitem 2, which is more desired.We would need to add an additional flag for removing stuff between certain tags, like
<script>and<style>, as bleach simply removes the tags and not stuff in between. We'll make this list of tags configurable, like bleach does.