All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Branding information typo within setup.py
- Spurious command in Makefile recipe
- Added
python_requiresclause to setup.py to prevent installation on unsupported platforms - Include information in README about
setuptoolsversion needed to properly package withinpython_requiresinformation - Conda packaging support along with information in README about new installation method
pyconllversion is now housed in .version file so that this version only needs to be changed in one place before release.
- Use slots on Token and Sentence class for more efficient memory usage with large amounts of objects
- Remove source fields on Token and Sentence. These were not an explicit part of the public API so this is not considered a breaking change.
- Solved
math.infissue with python 3.4 where it does not exist
- The example
reannotate\_ngrams.pywas out of sync with the function return type
- `find_nonprojective_deps`` was added to look for non-projective dependencies within a sentence
find_ngramsin theutilmodule did not properly match case insensitivity.conllableis now properly included in wildcard imports frompyconll.- Issue when loading a CoNLL file over a network if the file contained UTF-8 characters. requests default assumes ASCII enconding on HTTP responses.
- The Token columns deps and feats were not properly sorted by attribute (either numeric index or case invariant lexicographic sort) on serialization
- Clearer and more consise documentation
find_ngramsnow returns the matched tokens as the last element of the yielded tuple.
- Document and paragraph ids on Sentences
- Line numbers on Tokens and Sentences
- Equality comparison on Tokens and Sentences. These types are mutable and implementing equality (with no hash overriding) causes issues for API clients.
SentenceTreemodule. This functionaliy was moved to the Sentence class methodto_tree.
to_treemethod onSentencethat returns the Tree representing the Sentence dependency structure
- Updates to
requirements.txtto patch Jinja2 and requests
- Parsing of underscore's for the form and lemma field, would automatically default to None, rather than the intended behavior.
- When used on Windows, the default encoding of Windows-1252 was used when loading CoNLL-U files, however, CoNLL-U is UTF-8. This is now fixed.
- Getting Started page on the documentation to make easier for newcomers
- Versioning on docs page which had not been properly updated
- Some documentation errors
requestsversion used inrequirements.txtwas insecure and updated to newer version
- The
pyconll.treemodule was not properly included before insetup.py
pylintto build processConllableabstract base class to mark CoNLL serializable components- Tree data type construction of a sentence
- Linting patches suggested by
pylint. - Removed
_end_line_numberfromSentenceconstructor. This is an internal patch, as this parameter was not meant to be used by callers. - New, improved, and clearer documentation
- Update of
requestsdependency due to security flaw
- Removed test packages from final shipped package.
- There is now a FormatError to help make debugging easier if the internal data of a Token is put into an invalid state. This error will be seen on running
Token#conll. - Certain token fields with empty values, were not output when calling
Token#conlland were instead ignored. This situation now causes a FormatError. - Stricter parsing and validation of general CoNLL guidelines.
DEPSparsing was broken before and assumed that there was less information than is actually possible in the UD format. This means that nowdepsis a tuple with cardinality 4.
- Fixed issue with submodules not being packaged in build
- Ability to easily load CoNLL files from a network path (url)
- Some parsing validation. Before the error was not caught up front so the error could unexpectedly later show up.
- Sentence slicing had an issue before if either the start or end was omittted.
- More documentation and examples.
- Conll is now a
MutableSequence, so it handles methods beyond its implementation as well as defined by python.
- Some small bug fixes with parsing the token dicts.
- Issues with documentation since docstrings were not in RST. Fixed by using napoleon sphinx extension
- A little more docs
- More README info
- Better examples
- Installation issues again with wheel when using
pip.
- Installation issues when using
pip
- More documentation
- Util package for convenient and common logic
- Documentation which can be found here.
- Small documentation changes on methods.
- Everything. This is the first release of this package. The most notable absence is documentation which will be coming in a near-future release.