Skip to content

Releases: NRC-ILT/g2p

v2.3.1

26 Jan 21:53
c2b12e9

Choose a tag to compare

🐛 Bug Fixes

  • 3669b0f - remove assertion to allow for digraphs in is_word_character (commit by @roedoejet)
  • e4f0473 - tests: make test_tokenizer parallelizable by pre-empting race condition (commit by @joanise)
  • 8012629 - tests: make test_utils parallelizable and compatible with hatch/pytest (commit by @joanise)

✅ Tests

v2.3.0

27 Nov 23:02

Choose a tag to compare

✨ New Features

🐛 Bug Fixes

  • d7c4411 - deps: migrate panphon and editdistance to their ilt forks (PR #465 by @joanise)
  • b53372f - tests: moving import to function means patching from elsewhere now (commit by @roedoejet)
  • 16e64ad - do not let cache hits and and misses alter coverage reports (commit by @joanise)
  • 836f8df - remove redundant and incorrect er rule in deu mapping (commit by @joanise)
  • 06ed526 - build: use major.minor only for Heroku to get auto patch bumps (PR #460 by @joanise)
  • 93bb134 - ci: apply several recommended security fixes (PR #461 by @joanise)
  • f8a9bba - fix all mypy --check-untyped-defs errors outside tests/ (commit by @joanise)
  • 9197b1a - fix or ignore all mypy --check-untyped-defs errors in tests/ (commit by @joanise)
  • c75bca6 - first pass fix NFD normalization by Aidan (commit by @joanise)
  • 2aec48e - further fix NFD norm to handle diacritic reordering (commit by @joanise)
  • 6b151ae - ci: stabilize license checking for setuptools (commit by @joanise)
  • 250c8f1 - apply feedback from review (commit by @joanise)
  • 7a985e5 - ci: set permissions on docs.yml to publish to gh-pages (commit by @joanise)
  • 0ad7118 - only update and test schemas with pydantic<2.9 (commit by @joanise)
  • bdf608a - ci: give publish workflow required permissions (commit by @joanise)

⚡ Performance Improvements

♻️ Refactors

✅ Tests

  • bc8922e - yes, I still run test by calling ./test_neural.py (commit by @joanise)
  • 3edfb52 - a little more test coverage for missing neural dependencies (commit by @joanise)

🔧 Chores

Older changes

  • patches relating to tokenization by @joanise in #434
  • Add ces ipa - update by @kubicra in #432
  • Dev.ej/tokenize in studio by @joanise in #436
  • Rebase the deu branch and add the deu-ipa -> eng-ipa mapping by @joanise in #440
  • Connect ces to arpabet via eng-ipa by @joanise in #439
  • Dev.ej/moh final vowel length by @joanise in #435
  • Refresh our production environment for Heroku by @joanise in #444
  • sal-apa extension to better support Secwepemctsín by @joanise in #442
  • Remove the broken IPA->moh mapping, and have moh->IPA delete straggling loose colons by @joanise in #443

New Contributors

Full Changelog: v2.2.2...v2.3.0

v2.2.2

04 Jun 20:48
a654383

Choose a tag to compare

🐛 Bug Fixes

✅ Tests

v2.2.1

03 Apr 18:14
aca5701

Choose a tag to compare

✨ New Features

  • 9dbd466 - add Halkomelem APA/Hul’q’umi’num’ mapping (commit by @goodzack)
  • ed987b3 - chain equiv for kwk-umista; handle case better for all kwk mappings (commit by @joanise)
  • 403b0a3 - kwk broad IPA mapping for a more phonemic IPA representation (commit by @joanise)

🐛 Bug Fixes

  • f1dac49 - name ikt "Inuktut, Western" so it sorts next to Inuktitut (PR #410 by @joanise)
  • bc7e599 - app.py assumes a Transducer, just type ignore the mypy error (commit by @joanise)
  • 121cb92 - test: make test_studio more robust by waiting for meaningful effects (commit by @joanise)
  • 65305d4 - kwk digraphs in Umista use U+0315 not U+0313 (commit by @joanise)
  • 6b03886 - kwk-umista-equiv had a typo for y\u0313 -> 'y (commit by @joanise)
  • fea19a1 - regenerate kwk-ipa -> eng-ipa (commit by @joanise)
  • 5a5e570 - adjust kwk mapping in collaboration with Daisy R (commit by @joanise)
  • 8f1abe7 - kwk: regen kwk-ipa -> eng-ipa (commit by @joanise)
  • dac2976 - revert kwk umista->ipa to stricter IPA (commit by @joanise)
  • e06fe72 - call it phonemic IPA, not broad, by AP-PL-EJ consensus (commit by @joanise)
  • aca5701 - hur->hur-apa is from orthog to APA, not vice versa (PR #423 by @joanise)

♻️ Refactors

  • 0be84f5 - change 'hur_orthog' name to 'hur' (commit by @roedoejet)
  • 2b77e4e - test: name the 4 test case values instead of using indices (commit by @joanise)
  • 2ac44b6 - remove explicit indices from kwk when they are monotonic (commit by @joanise)
  • 9d4cdd4 - rename kwk*-con mappings kwk*-equiv (commit by @joanise)
  • a98ef00 - explicitly declare that kwk mappings rely on NFD norm_form (commit by @joanise)
  • eafdde0 - chain kwk-broad-ipa from kwk-ipa to make it DRY (commit by @joanise)

✅ Tests

  • 8b67292 - make sure the SCM_PRETEND_VERSION is up to date before publishing (commit by @joanise)
  • 085639a - refactor test_studio and make it more efficient (PR #420 by @deltork)
  • 2af6a73 - refactor test_studio to use expect.to_have_attribute for out_lang (commit by @joanise)
  • 57a959a - kwk: some test data for Umista text (commit by @joanise)
  • 4d33f9e - update tests for broad and strict kwk IPA (commit by @joanise)

🔧 Chores

v2.2.0

12 Nov 22:15
b3ee783

Choose a tag to compare

✨ New Features

  • 1262cbb - add --quiet option to tests/run.py and refactor the runners (commit by @joanise)
  • c419518 - add a lexicon-based tokenizer, esp. for English (commit by @joanise)

🐛 Bug Fixes

  • 419507e - indent only the first line in click indented paragraphs (commit by @joanise)

⚡ Performance Improvements

  • 24a28e0 - prevent quadratic time cost of degenerate inputs for lexicon-based tok (commit by @joanise)

♻️ Refactors

  • cf38989 - tests: quiet and reformat some test suites (commit by @joanise)
  • 5682125 - simplify merge_if_same_label to clearer merge_same_type_tokens (commit by @joanise)
  • d662622 - move merge_non_word_tokens and split_non_word_tokens to utils (commit by @joanise)
  • 163bc39 - import utils as a whole instead of each function (commit by @joanise)
  • c3d73bf - change tokens from a a custom dict to a Token class (PR #406 by @joanise)

✅ Tests

  • 2b8a803 - heroku: exercise the real Heroku server command in CI (commit by @joanise)
  • 0b2c83c - better unit testing for mappings.utils (commit by @joanise)

🔧 Chores

  • b2bd476 - migrate the pre-commit config to 4.x style (commit by @joanise)

v2.1.1

17 Sep 12:59
53c78f1

Choose a tag to compare

This is a primarily a performance improvement patch, reducing the memory footprint by about 45MB, and the initial load time, by:

  • using a more compact in-memory structure for the English lexicon, and
  • replacing the heavy-weight networkx library by a tiny custom class implementing only the algorithms used.

✨ New Features

🐛 Bug Fixes

  • 8929608 - add [tool.setuptools_scm] in pyproject.toml to please the build system (commit by @joanise)
  • 208a8e0 - deps: pydantic 2.9 changes our schemas, so block it (commit by @joanise)
  • 16668b2 - enable type-checking and fix things (commit by @dhdaines)
  • 6ab8545 - make sure self.rules is always the type we say it is (commit by @dhdaines)
  • 3eee1a6 - seeing match_pattern or intermediate_form is an error (commit by @dhdaines)
  • bbcd1e8 - avoid unnecessarily requiring a schema update (commit by @joanise)

⚡ Performance Improvements

  • e605ae5 - compact lexicon entries to take less RAM (commit by @joanise)
  • 96abff3 - replace networkx by network_lite throughout reduces memory footprint and load time (commit by @joanise)

♻️ Refactors

✅ Tests

  • d03aabb - carefully cover compact lexicon corner cases (commit by @joanise)

🔧 Chores

v2.1.0

23 Aug 13:53

Choose a tag to compare

💥 BREAKING CHANGES

  • due to 74e6172 - reimplement v1 API with FastAPI (commit by @dhdaines):

    /api/v1 error status code for validation errors is always 422, no longer 400 or 404

✨ Major New Features

✨ New Features

  • 36e4dcc - switch to hatch and dynamic versioning (commit by @dhdaines)
  • e0a0219 - build: autogenerate requirements.txt with hatch-pip-compile (commit by @dhdaines)
  • 1fe3385 - add a G2P_LOGLEVEL environment variable (commit by @dhdaines)
  • bd33314 - add redirections for backward compatibility (commit by @dhdaines)
  • 74c5c47 - new API supporting textual alignments (commit by @dhdaines)
  • 7909e6e - Add sal-apa generic mapping for APA-based Salish writing systems (commit by @joanise)
  • 077afc2 - add logic to auto-delete as_is support in g2p 3 (commit by @joanise)
  • d4bffad - g2p convert accepts - for stdin and linux /dev/ pipes (commit by @joanise)
  • f0cf073 - g2p convert now accepts --file option to read a file (commit by @joanise)
  • a938917 - bump the current major.minor version to 2.1 (commit by @joanise)

🐛 Bug Fixes

Read more

v2.0.0

19 Mar 21:07

Choose a tag to compare

💥 BREAKING CHANGES

  • Mapping configuration files have changed, and the programmatic API has changed.
    Please visit the migration guide for information on how to update 1.x mappings to g2p 2.x and other changes.

  • due to 1d8e4fb - switch to pydantic 2 (commit by @roedoejet):
    Requires python 3.7 (dropped support for Python 3.6).

✨ New Features

🐛 Bug Fixes

⚡ Performance Improvements

  • a5f51b7 - only create APP when it is really needed (commit by @joanise)
  • 0b8d773 - defer a whole bunch of expensive imports from the CLI (commit by @joanise)
  • 978153b - remove the app from the cli to make the CLI faster (commit by @joanise)

♻️ Refactors

Read more

Release v1.1.20230822

22 Aug 18:17

Choose a tag to compare

1.1.20230822 (2023-08-22)

Features

  • deps: make dependencies dependant on the Python version (6e68140)
  • clm (Klallam) mapping to g2p (882925a)
  • moh: update moh mappings (14e8bc6)

Bug Fixes

  • bisect_left does not accept key before Python 3.10 (cbb9fb2)
  • updating flask means updating socketio means updating socket.io.js (785f668)
  • deps: make sure engineio and socketio are all compatible (600b2ec)
  • have generate-mapping create files that pass pre-commit hooks (f6494a9)
  • the egg syntax is deprecated, use the at syntax instead (697abcb)
  • deps: lock dnspython to compatible 2.3.0 (e4eaa96)
  • ^ and $ are null-length so require separate sorting for creating fixed-width lookbehind (1ef573b)
  • error with missing apostrophe (8e55e44)
  • mapping: fix bug in haa mapping and add test suite lookbehind construction (a9e5e69)
  • moh: change name of language to Kanien'kéha (e3ab8c3)
  • studio: pin hands on table to 12.4 (b7df593)

Performance Improvements

  • build only in_seq or mappings as needed for alignments (4e6de3b)
  • store lexicon alignments as strings to save memory (6543214)
  • store lexicon k:v entries as joined strings, even less RAM (b984c42)

Tests

  • add unit test case mimicking #130 to confirm it works on Windows (b413089)
  • exercise the short -h option in unit testing (40db7fc)

Build Systems

  • bump gunicorn to latest version, just published (01234c7)
  • bump Heroku runtime to 3.10.12 as per Heroku warning (7f249d9)
  • force Heroku to bump python to 3.10.11, and docs (a0b9c03)

Continuous Integration

  • only run the full matrix test on release (f02f1ff)
  • reorganize CI test suites (c04c660)
  • run matrix tests on push to main too since that gets deployed (2622913)

Documentation

  • tell the user they need python 3.7 if they try to run studio with older (50852d8)
  • update phoneset (5eb14b1)

Code Refactoring

  • apply dhd feedback to remove dead code and unflatten the alignment (324e1a2)

Release v1.1.20230511

11 May 18:01

Choose a tag to compare

1.1.20230511 (2023-05-11)

⚠ BREAKING CHANGES

  • make_g2p(in, out) used to not tokenize, now it does, and its tok_lang argument is deprecated
  • g2p convert now tokenizes by default

Features

  • expose the tokenize option to api/v1/g2p (3f572c4)
  • g2p convert now tokenizes by default (4d67902)
  • make_g2p now tokenizes by default and has new signature (ecfe2ca)

Bug Fixes

  • adjust all calls to make_g2p to its new signature (bea7cec)
  • g2p needs to update both generated .pkl and .json files (2be51f8), closes #237
  • remove --path option to g2p convert, which does not work anyway (f99774f)
  • use the more canonical DeprecationWarning to flag deprecation (e8a8a4d)
  • mappings: output should not be escaped (5bd3250)

Documentation

  • add tokenize arg for api/v1/g2p to swagger.json (d2f226f)

Continuous Integration

  • make test_studio.py fast enough to run on each push (5fa2a01)
  • remove unused coveralls, make our omit compat with coverage 7.x (3f9d2df)

Tests

  • execise api/v1/g2p with and without tokenize (c64322f)
  • improve coverage of error situations in CLI (0b3f5ee)

Code Refactoring

  • make Tokenizer the base class name, and declare to return types (7c8e8f1)
  • move deprecation and version checking code to their own file (e61daa4)
  • remove dead code in app.py, increase test cov and speed up tests (07e87d6)