Releases: NRC-ILT/g2p
Releases · NRC-ILT/g2p
v2.3.1
🐛 Bug Fixes
3669b0f- remove assertion to allow for digraphs in is_word_character (commit by @roedoejet)e4f0473- tests: make test_tokenizer parallelizable by pre-empting race condition (commit by @joanise)8012629- tests: make test_utils parallelizable and compatible with hatch/pytest (commit by @joanise)
✅ Tests
v2.3.0
✨ New Features
1a99fe1- yid: initial commit with Yiddish rules (commit by @roedoejet)d4d3264- yid: completed the Yiddish mapping in collaboration with Avery (commit by @joanise)0562200- yid: generated yid-ipa -> eng-ipa mapping (commit by @joanise)9c711d3- add support for neural models (commit by @roedoejet)82ce14b- update schema (commit by @roedoejet)98cd5f3- cli: add neural flag to command line conversion (commit by @roedoejet)b8a742b- add neural english mapping (commit by @roedoejet)
🐛 Bug Fixes
d7c4411- deps: migrate panphon and editdistance to their ilt forks (PR #465 by @joanise)b53372f- tests: moving import to function means patching from elsewhere now (commit by @roedoejet)16e64ad- do not let cache hits and and misses alter coverage reports (commit by @joanise)836f8df- remove redundant and incorrect er rule in deu mapping (commit by @joanise)06ed526- build: use major.minor only for Heroku to get auto patch bumps (PR #460 by @joanise)93bb134- ci: apply several recommended security fixes (PR #461 by @joanise)f8a9bba- fix all mypy --check-untyped-defs errors outside tests/ (commit by @joanise)9197b1a- fix or ignore all mypy --check-untyped-defs errors in tests/ (commit by @joanise)c75bca6- first pass fix NFD normalization by Aidan (commit by @joanise)2aec48e- further fix NFD norm to handle diacritic reordering (commit by @joanise)6b151ae- ci: stabilize license checking for setuptools (commit by @joanise)250c8f1- apply feedback from review (commit by @joanise)7a985e5- ci: set permissions on docs.yml to publish to gh-pages (commit by @joanise)0ad7118- only update and test schemas with pydantic<2.9 (commit by @joanise)bdf608a- ci: give publish workflow required permissions (commit by @joanise)
⚡ Performance Improvements
ea055b1- move heavy import into make_g2p body (commit by @roedoejet)1bb0071- ci: use uv to install Python packages faster (commit by @joanise)3733a47- ci: cache playwright binaries (commit by @joanise)
♻️ Refactors
eea75de- round symbolSizes to 2 digits, removing meaningless precision (commit by @joanise)617139f- test: move neural tests to their own suite (commit by @roedoejet)71733f5- change neural exception (commit by @roedoejet)e3f4d1a- move neural deps out of dev (commit by @roedoejet)59cacb2- change from dp to ilt-deep-phonemizer (commit by @roedoejet)3549213- tests: set neural tests up similarly to other lang tests (commit by @roedoejet)ded17a9- tests: remove now irrelevant py<3.8 guard in test_studio.py (commit by @joanise)
✅ Tests
bc8922e- yes, I still run test by calling ./test_neural.py (commit by @joanise)3edfb52- a little more test coverage for missing neural dependencies (commit by @joanise)
🔧 Chores
b72c8b8- bump g2p version in .SETUPTOOLS_SCM_PRETEND_VERSION (commit by @joanise)9045b24- g2p udpate (commit by @joanise)e17b0fd- g2p update to add yid mappings (commit by @joanise)e7c9c87- remove 3.7 support (commit by @roedoejet)cd8d879- bump torch (commit by @roedoejet)d1171c2- bump ilt-deep-phonemizer (commit by @roedoejet)d9338f8- minor improvements from review (commit by @roedoejet)810960c- pre-commit run --all (commit by @joanise)ac59f25- g2p update (commit by @joanise)
Older changes
- patches relating to tokenization by @joanise in #434
- Add ces ipa - update by @kubicra in #432
- Dev.ej/tokenize in studio by @joanise in #436
- Rebase the deu branch and add the deu-ipa -> eng-ipa mapping by @joanise in #440
- Connect ces to arpabet via eng-ipa by @joanise in #439
- Dev.ej/moh final vowel length by @joanise in #435
- Refresh our production environment for Heroku by @joanise in #444
- sal-apa extension to better support Secwepemctsín by @joanise in #442
- Remove the broken IPA->moh mapping, and have moh->IPA delete straggling loose colons by @joanise in #443
New Contributors
Full Changelog: v2.2.2...v2.3.0
v2.2.2
v2.2.1
✨ New Features
9dbd466- add Halkomelem APA/Hul’q’umi’num’ mapping (commit by @goodzack)ed987b3- chain equiv for kwk-umista; handle case better for all kwk mappings (commit by @joanise)403b0a3- kwk broad IPA mapping for a more phonemic IPA representation (commit by @joanise)
🐛 Bug Fixes
f1dac49- name ikt "Inuktut, Western" so it sorts next to Inuktitut (PR #410 by @joanise)bc7e599- app.py assumes a Transducer, just type ignore the mypy error (commit by @joanise)121cb92- test: make test_studio more robust by waiting for meaningful effects (commit by @joanise)65305d4- kwk digraphs in Umista use U+0315 not U+0313 (commit by @joanise)6b03886- kwk-umista-equiv had a typo for y\u0313 -> 'y (commit by @joanise)fea19a1- regenerate kwk-ipa -> eng-ipa (commit by @joanise)5a5e570- adjust kwk mapping in collaboration with Daisy R (commit by @joanise)8f1abe7- kwk: regen kwk-ipa -> eng-ipa (commit by @joanise)dac2976- revert kwk umista->ipa to stricter IPA (commit by @joanise)e06fe72- call it phonemic IPA, not broad, by AP-PL-EJ consensus (commit by @joanise)aca5701- hur->hur-apa is from orthog to APA, not vice versa (PR #423 by @joanise)
♻️ Refactors
0be84f5- change 'hur_orthog' name to 'hur' (commit by @roedoejet)2b77e4e- test: name the 4 test case values instead of using indices (commit by @joanise)2ac44b6- remove explicit indices from kwk when they are monotonic (commit by @joanise)9d4cdd4- rename kwk*-con mappings kwk*-equiv (commit by @joanise)a98ef00- explicitly declare that kwk mappings rely on NFD norm_form (commit by @joanise)eafdde0- chain kwk-broad-ipa from kwk-ipa to make it DRY (commit by @joanise)
✅ Tests
8b67292- make sure the SCM_PRETEND_VERSION is up to date before publishing (commit by @joanise)085639a- refactor test_studio and make it more efficient (PR #420 by @deltork)2af6a73- refactor test_studio to use expect.to_have_attribute for out_lang (commit by @joanise)57a959a- kwk: some test data for Umista text (commit by @joanise)4d33f9e- update tests for broad and strict kwk IPA (commit by @joanise)
🔧 Chores
5f0adf9- bump version in .SETUPTOOLS_SCM_PRETEND_VERSION (commit by @joanise)09a0590- remove obsolete network.pkl file (commit by @joanise)8e38a52- update pre commit configuration (commit by @joanise)09736d1- g2p update (commit by @joanise)a476bfa- g2p update to roll in latest kwk changes (commit by @joanise)
v2.2.0
✨ New Features
1262cbb- add --quiet option to tests/run.py and refactor the runners (commit by @joanise)c419518- add a lexicon-based tokenizer, esp. for English (commit by @joanise)
🐛 Bug Fixes
⚡ Performance Improvements
24a28e0- prevent quadratic time cost of degenerate inputs for lexicon-based tok (commit by @joanise)
♻️ Refactors
cf38989- tests: quiet and reformat some test suites (commit by @joanise)5682125- simplify merge_if_same_label to clearer merge_same_type_tokens (commit by @joanise)d662622- move merge_non_word_tokens and split_non_word_tokens to utils (commit by @joanise)163bc39- import utils as a whole instead of each function (commit by @joanise)c3d73bf- change tokens from a a custom dict to a Token class (PR #406 by @joanise)
✅ Tests
2b8a803- heroku: exercise the real Heroku server command in CI (commit by @joanise)0b2c83c- better unit testing for mappings.utils (commit by @joanise)
🔧 Chores
v2.1.1
This is a primarily a performance improvement patch, reducing the memory footprint by about 45MB, and the initial load time, by:
- using a more compact in-memory structure for the English lexicon, and
- replacing the heavy-weight networkx library by a tiny custom class implementing only the algorithms used.
✨ New Features
966a057- allow panphon 0.21 where possible (commit by @joanise)aa9de1c- g2p show-mappings to display language names too (commit by @joanise)c70f30f- network_lite with minimal DiGraph class (commit by @joanise)123e27b- add full type signatures to DiGraph (commit by @dhdaines)6eb29ac- revamp schema versioning and update-schema (commit by @joanise)
🐛 Bug Fixes
8929608- add [tool.setuptools_scm] in pyproject.toml to please the build system (commit by @joanise)208a8e0- deps: pydantic 2.9 changes our schemas, so block it (commit by @joanise)16668b2- enable type-checking and fix things (commit by @dhdaines)6ab8545- make sure self.rules is always the type we say it is (commit by @dhdaines)3eee1a6- seeing match_pattern or intermediate_form is an error (commit by @dhdaines)bbcd1e8- avoid unnecessarily requiring a schema update (commit by @joanise)
⚡ Performance Improvements
e605ae5- compact lexicon entries to take less RAM (commit by @joanise)96abff3- replace networkx by network_lite throughout reduces memory footprint and load time (commit by @joanise)
♻️ Refactors
d1b3437- simplify shortest_path code (commit by @dhdaines)e2def43- only declare the SCM pretend version in one place (commit by @joanise)
✅ Tests
🔧 Chores
v2.1.0
💥 BREAKING CHANGES
-
due to
74e6172- reimplement v1 API with FastAPI (commit by @dhdaines):/api/v1 error status code for validation errors is always 422, no longer 400 or 404
✨ Major New Features
74e6172- reimplement v1 API with FastAPI (commit by @dhdaines)605ccd3- reimplement Studio app with FastAPI (commit by @dhdaines)c214c6f- add /api/v2 to studio but also make it standaloneable (commit by @dhdaines)
✨ New Features
36e4dcc- switch to hatch and dynamic versioning (commit by @dhdaines)e0a0219- build: autogenerate requirements.txt with hatch-pip-compile (commit by @dhdaines)1fe3385- add a G2P_LOGLEVEL environment variable (commit by @dhdaines)bd33314- add redirections for backward compatibility (commit by @dhdaines)74c5c47- new API supporting textual alignments (commit by @dhdaines)7909e6e- Add sal-apa generic mapping for APA-based Salish writing systems (commit by @joanise)077afc2- add logic to auto-delete as_is support in g2p 3 (commit by @joanise)d4bffad- g2p convert accepts - for stdin and linux /dev/ pipes (commit by @joanise)f0cf073- g2p convert now accepts --file option to read a file (commit by @joanise)a938917- bump the current major.minor version to 2.1 (commit by @joanise)
🐛 Bug Fixes
1cc2afe- ci: eventlet 0.36.0 considered harmful (commit by @dhdaines)d6004f9- style: bump black to 24.3.0 to fix black's first CVE (commit by @joanise)05f51f9- do not try to send whole lexicon over the wire (commit by @dhdaines)49ad2ff- port 5000 is used by MacOS on external interfaces (commit by @dhdaines)629209b- test: use 127.0.0.1 explicitly to avoid ipv6 confusion (commit by @dhdaines)d105e5f- allow other mapping arguments, use on-disk alignments (commit by @dhdaines)b29b23f- ci: eventlet 0.36.0 considered harmful (commit by @dhdaines)baef8fd- ci: remove bogus sleep (commit by @dhdaines)52b3bfd- needed apply-longest-first for atj (since the beginning (commit by @dhdaines)d9a07e5- do not copy the input mapping filename when generating (commit by @dhdaines)ea04262- do not try to generate mappings for empty outputs (commit by @dhdaines)f50768e- g2p convert should not add newline when input is a file (commit by @joanise)561817c- deps: specific anti-dependency on broken coloredlogs version (commit by @dhdaines)9f92f65- deps: use optional dependencies correctly (for docs too) (commit by @dhdaines)c8cba5f- test: no longer require flask needlessly for some tests (commit by @dhdaines)1a602ca- build: various build fixes (commit by @dhdaines)9543c96- deps: old versions of eventlet are also broken (commit by @dhdaines)4e6c3ab- docs: add install link for hatch (commit by @dhdaines)656f07a- ci: ensure version matches schema (commit by @dhdaines)4e23d76- docs: mention conda (commit by @dhdaines)38d5290- build: add a hook to make sure we have g2p/_version.py on heroku (commit by @dhdaines)1bba827- update API for newer FastAPI (commit by @dhdaines)5922f6f- get Studio working with FastAPI (commit by @dhdaines)98a07f1- restore compatible 404 response and enable api tests (commit by @dhdaines)89bd9b3- deps: fix deps for api (commit by @dhdaines)cfc50c6- update prod environment and workflow (commit by @dhdaines)ebc16ff- now need python 3.8 on windows (commit by @dhdaines)0a7c78b- not sure why we need to disable sendfile (commit by @dhdaines)4bcd948- remove fastapi-socketio (commit by @dhdaines)d5d2086- make the g2p library tests still run on Python 3.7 (commit by @joanise)f55e6bb- ci: make coverage work again (commit by @dhdaines)9bc3855- test: fix coverage (commit by @dhdaines)ff6c92d- more specific dependency to avoid gnashing of teeth (commit by @dhdaines)2d68577- deps: correct the gunicorn dependency... again (commit by @dhdaines)5e3c0f1- split /langs and /nodes as they are not the same thing (commit by @dhdaines)a88df6a- build: depend on gitlint-core, not gitlint (commit by @joanise)f126e1d- studio: studio is same-origin so no CORS, also add debug option (commit by @dhdaines)9f88fbf- studio: make deleting entire input work right (commit by @dhdaines)2a18cdd- ci: enable G2P_STUDIO_DEBUG to satisfy coverage (commit by @dhdaines)30b572a- normalize ó in mohawk (commit by @MENGZHEGENG)e6a1280- app: do not rely on running at the g2p root dir (commit by @joanise)627ca2e- tests: silence the logs in test_api_resources tests (commit by @joanise)54fc772- deps: pin panphon to 0.19-0.20 as 0.21 breaks many things (commit by @dhdaines)3323eb4- ci: remove stale job dependency in pythonpublish w...
v2.0.0
💥 BREAKING CHANGES
-
Mapping configuration files have changed, and the programmatic API has changed.
Please visit the migration guide for information on how to update 1.x mappings to g2p 2.x and other changes. -
due to
1d8e4fb- switch to pydantic 2 (commit by @roedoejet):
Requires python 3.7 (dropped support for Python 3.6).
✨ New Features
fd33a26- cli: add update-schema command (commit by @roedoejet)f85c4f2- use json for network as well (commit by @dhdaines)b01ec23- upgrade networkx now that we can (commit by @dhdaines)9fe200d- schema: update schema generation to include dialect spec by default (commit by @roedoejet)a04aeff- add case preservation option to mappings (commit by @roedoejet)c31c66b- g2p-studio also needs to support preserve_case (commit by @joanise)7447fe6- make x caron equiv to x dot below in clm (commit by @joanise)d4fdc8c- str: accept space+comb-cedilla or space+comb-comma as equiv to cedilla (commit by @joanise)
🐛 Bug Fixes
20e3bcb- pkl: remove generated default date (commit by @roedoejet)22644e7- studio: refactor to 'rules' instead of 'mapping' key (commit by @roedoejet)30dc282- ci: require 3.8 for windows ci (commit by @roedoejet)1df2dfd- add miscellaneous style fixes and typos (commit by @roedoejet)5ccd595- update: prevent loading all the mappings multiple times (commit by @roedoejet)45d5ecf- tests: fix studio tests (commit by @roedoejet)16e4869- restore Python 3.7 compatibility (commit by @joanise)060a8aa- use more generic variable names (commit by @dhdaines)ac2d42d- deps: back off networkx dep for python 3.7 (commit by @dhdaines)fa27730- crg: fix various rule feeding and ordering bugs for Michif (commit by @joanise)007aef5- crg: manually clean up crg-ipa -> eng-ipa (commit by @joanise)0e9271a- test: fix failure in test failure (commit by @dhdaines)15d5b64- test file could have arbitrary extra fields (commit by @dhdaines)25f4713- output a compatible config-g2p.yaml though some filenames change (commit by @dhdaines)32fe87c- add config_only option to export_to_dict (commit by @dhdaines)b5f9747- um, yes, model_dump() exists (commit by @dhdaines)2e5e560- do not exclude defaults, just inappropriate keys for config (commit by @dhdaines)9975100- add missing double vowel vowels to crg (commit by @dhdaines)83b6c1c- cursèd unicode g strikes again (commit by @dhdaines)f766a66- remove werkzeug lock since it is no longer necessary (commit by @joanise)1c7792f- correct the unit testing output for g2p mapping errors (commit by @joanise)996a060- remove unused kwargs in transducer call (commit by @roedoejet)d1aa6dd- sort rules without explicit indices (commit by @roedoejet)d768d74- detect incompatible case_sensitive+preserve_case instances (commit by @joanise)35868bb- preserve indices through prevent-feeding intermediate form (commit by @joanise)01ff75e- fix coverage issues and grepping for slow imports (commit by @joanise)251739a- deps: lock numpy<2 because 2.0.0 is coming and has breaking changes (commit by @joanise)27d0d2d- rename crj and crl "East Cree, Nor/Southern" so they sort nicely (commit by @joanise)17519d8- y in oka should go to /j/, palatal glide, not /y/ (commit by @joanise)b52a819- issue a fatal error when reading an empty mapping (commit by @joanise)95bf4be- app: errors in mappings should just trigger console warnings (commit by @joanise)5993242- str: cedilla is now the default glottal stop character (commit by @joanise)d18d17a- publish schemas only for major.minor, ignoring .patch (commit by @joanise)f2a7563- assertEquals is removed from Python 3.12 (commit by @joanise)5592659- close xlsx workbook after reading (commit by @joanise)7f34057- loading xlsx workbooks should not fail on empty cells (commit by @joanise)
⚡ Performance Improvements
a5f51b7- only create APP when it is really needed (commit by @joanise)0b8d773- defer a whole bunch of expensive imports from the CLI (commit by @joanise)978153b- remove the app from the cli to make the CLI faster (commit by @joanise)
♻️ Refactors
eec8e82- massive refactor to pydantic (commit by @roedoejet)1d8e4fb- switch to pydantic 2 (commit by @roedoejet)a753e07- config: require a 'mappings' key (commit by @roedoejet)006d370- in_char and out_char to rule_input and rule_output (commit by @roedoejet)b448523- change to config-g2p.yaml (commit by @roedoejet)5a67040- change langs.pkl to langs.json (commit by @roedoejet)5b259ff- separate data and path for rules, abbreviations, and alignments (commit by @roedoejet)ddefe77- make mapping.rules the only way to get to the rules (commit by @joanise)- [
090145e](090145eff53470e8a23...
Release v1.1.20230822
1.1.20230822 (2023-08-22)
Features
- deps: make dependencies dependant on the Python version (6e68140)
- clm (Klallam) mapping to g2p (882925a)
- moh: update moh mappings (14e8bc6)
Bug Fixes
- bisect_left does not accept key before Python 3.10 (cbb9fb2)
- updating flask means updating socketio means updating socket.io.js (785f668)
- deps: make sure engineio and socketio are all compatible (600b2ec)
- have generate-mapping create files that pass pre-commit hooks (f6494a9)
- the egg syntax is deprecated, use the at syntax instead (697abcb)
- deps: lock dnspython to compatible 2.3.0 (e4eaa96)
- ^ and $ are null-length so require separate sorting for creating fixed-width lookbehind (1ef573b)
- error with missing apostrophe (8e55e44)
- mapping: fix bug in haa mapping and add test suite lookbehind construction (a9e5e69)
- moh: change name of language to Kanien'kéha (e3ab8c3)
- studio: pin hands on table to 12.4 (b7df593)
Performance Improvements
- build only in_seq or mappings as needed for alignments (4e6de3b)
- store lexicon alignments as strings to save memory (6543214)
- store lexicon k:v entries as joined strings, even less RAM (b984c42)
Tests
- add unit test case mimicking #130 to confirm it works on Windows (b413089)
- exercise the short -h option in unit testing (40db7fc)
Build Systems
- bump gunicorn to latest version, just published (01234c7)
- bump Heroku runtime to 3.10.12 as per Heroku warning (7f249d9)
- force Heroku to bump python to 3.10.11, and docs (a0b9c03)
Continuous Integration
- only run the full matrix test on release (f02f1ff)
- reorganize CI test suites (c04c660)
- run matrix tests on push to main too since that gets deployed (2622913)
Documentation
- tell the user they need python 3.7 if they try to run studio with older (50852d8)
- update phoneset (5eb14b1)
Code Refactoring
- apply dhd feedback to remove dead code and unflatten the alignment (324e1a2)
Release v1.1.20230511
1.1.20230511 (2023-05-11)
⚠ BREAKING CHANGES
- make_g2p(in, out) used to not tokenize, now it does, and its tok_lang argument is deprecated
- g2p convert now tokenizes by default
Features
- expose the tokenize option to api/v1/g2p (3f572c4)
- g2p convert now tokenizes by default (4d67902)
- make_g2p now tokenizes by default and has new signature (ecfe2ca)
Bug Fixes
- adjust all calls to make_g2p to its new signature (bea7cec)
- g2p needs to update both generated .pkl and .json files (2be51f8), closes #237
- remove --path option to g2p convert, which does not work anyway (f99774f)
- use the more canonical DeprecationWarning to flag deprecation (e8a8a4d)
- mappings: output should not be escaped (5bd3250)
Documentation
- add tokenize arg for api/v1/g2p to swagger.json (d2f226f)
Continuous Integration
- make test_studio.py fast enough to run on each push (5fa2a01)
- remove unused coveralls, make our omit compat with coverage 7.x (3f9d2df)
Tests
- execise api/v1/g2p with and without tokenize (c64322f)
- improve coverage of error situations in CLI (0b3f5ee)