From f8f688bbd287135f0a96ab759145f8a128a0b9f2 Mon Sep 17 00:00:00 2001 From: Davy Chen Date: Mon, 16 Feb 2026 11:38:45 +0800 Subject: [PATCH 1/3] # Add More Information 1. Explain this dictionary --- README | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/README b/README index ea31103..a22d0e8 100644 --- a/README +++ b/README @@ -30,3 +30,54 @@ and corrections to us (air+cmudict@cs.cmu.edu) for consideration in a subsequent version. All submissions will be reviewed and approved by the current maintainer, Alex Rudnicky at Carnegie Mellon. + +About ARPABET +------------- + +ARPABET (also known as ARPAbet) is a phonetic transcription code +developed by Advanced Research Projects Agency (ARPA) as a part of +their Speech Understanding Research project in the 1970s. The name +ARPABET is directly derived from ARPANET, the precursor to the modern +Internet, as both were projects funded by ARPA (now DARPA). + +The CMU Pronouncing Dictionary uses ARPABET as its phonetic notation +system because: + +1. ARPABET was specifically designed for American English speech + recognition and synthesis applications. + +2. It uses a simple ASCII-based encoding system that is both + machine-readable and human-readable, making it ideal for + computational linguistics applications. + +3. It provides a standardized way to represent the approximately 39 + phonemes of American English using 1-2 letter codes, with stress + markers (0, 1, 2) for vowels. + +4. Its historical connection to speech research makes it well-suited + for speech technology applications, which is the primary purpose + of this dictionary. + + +Speech Synthesis Applications +------------------------------ + +The CMUdict ARPABET encoding is widely used in various speech +synthesis systems: + +Azure Speech Services: Microsoft Azure's English speech synthesis +services support ARPABET phonetic notation (alongside SAPI phonetics) +for precise pronunciation control in Speech Synthesis Markup Language +(SSML). This allows developers to specify exact pronunciations using +the same phonetic system as CMUdict. + +For more information on phonetic sets in Azure Speech Services, see: +https://docs.azure.cn/en-us/ai-services/speech-service/speech-ssml-phonetic-sets + + +Additional Resources +-------------------- + +NPM Package: A JavaScript/Node.js implementation of the CMU +Pronouncing Dictionary is available: +https://www.npmjs.com/package/cmu-pronouncing-dictionary From 9e8d850d47e8c26ed6f3f15683af72c34d0ee175 Mon Sep 17 00:00:00 2001 From: Davy Chen Date: Mon, 16 Feb 2026 11:45:36 +0800 Subject: [PATCH 2/3] # Add More Information 1. CMU homepage 2. sapi to ipa mapping --- README | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 60 insertions(+), 9 deletions(-) diff --git a/README b/README index a22d0e8..3dc629b 100644 --- a/README +++ b/README @@ -1,12 +1,12 @@ - -CMUdict -------- +# CMUdict CMUdict (the Carnegie Mellon Pronouncing Dictionary) is a free pronouncing dictionary of English, suitable for uses in speech technology and is maintained by the Speech Group in the School of Computer Science at Carnegie Mellon University. +**Official Homepage:** http://www.speech.cs.cmu.edu/cgi-bin/cmudict + The Carnegie Mellon Speech Group does not guarantee the accuracy of this dictionary, nor its suitability for any specific purpose. In fact, we expect a number of errors, omissions and inconsistencies to @@ -31,8 +31,7 @@ subsequent version. All submissions will be reviewed and approved by the current maintainer, Alex Rudnicky at Carnegie Mellon. -About ARPABET -------------- +## About ARPABET ARPABET (also known as ARPAbet) is a phonetic transcription code developed by Advanced Research Projects Agency (ARPA) as a part of @@ -59,8 +58,7 @@ system because: of this dictionary. -Speech Synthesis Applications ------------------------------- +## Speech Synthesis Applications The CMUdict ARPABET encoding is widely used in various speech synthesis systems: @@ -75,8 +73,61 @@ For more information on phonetic sets in Azure Speech Services, see: https://docs.azure.cn/en-us/ai-services/speech-service/speech-ssml-phonetic-sets -Additional Resources --------------------- +## Phoneme Mapping to IPA + +### Microsoft SAPI Phoneme to IPA Symbol Mapping (American English) + +For applications that need to convert between ARPABET/SAPI phonemes (used in CMUdict) +and IPA (International Phonetic Alphabet) symbols, here is a mapping example: + +```python +SAPI_TO_IPA = { + # Vowels + "AA": "ɑ", + "AE": "æ", + "AH": "ʌ", + "AO": "ɔ", + "AW": "aʊ", + "AY": "aɪ", + "EH": "ɛ", + "ER": "ɝ", + "EY": "eɪ", + "IH": "ɪ", + "IY": "i", + "OW": "oʊ", + "OY": "ɔɪ", + "UH": "ʊ", + "UW": "u", + # Consonants + "B": "b", + "CH": "tʃ", + "D": "d", + "DH": "ð", + "F": "f", + "G": "ɡ", + "HH": "h", + "JH": "dʒ", + "K": "k", + "L": "l", + "M": "m", + "N": "n", + "NG": "ŋ", + "P": "p", + "R": "ɹ", + "S": "s", + "SH": "ʃ", + "T": "t", + "TH": "θ", + "V": "v", + "W": "w", + "Y": "j", + "Z": "z", + "ZH": "ʒ", +} +``` + + +## Additional Resources NPM Package: A JavaScript/Node.js implementation of the CMU Pronouncing Dictionary is available: From cd3eb13bd4e013ba9f170db12a5e7ef792decd23 Mon Sep 17 00:00:00 2001 From: Davy Chen Date: Mon, 16 Feb 2026 11:47:40 +0800 Subject: [PATCH 3/3] # Rename Readme To MD File 1. As title --- README => README.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename README => README.md (100%) diff --git a/README b/README.md similarity index 100% rename from README rename to README.md