Rather than having one generic element and using a type attribute to describe whether it is a respelling, a rhyme or IPA, it makes more sense to have explicit , , and elements which also means we can leave and as text only elements and have contain elements for each syllable
for stressed syllable
for voiced syllable
for unstressed, unvoiced syllables