We receive non-UTF-8 characters in raw data files. These non-UTF-8 characters are not understood by R and must be manually fixed before contacting HIP registrants. For example, to manually edit the city value "CA\xd1ON CITY" in a file from Colorado, we would replace \xd1 with N to get the human readable value of "CANON CITY". This is done by opening the raw file, making the change, saving the file, and re-running read_hip().
Sometimes, it is not obvious as to what non-UTF-8 characters should be changed to. First names, last names, and street names are particularly variable. To ensure we make the correct change, each escape sequence of a hexadecimal byte value must be checked and replaced manually. This is time consuming.
Create a function that can automatically replace non-UTF-8 characters after reading in raw data, so that:
- Raw data files do not need to be manually edited
- Non-UTF-8 character conversion is automated and fully reproducible
- A full list of non-UTF-8 character replacements can be reviewed
We receive non-UTF-8 characters in raw data files. These non-UTF-8 characters are not understood by R and must be manually fixed before contacting HIP registrants. For example, to manually edit the city value
"CA\xd1ON CITY"in a file from Colorado, we would replace\xd1withNto get the human readable value of"CANON CITY". This is done by opening the raw file, making the change, saving the file, and re-runningread_hip().Sometimes, it is not obvious as to what non-UTF-8 characters should be changed to. First names, last names, and street names are particularly variable. To ensure we make the correct change, each escape sequence of a hexadecimal byte value must be checked and replaced manually. This is time consuming.
Create a function that can automatically replace non-UTF-8 characters after reading in raw data, so that: