Skip to content

[Bug] workiq ask -q mangles non-ASCII input on Windows when the OEM code page is not UTF-8 (CJK locales unusable) #149

Description

@yoshihiro-matsumoto

Summary

On Windows hosts whose OEM code page is not UTF-8 (e.g. Japanese = CP932, Simplified Chinese = CP936, Korean = CP949, traditional Chinese = CP950), every workiq ask -q "<non-ASCII question>" invocation reaches the WorkIQ executable with the question misdecoded. The model then receives garbled text (classic UTF-8-as-CP932 mojibake such as 縲後ユ繧ケ繝域律譛ャ隱槭・ for 「テスト日本語」) and either refuses to answer or hallucinates a "this looks corrupted, please resend" response.

This makes WorkIQ effectively unusable for any prompt containing non-ASCII characters on default Japanese / Chinese / Korean Windows installs.

Environment

  • OS: Windows 11 (Japanese system locale, OEM code page = CP932 / Shift-JIS)
  • @microsoft/workiq: 1.0.0 (clean npm i @microsoft/workiq into an empty folder)
  • PowerShell: 7.6.2
  • chcp reports 932

Reproduction

Minimal repro using only the published npm package — no wrappers, no IDE integrations involved:

mkdir C:\temp\workiq-test; cd C:\temp\workiq-test
npm init -y
npm i @microsoft/workiq
.\node_modules\.bin\workiq.cmd ask -q "次の文字列をそのままエコーバックしてください: 「テスト日本語」"

Expected: the model echoes back 「テスト日本語」.

Actual:

縲後ユ繧ケ繝域律譛ャ隱槭・

(The exact byte pattern — / / / runs — is the unambiguous signature of UTF-8 bytes decoded as CP932.)

The same garbled output is produced when invoking the bundled binary directly, bypassing the Node shim entirely:

.\node_modules\@microsoft\workiq\bin\win-x64\workiq.exe ask -q "次の文字列をそのままエコーバックしてください: 「テスト日本語」"

縲後ユ繧ケ繝域律譛ャ隱槭・

So the issue is not in the npm shim, not in Node's child_process, and not in any wrapper — it's in workiq.exe itself.

Root cause

PowerShell 7 passes the argument to CreateProcessW as UTF-16 (the OS-native form). However, the WorkIQ binary appears to re-interpret incoming argument bytes via the OEM code page (GetConsoleCP() / GetACP()) rather than reading them as UTF-16 from GetCommandLineW() (or, equivalently, normalizing them to UTF-8 at startup). On a Japanese Windows host this means CP932 is used as the working text encoding for argv, which mangles any byte sequence that isn't valid Shift-JIS.

Same root cause likely applies to stdin reads when stdin is redirected: .NET's Console.In defaults to Encoding.GetEncoding(GetConsoleCP()) unless explicitly overridden, which is the OEM code page on Windows.

Suggested fix

At process startup, pin the I/O encodings to UTF-8:

Console.InputEncoding  = new UTF8Encoding(false);
Console.OutputEncoding = new UTF8Encoding(false);

For argv specifically, use Environment.GetCommandLineArgs() (which on .NET 5+ reads from GetCommandLineW() and round-trips UTF-16 cleanly) rather than any path that re-decodes argv bytes through the OEM code page.

This is standard practice for locale-agnostic .NET CLI tools and matches what every modern cross-platform CLI (Node, Python 3.7+, Go, Rust) assumes by default.

Impact

  • All non-ASCII prompts via workiq ask -q are broken on CJK Windows installs
  • Hidden silent data corruption when users pass non-ASCII names, file paths, or quoted Teams/Outlook content as part of the prompt
  • User-side workarounds (re-encoding to OEM bytes before invoking) are fragile and locale-specific

Workaround for affected users (today)

None that round-trips full Unicode. Re-encoding the question to the local OEM code page before invocation gets ASCII + the local script through, but loses anything outside that code page (emoji, surrogate-pair kanji like 𠮷, characters from other scripts).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions