Open
Conversation
This paves the way for deriving fields in Options structs directly from OsStr, e.g. allowing for non-UTF-8 file paths. Commands and short/long options still need to be valid UTF-8.
This requires Opt::LongWithArg to use Cow (or (Os)String), &OsStr cannot be shortened on Windows. This in turn means the enum cannot be Copy, and should therefore be passed by reference whenever possible.
This does not happen automatically when a later function is generic (e.g. From::from).
Three types of parsing support that have &str counterparts: from_os_str (named and unnamed) and try_from_os_str (named). The fourth type of &str-based parsing (unnamed try_from_str using std::str::FromStr) does not have an &OsStr counterpart.
The same OsStrExt is present on some platforms that are not considered to be in the Unix family. Using more fine-grained cfg-attributes, these platforms can also support --valid-utf8=invalid-utf8. I have verified that it compiles for the x86_64-fortanix-unknown-sgx and wasm32-wasi targets, but not that it runs correctly.
Author
|
I think this PR is mostly finished, with the following caveats:
|
jirutka
added a commit
to jirutka/gumdrop
that referenced
this pull request
Jan 12, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a start towards allowing arguments that aren't UTF-8. At the moment there's just the groundwork to show that argument parsing is also possible based on
&OsStr. It's not finished yet, but I'd like to get some feedback and discuss some design decisions beforehand. The API ofgumdropitself has to change quite a bit, but I think users that just usederive(Options)shouldn't need to notice much of the change.Related to #15.
What becomes
&OsStrorAsRef<OsStr>?The parser accepts now
AsRef<OsStr>instead ofAsRef<str>, but normal string (slices) implement this too, so that isn't too big of a change. All commands, short/long options stay&str, but free arguments and arguments to long options become&OsStrbefore being parsed ingumdrop_derive.The functions
gumdrop::{parse_args, parse_args_default}currently accept&[String]. I think this either can be kept the same but accepting&[S] where S: AsRef<OsStr>or two new functionsgumdrop::{parse_args_os, parse_args_os_default}that accept&[OsString]can be made.In
gumdrop_derivethe parsing options based on&strcan remain, and the arguments can be converted from&OsStrto&strfirst, yielding an error if that's not possible (without this PR, this would yield a panic). For this purpose I made a doc-hidden but public functiongumdrop::to_str, I'm not sure if that's the best approach. In addition to parsing based on&str, parsing options based on&OsStrcan be added (basically the whole point of this PR):parse(from_os_str = "...")forfn(&OsStr) -> Tparse(try_from_os_str = "...")forfn(&OsStr) -> Result<T, E> where E: Displayparse(from_os_str)usesstd::convert::From::fromparse(try_from_str)has a goodOsStrequivalentLimitations
If available, platform-specific
OsStrExtis used to look ahead and parse arguments starting from the beginning. Otherwise, the only API available to get the inner data of theOsStristo_string_lossy, which checks the entire string for valid UTF-8 and allocates aStringif that isn't the case, replacing invalid UTF-8 with the replacement character. This means an option like--valid-utf8=invalid-utf8cannot be parsed intoLongWithArg(&str, &OsStr), because there is no way to shorten the&OsStrtoinvalid-utf8while keeping the invalid UTF-8 characters. However,--valid-utf8 invalid-utf8(with a space instead of equal sign) can be parsed correctly on all platforms, and most platforms (especially in terms of usage) implement a form ofOsStrExt.At the moment I crudely check with
cfg(unix)for unix's version ofOsStrExt, but this should be extended to other platforms that export the sameOsStrExtbut not all other unix stuff.Windows has a different
OsStrExttrait, which cannot turn--valid-utf8=invalid-utf8intoLongWithArg(&str, &OsStr), but can turn it intoLongWithArg(Cow<str>, Cow<OsStr>). This is because Windows uses 16-bit code points internally, which translates to that you can't shorten a&OsStr, but you can create a newOsStringthat contains only part of the original&OsStr.