Preserving original unsplit value for the command fieldtype

Background: I am working on an Elastic Integration to assist with ingesting Dissect data into Elastic via Elastic Agent. (The Elastic Ingest pipelines would also allow improved parsing of documents being written via rdump's elasticsearch writer).  I've been going over the dissect output of various functions and figuring out how it could best be mapped to ECS and other helpful fields for analysts. During this process, I've come across a few things that I'd like to bring up for discussion and determine if I am possible missing a flag for a command or if there would be a need for a feature request.

I'm opening an issue regarding a proposal to add a feature to the command fieldtype: an original property that preserves the exact input string before any splitting or normalization takes place. If there is already something like this and I'm just not seeing it, please let me know.

## Observed Problem

While testing the `runkeys` function, I noticed the command executable and args getting incorrectly split due to (what I think) are missing quotes in the original value. This results in a structure like:

```json
"command": {"executable": "%ProgramFiles%\\Windows", "args": ["Mail\\wab.exe /Upgrade"]}
```
When one would expect

```json
"command": {"executable": "%ProgramFiles%\\Windows Mail\\wab.exe", "args": ["/Upgrade"]}
```

Looking into it, I see that this is a recognized issue in the docstring of the `command` FieldType:

https://github.com/fox-it/flow.record/blob/a594c193b1050b361b48f76d502385f22bcbbbac/flow/record/fieldtypes/__init__.py#L755-L870

## Proposed Solution

I could join the the executable string and the args array to get an approximation of the original value; however, I feel there would be benefit in having access to the original, unmodified/unsplit/unstriped value (especially when dealing with forensics). I've read about various whitespace padding and null injection techniques to help store payloads in registry values.

Would it be reasonable to include something like an `executable.raw` or `executable.original` field to provide the original string? I would think it would look something like this on the Dissect JSON output:

```json
"command": {"executable": "%ProgramFiles%\\Windows", "args": ["Mail\\wab.exe /Upgrade"], "original": "%ProgramFiles%\\Windows Mail\\wab.exe /Upgrade"}
```

Or it may be better to just provide as a seperate field all together?

```json
"raw_command": "%ProgramFiles%\\Windows Mail\\wab.exe /Upgrade"
```
Having this original value would provide forensic accuracy and also allow me to map it the ECS [registry.data.strings](https://www.elastic.co/docs/reference/ecs/ecs-registry#field-registry-data-strings) field without worrying about reassembly . Adding this as a new field would also hopefully avoid any issues the user's current workflows relying on `command.executable` and `command.args`.

I could try to work out implementation and submit a PR if this field would be seen as a positive addition. I'm not sure what the ideal implementation would be though.

	class command(FieldType):
	"""The command fieldtype splits a command string into an ``executable`` and its arguments.

	Args:
	value: the string that contains the command and arguments
	path_type: When specified it forces the command to use a specific path type

	Example:

	.. code-block:: text

	'c:\\windows\\malware.exe /info' -> windows_path('c:\\windows\\malware.exe) ['/info']
	'/usr/bin/env bash' -> posix_path('/usr/bin/env') ['bash']

	# In this situation, the executable path needs to be quoted.
	'c:\\user\\John Doe\\malware.exe /all /the /things' -> windows_path('c:\\user\\John')
	['Doe\\malware.exe /all /the /things']
	"""

	__executable: path
	__args: tuple[str, ...]

	__path_type: type[path]

	def __init__(self, value: str = "", *, path_type: type[path] \| None = None):
	if not isinstance(value, str):
	raise TypeError(f"Expected a value of type 'str' not {type(value)}")

	raw = value.strip()

	# Detect the kind of path from value if not specified
	self.__path_type = path_type or type(path(raw.lstrip("\"'")))

	self.executable, self.args = self._split(raw)

	def __repr__(self) -> str:
	return f"(executable={self.executable!r}, args={self.args})"

	def __eq__(self, other: object) -> bool:
	if isinstance(other, command):
	return self.executable == other.executable and self.args == other.args
	if isinstance(other, str):
	return self.raw == other
	if isinstance(other, (tuple, list)):
	return self.executable == other[0] and self.args == (*other[1:],)

	return False

	def _split(self, value: str) -> tuple[str, tuple[str, ...]]:
	if not value:
	return "", ()

	executable, *args = shlex.split(value, posix=self.__path_type is posix_path)
	return executable.strip("'\" "), (*args,)

	def _pack(self) -> tuple[str, int]:
	path_type = TYPE_WINDOWS if self.__path_type is windows_path else TYPE_POSIX
	return self.raw, path_type

	@classmethod
	def _unpack(cls, data: tuple[str, int]) -> command:
	raw_str, path_type = data
	if path_type == TYPE_POSIX:
	return command(raw_str, path_type=posix_path)
	if path_type == TYPE_WINDOWS:
	return command(raw_str, path_type=windows_path)
	# default, infer type of path from str
	return command(raw_str)

	@property
	def executable(self) -> path:
	return self.__executable

	@property
	def args(self) -> tuple[str, ...]:
	return self.__args

	@executable.setter
	def executable(self, val: str \| path \| None) -> None:
	self.__executable = self.__path_type(val)

	@args.setter
	def args(self, val: str \| tuple[str, ...] \| list[str] \| None) -> None:
	if val is None:
	self.__args = ()
	return

	if isinstance(val, str):
	self.__args = tuple(shlex.split(val, posix=self.__path_type is posix_path))
	elif isinstance(val, list):
	self.__args = tuple(val)
	else:
	self.__args = val

	@property
	def raw(self) -> str:
	exe = str(self.executable)

	if " " in exe:
	exe = shlex.quote(exe)

	result = [exe]
	# Only quote on posix paths as shlex doesn't remove the quotes on non posix paths
	if self.__path_type is posix_path:
	result.extend(shlex.quote(part) if " " in part else part for part in self.args)
	else:
	result.extend(self.args)
	return " ".join(result)

	@classmethod
	def from_posix(cls, value: str) -> command:
	return command(value, path_type=posix_path)

	@classmethod
	def from_windows(cls, value: str) -> command:
	return command(value, path_type=windows_path)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserving original unsplit value for the command fieldtype #220

Observed Problem

Proposed Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Preserving original unsplit value for the command fieldtype #220

Description

Observed Problem

Proposed Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions