Fix four RCE / path-traversal vulnerabilities in Model Maker (Fixes #6267) by kennethkcox · Pull Request #1 · kennethkcox/mediapipe

kennethkcox · 2026-04-06T21:03:31Z

Summary

Four security vulnerabilities identified by static analysis and manually confirmed against MediaPipe source are addressed here. All fixes are minimal and scoped to the vulnerable call sites.

Finding 1 — Arbitrary code execution via `torch.load()` (`pytorch_converter.py:36`)

torch.load() was called without weights_only=True, allowing a malicious .pt checkpoint to execute arbitrary Python via __reduce__ payloads during deserialization.

- self._model = torch.load(model_path, map_location=torch.device("cpu"))
+ self._model = torch.load(
+     model_path, map_location=torch.device("cpu"), weights_only=True
+ )

PyTorch 2.6 changed the default to weights_only=True, but making it explicit is correct defence-in-depth regardless of installed version.

Finding 2 — RCE via Keras Lambda layer deserialization (`image_classifier.py:233`, `text_classifier.py:489`, `model_util.py:71`)

tf.keras.models.load_model() was called with a user-supplied saved_model_path. A SavedModel containing a Lambda layer executes the Lambda body unconditionally during loading. safe_mode=True does not block this for the SavedModel format — it only applies to the Keras-native .keras config format.

Fix for the user-supplied restore paths (the actual attack surface): since _create_model() already builds the model architecture before the load call, replace load_model() with load_weights(), which loads tensor values only and never reconstructs graph nodes or Lambda functions.

# image_classifier.py / text_classifier.py
- classifier._model = tf.keras.models.load_model(saved_model_path, compile=False)
+ classifier._model.load_weights(
+     os.path.join(saved_model_path, 'variables', 'variables')
+ )

Defence-in-depth for the internal utility: load_keras_model() in model_util.py is only called with hardcoded MediaPipe-controlled URLs (VGG19 perceptual loss model, gesture embedder), never with user input. safe_mode=True is added there for the .keras-format protection it provides, and the docstring documents its scope limitation.

Finding 3 — Path traversal (tar slip) during archive extraction (`file_util.py:77`)

tarf.extractall(tmpdir) was called without a filter, allowing a crafted archive to write files outside the extraction directory (e.g. ../model_maker_cache/<hash>_metadata.yaml).

- tarf.extractall(tmpdir)
+ if sys.version_info >= (3, 11, 4):
+     tarf.extractall(tmpdir, filter='data')
+ else:
+     abs_tmpdir = os.path.realpath(tmpdir) + os.sep
+     for member in tarf.getmembers():
+         member_path = os.path.realpath(os.path.join(tmpdir, member.name))
+         if not member_path.startswith(abs_tmpdir):
+             raise RuntimeError(f'Unsafe path in archive: {member.name}')
+     tarf.extractall(tmpdir)

filter='data' (Python 3.11.4+) blocks absolute paths, .. traversal, symlinks escaping the tree, and dangerous member types. The fallback covers older Python versions with an explicit path-boundary check.

Finding 4 — YAML deserialization RCE via chained tar slip (`cache_files.py:106`)

yaml.load(f, Loader=yaml.FullLoader) was used to read cache metadata. FullLoader permits Python-specific tags (e.g. !!python/object/new:subprocess.Popen) in older PyYAML versions and is exploitable as the landing target for a chained tar-slip attack (Finding 3 delivers the payload; Finding 4 executes it). yaml.safe_load() restricts output to plain Python primitives and accepts no Python object tags.

- metadata = yaml.load(f, Loader=yaml.FullLoader)
+ metadata = yaml.safe_load(f)

Files changed

File	Change
`mediapipe/tasks/python/genai/converter/pytorch_converter.py`	`weights_only=True`
`mediapipe/model_maker/python/core/utils/model_util.py`	`safe_mode=True` + docstring
`mediapipe/model_maker/python/vision/image_classifier/image_classifier.py`	`load_weights()`
`mediapipe/model_maker/python/text/text_classifier/text_classifier.py`	`load_weights()`
`mediapipe/model_maker/python/core/utils/file_util.py`	tar filter + `import sys` + docstring
`mediapipe/model_maker/python/core/data/cache_files.py`	`yaml.safe_load()`

Changes identified and implemented with Claude Code (claude-sonnet-4-6).

Finding 1 (pytorch_converter.py): add weights_only=True to torch.load() to prevent arbitrary code execution via __reduce__ payloads in .pt files. Finding 2 (model_util.py, image_classifier.py, text_classifier.py): - Add safe_mode=True to load_keras_model() as defense-in-depth for .keras format (safe_mode does not protect SavedModel format). - Replace tf.keras.models.load_model() with model.load_weights() in load_image_classifier() and load_bert_classifier(), where the model architecture is already built by _create_model(). This eliminates Lambda-layer deserialization entirely for the user-supplied model restore paths that were the described attack surface. Finding 3 (file_util.py): use tarfile filter='data' (Python 3.11.4+) to block path-traversal (tar slip) attacks during archive extraction, with a manual path-validation fallback for older Python versions. Finding 4 (cache_files.py): replace yaml.load(..., Loader=yaml.FullLoader) with yaml.safe_load() to prevent YAML deserialization RCE. FullLoader still allows construction of arbitrary Python objects (e.g. subprocess.Popen) via !!python/object/new tags; SafeLoader does not. https://claude.ai/code/session_01GqJo5CJ7UF2Toyr2PcjziK

- file_util.py: add `import sys` (stdlib), move `import requests` to third-party block (blank-line separated), replace try/except TypeError version detection with explicit sys.version_info >= (3, 11, 4) check, update Raises docstring to document the new RuntimeError on unsafe paths. - model_util.py: add Note section to load_keras_model() docstring documenting safe_mode scope and the recommended load_weights() pattern for SavedModel-format callers. https://claude.ai/code/session_01GqJo5CJ7UF2Toyr2PcjziK

kennethkcox · 2026-04-06T21:07:27Z

cc @google-ai-edge/mediapipe-team (if exists) / maintainers

Ready for review—fixes google-ai-edge#6267 RCEs with zero regressions. CLA ready if needed. PoCs in issue.

claude added 2 commits April 6, 2026 20:54

kennethkcox mentioned this pull request Apr 13, 2026

RCE via unsafe deserialization in Model Maker (torch.load, Keras, tar slip, yaml.FullLoader) google-ai-edge/mediapipe#6267

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix four RCE / path-traversal vulnerabilities in Model Maker (Fixes #6267)#1

Fix four RCE / path-traversal vulnerabilities in Model Maker (Fixes #6267)#1
kennethkcox wants to merge 2 commits into
masterfrom
fix/6267-rces

kennethkcox commented Apr 6, 2026

Uh oh!

kennethkcox commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kennethkcox commented Apr 6, 2026

Summary

Finding 1 — Arbitrary code execution via torch.load() (pytorch_converter.py:36)

Finding 2 — RCE via Keras Lambda layer deserialization (image_classifier.py:233, text_classifier.py:489, model_util.py:71)

Finding 3 — Path traversal (tar slip) during archive extraction (file_util.py:77)

Finding 4 — YAML deserialization RCE via chained tar slip (cache_files.py:106)

Files changed

Uh oh!

kennethkcox commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Finding 1 — Arbitrary code execution via `torch.load()` (`pytorch_converter.py:36`)

Finding 2 — RCE via Keras Lambda layer deserialization (`image_classifier.py:233`, `text_classifier.py:489`, `model_util.py:71`)

Finding 3 — Path traversal (tar slip) during archive extraction (`file_util.py:77`)

Finding 4 — YAML deserialization RCE via chained tar slip (`cache_files.py:106`)