Skip to content

Fix four RCE / path-traversal vulnerabilities in Model Maker (Fixes #6267)#1

Open
kennethkcox wants to merge 2 commits into
masterfrom
fix/6267-rces
Open

Fix four RCE / path-traversal vulnerabilities in Model Maker (Fixes #6267)#1
kennethkcox wants to merge 2 commits into
masterfrom
fix/6267-rces

Conversation

@kennethkcox

Copy link
Copy Markdown
Owner

Fixes google-ai-edge#6267

Summary

Four security vulnerabilities identified by static analysis and manually confirmed against MediaPipe source are addressed here. All fixes are minimal and scoped to the vulnerable call sites.


Finding 1 — Arbitrary code execution via torch.load() (pytorch_converter.py:36)

torch.load() was called without weights_only=True, allowing a malicious .pt checkpoint to execute arbitrary Python via __reduce__ payloads during deserialization.

- self._model = torch.load(model_path, map_location=torch.device("cpu"))
+ self._model = torch.load(
+     model_path, map_location=torch.device("cpu"), weights_only=True
+ )

PyTorch 2.6 changed the default to weights_only=True, but making it explicit is correct defence-in-depth regardless of installed version.


Finding 2 — RCE via Keras Lambda layer deserialization (image_classifier.py:233, text_classifier.py:489, model_util.py:71)

tf.keras.models.load_model() was called with a user-supplied saved_model_path. A SavedModel containing a Lambda layer executes the Lambda body unconditionally during loading. safe_mode=True does not block this for the SavedModel format — it only applies to the Keras-native .keras config format.

Fix for the user-supplied restore paths (the actual attack surface): since _create_model() already builds the model architecture before the load call, replace load_model() with load_weights(), which loads tensor values only and never reconstructs graph nodes or Lambda functions.

# image_classifier.py / text_classifier.py
- classifier._model = tf.keras.models.load_model(saved_model_path, compile=False)
+ classifier._model.load_weights(
+     os.path.join(saved_model_path, 'variables', 'variables')
+ )

Defence-in-depth for the internal utility: load_keras_model() in model_util.py is only called with hardcoded MediaPipe-controlled URLs (VGG19 perceptual loss model, gesture embedder), never with user input. safe_mode=True is added there for the .keras-format protection it provides, and the docstring documents its scope limitation.


Finding 3 — Path traversal (tar slip) during archive extraction (file_util.py:77)

tarf.extractall(tmpdir) was called without a filter, allowing a crafted archive to write files outside the extraction directory (e.g. ../model_maker_cache/<hash>_metadata.yaml).

- tarf.extractall(tmpdir)
+ if sys.version_info >= (3, 11, 4):
+     tarf.extractall(tmpdir, filter='data')
+ else:
+     abs_tmpdir = os.path.realpath(tmpdir) + os.sep
+     for member in tarf.getmembers():
+         member_path = os.path.realpath(os.path.join(tmpdir, member.name))
+         if not member_path.startswith(abs_tmpdir):
+             raise RuntimeError(f'Unsafe path in archive: {member.name}')
+     tarf.extractall(tmpdir)

filter='data' (Python 3.11.4+) blocks absolute paths, .. traversal, symlinks escaping the tree, and dangerous member types. The fallback covers older Python versions with an explicit path-boundary check.


Finding 4 — YAML deserialization RCE via chained tar slip (cache_files.py:106)

yaml.load(f, Loader=yaml.FullLoader) was used to read cache metadata. FullLoader permits Python-specific tags (e.g. !!python/object/new:subprocess.Popen) in older PyYAML versions and is exploitable as the landing target for a chained tar-slip attack (Finding 3 delivers the payload; Finding 4 executes it). yaml.safe_load() restricts output to plain Python primitives and accepts no Python object tags.

- metadata = yaml.load(f, Loader=yaml.FullLoader)
+ metadata = yaml.safe_load(f)

Files changed

File Change
mediapipe/tasks/python/genai/converter/pytorch_converter.py weights_only=True
mediapipe/model_maker/python/core/utils/model_util.py safe_mode=True + docstring
mediapipe/model_maker/python/vision/image_classifier/image_classifier.py load_weights()
mediapipe/model_maker/python/text/text_classifier/text_classifier.py load_weights()
mediapipe/model_maker/python/core/utils/file_util.py tar filter + import sys + docstring
mediapipe/model_maker/python/core/data/cache_files.py yaml.safe_load()

Changes identified and implemented with Claude Code (claude-sonnet-4-6).

claude added 2 commits April 6, 2026 20:54
Finding 1 (pytorch_converter.py): add weights_only=True to torch.load()
to prevent arbitrary code execution via __reduce__ payloads in .pt files.

Finding 2 (model_util.py, image_classifier.py, text_classifier.py):
- Add safe_mode=True to load_keras_model() as defense-in-depth for
  .keras format (safe_mode does not protect SavedModel format).
- Replace tf.keras.models.load_model() with model.load_weights() in
  load_image_classifier() and load_bert_classifier(), where the model
  architecture is already built by _create_model(). This eliminates
  Lambda-layer deserialization entirely for the user-supplied model
  restore paths that were the described attack surface.

Finding 3 (file_util.py): use tarfile filter='data' (Python 3.11.4+)
to block path-traversal (tar slip) attacks during archive extraction,
with a manual path-validation fallback for older Python versions.

Finding 4 (cache_files.py): replace yaml.load(..., Loader=yaml.FullLoader)
with yaml.safe_load() to prevent YAML deserialization RCE. FullLoader
still allows construction of arbitrary Python objects (e.g.
subprocess.Popen) via !!python/object/new tags; SafeLoader does not.

https://claude.ai/code/session_01GqJo5CJ7UF2Toyr2PcjziK
- file_util.py: add `import sys` (stdlib), move `import requests` to
  third-party block (blank-line separated), replace try/except TypeError
  version detection with explicit sys.version_info >= (3, 11, 4) check,
  update Raises docstring to document the new RuntimeError on unsafe paths.
- model_util.py: add Note section to load_keras_model() docstring
  documenting safe_mode scope and the recommended load_weights() pattern
  for SavedModel-format callers.

https://claude.ai/code/session_01GqJo5CJ7UF2Toyr2PcjziK
@kennethkcox

Copy link
Copy Markdown
Owner Author

cc @google-ai-edge/mediapipe-team (if exists) / maintainers

Ready for review—fixes google-ai-edge#6267 RCEs with zero regressions. CLA ready if needed. PoCs in issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RCE via unsafe deserialization in Model Maker (torch.load, Keras, tar slip, yaml.FullLoader)

2 participants