fix(vectorstore/faiss): HMAC-verify index.pkl before pickle.load (CWE-502)#2462
Open
sebastiondev wants to merge 1 commit intoarc53:mainfrom
Open
fix(vectorstore/faiss): HMAC-verify index.pkl before pickle.load (CWE-502)#2462sebastiondev wants to merge 1 commit intoarc53:mainfrom
sebastiondev wants to merge 1 commit intoarc53:mainfrom
Conversation
…-502) FAISS.load_local is invoked with allow_dangerous_deserialization=True, so a tampered index.pkl in storage results in arbitrary code execution the moment the index is opened. Sign index.pkl on save with HMAC-SHA256 keyed by ENCRYPTION_SECRET_KEY (with a domain separator) and verify the signature before deserializing. Legacy indexes without a signature are accepted once and signed trust-on-first-use, with a warning, so the upgrade is non-breaking.
|
@sebastiondev is attempting to deploy a commit to the Arc53 Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds HMAC-SHA256 integrity verification to the FAISS
index.pklfiles before they are passed topickle.load, mitigating an arbitrary-code-execution risk (CWE-502: Deserialization of Untrusted Data) inapplication/vectorstore/faiss.py.Vulnerability
application/vectorstore/faiss.py,FaissStore.__init__FAISS.load_local(temp_dir, self.embeddings, allow_dangerous_deserialization=True)FAISS.load_localis invoked withallow_dangerous_deserialization=True, which causespickle.loadto be called on the contents ofindex.pkl. The bytes forindex.pklare read from the configured storage backend (local filesystem or S3, viaStorageCreator) and written to a temp directory before being deserialized. There is no integrity check on the blob. An attacker who can write to the storage location for an index — for example, write access to an S3 bucket configured for vector storage — can replaceindex.pklwith a malicious pickle payload. The next time the index is loaded,pickle.loadwill execute the embedded__reduce__payload as the application user.This is a textbook unsafe-deserialization pattern. The exploitation primitive is well known and reliable; a one-line
__reduce__returning(os.system, ("...",))is enough.Fix
The fix derives a per-deployment key from
ENCRYPTION_SECRET_KEY(with a domain separator so it cannot be confused with other uses of that secret) and uses it to compute an HMAC-SHA256 over theindex.pklbytes. The signature is stored alongside the pickle asindex.pkl.sig.save_localpath that writes to storage), the signature is computed from the in-memory pickle bytes and written asindex.pkl.signext toindex.pkl.FAISS.load_local,_verify_pickle_integrityreadsindex.pkl.sigand compares it to the freshly computed HMAC usinghmac.compare_digest. A mismatch raisesValueErrorand aborts the load beforepickle.loadis reached..sigfile, the loader emits a warning and writes a TOFU (trust-on-first-use) signature for the current bytes, so any subsequent tampering is caught.The TOFU branch is intentional and called out in a
logger.warningso operators know to rebuild any index that may have been tampered with prior to upgrading. It is the only way to avoid breaking existing deployments on first upgrade.Tests
A new test file
tests/vectorstore/test_faiss_pickle_integrity.pycovers:index.pklloads without error.index.pkl(correct signature, mutated bytes) raisesValueErrorandpickle.load/FAISS.load_localare never reached.index.pkl.sig(correct bytes, mutated signature) raisesValueError..sigtriggers the TOFU path: a warning is logged, a signature is written, and a follow-up load with unchanged bytes succeeds while a follow-up load with mutated bytes fails.save_localto storage writes the.sigfile alongsideindex.pkl.Existing FAISS-related tests still pass.
Security analysis
The fix raises the bar for exploitation from "write any bytes to
index.pkl" to "writeindex.pkland forge a valid HMAC", which requires possession ofENCRYPTION_SECRET_KEY. For S3-backed deployments — where bucket-write permissions are commonly delegated and do not imply code execution on the application host — this is a meaningful reduction in blast radius. The signature is compared withhmac.compare_digestso the comparison is constant-time. The key derivation includes a domain separator (docsgpt|faiss-pickle-integrity|v1|) so the same secret used for at-rest encryption cannot accidentally produce an interchangeable key.There is one residual gap worth flagging for follow-up: an attacker who can race writes to a brand-new
index.pklbefore its first load will have their bytes signed by the TOFU branch. Closing this fully would require having the writer (the ingest worker / upload endpoint) compute and persist the.sigat write time in every code path, not justadd_texts. This PR fixes theadd_textssave path; tightening other write paths can be a small follow-up.Adversarial review
Before submitting, we tried to disprove this. The argument against is that for the local-filesystem storage backend, an attacker with write access to
indexes/already has filesystem write on the application host and can pursue many other RCE paths (crontab, source modification, etc.) — for that backend the fix is defense-in-depth rather than a true privilege boundary. However, for S3 or other cloud-storage backends, bucket write does not equal host code execution, andallow_dangerous_deserialization=Truewith no integrity check is a real and exploitable RCE primitive. We also checked whether any framework-level mitigation inlangchain_communitywould block tampered pickles — it does not; that is precisely what theallow_dangerous_deserializationflag opts out of. We are submitting the fix on the strength of the cloud-storage case.cc @lewiswigmore