fix(security): sanitize telemetry data + add SHA-256 verification for model downloads#1723
Open
MAXDVVV wants to merge 3 commits intoopeninterpreter:mainfrom
Open
Conversation
… credentials Added sanitization layer that: - Strips absolute file paths (Unix/Windows) from all telemetry properties - Redacts environment variable references ($HOME, %USERPROFILE%, etc.) - Redacts values of sensitive keys (api_key, password, token, etc.) - Recursively sanitizes nested dicts and lists This prevents accidental leakage of local file paths, credentials, or other PII that may appear in exception stack traces sent via telemetry.
New utility module that: - Computes SHA-256 checksums of downloaded model files - Verifies against expected hashes when available - Warns users when no hash is available for verification - Automatically removes files that fail integrity checks This addresses the risk of tampered or corrupted model files being downloaded and executed without any integrity verification.
- Import and call verify_model_integrity() after each wget.download() - Add 'sha256' field to all model entries (None for now — maintainers should populate with verified hashes from HuggingFace) - If hash verification fails, the corrupted file is automatically removed and download_model() returns None - When no hash is provided, a warning is printed with the computed hash so users can verify manually This prevents execution of tampered or corrupted model files that are downloaded from external sources without any integrity check.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses two security concerns in Open Interpreter:
1. Telemetry Data Sanitization (
telemetry.py)Problem: The
send_telemetry()function sends properties to PostHog without any sanitization. When exception stack traces or other debug data are included in telemetry properties, they may inadvertently leak:Fix: Added a
_sanitize_properties()layer that:/home/user/...and WindowsC:\Users\...) →<path>$HOME,%USERPROFILE%) →<env>api_key,password,token, etc.) →<redacted>2. Model Download Integrity Verification (
local_setup.py+ newdownload_security.py)Problem: Model files (multi-GB executables) are downloaded via
wget.download()from HuggingFace URLs with zero integrity verification. A network-level attacker (MITM) or compromised CDN could serve tampered model files that getchmod +xand executed.Fix:
download_security.pymodule withverify_model_integrity()functionsha256field to all model entries in the model list (currentlyNone— maintainers should populate with verified hashes)Files Changed
interpreter/core/utils/telemetry.pyrequests.post()interpreter/terminal_interface/download_security.pyinterpreter/terminal_interface/local_setup.pyverify_model_integrity()after downloadTesting
hashlibsha256isNone, download proceeds with a warning (non-blocking)Next Steps (for maintainers)
Populate the
sha256field for each model entry with verified checksums from HuggingFace. Example:{ "name": "Llama-3.1-8B-Instruct", "sha256": "abc123...", # from HuggingFace model card ... }