Skip to content

fix(security): sanitize telemetry data + add SHA-256 verification for model downloads#1723

Open
MAXDVVV wants to merge 3 commits intoopeninterpreter:mainfrom
MAXDVVV:fix/telemetry-sanitize-and-model-hash-verification
Open

fix(security): sanitize telemetry data + add SHA-256 verification for model downloads#1723
MAXDVVV wants to merge 3 commits intoopeninterpreter:mainfrom
MAXDVVV:fix/telemetry-sanitize-and-model-hash-verification

Conversation

@MAXDVVV
Copy link
Copy Markdown

@MAXDVVV MAXDVVV commented Apr 7, 2026

Summary

This PR addresses two security concerns in Open Interpreter:

1. Telemetry Data Sanitization (telemetry.py)

Problem: The send_telemetry() function sends properties to PostHog without any sanitization. When exception stack traces or other debug data are included in telemetry properties, they may inadvertently leak:

  • Absolute file paths revealing usernames and directory structures
  • Environment variable values
  • API keys, tokens, or passwords embedded in error messages

Fix: Added a _sanitize_properties() layer that:

  • Strips absolute file paths (Unix /home/user/... and Windows C:\Users\...) → <path>
  • Redacts environment variable references ($HOME, %USERPROFILE%) → <env>
  • Redacts values of sensitive keys (api_key, password, token, etc.) → <redacted>
  • Recursively processes nested dicts and lists

2. Model Download Integrity Verification (local_setup.py + new download_security.py)

Problem: Model files (multi-GB executables) are downloaded via wget.download() from HuggingFace URLs with zero integrity verification. A network-level attacker (MITM) or compromised CDN could serve tampered model files that get chmod +x and executed.

Fix:

  • New download_security.py module with verify_model_integrity() function
  • Computes SHA-256 hash of downloaded files
  • Verifies against expected hash when available
  • Warns users when no hash is available (prints computed hash for manual verification)
  • Automatically removes files that fail integrity checks
  • Added sha256 field to all model entries in the model list (currently None — maintainers should populate with verified hashes)

Files Changed

File Change
interpreter/core/utils/telemetry.py Added sanitization layer before requests.post()
interpreter/terminal_interface/download_security.py New — SHA-256 verification utility
interpreter/terminal_interface/local_setup.py Import + call verify_model_integrity() after download

Testing

  • Sanitization is regex-based with no external dependencies
  • Hash verification uses only stdlib hashlib
  • No breaking changes — all existing behavior preserved
  • When sha256 is None, download proceeds with a warning (non-blocking)

Next Steps (for maintainers)

Populate the sha256 field for each model entry with verified checksums from HuggingFace. Example:

{
    "name": "Llama-3.1-8B-Instruct",
    "sha256": "abc123...",  # from HuggingFace model card
    ...
}

MAXDVVV added 3 commits April 7, 2026 22:56
… credentials

Added sanitization layer that:
- Strips absolute file paths (Unix/Windows) from all telemetry properties
- Redacts environment variable references ($HOME, %USERPROFILE%, etc.)
- Redacts values of sensitive keys (api_key, password, token, etc.)
- Recursively sanitizes nested dicts and lists

This prevents accidental leakage of local file paths, credentials, or
other PII that may appear in exception stack traces sent via telemetry.
New utility module that:
- Computes SHA-256 checksums of downloaded model files
- Verifies against expected hashes when available
- Warns users when no hash is available for verification
- Automatically removes files that fail integrity checks

This addresses the risk of tampered or corrupted model files being
downloaded and executed without any integrity verification.
- Import and call verify_model_integrity() after each wget.download()
- Add 'sha256' field to all model entries (None for now — maintainers
  should populate with verified hashes from HuggingFace)
- If hash verification fails, the corrupted file is automatically
  removed and download_model() returns None
- When no hash is provided, a warning is printed with the computed
  hash so users can verify manually

This prevents execution of tampered or corrupted model files that
are downloaded from external sources without any integrity check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant