RSTR-DES-001 — Python pickle.loads on untrusted input
Summary
pickle.loads (and Unpickler.load) deserializes arbitrary Python
objects, including ones whose constructors execute side-effectful
code. A pickle byte string is a program; the only safe input is
data you yourself produced and stored in a location only you can write
to. From the network, from a database row a user wrote, from a file
upload — never.
This is the single most common Python RCE primitive.
Severity
Critical.
Languages
Python.
What rastray flags
import pickle
obj = pickle.loads(request.data) # ← flagged
obj = pickle.load(open('user_uploaded.pkl', 'rb')) # ← flagged
from pickle import Unpickler
obj = Unpickler(stream).load() # ← flagged
What rastray deliberately does not flag
json.loads(...),tomllib.loads(...),msgpack.unpackb(...)— data-only formats.dill,cloudpickle— also unsafe (same primitive); they have separate rules.
How to fix it
For data interchange, use JSON or MessagePack. For storing typed
Python objects, use pydantic / attrs / dataclasses with explicit
from_dict constructors:
import json
from pydantic import BaseModel
class Job(BaseModel):
id: str
payload: dict
job = Job.model_validate_json(request.data)
If you absolutely must use pickle (long-lived internal cache, no external surface), sign the payload with HMAC-SHA-256 and verify before unpickling. That moves the threat model from "anyone with write access to the channel can RCE you" to "anyone with the HMAC key can RCE you" — better, but still demand a real reason.