RSTR-XXE-002 — lxml XMLParser(resolve_entities=True)

Summary

lxml.etree.XMLParser is constructed with resolve_entities=True (the default), so external entities embedded in the parsed XML are resolved. A malicious document can read local files (file:///etc/passwd) or make outbound HTTP requests (SSRF) from the parser process.

Severity

High.

Languages

Python.

What rastray flags

from lxml import etree

parser = etree.XMLParser(resolve_entities=True)    # ← flagged
tree   = etree.fromstring(payload, parser)
parser = etree.XMLParser()                          # ← flagged (defaults to True)

What rastray deliberately does not flag

  • etree.XMLParser(resolve_entities=False, no_network=True).
  • defusedxml.lxml.parse(...) / fromstring(...).

How to fix it

Either harden the parser:

from lxml import etree

parser = etree.XMLParser(
    resolve_entities=False,
    no_network=True,
    huge_tree=False,
    dtd_validation=False,
    load_dtd=False,
)
tree = etree.fromstring(payload, parser)

Or, easier, use defusedxml.lxml:

from defusedxml.lxml import fromstring
tree = fromstring(payload)

defusedxml is the OWASP-recommended drop-in: it disables every XXE-relevant feature by default and the API mirrors stdlib lxml.

References