CRITICAL
Rule Definition
XML documents optionally contain a Document Type Definition (DTD), which, among other features, enables the definition of XML entities. It is possible to define an entity by providing a substitution string in the form of a URI. The XML parser can access the contents of this URI and embed these contents back into the XML document for further processing.
By submitting an XML file that defines an external entity with a file:// URI, an attacker can cause the processing application to read the contents of a local file. For example, a URI such as "file:///c:/winnt/win.ini" designates (in Windows) the file C:\Winnt\win.ini, or file:///etc/passwd designates the password file in Unix-based systems. Using URIs with other schemes such as http://, the attacker can force the application to make outgoing requests to servers that the attacker cannot reach directly, which can be used to bypass firewall restrictions or hide the source of attacks such as port scanning.
Once the content of the URI is read, it is fed back into the application that is processing the XML. This application may echo back the data (e.g. in an error message), thereby exposing the file contents.
Remediation
For lxml: create a parser and set the parameter resolve_entities to False.
For modules of the Python Standard Library (xml.*): they are safe since Python v.3.6 and greater, but the use of the defusedxml or defusedexpat package is recommended for any server code that parses untrusted XML data.
Violation Code Sample
# lxml
import lxml.etree as etree
@app.post("/upload")
async def upload(request: Request):
xml_src = await request.body()
doc = etree.fromstring(xml_src) # violation
return etree.tostring(doc)
#----------------------------------------------
# Python standard library
import xml.etree.ElementTree as ET
@app.post("/upload")
async def upload(request: Request):
xml_src = await request.body()
doc = ET.fromstring(xml_src) # violation
return ET.tostring(doc)
Fixed Code Sample
# lxml
import lxml.etree as etree
@app.post("/upload")
async def upload(request: Request):
xml_src = await request.body()
parser = etree.XMLParser(resolve_entities=False) # sanitization
doc = etree.fromstring(xml_src, parser=parser) # no violation
return etree.tostring(doc)
#----------------------------------------------
# Python standard library
import defusedxml.ElementTree as ET # <- safe library
@app.post("/upload")
async def upload(request: Request):
xml_src = await request.body()
doc = ET.fromstring(xml_src) # no violation
return ET.tostring(doc)
Reference
https://cwe.mitre.org/data/definitions/611.html
https://www.owasp.org/index.php/Top_10-2017_A4-XML_External_Entities_(XXE)
https://owasp.org/Top10/fr/A05_2021-Security_Misconfiguration/
https://docs.python.org/3/library/xml.html?highlight=external%20entity%20expansion
https://pypi.org/project/defusedxml/
Related Technologies
Technical Criterion
CWE-611 - Improper Restriction of XML External Entity Reference
About CAST Appmarq
CAST Appmarq is by far the biggest repository of data about real IT systems. It's built on thousands of analyzed applications, made of 35 different technologies, by over 300 business organizations across major verticals. It provides IT Leaders with factual key analytics to let them know if their applications are on track.