tensorflow

TensorFlow, a popular Python-based machine learning and artificial intelligence project developed by Google has dropped support for YAML, to patch a critical code execution vulnerability.

YAML or Yet Another Markup Language is a convenient choice among developers looking for a human-readable data serialization language for handling configuration files and data in transit.

Untrusted deserialization vulnerability in TensorFlow

Maintainers behind both TensorFlow and Keras, a wrapper project for TensorFlow, have patched an untrusted deserialization vulnerability that stemmed from unsafe parsing of YAML.

Tracked as CVE-2021-37678,  the critical flaw enables attackers to execute arbitrary code when an application deserializes a Keras model provided in the YAML format.

Deserialization vulnerabilities typically occur when an application reads malformed or malicious data originating from inauthentic sources.

After an application reads and deserializes the data, it may crash resulting in a Denial of Service (DoS) condition, or worse, execute the attacker's arbitrary code.

This YAML deserialization vulnerability, rated a 9.3 in severity, was responsibly reported to TensorFlow maintainers by security researcher Arjun Shibu.

And the source of the flaw, you ask? The notorious "yaml.unsafe_load()" function in TensorFlow code:

yaml.unsafe_load function call
Vulnerable yaml.unsafe_load function call in TensorFlow (GitHub)

The "unsafe_load" function is known to deserialize YAML data rather liberally—it resolves all tags, "even those known to be unsafe on untrusted input."

This means, ideally "unsafe_load" should only be called on input that comes from a trusted source and is known to be free of any malicious content.

Should that not be the case, attackers can exploit the deserialization mechanism to execute code of their choice by injecting malicious payload in the YAML data which is yet to be serialized.

An example Proof-of-Concept (PoC) exploit shared in the vulnerability advisory demonstrates just this:

from tensorflow.keras import models

payload = '''
!!python/object/new:type
args: ['z', !!python/tuple [], {'extend': !!python/name:exec }]
listitems: "__import__('os').system('cat /etc/passwd')"
'''
  
models.model_from_yaml(payload)

TensorFlow drops YAML altogether in favor of JSON

After the vulnerability was reported, TensorFlow decided to drop YAML support altogether and use JSON deserialization instead.

"Given that YAML format support requires a significant amount of work, we have removed it for now," say the project maintainers in the same advisory.

"The methods `Model.to_yaml()` and `keras.models.model_from_yaml` have been replaced to raise a `RuntimeError` as they can be abused to cause arbitrary code execution," also explain the release notes associated with the fix.

"It is recommended to use JSON serialization instead of YAML, or, a better alternative, serialize to H5."

It is worth noting, TensorFlow is not the first or only project found to be using YAML's unsafe_load. The function's use is rather prevalent in Python projects.

GitHub shows thousands of search results referencing the function, with some developers proposing improvements:

github results for applications using unsafe_load
Many repos on GitHub have used and use YAML's unsafe load function (GitHub)

Fix for CVE-2021-37678 is expected to arrive in TensorFlow version 2.6.0, and will also be backported into prior versions 2.5.1, 2.4.3, and 2.3.4, state the maintainers.

Related Articles:

New Google Pixel AI feature analyzes phone conversations for scams

Google says “Enhanced protection” feature in Chrome now uses AI

Google: 70% of exploited flaws disclosed in 2023 were zero-days

Critical flaw in NVIDIA Container Toolkit allows full host takeover

Google sees 68% drop in Android memory safety flaws over 5 years