OpenAI's ChatGPT platform provides a great degree of access to the LLM's sandbox, allowing you to upload programs and files, execute commands, and browse the sandbox's file structure.

The ChatGPT sandbox is an isolated environment that allows users to interact with the it securely while being walled off from other users and the host servers.

It does this by restricting access to sensitive files and folders, blocking access to the internet, and attempting to restrict commands that can be used to exploit flaws or potentially break out of the sandbox.

Marco Figueroa of Mozilla's 0-day investigative network, 0DIN, discovered that it's possible to get extensive access to the sandbox, including the ability to upload and execute Python scripts and download the LLM's playbook. 

In a report shared exclusively with BleepingComputer before publication, Figueroa demonstrates five flaws, which he reported responsibly to OpenAI. The AI firm only showed interest in one of them and didn't provide any plans to restrict access further.

Exploring the ChatGPT sandbox

While working on a Python project in ChatGPT, Figueroa received a "directory not found" error, which led him to discover how much a ChatGPT user can interact with the sandbox.

Soon, it became clear that the environment allowed a great deal of access to the sandbox, letting you upload and download files, list files and folders, upload programs and execute them, execute Linux commands, and output files stored within the sandbox.

Using commands, such as 'ls' or 'list files', the researcher was able to get a listing of all directories of the underlying sandbox filesystem, including the '/home/sandbox/.openai_internal/,' which contained configuration and set up information.

Listing files and folders in the ChatGPT sandbox
Source: Marco Figueroa

Next, he experimented with file management tasks, discovering that he was able to upload files to the /mnt/data folder as well as download files from any folder that was accessible.

It should be noted that in BleepingComputer's experiments, the sandbox does not provide access to specific sensitive folders and files, such as the /root folder and various files, like /etc/shadow.

Much of this access to the ChatGPT sandbox has already been disclosed in the past, with other researchers finding similar ways to explore it.

However, the researcher found he could also upload custom Python scripts and execute them within the sandbox. For example, Figueroa uploaded a simple script that outputs the text "Hello, World!" and executed it, with the output appearing on the screen.

Executing Python code on the sandbox
Source: Figueroa

BleepingComputer also tested this ability by uploading a Python script that recursively searched for all text files in the sandbox.

Due to legal reasons, the researcher says he was unable to upload "malicious" scripts that could be used to try and escape the sandbox or perform more malicious behavior.​

It should be noted that while all of the above was possible, all actions were confined within the boundaries of the sandbox, so the environment appears properly isolated, not allowing an "escape" to the host system.

Figueroa also discovered that he could use prompt engineering to download the ChatGPT "playbook," which governs how the chatbot behaves and responds on the general model or user-created applets.

The researcher says that access to the playbook offers transparency and builds trust with its users as it illustrates how answers are created, it could also be used to reveal information that could bypass guardrails.

"While instructional transparency is beneficial, it could also reveal how a model's responses are structured, potentially allowing users to reverse-engineer guardrails or inject malicious prompts," explains Figueroa.

"Models configured with confidential instructions or sensitive data could face risks if users exploit access to gather proprietary configurations or insights," continued the researcher.

Accessing the ChatGPT playbook
Source: Figueroa

Vulnerability or design choice?

While Figueroa demonstrates that interacting with ChatGPT's internal environment is possible, no direct safety or data privacy concerns arise from these interactions.

OpenAI's sandbox appears adequately secured, and all actions are restricted to the sandbox environment.

That being said, the possibility of interacting with the sandbox could be the result of a design choice by OpenAI.

This, however, is unlikely to be intentional, as allowing these interactions could create functional problems for users, as the moving of files could corrupt the sandbox.

Moreover, accessing configuration details could enable malicious actors to better understand how the AI tool works and how to bypass defenses to make it generate dangerous content.

The "playbook" includes the model's core instructions and any customized rules embedded within it, including proprietary details and security-related guidelines, potentially opening a vector for reverse-engineering or targeted attacks.

BleepingComputer contacted OpenAI on Tuesday to comment on these findings, and a spokesperson told us they're looking into the issues.

Related Articles:

OpenAI confirms threat actors use ChatGPT to write malware

OpenAI's new ChatGPT Search Chrome extension feels like a search hijacker

ChatGPT-4o can be used for autonomous voice-based scams

New Google Pixel AI feature analyzes phone conversations for scams

Google says “Enhanced protection” feature in Chrome now uses AI