Apache Parquet

Canary Exploit tool for CVE-2025-30065 Apache Parquet Avro Vulnerability

Investigating a schema parsing concern in the parquet-avro module of Apache Parquet Java.
May 05, 2025
9 min. read

Introduction

On April 1st, 2025, CVE-2025-30065 was published, although rumors had been swirling on various platforms for several days before about a very high severity security issue with Apache Parquet, leading to much consternation within the IT community.

F5 began receiving calls from worried customers asking questions about this vulnerability in their own systems as early as March 29th, three days before it was publicly disclosed. At this time very little was known about the issue, only that it was possibly very serious.

As it turned out, CVE-2025-30065 was issued as a CVSS 10.0 (Critical) vulnerability in Apache Parquet Java. Patches were immediately issued, customers were able to assess their exposure, and the attention seen previously began to wane.

We decided to take a closer look at this issue, because PoCs in circulation either did not work or appeared to us to be of little offensive utility.

  • CVE-2025-30065 is a CVSS score 10 (Critical) vulnerability in the Apache Parquet parquet-avro Maven module.
  • Prior to and just after the announcement of this CVE, rumors of its impact caused a great deal of concern, with many assuming that this was a deserialization vulnerability.
  • The vulnerability is somewhat difficult to trigger and only allows for arbitrary class loading and arbitrary class constructors with a single String parameter, which has limited utility to attackers.
  • F5 Labs has developed an easy-to-use tool to generate a “canary exploit” Parquet file which can be used to test for this vulnerability and test to assure it has been patched and is not using a vulnerable configuration.

CVE-2025-30065 Canary Exploit

F5 Labs has created a tool that generates a parquet/avro file that will trigger object instantiation of a class that comes with Java (javax.swing.JEditorKit).1 Instantiating javax.swing.JEditorKit with a single String argument has the side effect of treating the String as a URL and making an HTTP GET request. By registering a canary URL and using that as the target URL, our tool allows for easy testing of the vulnerability, as well as assurance it has been fixed by applying patches and proper configuration.

You can find this “canary exploit” tool on our GitHub https://github.com/F5-Labs/parquet-canary-exploit-rce-poc-CVE-2025-30065. We have provided setup instructions that should work for Linux, Windows, and Mac.

Credit for the internals of our PoC go to Mouad Kondah.2 Their write-up dated 2025-04-07 discusses this CVE and their PoC.3

We developed this tool because we at F5 Labs believe that tools which allow developers and security staff to quickly and robustly determine the vulnerability of their code is of the most practical importance, allowing for quick response and a minimization of the disruption caused by these sorts of “critical” security issues. This is especially true in complex environments where a vulnerable library may live deep within a constellation of services that could be obscure to developers and security engineers. Tracing these sorts of dependencies is time-consuming, error-prone, and can lead to a lot of work only to find out that the vulnerability does not apply to a given environment. Tools such as this allow developers and security engineers to quickly assess if further investigation is needed, and at what priority.

CVE-2025-30065 Timeline

This vulnerability is in the Apache Parquet Java library, specifically within the parquet-avro Maven module, and is due to unrestricted Java class references for java.lang.String coercion in versions 1.15.0 and earlier. Looking at the change logs, however, the details seem much less straightforward.

Apache Avro Improvement

In the separate Apache Avro project, an issue was filed on May 2nd 2024.4 This issue was titled "Restrict trusted packages in ReflectData and SpecificData" and was filed by Jean-Baptiste Onofré. It suggested that the lack of an allow list for these two packages would allow malicious payloads to be marshalled or unmarshalled.

The proposed solution was to set up an allow list, limiting allowed packages and using the system property org.apache.avro.TRUSTED_PACKAGES to list and change this behavior. It also suggested allowing a wildcard setting of org.apache.avro.TRUSTED_PACKAGES=* to allow all packages.

After some discussion the change was eventually included in Apache Avro release 1.11.4 on August 5th, 2024.5

This was 11 months before the CVE was released and to date no CVE has been issued for this feature enhancement in the Apache Avro project.

Adding the Improvement to Apache Parquet

On March 5th, 2025, user wgtmac opened an issue for the parquet-avro module used by Apache Parquet Java, suggesting that the same enhancement be added to the parquet-avro library as was added to the Apache Avro library. This change was completed March 7th, and incorporated into Apache Parquet version 11.5.1, which was released and announced on March 16th, 2025.

This was more than two weeks before the CVE was announced.

Sometime around March 29th, rumors of a vulnerability in Parquet began circulating, and many organizations were very concerned, but there wasn't any definitive information at this point available to most.

On April 1st, Gang Wu of the Apache Project sent a message to the Openwall oss-security mailing list, announcing a CVE listing vulnerable versions, and crediting Keyi Li of Amazon as the CVE finder.6

Figure 1: A screenshot of the announcement of CVE-2025-30065 on the oss-security mailing list

Figure 1: A screenshot of the announcement of CVE-2025-30065 on the oss-security mailing list

From Improvement to Vulnerability?

It’s very important to understand exactly how this all happened.

Step 1. A developer for Apache Avro created a feature to add an allow list, controlling what Java packages can be used in Avro deserializations, citing the possibility that deserializing attacker-controlled Avro content could pose a risk to the host operating system. This was, essentially, a security improvement, and a means of allowing users to harden their use of Apache Avro.

Step 2. Eight months later, a developer from the Apache Parquet project added the same feature to the Apache Parquet Avro Maven module, again as a security improvement.

Step 3. Sometime during the next few weeks, a report of a vulnerability was submitted to the Apache Foundation, which then resulted in a CVE with a CVSS score of 10, apparently for the general risk that the previous two items were attempting to address. This risk is inherent to deserialization, and in this case only allows for arbitrary class loading and arbitrary class constructors with a single String parameter.

This is an atypical path to a CVE, to be sure. The situations where a CVE is issued and the CVSS score it is given are entirely in the hands of the Apache Foundation as the CNA for these projects, but it does seem strange that a security enhancement caused a CVE, and also seems inconsistent that Apache Parquet issued a CVE, but Apache Avro did not, for essentially the same behavior and the same security improvement.

Avro and Parquet File Formats

To understand how this vulnerability works in detail, we should spend a little bit of time understanding the file formats that are involved.

Parquet is a column-oriented data storage format which is used by the Apache Hadoop ecosystem and hundreds of open-source projects.1 It provides, through modules such as Avro, data encoding and compression and is widely used in data science and big data pipelines.

Avro is a row-oriented serialization format which also used by Hadoop and thousands of other projects.2 Avro container object files allow for the efficient serialization of data, along with a strictly enforced schema.

Most importantly for us, we can embed Avro container objects in Parquet files, and this is where the issue emerges. Reading a Parquet file with an Avro schema using the parquet-avro module leads to the instantiation of Java objects.

Types of Java Deserialization Bugs

Object serialization and deserialization are useful language features that allow for program state to be saved and loaded later. This process is not, in and of itself, a security issue, but when attacker-controlled data is loaded, and the deserialization of arbitrarily complex graphs of Java objects is supported, deserialization bugs can occur, and can lead directly to unanticipated code execution.

In Java, the most common form of this occurs when the core Java classes ObjectInputStream and ObjectOutputStream are used improperly, allowing malicious actors to chain together Java objects known as gadgets which will accomplish attacker goals, up to and include remote code execution useful to attackers. A great resource for these gadgets is https://github.com/frohoff/ysoserial.

In CVE-2025-30065 however, this is not the case. Instead, malicious Avro content when read by an out-of-the-box ParquetAvroReader is constrained to primitive data types and collections. However, alternate representations of String objects are allowed. This is supported by passing a String as a single argument to an arbitrary Java class.

This means that in this specific CVE, the malicious Avro content can only instantiate objects from classes that are already in the classpath of the target and which can take a single String argument in their constructor.

Therefore, the attacker must rely on there being classes that they can call which will have side effects that are useful to them.

This dramatically limits what the attacker can accomplish, unless they can control the classpath files of the target or exploit the behavior of a useful deserialization gadget already on the classpath.

Anecdotally, Java vulnerabilities also attract less adversarial attention: “[if it’s Java] that’s enough to put off some percentage of hackers” said Adam Boileau of Risky Business last week.3

All these factors make exploitation in the wild sound improbable, and if someone can modify your classpath files you already have much larger problems to deal with than this CVE.

Likelihood of Exploitation

Various exploitation scenarios for this CVE are possible, but all require that a malicious Parquet/Avro file be placed into an environment which will use the Apache Parquet Avro module to parse it. If you use Apache Parquet Java to parse Parquet files that include embedded Avro, then you should investigate patching.

Nevertheless, this is somewhat of a high bar for attackers. While Parquet and Avro are used widely, this issue requires a specific set of circumstances that isn’t all that likely in general. Even then, this CVE only allows attackers to trigger the instantiation of a Java object which then must have a side effect that is useful for the attacker. As noted above, this also seems somewhat unlikely to us.

That said, every environment is different. Some organizations may need to process Parquet files from untrusted sources, or even from trusted third parties that themselves may be compromised. We can speculate a number of attack scenarios:

An attacker might provide a bogus dataset packaged as Parquet and post it to a public repository of datasets, using a “watering hole” type approach to get victims to download and parse their malicious payload.

Attackers might also be more targeted and send it to a wide audience in social engineering style attack, urging them to download and use a malicious Parquet file for some reason.

They may even, given enough information, take a spear-phishing approach and target specific developers identified to possibly be running a vulnerable stack.

This still would require that the target have exploitable gadgets included in their classpath. It might be reassuring to think that everyone is up to date and only uses modern libraries, but we think it’s reasonable to assume that this is not always the case.

While it may seem that we are downplaying the likelihood and impact of this CVE, it is important to note that Parquet is ubiquitous and is used in a lot of AI and ML pipelines. Checking your use of Parquet, and what tools you are using to parse it, is recommended.

Also, it’s important to also note once again that serializing and deserializing objects is what Parquet is designed to do. This is its expected behavior and not a bug. The “patch” is an addition of an allow list which gives developers control over what Java packages are allowed to be used in serialization and deserialization operations.

Conclusion

CVE-2025-30065 is a critical severity vulnerability in the Apache Parquet Java parquet-avro Maven module with a CVSS score of 10. While originally feared to be a significant remote code execution (RCE) risk and causing widespread concern, our analysis has shown that exploiting this vulnerability is difficult, provides negligible value to attackers, and has limited impact. The issue stems from deserialization processes in Avro files embedded in Parquet files, allowing attackers constrained instantiation of Java objects from classes in the target’s existing classpath. However, it does not allow full attacker-controlled code execution.

The vulnerability exists due to the lack of restricted Java class references for string coercion during deserialization, which was mitigated by the addition of an allow list mechanism in parquet-avro.

F5 Labs developed a canary exploit tool to aid in testing environments for exposure to CVE-2025-30065 and assure patch effectiveness.

Recommendations

To mitigate these types of attacks, consider implementing the following security controls based on your specific circumstances:

  • Consider using F5 Labs CVE-2025-30065 Canary Exploit tool to assess if you’re vulnerable.
  • If vulnerable, upgrade to Apache Parquet Java version 15.1.1 and configure the org.apache.parquet.avro.SERIALIZABLE_PACKAGES to restrict which packages can be used in deserialization.
  • Additionally, avoid the use of the wildcard setting (*), as this negates the purpose of the allow list.
  • Conduct a thorough dependency review to identify whether older or vulnerable versions of parquet-avro are indirectly included via transitive dependencies.
  • Use dependency management tools available to you, such as Maven’s ‘dependency:tree’ task or Gradle’s ‘dependencies’ task to audit your software's dependency stack.
  • Avoid processing Parquet files from outside your system.
  • Educate staff on the dangers of ingesting untrusted data sources however they may have received them.
Authors & Contributors
Malcolm Heath (Author)
Principal Threat Researcher
Merlyn Albery-Speyer (Author)
Sr Cybersecurity Threat Researcher
Footnotes

1We believe this class was first used as a “deserialization gadget” in 2020 for exploitation of fastjson. See https://nvd.nist.gov/vuln/detail/cve-2020-10969.

2https://www.deep-kondah.com/author/mouad/

3https://www.deep-kondah.com/parquet-under-fire-a-technical-analysis-of-cve-2025-30065/

4https://issues.apache.org/jira/browse/AVRO-3985

5https://github.com/apache/avro/releases/tag/release-1.12.0

6https://www.openwall.com/lists/oss-security/2025/04/01/1

7https://mvnrepository.com/artifact/org.apache.parquet/parquet-avro/usages

8https://mvnrepository.com/artifact/org.apache.avro/avro

9https://www.youtube.com/watch?v=wke0U7WKI5o&t=1710s

Read More from F5 Labs

Campaign Targets Amazon EC2 Instance Metadata via SSRF
Campaign Targets Amazon EC2 Instance Metadata via SSRF
04/08/2025 article 5 min. read
2025 Advanced Persistent Bots Report
2025 Advanced Persistent Bots Report
03/28/2025 report 40 min. read
Canary Exploit tool for CVE-2025-30065 Apache Parquet Avro Vulnerability
Canary Exploit tool for CVE-2025-30065 Apache Parquet Avro Vulnerability
05/05/2025 article 9 min. read