CRITICAL VULNERABILITY

CVE-2025-23298

Remote Code Execution in NVIDIA Merlin Transformers4Rec

CVSS Score: 9.8 (Critical)CWE-502: DeserializationNVIDIA Merlin

Critical Security Alert

CVE-2025-23298 is a critical remote code execution vulnerability in NVIDIA Merlin Transformers4Rec that allows attackers to execute arbitrary commands with root privileges through malicious model checkpoint files. This vulnerability affects production ML systems worldwide and requires immediate patching.

Executive Summary

Discovered by the Trend Micro Zero Day Initiative (ZDI) Threat Hunting Team, CVE-2025-23298 represents a critical vulnerability in the NVIDIA Merlin Transformers4Rec library. The vulnerability stems from unsafe deserialization practices in the model checkpoint loading functionality, specifically the use of Python's pickle module without proper safety controls.

What makes this finding particularly significant is how it highlights endemic security challenges facing the ML/AI ecosystem's reliance on Python's pickle serialization. Despite years of warnings from the security community, this class of vulnerability continues to plague machine learning frameworks.

9.8

CVSS Score

RCE

Attack Type

Root

Privilege Level

Real-World Impact

Source: Trend Micro Research - Comprehensive impact analysis of CVE-2025-23298

Technical Analysis

About NVIDIA Transformers4Rec

NVIDIA Transformers4Rec is part of the Merlin ecosystem, designed to leverage state-of-the-art transformer architectures for sequential and session-based recommendation tasks. It acts as a bridge between natural language processing (NLP) and recommender systems (RecSys) by integrating with Hugging Face Transformers.

Production Usage

Widely deployed in e-commerce and content platforms for building recommendation systems

Integration

Works with NVTabular for preprocessing and Triton Inference Server for deployment

Critical Component

Essential part of many ML pipelines in production environments

Affected Systems

Vulnerable Versions

Systems and versions affected by CVE-2025-23298

NVIDIA Merlin Transformers4Rec

Vulnerable

All versions prior to the security patch are affected by this vulnerability.

Affected Components:

load_model_trainer_states_from_checkpoint function
Model checkpoint loading functionality
PyTorch model state restoration
Training resumption mechanisms

Deployment Scenarios at Risk:

Production ML Pipelines

Systems loading checkpoints for inference or continued training

Model Serving Infrastructure

Triton Inference Server deployments using Transformers4Rec models

Development Environments

Data science workstations loading shared or downloaded models

Cloud ML Services

Cloud-based recommendation systems using Merlin ecosystem

Mitigation & Remediation

Immediate Action Required

Organizations using NVIDIA Merlin Transformers4Rec should immediately apply the security patch and implement the recommended security practices below.

Recommended Security Practices

1. Apply Security Patch Immediately

Update to the latest version of NVIDIA Merlin Transformers4Rec that includes the security fix. The patch modifies the checkpoint loading function to use safe deserialization methods.

pip install --upgrade transformers4rec

2. Use Safe Loading Parameters

When loading PyTorch models, always use the weights_only=True parameter to prevent arbitrary code execution:

torch.load(checkpoint_path, weights_only=True)

3. Validate Model Sources

Only load checkpoint files from trusted, verified sources. Implement cryptographic verification (checksums, digital signatures) for all model files before loading. Maintain an allowlist of approved model repositories and sources.

4. Implement Sandboxing

Load untrusted models in isolated, sandboxed environments with restricted permissions. Use containerization (Docker, Kubernetes) with security policies that limit system access. Consider using dedicated model loading services with minimal privileges.

5. Apply Principle of Least Privilege

Run ML services with minimal necessary permissions. Avoid running model loading processes as root or with elevated privileges. Use dedicated service accounts with restricted access to sensitive resources.

6. Monitor and Audit

Implement comprehensive logging for all model loading operations. Monitor for suspicious activities such as unexpected system calls or network connections during model loading. Regularly audit model sources and loading patterns.

7. Use Alternative Serialization Formats

Consider migrating to safer serialization formats like SafeTensors, which is designed specifically for ML models and doesn't allow arbitrary code execution:

from safetensors.torch import load_file
model_state = load_file("model.safetensors")

8. Network Segmentation

Isolate ML infrastructure from critical systems and sensitive data. Implement network segmentation to limit the blast radius of a potential compromise. Use firewalls and access controls to restrict communication between ML systems and other infrastructure.

Detection and Response

Organizations should implement detection mechanisms to identify potential exploitation attempts:

File Integrity Monitoring

Monitor checkpoint files for unexpected modifications or suspicious metadata

Process Monitoring

Watch for unusual child processes spawned during model loading operations

Network Traffic Analysis

Detect unexpected network connections initiated during checkpoint loading

Behavioral Analysis

Identify anomalous system behavior following model loading events

Additional Resources

Official References

ZDI Blog: CVE-2025-23298 Analysis ZDI Advisory ZDI-25-833 NVIDIA Transformers4Rec GitHub

Stay Protected

Keep your ML infrastructure secure by staying informed about the latest vulnerabilities and security best practices. Subscribe to our security alerts and explore our comprehensive resources on AI security.