CRITICAL VULNERABILITY

CVE-2025-23298

Remote Code Execution in NVIDIA Merlin Transformers4Rec

CVSS Score: 9.8 (Critical)CWE-502: DeserializationNVIDIA Merlin
Executive Summary

Discovered by the Trend Micro Zero Day Initiative (ZDI) Threat Hunting Team, CVE-2025-23298 represents a critical vulnerability in the NVIDIA Merlin Transformers4Rec library. The vulnerability stems from unsafe deserialization practices in the model checkpoint loading functionality, specifically the use of Python's pickle module without proper safety controls.

What makes this finding particularly significant is how it highlights endemic security challenges facing the ML/AI ecosystem's reliance on Python's pickle serialization. Despite years of warnings from the security community, this class of vulnerability continues to plague machine learning frameworks.

9.8
CVSS Score
RCE
Attack Type
Root
Privilege Level

Real-World Impact

Real-World Impact of CVE-2025-23298 showing Remote Code Execution, Privilege Escalation, Data Exfiltration, Supply Chain Attacks, and Lateral Movement

Source: Trend Micro Research - Comprehensive impact analysis of CVE-2025-23298

Technical Analysis

About NVIDIA Transformers4Rec

NVIDIA Transformers4Rec is part of the Merlin ecosystem, designed to leverage state-of-the-art transformer architectures for sequential and session-based recommendation tasks. It acts as a bridge between natural language processing (NLP) and recommender systems (RecSys) by integrating with Hugging Face Transformers.

Production Usage
Widely deployed in e-commerce and content platforms for building recommendation systems
Integration
Works with NVTabular for preprocessing and Triton Inference Server for deployment
Critical Component
Essential part of many ML pipelines in production environments

Affected Systems

Vulnerable Versions
Systems and versions affected by CVE-2025-23298

NVIDIA Merlin Transformers4Rec

Vulnerable

All versions prior to the security patch are affected by this vulnerability.

Affected Components:
  • load_model_trainer_states_from_checkpoint function
  • Model checkpoint loading functionality
  • PyTorch model state restoration
  • Training resumption mechanisms

Deployment Scenarios at Risk:

Production ML Pipelines
Systems loading checkpoints for inference or continued training
Model Serving Infrastructure
Triton Inference Server deployments using Transformers4Rec models
Development Environments
Data science workstations loading shared or downloaded models
Cloud ML Services
Cloud-based recommendation systems using Merlin ecosystem

Mitigation & Remediation

Recommended Security Practices

1. Apply Security Patch Immediately

Update to the latest version of NVIDIA Merlin Transformers4Rec that includes the security fix. The patch modifies the checkpoint loading function to use safe deserialization methods.

pip install --upgrade transformers4rec

2. Use Safe Loading Parameters

When loading PyTorch models, always use the weights_only=True parameter to prevent arbitrary code execution:

torch.load(checkpoint_path, weights_only=True)

3. Validate Model Sources

Only load checkpoint files from trusted, verified sources. Implement cryptographic verification (checksums, digital signatures) for all model files before loading. Maintain an allowlist of approved model repositories and sources.

4. Implement Sandboxing

Load untrusted models in isolated, sandboxed environments with restricted permissions. Use containerization (Docker, Kubernetes) with security policies that limit system access. Consider using dedicated model loading services with minimal privileges.

5. Apply Principle of Least Privilege

Run ML services with minimal necessary permissions. Avoid running model loading processes as root or with elevated privileges. Use dedicated service accounts with restricted access to sensitive resources.

6. Monitor and Audit

Implement comprehensive logging for all model loading operations. Monitor for suspicious activities such as unexpected system calls or network connections during model loading. Regularly audit model sources and loading patterns.

7. Use Alternative Serialization Formats

Consider migrating to safer serialization formats like SafeTensors, which is designed specifically for ML models and doesn't allow arbitrary code execution:

from safetensors.torch import load_file
model_state = load_file("model.safetensors")

8. Network Segmentation

Isolate ML infrastructure from critical systems and sensitive data. Implement network segmentation to limit the blast radius of a potential compromise. Use firewalls and access controls to restrict communication between ML systems and other infrastructure.

Detection and Response

Organizations should implement detection mechanisms to identify potential exploitation attempts:

File Integrity Monitoring

Monitor checkpoint files for unexpected modifications or suspicious metadata

Process Monitoring

Watch for unusual child processes spawned during model loading operations

Network Traffic Analysis

Detect unexpected network connections initiated during checkpoint loading

Behavioral Analysis

Identify anomalous system behavior following model loading events

Stay Protected

Keep your ML infrastructure secure by staying informed about the latest vulnerabilities and security best practices. Subscribe to our security alerts and explore our comprehensive resources on AI security.