12,000+ API Keys and Passwords Exposed in Public AI Training Data

Dwain.B

28 Feb 2025

Major Security Risk as Sensitive Credentials Found in LLM Datasets

Security researchers have discovered over 12,000 API keys, passwords, and other sensitive credentials embedded in publicly available datasets used for training large language models (LLMs). The exposed data includes AWS root keys, Slack webhooks, and Mailchimp API keys, posing a significant cybersecurity threat. Experts warn that AI models trained on such data could inadvertently propagate insecure coding practices and expose sensitive information. This discovery underscores the ongoing risks of hard-coded secrets in publicly accessible data.

Read more about this security breach on The Hacker News here.