您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [GitGuardian]:2025年机密信息泄露态势报告 - 发现报告

2025年机密信息泄露态势报告

信息技术 2025-03-10 GitGuardian 杨框子
报告封面

2025 T A B L E O F C O N T E N T S AI-Enhanced Detection: Revealing the Full Scope of Credential Exposure558% of All Detected Secrets Are Generic7GitHub’s Push Protection: A Promising Initiative, But Not a Silver Bullet9Private Repositories 8 Times More Likely To Contain Secrets12Fastest Growing Services17 Mapping the SDLC: Where Leaks Happen18 Collaboration Tools: The Overlooked Frontier of Secrets Sprawl18100,000+ Valid Secrets on Docker Hub21Copilot increases secrets incidence rate by 40%25 Secrets Managers: Not a Complete Solution28Excessive Permissions Make Secret Leaks More Severe31Bridging the remediation gap33 Understanding the Impact: Real-World Risks of Secrets Sprawl34 About GitGuardian40 41 Methodology43 The State ofSecrets Sprawl 2025 DATA ANALYSIS BYGITGUARDIAN 15%of commit authorsleaked a secret 70%of valid secrets detected in publicrepositories in 2022 remain active today 4.6%o f a ll p ub lic r e p o s i to r ie s co n t ain a s e c r e t 35%of all private repositoriescontain hardcoded secrets 38% of incidents in collaboration and projectmanagement tools (Slack, Jira or Confluence) wereclassified as highly critical or urgent, compared to31% in Source Code Management Systems (SCMs) From day one, GitGuardian has been committed to protecting developer environments from secretssprawl, a dedication that has established us asthe #1 application on GitHub Marketplace. For overseven years, our real-time scanning of public GitHub events through ourGood Samaritanprogram hasenabled us to proactively notify developers when credentials are exposed. In 2024 alone, we sent1.9 million pro bono alert emailsto developers who inadvertently leaked sensitive credentials. How Leaky Was 2024 Long-lived plaintext credentials have been involved in most breaches over the last several years.When valid credentials, such as API keys, passwords, and authentication tokens, leak, attackersat any skill level can gain initial access or perform rapid lateral movement through systems. In 2024, we found23,770,171 new hardcoded secretsadded to public GitHub repositories.This figure represents a25%surge in the total number of secrets from the previous year.This marks a substantial increase in the number of secrets found and continuesthe disturbing trend: secrets sprawl is steadily worsening over time. Despite GitHub’s efforts to prevent certain credential leaks during the push stage, which didindeed reduce incidents involving specific secrets (secrets following known patterns for specificservices), the platform’s measures have not effectively addressed the growing prevalence ofgeneric secrets. It is within this category that we observed the most significant year-over-yearsurge in plaintext credentials. The danger of the continued rise of secrets leakage is very real. Over the past 10 years, stolencredentials have been used in 31% of all breaches, according toVerizon’s 2024 Data BreachInvestigations Report. It is an attacker’s favorite way to gain an initial foothold and to move laterallythrough environments. At the same time,IBM’s Cost of a Data Breachreport makes it clear how time-consuming this issue is for the enterprise. Breaches involving stolen or compromised credentials takean average of292 daysto identify and remediate, more than any other attack vector. AI-Enhanced Detection: Revealing the FullScope of Credential Exposure The 2025 State of Secrets Sprawl report marks a significant milestone in secrets detection,unveiling a more comprehensive picture of the secrets sprawl landscape. For the first time,thanks to our innovative machine learning models, such as the one poweringFalse PositiveRemover, GitGuardian can now confidently identify and validate more generic secrets. Historically, GitGuardian took a conservative stance on generic secrets to avoid a large numberof potential false positive results. Our secrets detection engine was intentionally calibrated forhigh precision, ensuring that when a secret was flagged, it was almost certainly a real secret.Any doubt meant leaving it out. Our past focus was concentrated on the most commonly used enterprise-specific secrets,such as API keys and service-specific credentials, but these are just the tip of the iceberg.The true magnitude of the secrets sprawl problem lies in the vast ocean of generic secrets,such as usernames & passwords and unstructured credentials. As an example, here’s a Base64 basic auth string: “Authorization”: “Basic aW50ZXJuc2hpcDpjZGk=” Or an example of a database credential: connect_to_db(host=”136.12.43.86”, port=8130,username=”root”,password=”m42ploz2wd”) This ML-driven shift not only enables us to find more secrets but also helps us categorize themmuch more effectively. Doing so we ensure they are genuine secrets, strengthening bothrecalland precision. The result provides a more accurate, holistic understanding of how and wheresecrets are spreading. The Department of The Treasury breach In December 2024, Chinese state-