T A B L E O F C O N T E N T S Foreword 2023 Map of Leaks8Industry Leaks9Secrets Detectors10Focus: GenAI Secrets Leaks14Ranking File Extensions by Their Leakiness15 What Happens After a Secret Leaks?17 Remediation Efforts17Revoked secrets19Zombie Leaks: a Hidden Threat20DMCA Takedown Notices: a Last Resort to Stop Leaks?22 AI for Secrets Detection23 How Good Can LLMs Be at Detecting Secrets?24Powering Secrets Detection with AI: GitGuardian’s Approach27 Are You Sure to Know Where Your Secrets Are?29Unveiling Secret Exposures with HasMySecretLeaked29 Solving Secrets Sprawl37 Awareness & Training38Combining Secrets Detection & Management40Preventing Leaks & Breaches42 About GitGuardian44 Appendix45 Definitions45 Methodology46 The State ofSecrets Sprawl 2024 DATA ANALYSIS BY GITGUARDIAN See Industry leaks See What Happens After a Public Leak? Foreword It is not a secret. Hard-coded credentials have long been a primary cause ofsecurity incidents in the software world. Yet, with the growing complexity ofdigital supply chains, secrets sprawl is the Achilles’ heel for organizations of allsizes and security postures. GitGuardian has been at the forefront of identifying and reporting hard-codedsecrets for the past four years. Remarkably, the incidence of publicly exposedsecrets has quadrupled in this time, with astaggering 12.8 million occurrencesdetected on GitHub.com in the last year alone—a 28% increase from 2022. “[In 2023] for the first time, compromised credentials took the topspot in root causes [of attacks]. In the first six months, compromisedcredentials accounted for 50% of root causes, whereas exploiting avulnerability came in at 23%.” Verizon’s 2023 Data Breach Investigations Report “49% of breaches by external actors involved Use of stolencredentials, while Phishing made up 12% of external attacks.Attackers used the Exploit vulnerability technique in 5% of breaches.”Sophos’2023 Active Adversary Report The proliferation of 50 million new code repositories on GitHub, a 22%increase from last year, amplifies the risk of both accidental exposuresand deliberate malicious acts. This reality underscores the vital need forcompanies to track and manage the exposure of their sensitive information.Too many remain vulnerable to breaches without awareness or means tomitigate them. Our research sheds light on a concerning trend:90% of exposed valid secretsremain active for at least five days after the author is notified.This finding emphasizes a crucial lesson in code security:while detectingvulnerabilities is critical, the real challenge lies in remediation.Security,we believe, must be a shared responsibility across all stages of the SoftwareDevelopment Life Cycle (SDLC), not just the domain of specialized teams.Raising awareness about these seemingly minor lapses is essential formitigating supply chain risks. When 507 IT decision-makers were asked, “Have you ever beenimpacted by, or heard of secrets (API keys, username and passwords,encryption keys, etc.) leaking within your organization?” the responseshighlighted widespread concern: •75% of respondents reported experiencing a secret leak.•60% reported leaks impacting the company or its employees.•47% identified “Hard-coded secrets” as key risk points in theirsoftware supply chain. Voice of Practitioners: the State of Secrets in Appsec As our world becomes increasingly digital, and as secrets continue tounderpin the trustworthiness of digital systems, closing the remediation loopis imperative for securing a safer digital future. How Leaky Was 2023? All the metrics featured in this report have been meticulously filtered to ensurethey precisely depict the current state of secrets sprawl. For a comprehensiveunderstanding of our methodology, please refer to the appendix. 4.6% of active repositories leaked a secret in 2023 On 67,243,678 active repositories, 3,066,304 leaked a secret. More than 1 in 10 commit authorsleaked a secret Out of the 14,978,367 distinctauthors who contributed,1,749,398 (11.7%) leaked a secret. 1.8M pro-bono alert emails (+23.5%) GitGuardian sent 1,854,834 pro-bono alert emails following thedetection of an exposed secret for the first time. Out of these,only 33,242 secrets were revoked within 5 days. GitGuardian stopstracking the status of secrets after these 5 days. 2023 Map of Leaks Industry Leaks Leveraging its advanced proprietary algorithm, GitGuardian has successfullytraced many secret occurrences back to their respective companies.This capability extends to incidents where secrets were leaked outside thecompany-owned repositories. Through this sophisticated mapping process,GitGuardian provides insightful data on the prevalence of leaks across differentindustries: Unsurprisingly, the IT sector, which includes software vendors, accounts for65.9% of all detected leaks. Following IT, the education sector is responsible for 20% of the leaks,reflecting the growing digitization and reliance on technology wit