World’s Largest
Breach Database

SpyCloud isn’t just a cybersecurity company; we’re a big data company.

Our solutions are backed by the largest repository of stolen credentials and PII in the world, recovered by our researchers early in the ATO timeline. Access to this massive collection of recaptured breach data enables enterprises to quickly identify and take action on exposed accounts, preventing those exposures from progressing to account takeovers.

The SpyCloud Difference

Current, Relevant, Truly Actionable Data

SpyCloud uses Human Intelligence (HUMINT) to quickly recover current breach data within days of the breach occurring. Our unique data cleansing and password cracking process reveals exposed credentials faster and with greater match rates. Not only is our breach database the cleanest, we provide the most data of any provider, with context and perspective to make it immediately actionable.

Timeline of a data breach showing what cybercriminals do with stolen credentials, starting with targeted account takeover attacks of high-value victim. Ultimately, stolen logins will end up on the deep and dark web and used in high-volume credential stuffing attacks.

Current Data Enabled by Early Recovery

SpyCloud’s security researchers recapture breached data earlier in the ATO timeline and share it with customers before it is used to cause harm, typically months or even years before anyone else. In many cases, we are the first to inform the affected victim organizations through our responsible disclosure process.

Relevant Data for Your Domain

SpyCloud collects data from all kinds of breaches, from private/sensitive sources and small web forums to large combo lists. This enriches our breach database with credentials and PII from millions of companies around the world, not just Fortune 1000 enterprises (though we have over 23 billion plaintext credential pairs for those companies alone!). Check out what we have for your domain:

Truly Actionable Data to Protect Your Enterprise

Over the years, we have refined our proprietary password cracking methods to the point where at any given time, 90% of all passwords in our database are in plaintext. It’s simple for customers to automatically and quickly identify exposed account matches and at a higher rate than any other provider.

We also quickly cleanse, analyze the data, and provide the full context of each record with the source and breach description, in addition to the actual breached password.

Data Collection Methods

We collect stolen and leaked artifacts using multiple techniques and from many sources. We acquire the most actionable data using a combination of Human Intelligence (HUMINT) and Applied Research (HUMAN+TECHNOLOGY).

Our team of researchers have been performing this type of tradecraft for years and are among the most capable researchers in this area. Although we also use scanners and automated collection tools, the vast majority of useful data comes from HUMINT and is our primary focus.

What kind of data
does SpyCloud collect?

Since our core business is focused on providing solutions that prevent Account Takeover, we primarily look to acquire leaked or stolen assets in the criminal underground that contain:

  • User credentials: email/username and password
  • Highly enriched PII such as first and last names, addresses, phone numbers, dates of birth, SSNs, and over 200 data types that power fraud investigations

Data Cleansing & Enrichment Process

We collect a massive amount of breach data on a daily basis. After each digital asset is acquired, it is then put through a rigorous quality-control process to determine its value.


Given the different file formats we acquire, parsing is a huge challenge. We have invested heavily in technology that helps simplify this part of the process. Each of the data sources that goes through this phase is human-verified to ensure the fidelity of the information is maintained.


Once the data is parsed, we must classify each of the fields into the correct category (i.e. determine what is a phone number and move them to their respective column). This allows us to analyze and enrich the data in later steps. At this point, we can discard any records that do not contain passwords or high value PII.


We compare each record to the billions of assets already in our system. During this step, we end up discarding ~60% of the files we collect as most of them are duplicates from past breaches. This process ensures our customers are not inundated with extraneous alerts. Also, when we see that a user has the same credential exposed across different unique sources, we increment the “sighting” field to indicate that the same email/password combination was reused.


The next important step is to check the authenticity of the breached data source. We validate every password dump using several different techniques to give our customers accurate context to the exposure of their credentials.


The last step is data enrichment so our customers understand how many times these credentials have been seen before, the severity of each individual record, and additional metadata that we can combine from other sources.

We’re confident you’ll get more matches with SpyCloud. Let’s do a match rate test.

Interested in integrating SpyCloud data to enhance your solution?