How Do We Collect Data?
We collect stolen and leaked artifacts using multiple techniques and from many sources. We acquire the most actionable data from sources using Human Intelligence (HUMINT) and Applied Research (HUMAN+TECHNOLOGY).
Our team of researchers have been performing this type of tradecraft for years and are among the most capable researchers in this area. Although we also use scanners and automated collection tools, the vast majority of useful data comes from HUMINT and is our primary focus.
How Do We Cleanse The Data?
We collect a massive amount of data on a daily basis. After each digital asset is acquired, it is then put through a rigorous quality-control process to determine its value. This process can be summarized in the following steps:
Parsing is a huge challenge given the massive amounts of different file formats we acquire. We have invested heavily in technology that helps simplify this part of the process. Each of the data sources that goes through this phase is human-verified to ensure fidelity of the information is maintained.
Normalization is another challenging technical aspect of the process. Once the data is parsed, we must classify each of the fields in the correct manner (e.g. Date of birth, first name, password, salt, etc.). This allows us to analyze and enrich the data in later steps. At this point, we can throw away any records that do not contain passwords or high value PII.
We then de-duplicate the data. We compare each record to the billions of assets already in our system. During this step, we end up discarding ~60% of the files we collect as most of them are duplicates from past breaches. This process removes those duplicates so our customers are not inundated with extraneous alerts. Also, when we see that a user has the same credential exposed across different unique sources, we increment the “sighting” field to indicate that the same email/password combination was reused.
Checking the authenticity of breached data sources is the next important step. We validate every password dump using several techniques to give our customers accurate context to the exposure of their credentials.
The last step for us is to enrich the data so our customers understand how many times these credentials have been seen before, the severity of each individual record, and additional metadata that we can combine from other sources.
What Kind of Data Do We Focus On Collecting?
Since our core business is focused on providing solutions that prevent Account Takeover, we primarily look to acquire leaked or stolen assets in the Underground that contain user credentials (email/username + password) or highly enriched PII.
What Kind of Data Do We NOT Focus On?
We pride ourselves on the fact that our data is actionable and relevant to protecting customer and employee account information. Holding true to that mission, we do not focus on acquiring spam lists (long lists of just email addresses commonly used by spammers) or marketing databases (lists of employee names and email addresses) if they do not contain passwords or useful PII.