What’s Inside the Massive Chinese Data Leak

Table of Contents

Check your exposure

Earlier this month, a security researcher discovered a massive new data leak containing a total of over 4 billion records, which appears to contain data on Chinese users. Cybernews reported that this leak may be “the largest single-source leak of Chinese personal data ever identified.”

The database consisted of a variety of data collections each containing different types of PII and user data on Chinese citizens, including social media account data, financial data, employment data, government records data, and vehicle registration data. 

After parsing and normalizing the data obtained from this leak, SpyCloud Labs is breaking down exactly what’s in this breach and what we think the purpose of this massive database might be.

The Chinese data leak by the numbers

Security researchers, including our team at SpyCloud Labs, were only able to obtain a copy of a portion of the 4 billion records before the unsecured database appeared to be taken down. The raw data from the collections in this portion of this leak amounted to a total of 1.8 billion records, which our team was able to normalize, parse, and deduplicate to extract 1.7 billion unique records containing 10.9 billion assets. A full breakdown of this data can be found in the table below.

Breakdown of data from the leak

So what do we think this huge Chinese database is?

As Cybernews reported (and we were able to corroborate by analyzing a copy of the data), this database appears to be a central aggregation point for data from multiple different sources for easy centralized queryability. Whoever parsed and normalized the data after aggregation appears to have done an imprecise job, which is reflected in key-value mismatches in some of the collections (see example in Image 1), as well as typos in the data labels (see example in Image 2). 

Additionally, as you can see in the table above, the collections themselves have non-standard naming conventions. Collections with very similar categories of data also include inconsistent data asset types, indicating that this database was compiled from disparate sources.

Image 1: Sample of data from the leaked database showing gender data in the ‘card_id’ field and national ID numbers in the ‘gender’ field.

Image 2: Sample of data from the leaked database showing an ‘id_cardd’ field, which appears to be a misspelling.

Considering the types of data available in the various collections, the fact that this appears to be a centralized aggregation point for various types of personal data obtained from a variety of sources compiled for easier queryability, the evidence of a somewhat slapdash parsing and data labeling process, and the fact that this giant repository of data was sitting totally exposed to the internet without any password protection, we believe that this was a Chinese cybercriminal’s collection of breached and stolen data, likely serving as a backend for an SGK.

The SGK connection

SGK is an abbreviation of Shègōng kù, which translates to social engineering library. SGKs are essentially repositories of leaked and stolen PII, created by Chinese-language threat actors; they compile together hacked and leaked databases allowing for easy queryability of PII on Chinese citizens and users. Some are fully public, while others require engaging with an actor to gain the ability to search. 

SpyCloud tracks dozens of these SGKs, primarily on Telegram and clearnet websites. Often these bots are also marketed alongside premium lookup services, wherein corrupt insiders working in Chinese government security or law enforcement agencies, the banking sector, or the technology sector will obtain sensitive records for a higher price.

Image 3: Sample results from a basic SGK query. This SGK interfaces with users via a Telegram bot.

Observing chatter among Chinese users, particularly in some of the cybercriminal communities that we monitor, we were able to find other users that share our hypothesis about this database. We found Chinese-speaking netizens discussing the data leak on forums, Weibo, Telegram, Twitter/X, Reddit, YouTube comments, Meta Threads, and even an article in a Chinese cybersecurity publication who all shared the opinion that this database was likely a dark market SGK database used for “opening boxes.” 

“Opening boxes and hanging people” (开盒寡人) is a phrase commonly used in Chinese doxxing communities to describe the act of maliciously disclosing a victim’s personal information in an effort to incite other people to attack and abuse them. SGKs are often marketed as “box opening” services. 

Image 4: Messages in a Chinese cybercriminal chat on Telegram speculating that this leak was due to someone not setting a login password for their SGK database.

Image 5: A response to a forum post sharing the Cybernews article about this data breach. The commenter posits that the database belonged to “those who sell opened boxes.”

Image 6: Screenshot of a CN-SEC article about the data leak. The headline translates to “Black market database leaked again, over 4 billion user records exposed.”

Key takeaways about this leak

The severity and extent of the Chinese threat actor ecosystem is often overlooked and underreported in Western media. The glimpse into this latest leak provides important and concerning insight into Chinese-language cybercrime TTPs:

At SpyCloud, we focus on giving organizations insights into stolen data to prevent follow-on attacks and disrupt the cycle of global cybercrime. To learn more about what you can do with our recaptured data and analytics, contact us today.

SpyCloud shows you what criminals have in hand, before they can act

Get a demo of the industry’s most powerful exposure insights – sourced from infostealer logs, phished data, ULP combolists, and breaches.

Keep reading

Summer Cybercrime Trends, Recycled Leaks & Nefarious Nation-State Activity
From the “16 billion passwords” leak to trends in the Chinese criminal underground, our June cybercrime update breaks down the biggest cyber threats and news.
blog image for chinese hacking-for-hire analysis
State Secrets for Sale: More Leaks from the Chinese Hack-for-Hire Industry
SpyCloud Labs analyzes sample data from the VenusTech and Salt Typhoon data leaks as well as overall trends in the Chinese cybercriminal underground.
The LummaC2 Takedown, Attack Trends & Forum War Fighting
The LummaC2 Takedown, Attack Trends & Forum War Fighting
From the LummaC2 takedown to the BreachForums void, our May cybercrime update breaks down the biggest cyber threats & news.

Check Your Company's Exposure

See your real-time exposure details powered by SpyCloud.