Earlier this month, a security researcher discovered a massive new data leak containing a total of over 4 billion records, which appears to contain data on Chinese users. Cybernews reported that this leak may be “the largest single-source leak of Chinese personal data ever identified.”
The database consisted of a variety of data collections each containing different types of PII and user data on Chinese citizens, including social media account data, financial data, employment data, government records data, and vehicle registration data.
After parsing and normalizing the data obtained from this leak, SpyCloud Labs is breaking down exactly what’s in this breach and what we think the purpose of this massive database might be.
The Chinese data leak by the numbers
Security researchers, including our team at SpyCloud Labs, were only able to obtain a copy of a portion of the 4 billion records before the unsecured database appeared to be taken down. The raw data from the collections in this portion of this leak amounted to a total of 1.8 billion records, which our team was able to normalize, parse, and deduplicate to extract 1.7 billion unique records containing 10.9 billion assets. A full breakdown of this data can be found in the table below.
So what do we think this huge Chinese database is?
As Cybernews reported (and we were able to corroborate by analyzing a copy of the data), this database appears to be a central aggregation point for data from multiple different sources for easy centralized queryability. Whoever parsed and normalized the data after aggregation appears to have done an imprecise job, which is reflected in key-value mismatches in some of the collections (see example in Image 1), as well as typos in the data labels (see example in Image 2).
Additionally, as you can see in the table above, the collections themselves have non-standard naming conventions. Collections with very similar categories of data also include inconsistent data asset types, indicating that this database was compiled from disparate sources.

Image 1: Sample of data from the leaked database showing gender data in the ‘card_id’ field and national ID numbers in the ‘gender’ field.

Considering the types of data available in the various collections, the fact that this appears to be a centralized aggregation point for various types of personal data obtained from a variety of sources compiled for easier queryability, the evidence of a somewhat slapdash parsing and data labeling process, and the fact that this giant repository of data was sitting totally exposed to the internet without any password protection, we believe that this was a Chinese cybercriminal’s collection of breached and stolen data, likely serving as a backend for an SGK.
The SGK connection
SGK is an abbreviation of Shègōng kù, which translates to social engineering library. SGKs are essentially repositories of leaked and stolen PII, created by Chinese-language threat actors; they compile together hacked and leaked databases allowing for easy queryability of PII on Chinese citizens and users. Some are fully public, while others require engaging with an actor to gain the ability to search.
SpyCloud tracks dozens of these SGKs, primarily on Telegram and clearnet websites. Often these bots are also marketed alongside premium lookup services, wherein corrupt insiders working in Chinese government security or law enforcement agencies, the banking sector, or the technology sector will obtain sensitive records for a higher price.

Image 3: Sample results from a basic SGK query. This SGK interfaces with users via a Telegram bot.
Observing chatter among Chinese users, particularly in some of the cybercriminal communities that we monitor, we were able to find other users that share our hypothesis about this database. We found Chinese-speaking netizens discussing the data leak on forums, Weibo, Telegram, Twitter/X, Reddit, YouTube comments, Meta Threads, and even an article in a Chinese cybersecurity publication who all shared the opinion that this database was likely a dark market SGK database used for “opening boxes.”
“Opening boxes and hanging people” (开盒寡人) is a phrase commonly used in Chinese doxxing communities to describe the act of maliciously disclosing a victim’s personal information in an effort to incite other people to attack and abuse them. SGKs are often marketed as “box opening” services.

Image 4: Messages in a Chinese cybercriminal chat on Telegram speculating that this leak was due to someone not setting a login password for their SGK database.

Image 5: A response to a forum post sharing the Cybernews article about this data breach. The commenter posits that the database belonged to “those who sell opened boxes.”

Key takeaways about this leak
The severity and extent of the Chinese threat actor ecosystem is often overlooked and underreported in Western media. The glimpse into this latest leak provides important and concerning insight into Chinese-language cybercrime TTPs:
- This database contains billions of records from a variety of different sources, including sensitive PII like national ID numbers, financial data, vehicle registration data, and social media user information
- We believe that this database is the backend to an SGK, which is a repository of easily queryable stolen PII on Chinese citizens and users
- This dataset is useful to threat actors seeking to carry out account takeover, financial fraud, or doxxing and harassment
At SpyCloud, we focus on giving organizations insights into stolen data to prevent follow-on attacks and disrupt the cycle of global cybercrime. To learn more about what you can do with our recaptured data and analytics, contact us today.
SpyCloud shows you what criminals have in hand, before they can act
Get a demo of the industry’s most powerful exposure insights – sourced from infostealer logs, phished data, ULP combolists, and breaches.