Search
Close this search box.

Properly Cleaning and Gutting Your Phish: How Cybercriminals Are Vetting Victim Data

Blog Phishing gateway

At SpyCloud Labs, we frequently deal with the problem of “junk” data – data that is clearly falsified, fabricated, or improperly entered, but is mixed into a database with high quality data. As it turns out, cybercriminals face a similar challenge. Over the past few months, we’ve observed threat actors using some interesting strategies to filter out “junk” data. Most recently, we’ve specifically observed phishing operators using a simple technique to ensure they are collecting legitimate phished data: a phishing gateway page.

This gateway page sits on a separate host to the full phishing site and validates that user-submitted email addresses are valid and (optionally) match a phishing targeting list, before passing the victim on to input their sensitive information.

By analyzing these gateway pages, it is sometimes possible to obtain the full phishing targeting lists for different phishing campaigns, which can be periodically updated as new phishing attacks are started, and used across multiple phishing sites.

In this analysis, we’ll break down our observations of this technique, including:

Read on to see how bad actors are leveraging gateway pages to filter out unwanted and junk data from their phishing collections, as well as using them as an obfuscation tool to avoid being tracked by defenders.

Techniques Used in Phishing Attacks to Validate Stolen Data

Advanced email validation

There are multiple versions of this type of page, which appear to have been customized by operators over time, starting as early as July 2024. The versions include a basic regex validation which checks that an email follows a valid email address format, as well as a more advanced validation that checks the email against the targeting list for a phishing campaign.

Sometimes phishing actors use encoding or encryption to hide the URL of the targeting list. In rare cases, we have even observed operators deploy the gateway page with an API to check whether an inputted email address matches a targeted user.

Some pages include a toggle switch to determine how the email address should be validated as shown here:

Image 1: Snippet of code that determines validation method

If the email address fails the configured checks on this initial phishing gateway page, the user is not forwarded to the next phishing page. Different versions of the page have different redirect logic. For example, in Image 2, you can see a snippet of the redirect logic from one deployment of the page showing what the page does if the user inputs an undesired email address. Each time a user inputs an email that does not match the target list, they are returned an error message stating that the email address they entered is invalid and prompting them to try again with a valid email. After four unsuccessful attempts, the user is redirected to a random Wikipedia page.

Image 2: Portion of the redirect logic for a gateway page showing what happens if the user inputs an email address that is not on the targeting list. [Source]

When a user inputs an email that matches the required conditions—generally, either conforming to the correct email format or matching against a specified targeting list—they are redirected to the next stage in the phishing process. This next stage varies widely depending on which phishers have set up the page. We have observed phishing operators place this gateway page in front of a variety of different phishing pages including custom phishing pages as well as pages that appear to have been set up using popular phishing kits like Tycoon and Mamba (based on indicators such as url structure, textual content and behavioural analysis).

PhaaS Kit Page Patterns

Phishing pages set up by popular Phishing as a Service (PhaaS) kits often have recognizable patterns that allow us to identify them and associate them with the kit that was used to set them up.

Sekoia researchers noted in their blog on Mamba 2FA that Mamba phishing pages follow the following structure:

And Validin references Tycoon pages as having a specific structure and motivational text:

Some phishers also set up additional filtering logic in redirects between the gateway page and the main phishing page. In one example, we observed phishers set up IP-based filtering so that even if you passed the email check on the gateway page, some IP addresses were blocked from accessing the main phishing page and were instead redirected to a random Wikipedia page.

IP-based filtering might be based on an IP’s geolocation so that only victims coming from targeted countries get through to the main phishing page, or to block known datacenter IP address ranges which are often used by bots, internet scanning services, or security researchers.

Basic validation

The gateway page appears to contain a basic email validation step that checks whether an email address that is input on an initial phishing landing page follows the standard email format.

In the example in Image 3, the gateway page uses a regular expression written in javascript to determine whether the user inputted data follows the correct format of an email address. If the validateEmail function returns as true, the user is redirected to the phishing page.

Image 3: Code snippet showing the source code of a gateway page that uses basic validation to determine if the user input matches the correct format for an email address. [Source]

Validation against email lists

Bad actors also use this gateway page to validate an inputted email against a target list. The target list is always linked somewhere in the source code for the gateway page. Most commonly, we have seen lists hosted as external text files that simply contain a newline-separated list of email addresses. We have observed phishing operators update these text files periodically with new data as they send out additional phishing emails.

Image 4: Sample email list used by one of the phishing gateway pages to validate that the user-inputted email matches the phishing email targeting list.

In many cases, we found that the targeting lists for the gateway pages could be easily downloaded from the URLs exposed in the phishing pages’ source code, providing a full list of target email addresses for the associated campaign. SpyCloud has recaptured 40 unique phishing campaign targeting lists from these gateway pages containing over 930,000 targeted email addresses from over 250,000 domains.

Validation against an API

In one instance, we have even observed phishers create a simple custom API to handle validation requests against a list of target email addresses. The phishing gateway page simply submits an API request containing a user-inputted email address and the API responds with a true or false response.

Image 5: API response showing that the dummy email address test@abc.com is not in the phishers’ targeting list.

The Phishing Gateway Page

These gateway pages use Microsoft branding and have a simple user interface that prompts a user to input an email address. Many of the gateway pages that we have observed deployed in the wild also use various obfuscation and anti-bot techniques designed to make them harder to track and detect as malicious.

We have observed different variations of this gateway page deployed by a wide range of different phishing actors in front of a variety of different phishing pages including custom phishing pages and those set up by multiple different PhaaS kits. Because the gateway pages are so simple, many use easy-to-stand-up hosting services such as GitHub pages, AWS, Cloudflare Workers, and Glitch.

Phishing landing pages

The phishing gateway pages all appear very similar; they include the Microsoft logo and text prompting the user to enter an email address to “confirm your identity” so that they can access some sort of sensitive document or message. Image 6 contains a generic example of one of these pages. Sometimes the phishers also include an additional image as a background behind the user prompt. In Image 7, you can see what appears to be a blurred-out image of a Microsoft Outlook email inbox, and in Image 8, you can see an image containing branding for Cisco Hypershield, a security product for protecting hyperscale data centers.

Image 6: Fake Microsoft gateway page at mshtsrvcsacsklima-pvvinfra[.]jikida3326[.]workers[.]dev which prompts users to input their email address to access “sensitive information”.

Image 7: Gateway page at d92v3k9lcfrmi[.]cloudfront[.]net which has a blurred out image of a fake Outlook inbox in the background.

Image 8: Gateway page at develect[.]github[.]io/signatureverificationmandatorynowba133e514180048f8188beb509bde40115b25ba1566403e69021/ which has appropriated a background image containing branding for Cisco Hypershield.

Many of the pages also have a function to randomize the page title each time the page is loaded from a short list of options like Secure Your Access, Verify Your Credentials, and Identity Verification Needed.

Image 9: Code snippet from one of the phishing pages that contains a function to choose a random page title from an array of options each time the page is loaded.

The initial landing pages only ask the user to input their email address, which allows them to straightforwardly gate user input based on their email validation logic. As we mentioned above, when users input an email address that fails the email validation logic, the pages will generally re-prompt the user to input a correct email address or redirect them to a different page such as a random Wikipedia page. This way, the phishing operators only receive additional user-inputted data, such as passwords or credit card information, for the specific victims that they are interested in targeting.

Obfuscation

These email validation pages serve as an obfuscation method in and of themselves because they serve as an additional step between the phishing email and the final phishing site. We have also observed many phishers use additional obfuscation tactics to make it more difficult for defenders to detect, scrape, and track these pages.

Often the phishers apply some basic obfuscation to hide the link to the email target list. The most basic version of this is simply base64 encoding the link to the email targeting list. We have also seen some phishers encrypt the target list URL, but include the decryption key and function to decrypt the URL in the source code for the gateway page, as you can see in Image 10. In both cases, it is trivial to obtain the raw URL either by unencoding or decrypting it manually using a tool like CyberChef or by capturing the outgoing request using a tool like Chrome DevTools.

Image 10: Code snippet from a phishing gateway page that shows an encrypted URL as well as a function to decrypt the URL and the relevant decryption key.

We have also observed phishers go a step further and use base64 to encode the script containing most of the email validation logic. Within that encoded function, the URL for the email target list is also base64 encoded a second time.

Image 11: Screenshot of the CyberChef tool decoding a base64 encoded string revealing JavaScript code. The JavaScript code includes a second base64 encoded string which contains the URL for the email targeting list.

In other instances, we have observed different phishing operators deploy versions of this phishing gateway page where they have used a javascript obfuscation tool to make their source code more difficult to understand. Additionally, we have seen versions with additional anti-bot functionality including requiring users to click a button before being redirected to the box to input their email address, mouse movement detection (which we show in Image 12), and requiring a wait time between email validation attempts.

Image 12: Code snippet from a phishing gateway page showing mouse movement detection. [Source]

How These Insights About Phishing Can Help Defenders

If you’re a defender, knowing an employee or consumer email is on a phishing campaign’s email list offers several benefits:

Key Takeaways

This phishing gateway page is a simple but effective tool that we have seen phishing operators use in order to filter out unwanted users, bots, and security scanning tools from reaching their phishing pages.

At SpyCloud, we specialize in recapturing stolen data from phishing campaigns, malware infections, and breaches to empower organizations with proactive defense levers. Phishing remains one of the most pervasive cyber threats to businesses, but understanding its mechanics and impact can help organizations stay ahead. By adopting SpyCloud’s proactive defenses, you can reduce your exposure to identity-based attacks that use phished data.

SpyCloud is at the forefront of phishing mitigation – helping businesses neutralize threats and protect employee and customer identities.

Keep reading

Discover the biggest wins from 2024 against cybercrime—from major infostealer takedowns to global ransomware crackdowns—and what they mean for the future of cybersecurity.
Headline-making breaches in 2024 exposed millions of records, compromising sensitive data. This blog explores what was stolen and the impact that has on security strategies to stay ahead.
A deep dive into February’s cybercrime trends, including Black Basta ransomware insights, stolen credit card databases, and the latest threat actor activities.
Table of Contents
Check your darknet exposure

Check Your Company's Exposure

See your real-time exposure details powered by SpyCloud.

The SpyCloud 2025 Annual Identity Exposure Report is in orbit. 🚀 Read the full report here >>

X
Search
Close this search box.