Data leak

Security researchers have found that roughly eight out of ten websites featuring a search bar will leak their visitor’s search terms to online advertisers like Google.

This practice has the implication of breaching the users’ privacy and leaking sensitive information to a massive network of third parties who can then use this data to deliver targeted advertisements or track your behavior on the web.

This data is shared among the network members or sold to more entities, leaving users unable to estimate their exposure or stop its dissemination.

While some websites may declare this practice in their user policy, visitors typically don’t read these and assume that the information they enter on embedded search fields is isolated from big data brokers.

Crawling 1 million sites

To conduct this research, Norton Labs created a crawler capable of going past “interstitials” or other browsing disruptions and human-confirmation challenges to scan what happens on the top million websites.

The crawler located the search input on the visited sites, searched for the term “JELLYBEANS,” and then collected all network traffic.

Norton's crawler function
Crawler function (Norton Labs)

The idea was to scrutinize the HTTP network request to see if “JELLYBEANS” appeared anywhere in requests to third-party partners, which it did in 81.3% of the cases.

The network requests comprise the URL, the request referrer header, which provides more details about the resource to be fetched by the server that receives the request, and the payload, which typically contains browser fingerprint and clickstream data.

Sample request header (left) and payload (right)
Sample request header (left) and payload (right) (Norton Labs)

The results showed that most search term leaks came through the referrer header (75.8%) and the URL (71%), while payloads contained JELLYBEANS in 21.2% of the examined cases.

Search term leak results
Search term leak results (Norton Labs)

In total, 81.3% of the visited one million sites leaked information to advertisers via at least one of the inspected three locations.

Norton Labs underlines that this should be taken as the lowest number, with the actual percentage likely being even higher.

For example, many payloads in the HTTP requests were obfuscated, so the analysis tools couldn’t identify the search string, but it might have been there.

As for the disclosure of the data sharing practice on privacy policies, the crawler found that only 13% mentioned “search terms” specifically, while 75% contained the generic “sharing of user information with third parties” statement.

What users can do

Unfortunately, there’s not much that users can do about this problem besides setting their browsers to block all third-party trackers from loading on the websites they visit.

Also, searching on privacy-centric engines like DuckDuckGo or Brave Search, when possible, would be preferable to using embedded fields.

Related Articles:

Leaked info of 122 million linked to B2B data aggregator breach

Signal introduces convenient "call links" for private group chats

Nokia says hackers leaked third-party app source code

City of Columbus: Data of 500,000 stolen in July ransomware attack

Cisco says DevHub site leak won’t enable future breaches