I Stole a Wall Street Trick to Solve a Google Trends Data Problem

Published: (March 9, 2026 at 03:08 PM EDT)
11 min read

Source: Towards Data Science

“Google Trends is a god‑send for market research. If you want to understand interest in a particular term you can just look it up and see how it’s changing over time. This is the kind of data we could do some serious data science with—if the data were actually usable.”

In practice, Google Trends does exactly what its name suggests: it shows trends. The data is heavily normalised and regionalised, making it difficult to extract comparable figures for meaningful modelling—unless you have a few tricks up your sleeve.

Recap

In my previous post, Google Trends Is Misleading You – How to Do Machine Learning with Google Trends Data, we introduced the concept of chaining data across overlapping windows to overcome the granularity limitations of Google Trends.

What’s Next?

Today we’ll explore how to:

  1. Compare chained data across countries and regions
  2. Leverage those comparisons for actionable insights

By the end of this tutorial you’ll be able to:

  • Align time series from different geographic areas despite differing normalisation scales.
  • Build models that incorporate multi‑regional trend signals.
  • Generate real‑world business intelligence from what initially looks like “unusable” data.

Stay tuned—let’s turn Google Trends into a powerful, cross‑regional analytics tool!

Motivation: Comparing Motivation

Google Trends allows the downloading and reuse of Trends data with citation, so I’ve gone and downloaded the data on motivation for five years and scaled it so we have one dataset of motivation searches for each country that gives us a rough idea of how each country’s interest in motivation changes over time. My goal was to compare how motivated different countries are, but I have a problem. I don’t know whether a Google Trends score of 100 searches in the US is bigger or smaller than a score of 100 in the UK, and my first suggestion for how to work that out fell flat. Let me explain.

The initial confusion

When I started this project I wasn’t a connoisseur of Google Trends and I quite naively tried typing in UK motivation, then adding a comparison and typing motivation again and changing the location to the US. I was confused as to why it was the same graph. I thought perhaps the UK and US were too similar, so I added Japan, and it wasn’t until I got to China that I realised the graph was changing all of the lines to be that country’s motivation.

Screenshot showing the same graph being re‑loaded three times
I thought I was changing countries. Turns out I was just reloading the same data three times.
Screenshot by the author. Data source: Google Trends (https://www.google.com/trends).

If I can’t get the countries on the same graph, then I can’t compare them—unless I find a more creative way…

A “genius” idea that didn’t work

My next brainwave came from looking at the US. If you scroll down on Google Trends you’ll see a sub‑region section showing the states in relative terms. The state with the highest search volume is set to 100 and the other states are scaled accordingly.

US search results for “motivation” scaled relatively by state
Screenshot by the author. Data source: Google Trends (https://www.google.com/trends).

I thought I was a genius: set the region to “worldwide”, note the numbers that come out for my countries of interest, and just multiply each country’s results by that number.

But I had misunderstood something fundamental again, and we need a bit of math to explain why.

The Maths Behind Google Trends Normalisation

I grabbed ninety days of data from the US and the UK (starting 24 April) on two separate Google Trends graphs. Both are scaled so the maximum is 100, but the peak occurs on a different day for each country.

US and UK graphs each scaled to 100
When 100 means something different on each side of the Atlantic.
Screenshot by the author. Data source: Google Trends.

Interest over time for “motivation” in the US and UK
Graph of US and UK showing interest over time searching for motivation over 90 days.
Screenshot by the author. Data source: Google Trends.

Because we’re looking at two different countries, the Google Trends scores are in fundamentally different units for each country—just like inches and centimetres are different units of measurement. Unlike inches‑to‑centimetres, we don’t know the conversion factor here.

Assume that on the worldwide graph the US is given a score of 100 and the UK a score of 50. The UK score of 50 means that the UK peak is 50 % of the US peak. At first glance this might suggest a conversion factor of ½, i.e., one US unit = 2 UK units. Let’s see why that’s not true.

Take a non‑peak day, say 30 April, with a hypothetical score of 70 in the US and 80 in the UK.

From the US perspective

[ 70% \text{ of US peak} = 0.70 \times 100\ \text{US units} = 0.70 \times 2 \times 100\ \text{UK units (if 1 US = 2 UK)} = 140\ \text{UK units} ]

From the UK perspective

[ 80% \text{ of UK peak} = 0.80 \times 100\ \text{UK units} = 80\ \text{UK units} ]

Clearly, 140 UK units is not double 80 UK units.

Just because the US peak is twice the UK peak doesn’t mean the US data are twice the UK data for the whole period!

So we can’t simply use worldwide ratios to compare different countries. What can we do?

Taking Inspiration from the Stock Market

The underlying science and methodologies we use in data science can translate across domains, so I’ll borrow an approach from finance.

The stock market is a place for buying and selling equity (shares) in a company. Shares represent partial ownership and often come with voting rights or dividends—a small bonus for being an owner. Stocks can be held by individuals, banks, hedge funds, or other private companies.

The stock market can be used as a measure of a country’s economic health. When stocks are going up, we’re in a…

(The rest of the article continues…)

Cleaned Markdown


Market Cycles and Economic Health

The stock market and a country’s economy are, in theory, financially prosperous together. When the market starts to fall we enter a bear market, and things go less well. This is a huge simplification—markets move according to human behaviour, which is notoriously difficult to understand—but for our purposes this generalisation holds: we can gain an understanding of a country’s economic health based on its stock market.

Tracking the Market Through Indices

So how do we track the stock market as a whole? The obvious idea is to add up the prices of all shares on the exchange to get a single number for the market’s value. In reality we use indices.

You’ve probably heard of the S&P 500, an index built from the 500 biggest U.S. companies. It’s used to track the U.S. market because, being the largest firms, it covers about 80 % of total market capitalisation (i.e., value) and is very liquid—its shares trade frequently and their prices move a lot.

Because it covers the majority of the market, the S&P 500 is a good representation of the whole market in a smaller collection of 500 stocks. Why 500?

  • The S&P 500 was introduced in 1957.
  • It wasn’t just a matter of computational power—at the time a new electronic calculation method made it possible to include 500 stocks in the index. (Before that, indices were even smaller because they were calculated by hand.)

Source: S&P Global – “Where It All Began”

Why Estimate in This Big‑Data World?

Today we have the computational power to calculate the entire market if we want; a few thousand stocks are “small fry” in today’s big‑data world. However, it isn’t really necessary:

  • Adding smaller companies increases overhead in tracking them.
  • Some small‑cap stocks trade infrequently, so their data become stale.

The cons outweigh the pros.

This conversation appears across finance. Examples:

IndexComposition
FTSE‑100100 UK stocks
Commodity basketsGroups of commodities (e.g., oil, agriculture)
CPIBasket of goods to track price changes

FTSE 100 – Screenshot by the author

If a basket of representative items can measure the entire stock market—or inflation—why not use a basket to track search volumes?

To use this concept, we need a set of the most commonly searched terms that can serve as a S&P‑500‑esque index for each country. Google Trends’ Year In Search provides a good source of basket candidates.

Daily Google Trends data for “Facebook”, built using my chaining methodology – Image by the author

Assume we have average search volumes for at least one country (e.g., the United States). We can:

  1. Average the scaling factors for a subset (or the whole) of the basket.
  2. Treat this average as “US Google‑Trends units → real‑world search volumes.”
  3. Use the derived factor to estimate absolute search volumes for any term, giving us a sense of motivation behind the searches.

Making Search Data Truly Comparable Across Countries

Caveats

  1. Representativeness of the basket – I was limited to nine items because of manual download constraints.
  2. Country‑specific popular terms – Some nations have huge search volumes for terms absent from my basket.
    • Example: Facebook and Instagram dominate in the US/UK, but WeChat is the Chinese equivalent.
    • I omitted WeChat because it isn’t representative globally, yet it is highly representative for China.

Scaling Beyond the Benchmark Country

Even if we can benchmark one country, how do we scale the others? Two obvious influencers:

FactorReason
PopulationMore people → potentially more searches.
Internet penetrationNot everyone has internet access; the proportion of users varies by country.

I obtained data on percentage of internet users per country. Multiplying this by the total population yields the absolute number of internet users per country.

Adjustment factor for any country =

[ \frac{\text{Internet users in country}}{\text{Internet users in the US}} ]

Multiplying the US scaling factor by this adjustment gives an estimate of absolute search volume for any term in any country.

When the Maths Simplifies Itself

(Continuation of the analysis…)

Note: The original text had a stray “t” at the beginning. It has been removed for clarity.

Because we want to compare countries and model motivation trends, we’re not looking at absolute search volumes for “motivation.”
If we did, we might conclude that the U.S. is less motivated than the UK simply because it searches for “motivation” more often. In reality, the larger population means more searches, not lower motivation.

How we solve this

We need to express search volume for “motivation” as a proportion of total search volume.
We already have a “basket of terms” that approximates overall search activity, so we can:

  1. Calculate the absolute search volume for each term in the basket.
  2. Sum those volumes to get the basket total.
  3. Divide the absolute “motivation” volume by the basket total.

Observation: When we perform this calculation, all of the scaling factors we previously applied cancel out.
In other words, the scaling work becomes unnecessary for the final proportion.

Adjusting for reality: accounting for differences in internet access when estimating search volumes across countries.

Why the extra work still matters

If we had started the post with “let’s just add up the Google Trends scores of the basket and divide motivation by it,” readers would likely wonder, “Why? Is that even possible?”

Only after building the full scaling pipeline did we discover that the simple ratio works.

Additional benefit:
During the scaling process we accumulated many estimations, which introduced noise. By canceling out the scale factors, we effectively remove a lot of that noise.

Compounding errors in action.

Bottom line

Yes, we performed work that turned out to be unnecessary for the final calculation, but it was essential for:

  • Understanding the problem deeply.
  • Gaining confidence that our final metric is robust.

About Evil Works

At Evil Works we’re dedicated to improving the life of data scientists by:

Click the links to learn more.

0 views
Back to Blog

Related posts

Read more »

Using AI with Real-World Health Data

I've been working with others to explore how AI can use real-world biosensor data. One thing that has become really clear is that the data we get from clinics a...

[Paper] Scale Space Diffusion

Diffusion models degrade images through noise, and reversing this process reveals an information hierarchy across timesteps. Scale-space theory exhibits a simil...