I Stole a Wall Street Trick to Solve a Google Trends Data Problem
Source: Towards Data Science
Google Trends: Turning “Raw” Data into Real Insights
“Google Trends is a god‑send for market research. If you want to understand interest in a particular term you can just look it up and see how it’s changing over time. This is the kind of data we could do some serious data science with—if the data were actually usable.”
In practice, Google Trends does exactly what its name suggests: it shows trends. The data is heavily normalised and regionalised, making it difficult to extract comparable figures for meaningful modelling—unless you have a few tricks up your sleeve.
Recap
In my previous post, Google Trends Is Misleading You – How to Do Machine Learning with Google Trends Data, we introduced the concept of chaining data across overlapping windows to overcome the granularity limitations of Google Trends.
What’s Next?
Today we’ll explore how to:
- Compare chained data across countries and regions
- Leverage those comparisons for actionable insights
By the end of this tutorial you’ll be able to:
- Align time series from different geographic areas despite differing normalisation scales.
- Build models that incorporate multi‑regional trend signals.
- Generate real‑world business intelligence from what initially looks like “unusable” data.
Stay tuned—let’s turn Google Trends into a powerful, cross‑regional analytics tool!
Motivation: Comparing Motivation
Google Trends allows the downloading and reuse of Trends data with citation, so I’ve gone and downloaded the data on motivation for five years and scaled it so we have one dataset of motivation searches for each country that gives us a rough idea of how each country’s interest in motivation changes over time. My goal was to compare how motivated different countries are, but I have a problem. I don’t know whether a Google Trends score of 100 searches in the US is bigger or smaller than a score of 100 in the UK, and my first suggestion for how to work that out fell flat. Let me explain.
The initial confusion
When I started this project I wasn’t a connoisseur of Google Trends and I quite naively tried typing in UK motivation, then adding a comparison and typing motivation again and changing the location to the US. I was confused as to why it was the same graph. I thought perhaps the UK and US were too similar, so I added Japan, and it wasn’t until I got to China that I realised the graph was changing all of the lines to be that country’s motivation.

I thought I was changing countries. Turns out I was just reloading the same data three times.
Screenshot by the author. Data source: Google Trends (https://www.google.com/trends).
If I can’t get the countries on the same graph, then I can’t compare them—unless I find a more creative way…
A “genius” idea that didn’t work
My next brainwave came from looking at the US. If you scroll down on Google Trends you’ll see a sub‑region section showing the states in relative terms. The state with the highest search volume is set to 100 and the other states are scaled accordingly.

Screenshot by the author. Data source: Google Trends (https://www.google.com/trends).
I thought I was a genius: set the region to “worldwide”, note the numbers that come out for my countries of interest, and just multiply each country’s results by that number.
But I had misunderstood something fundamental again, and we need a bit of math to explain why.
The Maths Behind Google Trends Normalisation
I grabbed ninety days of data from the US and the UK (starting 24 April) on two separate Google Trends graphs. Both are scaled so the maximum is 100, but the peak occurs on a different day for each country.

When 100 means something different on each side of the Atlantic.
Screenshot by the author. Data source: Google Trends.

Graph of US and UK showing interest over time searching for motivation over 90 days.
Screenshot by the author. Data source: Google Trends.
Because we’re looking at two different countries, the Google Trends scores are in fundamentally different units for each country—just like inches and centimetres are different units of measurement. Unlike inches‑to‑centimetres, we don’t know the conversion factor here.
Assume that on the worldwide graph the US is given a score of 100 and the UK a score of 50. The UK score of 50 means that the UK peak is 50 % of the US peak. At first glance this might suggest a conversion factor of ½, i.e., one US unit = 2 UK units. Let’s see why that’s not true.
Take a non‑peak day, say 30 April, with a hypothetical score of 70 in the US and 80 in the UK.
From the US perspective
[ 70% \text{ of US peak} = 0.70 \times 100\ \text{US units} = 0.70 \times 2 \times 100\ \text{UK units (if 1 US = 2 UK)} = 140\ \text{UK units} ]
From the UK perspective
[ 80% \text{ of UK peak} = 0.80 \times 100\ \text{UK units} = 80\ \text{UK units} ]
Clearly, 140 UK units is not double 80 UK units.
Just because the US peak is twice the UK peak doesn’t mean the US data are twice the UK data for the whole period!
So we can’t simply use worldwide ratios to compare different countries. What can we do?
Taking Inspiration from the Stock Market
The underlying science and methodologies we use in data science can translate across domains, so I’ll borrow an approach from finance.
The stock market is a place for buying and selling equity (shares) in a company. Shares represent partial ownership and often come with voting rights or dividends—a small bonus for being an owner. Stocks can be held by individuals, banks, hedge funds, or other private companies.
The stock market can be used as a measure of a country’s economic health. When stocks are going up, we’re in a…
(The rest of the article continues…)
Cleaned Markdown
Market Cycles and Economic Health
The stock market and a country’s economy are, in theory, financially prosperous together. When the market starts to fall we enter a bear market, and things go less well. This is a huge simplification—markets move according to human behaviour, which is notoriously difficult to understand—but for our purposes this generalisation holds: we can gain an understanding of a country’s economic health based on its stock market.
Tracking the Market Through Indices
So how do we track the stock market as a whole? The obvious idea is to add up the prices of all shares on the exchange to get a single number for the market’s value. In reality we use indices.
You’ve probably heard of the S&P 500, an index built from the 500 biggest U.S. companies. It’s used to track the U.S. market because, being the largest firms, it covers about 80 % of total market capitalisation (i.e., value) and is very liquid—its shares trade frequently and their prices move a lot.
Because it covers the majority of the market, the S&P 500 is a good representation of the whole market in a smaller collection of 500 stocks. Why 500?
- The S&P 500 was introduced in 1957.
- It wasn’t just a matter of computational power—at the time a new electronic calculation method made it possible to include 500 stocks in the index. (Before that, indices were even smaller because they were calculated by hand.)
Source: S&P Global – “Where It All Began”
Why Estimate in This Big‑Data World?
Today we have the computational power to calculate the entire market if we want; a few thousand stocks are “small fry” in today’s big‑data world. However, it isn’t really necessary:
- Adding smaller companies increases overhead in tracking them.
- Some small‑cap stocks trade infrequently, so their data become stale.
The cons outweigh the pros.
This conversation appears across finance. Examples:
| Index | Composition |
|---|---|
| FTSE‑100 | 100 UK stocks |
| Commodity baskets | Groups of commodities (e.g., oil, agriculture) |
| CPI | Basket of goods to track price changes |

If a basket of representative items can measure the entire stock market—or inflation—why not use a basket to track search volumes?
Applying ETFs to Google Trends Data
To use this concept, we need a set of the most commonly searched terms that can serve as a S&P‑500‑esque index for each country. Google Trends’ Year In Search provides a good source of basket candidates.

Assume we have average search volumes for at least one country (e.g., the United States). We can:
- Average the scaling factors for a subset (or the whole) of the basket.
- Treat this average as “US Google‑Trends units → real‑world search volumes.”
- Use the derived factor to estimate absolute search volumes for any term, giving us a sense of motivation behind the searches.
Making Search Data Truly Comparable Across Countries
Caveats
- Representativeness of the basket – I was limited to nine items because of manual download constraints.
- Country‑specific popular terms – Some nations have huge search volumes for terms absent from my basket.
- Example: Facebook and Instagram dominate in the US/UK, but WeChat is the Chinese equivalent.
- I omitted WeChat because it isn’t representative globally, yet it is highly representative for China.
Scaling Beyond the Benchmark Country
Even if we can benchmark one country, how do we scale the others? Two obvious influencers:
| Factor | Reason |
|---|---|
| Population | More people → potentially more searches. |
| Internet penetration | Not everyone has internet access; the proportion of users varies by country. |
I obtained data on percentage of internet users per country. Multiplying this by the total population yields the absolute number of internet users per country.
Adjustment factor for any country =
[ \frac{\text{Internet users in country}}{\text{Internet users in the US}} ]
Multiplying the US scaling factor by this adjustment gives an estimate of absolute search volume for any term in any country.
When the Maths Simplifies Itself
(Continuation of the analysis…)
Note: The original text had a stray “t” at the beginning. It has been removed for clarity.
Because we want to compare countries and model motivation trends, we’re not looking at absolute search volumes for “motivation.”
If we did, we might conclude that the U.S. is less motivated than the UK simply because it searches for “motivation” more often. In reality, the larger population means more searches, not lower motivation.
How we solve this
We need to express search volume for “motivation” as a proportion of total search volume.
We already have a “basket of terms” that approximates overall search activity, so we can:
- Calculate the absolute search volume for each term in the basket.
- Sum those volumes to get the basket total.
- Divide the absolute “motivation” volume by the basket total.
Observation: When we perform this calculation, all of the scaling factors we previously applied cancel out.
In other words, the scaling work becomes unnecessary for the final proportion.

Why the extra work still matters
If we had started the post with “let’s just add up the Google Trends scores of the basket and divide motivation by it,” readers would likely wonder, “Why? Is that even possible?”
Only after building the full scaling pipeline did we discover that the simple ratio works.
Additional benefit:
During the scaling process we accumulated many estimations, which introduced noise. By canceling out the scale factors, we effectively remove a lot of that noise.

Bottom line
Yes, we performed work that turned out to be unnecessary for the final calculation, but it was essential for:
- Understanding the problem deeply.
- Gaining confidence that our final metric is robust.
About Evil Works
At Evil Works we’re dedicated to improving the life of data scientists by:
- Showcasing real‑world projects – Read our blog
- Building tools to do data science better – Explore our products
Click the links to learn more.