How do geo-tests really work?
Apr 7, 2025
What are geo-tests?
Geo-testing, or matched market testing, is a handy way to gauge the incremental impact of your marketing activity. It helps you determine how much of the outcome is due to your actions versus what would have happened anyway. You can compare areas where you made a change to those where you didn't. It's a useful tool when you can't use the Randomized Controlled Trial method.
This post explores prominent geo-testing methodologies:
Pre-post analysis: Looks at changes in the target metric before and after the intervention in the same region.
Difference-in-Differences (DiD): Compares the changes in the target metric before and after the intervention between a treated region and a control region.
Synthetic Control: Constructs a "synthetic" version of the treated region as a counterfactual to estimate what would have happened without the intervention.
Bayesian structural time series (BSTS): Uses advanced statistical modeling to predict the counterfactual and estimate the causal impact of the intervention.
Each geo-test measures the incremental impact of a specific treatment (e.g., ad spend increase, pricing change, new creative/offer, different targeting) by comparing treated geographic regions (test markets) to untreated ones (control markets). These tests assess incrementality — how much of the outcome would not have happened without the treatment.
The key to geo-testing is selecting a methodology that accounts for confounders, natural trends, and noise to isolate the true causal effect of the intervention.
But first, RCTs are better than geo-tests!
Randomized controlled trials (RCTs) are the gold standard for establishing causal relationships because they isolate the effect of a specific intervention. The key is the random assignment of subjects (e.g. customers, users) to either a treatment group that receives the intervention or a control group that does not.
If you’ve run a/b tests on your website, you’ve likely executed an RCT. However, usual “a/b tests” in ad platforms aren’t actual RCTs. More on that topic in a follow-up post.
This random assignment ensures the two groups are statistically equivalent on average, except for the intervention. Any observed outcome differences can be attributed to the treatment’s causal effect, not confounding factors.
In marketing, many variables can influence outcomes. These include seasonality, competitor activity, media coverage, and market changes, which can all impact these metrics independently of the marketing intervention being tested.
With an RCT, random assignment lets you control for extraneous factors and isolate the true causal impact of your marketing campaign, promotion, ad creative, or intervention. This is impossible with observational data or non-randomized study designs.
While randomized controlled trials (RCTs) are the best method to understand causal and effect relationships, they can be challenging to execute in marketing.
Often, you can't randomly assign users into test and control groups. Most ad platforms and marketing channels don't offer a self-service way to do this.
Even when platforms allow RCTs, there are additional requirements that make it difficult. Meta offers a self-serve "conversion lift" feature using an RCT design. However, you need a CAPI integration set up and maintain a high event match quality score. These technical prerequisites are barriers to executing a true RCT.
Geo-tests are a practical alternative for modern marketers when running a true RCT isn’t possible. Let’s explore the geo-testing methodologies.
1. Pre-Post Analysis
The simplest geo-testing method, pre-post analysis, compares the outcome metric (e.g., sales, conversions) in the treated region before and after the intervention. This shows any change in the metric over time in the treated area.
Pros
Pre-post analysis is straightforward. You're looking at how the metric changed in the test region before vs. after the treatment. No complicated statistical modeling is required.
Unlike other geo-testing methods, pre-post analysis doesn’t require a separate control region for comparison. You need only data from the treated region.
Cons
The major downside of pre-post analysis is that it cannot account for external factors impacting the outcome metric over time, besides the intervention. For example, if you ran a promotional campaign in a region, any sales increase could be due to market trends, seasonal effects, competitor activity, or other confounding variables - not just your campaign. Without a control group, you can’t isolate the true causal effect.
Imagine you ran a new ad campaign in the Northeast US. If you looked at sales in that region before vs. after the campaign, you might see a 20% increase. However, if national sales also increased by 15% during that time due to broader economic conditions, the true impact of your campaign was only around 5%. Pre-post analysis would incorrectly attribute the full 20% increase to the campaign.
When to Use
Pre-post analysis is only appropriate in specific situations where you can be confident there are no significant confounding factors, when there's no viable control group and minimal risk of confounding (rare). This is rare in real-world marketing contexts.
As a preliminary exploration, pre-post can be useful as an initial exploration of the data. However, one should never rely on it for causal claims about an intervention’s impact. One should always follow it up with rigorous geo-testing methods.
2. Difference-in-Differences (DiD)
The Difference-in-Differences (DiD) method estimates the causal impact of an intervention. DiD compares the change in the outcome variable (e.g. sales, conversions) before and after the treatment between a group that received the treatment and a group that did not.
The DiD method relies on the parallel trends assumption. This assumption states that in the absence of the treatment, the outcome trends would have been the same between the treatment and control groups. Any divergence in trends after the treatment can be attributed to its causal effect.
Here's how the DiD method works in more detail:
Define treatment and control groups: Identify two sets of geographic regions, markets, or entities. One group is a "treatment" group that received the intervention, and the other is a "control" group that did not.
Before and after the treatment, measure the outcome variable (e.g. sales, conversions) in both the treatment and control groups.
Calculate two differences:
The change in the treatment group’s outcome variable from the pre-period to the post-period.
The change in the control group’s outcome variable from the pre-period to the post-period.
Estimate the treatment effect: The DiD estimate of the treatment effect is the difference between these two differences. Specifically:
Treatment effect = [Treatment (post) - Treatment (pre)] - [Control (post) - Control (pre)]
This formula isolates the causal impact of the treatment by subtracting concurrent changes in the control group that are not attributable to the treatment.
The key assumption of the DiD method is that without the treatment, the treatment and control groups would have followed parallel trends in the outcome variable over time. This assumption can be tested by examining the pre-treatment trends in the two groups.
If the parallel trends assumption holds, the DiD estimate provides a robust and unbiased estimate of the treatment’s causal effect. However, if the trends in the two groups diverge before the treatment, the DiD method may produce biased results.
Pros
The DiD method is straightforward to understand. This makes it an accessible approach.
Cons:
If the treatment and control groups exhibit diverging trends in the outcome variable before the treatment, the DiD estimate will be biased.
While DiD can account for time-invariant differences between the treatment and control groups, it struggles with time-varying confounding factors that may impact the two groups differently over the campaign period, leading to potential bias.
The key pros of the DiD method are its simplicity and intuitive appeal. However, the cons highlight the importance of validating the parallel trends assumption and being mindful of potential time-varying confounders. Assessing these requirements is crucial for applying DiD successfully.
3. Synthetic Control Method
The Synthetic Control Method (SCM) constructs a "synthetic" version of the treated unit (e.g. a market, region, or entity) as a counterfactual - an estimate of what would have happened without the treatment.
Here's how the SCM approach works:
First, identify potential control markets for the synthetic control. These units should be similar to the treated unit in pre-treatment characteristics and trends but did not receive the treatment.
SCM creates a “lookalike” version of the test market by combining several control markets. This is known as the Synthetic Control. It picks and weights those controls to best match how the test market behaved before the campaign started. This lookalike becomes the baseline to measure the campaign’s true impact.
The weights are chosen so the synthetic control replicates the pre-treatment behavior of the treated unit. This is done by minimizing the mean squared difference between the treated unit's and synthetic control's pre-treatment outcomes.
After constructing the synthetic control, you compare the post-treatment outcomes of the treated unit to the synthetic control. Any difference in the post-treatment period represents the estimated causal effect of the treatment.
Pros
The key advantage of the SCM approach is that it doesn’t rely on the parallel trends assumption of the Difference-in-Differences (DiD) method. SCM can handle situations where the pre-treatment trends of the treated unit and potential control units are not perfectly parallel.
This makes SCM a great fit when your test market behaves a little differently than the others before launch. It’s especially useful when you’re testing something in just one region, like trying a new campaign in a single state or city. DiD, on the other hand, works better when you have lots of test and control groups to compare.
Cons
The main limitations of SCM are its computational intensity, especially with many potential control units.
A strong pre-treatment fit between the treated unit and the synthetic control is required to produce reliable treatment effect estimates.
The Synthetic Control Method provides a flexible and powerful approach for estimating causal impacts in quasi-experimental settings.
4. Bayesian Structural Time Series (BSTS)
BSTS models, popularized by Google's CausalImpact package, are effective for estimating the causal impact of an intervention or treatment. They build on top of the Synthetic Control Method. These models use Bayesian time series modeling to predict the counterfactual - what would have happened without the treatment.
The key idea behind BSTS is to model the observed time series as a combination of underlying components:
Trend: The long-term direction of the time series.
Seasonality refers to periodic patterns or cycles in the data (e.g. weekly, monthly, yearly).
Regressors are additional variables or control time series that clarify the outcome.
Here's how BSTS models work in detail:
First, you fit a BSTS model to the pre-treatment data, using the control regions/time series as regressors. This allows the model to learn the historical trend, seasonality, and relationship between the control variables and the outcome.
Once you’ve trained the model on the pre-treatment data, it can forecast the post-treatment period for the treated region, as if the treatment had never occurred. This forecast represents the counterfactual - the expected outcome without the treatment.
Compare Forecast to Actual: The model compares the actual post-treatment outcome in the treated region to the forecasted counterfactual. Any divergence represents the estimated causal impact of the treatment.
The key advantages of the BSTS approach are:
Pros:
It captures complex time dynamics like trend and seasonality, common in marketing and business data.
It provides probabilistic estimates of the treatment effect and confidence intervals, indicating the uncertainty around the impact.
BSTS models can use relevant control time series as regressors to improve the accuracy of the counterfactual forecast.
Cons:
The model assumptions, like the time series components structure, must be validated.
The quality and relevance of the input control time series can greatly impact the model's performance.
Interpreting a BSTS model is more complex than simpler geo-testing methods, especially for non-statistical audiences.
BSTS models and the CausalImpact package provide a flexible and robust approach for estimating causal impacts in time series data with complex patterns and seasonality.
How does Paramark’s incrementality testing work?
Paramark uses the Bayesian Structural Time Series (BSTS) approach. Although developed separately — SCM from econometrics and BSTS from Bayesian time series modeling — they are conceptually similar. Some research and practical frameworks consider BSTS a Bayesian generalization of synthetic control, especially when:
You model the treated unit as a function of a weighted combination of controls.
You allow time-varying effects and uncertainty.
Here’s an example of the overlap:
SCM: The treated outcome ≈ weighted sum of static control outcomes
BSTS with regression: Treated outcome ≈ regression on control outcomes + dynamic components + noise
BSTS can be considered as:
SCM + time dynamics + uncertainty modeling + prior distributions
Conclusion
Geo-testing methodologies vary in complexity and power. While Pre-Post and Difference-in-Differences are popular methods in this field, Synthetic Control and Bayesian time series models are gaining popularity for their flexibility and robustness.
Investing in rigorous methodology ensures your incrementality tests yield trustworthy insights. This separates true impact from noise and enables more effective marketing decisions.