Uncertainty Reduction in Customer Conversion Prediction

Overview

Predicting whether a customer will convert on an e-commerce platform sounds straightforward until 70% of your geographic segments have zero conversions. Standard logistic regression breaks entirely in this scenario, producing infinite coefficients and unstable estimates. This project tackles that problem head-on using a Hierarchical Bayesian approach with partial pooling.

Built with PyMC, this was my final project for STP 505.

The Problem: Perfect Separation

The dataset contained ~100,000 customer interactions with a binary conversion target (1.6% global conversion rate). When we split conversion rates by geography, a stark pattern emerged:

Active cities (Lahore, Islamabad, Karachi) healthy conversion rates between 4.8% and 5.5%
Dead cities (Faisalabad, Gujranwala, Multan, Peshawar, Quetta, Rawalpindi, Sialkot) zero conversions across tens of thousands of observations

For Maximum Likelihood Estimation, zero conversions mean the log-odds approach −∞. The MLE algorithm tries to drive coefficients toward negative infinity, resulting in completely unstable, unusable estimates for exactly the cities a business most needs to understand.

The Solution: Hierarchical Bayesian Modeling

We implemented four models in PyMC, each progressively adding hierarchical structure:

Model 1 — Flat Logistic Regression (Baseline) A standard logistic regression ignoring all group effects. Confirmed the failure case: unable to produce valid estimates for zero-conversion cities.

Model 2 — Hierarchical by Location Introduced a random intercept per city, drawn from a shared hyper-prior. This is the key insight instead of treating each city independently, the model borrows strength from the global average:

α location ~ Normal(μα, σα)
logit(p) = αlocation[i] + Σ βk * Xk

The hyper-prior pulls zero-conversion city estimates toward the global mean (shrinkage), producing valid, finite probability estimates.

Model 3 — Hierarchical by Lead Source Same structure, but grouped by marketing channel (Email, Organic, Referral, Social Media) instead of location.

Model 4 — Crossed Hierarchical Combined both location and lead source random effects simultaneously.

Results

Model comparison via LOO (Leave-One-Out Cross-Validation) and Bayesian stacking gave a clear winner:

Model 2 (Hierarchical Location) is the best model.

The crossed model (Model 4) performed nearly identically but received a stacking weight of zero adding lead source effects contributed zero predictive value, meaning the added complexity was entirely redundant.

Model 3 (Lead Source only) performed dramatically worse, with a LOO score 1,091 points higher than the winner. Marketing channel is essentially noise compared to geography.

Shrinkage in Action

For the seven zero-conversion cities, the hierarchical model didn’t output 0% it shrank the estimates toward the global mean and produced valid conversion probabilities between 0.5% and 1.2%. That’s a conservative but honest answer: we haven’t seen a conversion here yet, but it’s possible.

What Actually Drives Conversion?

Looking at the fixed effects across behavioral features:

Pages Viewed the single strongest positive predictor. More pages = stronger intent signal.
Time on Site surprisingly weak. A user leaving a tab open is not the same as a user actively browsing.

The takeaway: engagement depth matters far more than engagement duration.

Convergence Diagnostics

All models were sampled with 4 parallel chains, 1,000 tuning steps, and 2,000 draws per chain (8,000 total posterior samples). Convergence was assessed via Gelman-Rubin statistics (R̂) and Effective Sample Size (ESS):

Models 2 & 4: R̂ = 1.0 for all hyperparameters, ESS consistently > 1,000 excellent mixing
Model 3: R̂ reached 1.03, ESS ≈ 144 slower mixing, consistent with lead source being a weak signal

Key Takeaways

I mainly learned from this project that Hierarchical Bayesian modeling is actually a practical solution for real-world sparse data problems. The flat model failed completely for 70% of the data. The hierarchical model handled the structural zeros well.

For businesses using this kind of model, the implications are concrete:

Focus on on-site engagement depth (pages viewed) as the primary lever for improving conversion
Investigate logistical or regional barriers in zero-conversion cities the model confirms there’s latent potential there, just untapped