Uncertainty Reduction in Customer Conversion Prediction
A Hierarchical Bayesian model with partial pooling that solved the perfect separation problem in e-commerce conversion data.
Overview
Predicting whether a customer will convert on an e-commerce platform sounds straightforward until 70% of your geographic segments have zero conversions. Standard logistic regression breaks entirely in this scenario, producing infinite coefficients and unstable estimates. This project tackles that problem head-on using a Hierarchical Bayesian approach with partial pooling.
Built with PyMC, this was my final project for STP 505.
The Problem: Perfect Separation
The dataset contained ~100,000 customer interactions with a binary conversion target (1.6% global conversion rate). When we split conversion rates by geography, a stark pattern emerged:
- Active cities (Lahore, Islamabad, Karachi) healthy conversion rates between 4.8% and 5.5%
- Dead cities (Faisalabad, Gujranwala, Multan, Peshawar, Quetta, Rawalpindi, Sialkot) zero conversions across tens of thousands of observations
For Maximum Likelihood Estimation, zero conversions mean the log-odds approach −∞. The MLE algorithm tries to drive coefficients toward negative infinity, resulting in completely unstable, unusable estimates for exactly the cities a business most needs to understand.
The Solution: Hierarchical Bayesian Modeling
We implemented four models in PyMC, each progressively adding hierarchical structure:
Model 1 — Flat Logistic Regression (Baseline) A standard logistic regression ignoring all group effects. Confirmed the failure case: unable to produce valid estimates for zero-conversion cities.
Model 2 — Hierarchical by Location Introduced a random intercept per city, drawn from a shared hyper-prior. This is the key insight instead of treating each city independently, the model borrows strength from the global average:
α location ~ Normal(μα, σα)
logit(p) = αlocation[i] + Σ βk * Xk
The hyper-prior pulls zero-conversion city estimates toward the global mean (shrinkage), producing valid, finite probability estimates.
Model 3 — Hierarchical by Lead Source Same structure, but grouped by marketing channel (Email, Organic, Referral, Social Media) instead of location.
Model 4 — Crossed Hierarchical Combined both location and lead source random effects simultaneously.
Results
Model comparison via LOO (Leave-One-Out Cross-Validation) and Bayesian stacking gave a clear winner:
Model 2 (Hierarchical Location) is the best model.
The crossed model (Model 4) performed nearly identically but received a stacking weight of zero adding lead source effects contributed zero predictive value, meaning the added complexity was entirely redundant.
Model 3 (Lead Source only) performed dramatically worse, with a LOO score 1,091 points higher than the winner. Marketing channel is essentially noise compared to geography.
Shrinkage in Action
For the seven zero-conversion cities, the hierarchical model didn’t output 0% it shrank the estimates toward the global mean and produced valid conversion probabilities between 0.5% and 1.2%. That’s a conservative but honest answer: we haven’t seen a conversion here yet, but it’s possible.
What Actually Drives Conversion?
Looking at the fixed effects across behavioral features:
- Pages Viewed the single strongest positive predictor. More pages = stronger intent signal.
- Time on Site surprisingly weak. A user leaving a tab open is not the same as a user actively browsing.
The takeaway: engagement depth matters far more than engagement duration.
Convergence Diagnostics
All models were sampled with 4 parallel chains, 1,000 tuning steps, and 2,000 draws per chain (8,000 total posterior samples). Convergence was assessed via Gelman-Rubin statistics (R̂) and Effective Sample Size (ESS):
- Models 2 & 4: R̂ = 1.0 for all hyperparameters, ESS consistently > 1,000 excellent mixing
- Model 3: R̂ reached 1.03, ESS ≈ 144 slower mixing, consistent with lead source being a weak signal
Key Takeaways
I mainly learned from this project that Hierarchical Bayesian modeling is actually a practical solution for real-world sparse data problems. The flat model failed completely for 70% of the data. The hierarchical model handled the structural zeros well.
For businesses using this kind of model, the implications are concrete:
- Focus on on-site engagement depth (pages viewed) as the primary lever for improving conversion
- Investigate logistical or regional barriers in zero-conversion cities the model confirms there’s latent potential there, just untapped