Methodology
The math behind
your number
Every percentage you see is the output of a conditional probability model built on government census microdata — not surveys, not vibes, not crowd-sourced guesses.
Try the calculator →The data
Primary sources — not proxies
Every statistical claim in this tool traces back to a public government dataset. No third-party polls. No app self-reports. The raw microdata is processed once, compiled into lookup tables, and served statically — so every calculation runs in your browser in milliseconds with no external API calls.
🇺🇸
US Census Bureau — ACS 5-Year (2022)
Age, income, marital status, education, nativity, and heritage distributions across 330M+ Americans. The ACS is the gold standard for US population statistics — it covers 3.5 million households per year.
🏥
CDC NHANES (2017–2020)
Height and weight distributions measured by trained examiners using calibrated equipment — not self-reported. This is the only nationally representative source for actual physical measurements.
🏦
Federal Reserve SCF (2022)
Survey of Consumer Finances — the definitive source for US wealth distributions. Conducted every 3 years with oversampling of wealthy households to accurately capture the top tail.
📊
International: Statistics Korea, Statistics Bureau Japan, NBS China, ONS UK, Destatis, INSEE, CBS, INE, ISTAT, SCB, GUS
Country-specific equivalents covering height (NCD-RisC 2016 global study), income (Eurostat LCS), and wealth (ECB Household Finance and Consumption Survey 2021).
The model
Conditional independence — and when we break it
The core calculation is a product of marginal probabilities across each criterion. If you require someone who is in the top 15% for height, the top 20% for income, and has never been married, the naive estimate is:
// Naïve independence assumption:
P(match) = P(height) × P(income) × P(never_married) × …
This is the same approach used by every competitor tool. It works as a first approximation but misses the real structure of the data — people's traits are correlated with each other. We partially correct for this in three places.
Age-conditional distributions
Relationship status, income, and net worth all shift dramatically with age. A 23-year-old who has 'never been married' is totally unremarkable; a 42-year-old is relatively rare. Rather than applying a flat national marriage rate, we decompose the user's age range into six age buckets and apply bucket-specific rates, then take a population-weighted average across buckets.
Heritage-conditional physical and economic traits
Height, weight, income, wealth, and education distributions differ substantially across ancestral groups — not because of any inherent ranking, but because immigration selection effects, geographic concentration, and historical context all shift the distribution. When a heritage preference is selected, we replace the national Gaussian height parameters and income/wealth multipliers with group-specific ones derived from CDC NHANES and ACS microdata.
Heritage-conditional nativity
This is where most tools go wrong. If someone selects 'East Asian heritage' and 'locally born only', a naïve model multiplies P(East Asian) × P(native_born_nationally) ≈ 0.006 × 0.87. But P(native_born | East Asian) ≈ 0.42 — less than half the overall rate. Treating it otherwise overestimates the locally-born East Asian pool by more than 2×. We apply heritage-group-specific nativity rates drawn from ACS nativity-by-ancestry tables.
Education–income joint probability
If you filter for 'bachelor's degree or higher' AND a minimum income, the two are not independent — educated people earn more. Multiplying P(bachelor's+) × P(income ≥ X) would overstate the rarity. Instead, when both filters are active, we use P(income ≥ X | education level, age) via an education income-ratio multiplier, and report P(bachelor's+ | age) separately. Together they yield the correct joint P(edu ∩ income | age).
Physical traits
Height: parametric Gaussian over lookup tables
For the general population, height is read from a survival function table derived from NHANES measurements. When a specific heritage is selected, we switch to a parametric Gaussian with group-specific mean and standard deviation — because the group may be too small in the NHANES sample to produce a reliable empirical CDF at the tails. A continuity correction of ±0.5 inches is applied to account for discrete reporting of height in whole inches.
Erf computed via the Abramowitz & Stegun polynomial approximation (max error 1.5 × 10⁻⁷).
Economic traits
Income and wealth: ratio-shifted survival functions
Raw income and wealth distributions are stored as empirical survival functions — P(income ≥ X) at discrete dollar thresholds. To condition on age, heritage, education, or city, we apply a multiplicative ratio to the threshold rather than to the probability, which is equivalent to a location shift in log-space and preserves the shape of the distribution.
adjusted_threshold = raw_threshold / ratio
P(income ≥ X | age, edu, city) = survival_fn(X / (age_ratio × edu_ratio × city_ratio))
City wealth is dampened relative to city income using a 0.75 exponent — because high-cost cities consume the income premium through housing, so high earners in San Francisco don't accumulate proportionally more wealth than their Dallas counterparts.
Attractiveness
Looks: a uniform percentile model
Attractiveness is not measurable from government data. We model it as a uniform distribution across [0, 100] percentiles. Selecting "Top 15%" means P(looks qualifies) = 0.15. This is the one place where the model is deliberately simple — because no dataset exists that would let us do better. We trust users to self-calibrate this slider based on their own honest assessment.
Peer-reviewed literature suggests that people reliably overestimate their own attractiveness by roughly one standard deviation — take that as you will.
Grading
The grade scale
The final probability is mapped to a letter grade based on how rare your match is within the total adult population of the target sex.
≥ 20% of population
One in five people qualifies. You'll find them without much difficulty.
5–20%
One in 5–20. More effort required, but a healthy dating pool remains.
1–5%
One in 20–100. You're in the territory where luck plays a meaningful role.
0.1–1%
One in 100–1,000. They exist — statistically all of them could fit in a stadium.
< 0.1%
Fewer than one in 1,000. The math has concerns.
Transparency
What we don't model — and why
Remaining correlations between traits
Height and weight are correlated (taller people weigh more on average). Income and wealth are correlated (r ≈ 0.65). Attractiveness and income are correlated (the 'beauty premium' is approximately 10–15% in wage data). We condition on several of these but not all — a fully joint multivariate model would require microdata with every variable measured on the same individual, which no single public dataset provides.
Geographic heterogeneity beyond city income
We adjust income and wealth thresholds for the selected city, but we don't yet adjust height or weight distributions by region (which do vary slightly), or relationship status by metro area (cities skew younger and more single).
Dating pool, not total population
We estimate the share of the total adult population meeting your criteria. The actual singles dating market is a subset of this — it excludes people in relationships, people not actively dating, and people outside your geography. The true number of findable matches is lower. We show a floor, not a ceiling.
Attractiveness measurement
There is no validated, large-scale measurement of physical attractiveness distributions. Our uniform model is an acknowledged simplification.
Run the numbers
Now that you know how it works, let the math tell you something true.