Chapter 07 · Build the list

Vendor category

Enrichment vendor tradeoffs — data quality at scale.

Enrichment is the data-quality layer underneath every other reference in the outbound stack. Per-attribute accuracy ranges from 60% at the low end to 95%+ at the high end, and the per-vendor differential is real and measurable. The operator who treats enrichment as a commodity purchase is the operator whose downstream bounce rate, reply rate, and meeting rate are bounded by the worst per-attribute accuracy in their data layer.

The premise

Every reference downstream of this chapter — email infrastructure, copy, LinkedIn, conference outreach — presupposes that the prospect record is correct. The email resolves to a working inbox. The first name matches what the recipient is called. The title reflects the recipient's actual role. The company is the one the recipient currently works at, not the one they left fourteen months ago. Each attribute has a per-vendor accuracy rate, and each accuracy gap propagates into the downstream conversion math.

The empirical relationship is mechanical: a list with 78% email accuracy delivers roughly 78% of the deliverability the underlying infrastructure would otherwise produce; the fourteen-point gap against a 92%-accuracy list is not absorbed by the sending stack but shows up as a fourteen-point lift in bounce rate, which the email-cluster bounce chapter identifies as one of the strongest reputation signals at Gmail and Microsoft. A list with 70% role accuracy produces a copy layer whose personalization is wrong on 30% of sends. The empirical reply-rate impact, in our observation, is a 40 to 60% decay against the same copy run on a correctly-enriched list.

Enrichment is not, in this framing, a procurement decision. It is the upstream determinant of the ceiling on every conversion metric the operator measures downstream.

The four enrichment categories

The vendor landscape is structurally divided along the dimension of how the data is acquired and how often it is refreshed. The four categories produce meaningfully different data products at meaningfully different price points.

Major B2B data providers

Large, general-purpose B2B data platforms — typically the operator's first vendor and the one that produces the broadest coverage. These vendors build datasets through public-records ingestion, web scraping, contributory networks (users opting in to share their contact graph in exchange for free access), and direct sourcing. The dataset typically spans 200M to 700M contact records and 30M to 100M company records.

The strength is breadth: an operator targeting a generic ICP — VP of Sales at SaaS companies between $5M and $50M ARR, US-based — will find adequate coverage from any of the top three vendors in this category. The weakness is depth: in any specific vertical, region, or seniority band, the major providers are typically outperformed by a niche vendor focused tighter.

Niche enrichment vendors

Vendors focused on a specific vertical (healthcare, government contracting, manufacturing, real estate), region (DACH, Japan, MENA, Southeast Asia), or seniority band (C-suite only, technical-leader only). The dataset is smaller — 5M to 50M contacts is typical — but the per-attribute accuracy inside the niche is meaningfully higher than the major providers, and the long-tail coverage includes companies and roles the major providers miss entirely.

The operator targeting a non-US ICP will typically find that the major providers' coverage drops by 30 to 60% outside North America, and the per-attribute accuracy of the records they do have drops by a further 10 to 20 points. A niche enricher focused on the target region closes the gap.

Scraping-based enrichment services

Vendors that operate as on-demand scrapers — given a name, company, or URL, they retrieve the prospect's professional profile, employment history, and contact information by scraping public profile pages, then run a per-record verification pass. The dataset is not pre-built; the scrape is performed at query time, which produces a different freshness profile and a different legal posture (the scrape itself is the legal action, not the contribution to a static database).

The category exists in a legal regime defined by the post-hiQ v. LinkedIn rulings (cross-link to the LinkedIn-compliance chapter in the LinkedIn cluster). The Ninth Circuit's clarification that scraping publicly-accessible data does not, in itself, violate the Computer Fraud and Abuse Act produced the operating space for these vendors, but platform-specific terms-of-service exposure remains. Operators using scraping-based enrichment should understand the contractual posture, not just the statutory one.

AI-research enrichment

The newest category: vendors that, for each prospect, run an LLM-driven research pass and produce a per-prospect narrative — a paragraph or two summarizing recent activity, company news, inferred priorities, and a candidate personalization hook. The output is not structured firmographic data; it is a synthesized briefing.

The production cost is meaningfully higher per record — $0.10 to $0.50 per prospect is typical at the time of writing, versus $0.002 to $0.05 for structured enrichment — and the empirical signal value is unevenly distributed. On the top decile of prospects, the output materially lifts reply rates; on the bottom decile, the output is generic and adds no signal over a templated send. The operator's task is gating the AI-research spend to the prospects where the lift materializes.

Per-attribute accuracy ranges

Per-attribute accuracy is the metric that determines downstream conversion. Vendor marketing materials report aggregate accuracy; the operator running production campaigns should track per-attribute. The empirical ranges, drawn from operator audits across the four vendor categories:

Attribute	Low end	High end	Decay driver
Work email	65%	92%	Job changes, role transitions, domain migrations
Direct dial	40%	75%	Remote work, mobile-only roles, gatekeeping
Role / seniority	70%	90%	Internal promotions, title inflation, reorgs
Company firmographics	80%	95%	Acquisitions, headcount drift, funding stage

The spread is not random. Email and direct dial decay fastest because they are tied to individual employment, which churns at a 20 to 30% annual rate in the typical target population. Role and seniority decay at the rate of internal promotion and reorganization, roughly 15 to 25% annually. Company firmographics decay slowest because the company itself is relatively stable.

Per-category coverage differential

The aggregate accuracy ranges above obscure a coverage differential along three dimensions: geography, B2B vs B2C, and company size.

Geographically, major providers are US-heavy. Their per-attribute accuracy inside the US is typically 10 to 20 points higher than EMEA and 20 to 40 points higher than APAC. Operators targeting non-US ICPs and relying on a single US-headquartered vendor are running campaigns on a list with materially lower data quality than they are billed for.

On the B2B vs B2C axis, this vendor category is built for B2B. Consumer-oriented enrichment — personal email, household firmographics — is a different vendor category entirely. On the company-size axis, mid-market and enterprise coverage is typically strong; SMB and sub-50-employee coverage is meaningfully weaker, with sparse contact-level enrichment below the founder tier. Operators targeting SMB should expect 50 to 70% coverage rather than the 85 to 95% they would see on a mid-market list.

Pricing patterns

Pricing in the enrichment category follows a recognizable three-tier structure. The empirical ranges:

Tier	Price range	Typical structure
Entry / self-serve	$30 – $200 / mo	Per-seat with a capped monthly export quota; suited to one or two operators
Mid-market	$500 – $3,000 / mo	Annual contract with a meaningfully higher export quota and the start of API access
Enterprise	$10K – $50K+ / yr	Annual contract, full API access, custom datasets, dedicated account management

The structural observation: at the entry tier, the operator is constrained on volume and cannot run the multi-vendor pattern described below. At the mid-market tier, the operator has the API access required to integrate enrichment into the prospect-graph construction pipeline (Chapter 04). At the enterprise tier, the operator is buying not just data but contractual posture — indemnification, data-residency guarantees, and compliance documentation required for EU operations.

The multi-source enrichment pattern

The dominant operational pattern at production scale: run two or three vendors in parallel against the same prospect list, and take the highest-confidence value per attribute. Sometimes called waterfall enrichment, sometimes multi-source enrichment, it is the single highest-leverage data-quality intervention in the stack.

Mechanically: for each prospect, query vendor A, vendor B, and (optionally) vendor C. For each attribute — email, direct dial, title, company — apply a confidence ranking. Email-confidence is driven by a separate verification pass (see below); title and firmographic confidence by the vendor's reported confidence score; phone confidence by vendor reputation in the relevant region. The output is a merged record where each attribute is drawn from the highest-confidence source.

The empirical lift, in our observation, is a 10 to 25 percentage-point increase in per-attribute accuracy versus the best single-vendor result, with the largest gains in international coverage and on the email attribute. The operational cost is a 2x to 3x increase in enrichment spend, partially offset by reduction in wasted downstream spend on bad records. The operator's question is not whether the pattern produces a lift; it does. The question is whether the lift on the operator's specific ICP is worth the integration overhead — a one-to-three-week engineering investment at mid-market vendor tiers.

The email-verification layer

Email verification is a separate vendor category from enrichment, performed as a post-enrichment pass. The verification service performs an SMTP-level handshake with the receiving mail server — connecting to the MX, issuing RCPT TO, and reading the response code — to determine whether the address resolves to a deliverable mailbox without actually sending mail.

Per-vendor verification accuracy ranges from 90% to 98% in our observation, with residual error driven by catch-all domains that accept all addresses at SMTP and only bounce post-acceptance, and by greylisting that defers the handshake. Operators running cold sending against an unverified list typically observe bounce rates 8 to 15 percentage points higher than the same list verified pre-send.

The implication for the bounce-rate ceiling — cross-link to the email cluster bounce taxonomy — is direct. Gmail and Microsoft begin penalizing sender reputation at bounce rates above 3%. A list enriched at 78% email accuracy and sent without verification will produce a bounce rate of 18 to 22%, which is reputation-destroying at any meaningful volume; the same list with a verification pass typically lands at 4 to 7%. The verification step is the most cost-effective single intervention in the stack: $0.005 to $0.02 per record, against a downstream cost-per-meeting two to four orders of magnitude higher.

Data freshness and per-attribute decay

Every enrichment vendor refreshes on a cadence. The cadence and the methodology — full re-scrape, selective re-verification, contributory updates — drive the empirical decay of per-attribute accuracy between purchase and use.

A list pulled six months ago and re-used today has, in our observation, a 15 to 30 percentage-point decay in email accuracy and a similar decay in role accuracy. The decay is dominated by job changes — the recipient left, the email no longer resolves, the role no longer applies. Treating a one-time enrichment pull as a permanent asset is the most common operational failure in the category. The correct posture is quarterly or semi-annual re-enrichment of any active prospect list, with a hard rule against running cold sequences against records enriched more than 90 days ago without a fresh verification pass.

Privacy and compliance

The enrichment vendor's compliance posture is a non-negotiable input for any operator targeting EU prospects. GDPR Article 6 requires a lawful basis for processing personal data, and the legitimate-interest basis applied to B2B targeting is contested in practice — defensible for narrow, role-specific outreach to senior decision-makers with a clear commercial purpose; not defensible for blanket prospecting at low seniority.

The vendor-side question is whether the enrichment provider has its own GDPR-compliant lawful basis for holding and selling the data. A vendor that cannot articulate this transfers the compliance risk to the operator at the point of use. CCPA, the California analog, is more permissive in B2B but carries its own posture requirements: the operator must honor right-to-delete requests within 45 days, and the vendor must propagate deletions to the operator's downstream systems.

The practical implication for B2B targeting in EU: operate with a higher per-record vendor cost, accept the narrower targeting envelope, and document the lawful basis at the campaign level. The operator who treats the EU as a marginal expansion of a US-built motion is the operator who discovers the regulatory exposure at the point of complaint.

The do-not-reach list

Suppression at the enrichment layer prevents bad data from entering the campaign in the first place, rather than catching it at send. The do-not-reach list — competitors, existing customers, former customers, hand-flagged prospects, opted-out recipients, regulator-mandated exclusions — is applied as a filter against the enriched list before any send-stage tool sees the record. The pattern at scale: maintain the list as a first-class asset in the operator's own systems, sync it into every campaign tool, and re-apply it on every enrichment pull. Cross-link to Chapter 08 for the full hygiene discipline; the enrichment-layer point is that the suppression filter belongs at acquisition, not at send.

The build-vs-buy question for high-volume operators

At the high end of operator scale — single-operator volumes above 50,000 enriched contacts per month, or multi-operator volumes above 200,000 per month — the build-vs-buy question becomes economically meaningful. The build path: operate scrapers, maintain a contact database, run verification at scale, and treat the enrichment layer as in-house infrastructure rather than purchased data.

The path carries meaningful overhead — engineering, compliance, infrastructure — and is not justified below the high end. It is, however, the operational pattern of every major B2B data provider in the category. Operators at sufficient scale to justify the build typically arrive at a hybrid posture: in-house scraping for the highest-value attributes (current employment, title, recent role transitions), purchased data for broader firmographic and historical context. For the operator below this volume threshold, the build path is a distraction; the operational hour is better spent on multi-source vendor integration than on scraper maintenance.

Common operator failures observed in production

Single-vendor over-reliance. The operator buys one vendor's contract, builds the entire pipeline against its API, and never runs a multi-source comparison. The per-attribute accuracy ceiling is set by that vendor's coverage in the specific ICP, and the operator never discovers what the ceiling could be.
No email verification. The operator runs cold sequences against raw enrichment output, observes a 15 to 22% bounce rate, and attributes the deliverability problem to sending infrastructure rather than the list. The infrastructure investment that follows is wasted.
No compliance posture. The operator targets EU prospects under a US-built motion with no documented lawful basis, no opt-out flow, and no consideration of the vendor's GDPR posture. The exposure is dormant until the first complaint.
Treating enrichment as one-time. The operator pulls a list, enriches once, runs it for nine months. By month three, the data is decayed; by month six, the campaign sends into a graveyard of stale roles. The operator typically attributes declining performance to creative fatigue and rotates copy, which does not fix the underlying data problem.
Buying AI-research enrichment at the wrong tier. The operator deploys per-prospect LLM research across the full target list, spends $0.20 per record on 50,000 records, and produces $10,000 of personalization where the empirical lift is concentrated in the top 5,000. The same outcome at a fraction of the cost is available with tier-gated AI research.
Ignoring the do-not-reach list at the enrichment layer. Suppression applied only at send results in repeated re-purchase of records the operator has already paid to exclude. The vendor invoice grows at the rate of the do-not-reach list with no corresponding lift in addressable population.

Pre-purchase checklist

Per-attribute accuracy benchmarked against the operator's actual ICP — not the vendor's marketing claim — on a 500 to 1,000 record sample
Coverage tested against the operator's specific geography, company-size band, and seniority band
Refresh cadence documented, with the implication for re-enrichment frequency budgeted in advance
Verification vendor selected and integrated as a post-enrichment pass before any send-stage tool consumes the data
Multi-source pattern designed at the architecture stage, even if only one vendor is purchased at the entry tier
Compliance posture documented for any EU or California targeting — lawful basis, vendor warranties, suppression propagation
Do-not-reach list applied at the enrichment-pull boundary, not only at the send boundary
Decay budget — the expected re-enrichment frequency — built into the per-campaign cost model from day one

Where enrichment fits

Enrichment is the data-quality substrate beneath the prospect graph (Chapter 04), the segmentation architecture (Chapter 05), and the intent integration (Chapter 06). It is also the input to the operational list management discipline of Chapter 08 — suppression, deduplication, refresh cadence, the bounce-rate implications that propagate into the email cluster. The operator whose enrichment posture is correct has a list whose ceiling is set by the ICP definition and the market. The operator whose enrichment posture is wrong has a list whose ceiling is set by the worst per-attribute accuracy rate in the data layer, and no amount of downstream sending discipline reaches above it.

Related chapters

Operational List Management — Hygiene at Production Scale — the maintenance discipline enrichment feeds.
Prospect Graph Construction — Beyond the Flat List — the data structure enrichment writes into.
Buyer Intent Data — Where the Signal Actually Is — the third-party signal layer paired with enrichment.
Email Bounce Codes Explained — the downstream impact of bad enrichment.

Was this guide useful?

Skip the setup

Allston Labs operates the full sending estate as a service.

We provision domains, configure the entire authentication record set, run warmup, and monitor reputation across providers. The stack lives under your entity. The engineer on call lives in your Slack.

See the service →Book a call →