Chapter 11 · Operations

Provider tooling

Postmaster Tools and SNDS — the reputation dashboards most senders never check.

Every major mailbox provider runs an independent reputation system. Two of them — Gmail and Microsoft — expose a diagnostic interface back to senders. A sender not enrolled in both is operating without visibility into the reputation receivers have already assigned them.

TL;DR

Check Gmail Postmaster Tools and Microsoft SNDS daily, not weekly — especially during the ramp.
Gmail reputation buckets: High, Medium, Low, Bad. Pause sending immediately on Low or Bad.
Microsoft SNDS bands: Green, Yellow, Red, color-coded per IP. Same rule — pause on Red.
The recovery sequence: pause sending → MXToolbox blacklist check → fix DNS → restart warmup → drop volume to 5/day per inbox → ramp by 5 every 2 days.
Postmaster reports a 4 to 7 day lead time over open-rate degradation. If you're only watching opens, you've already lost a week.

The premise

A receiving mail server does not classify mail primarily by reading individual messages. It classifies by referencing a continuously updated reputation score attached to the sending domain and IP, then adjusting that score based on subsequent engagement. The reputation score is the actual filter; the content is, at the margin, a tiebreaker.

This reputation lives inside the receiver's infrastructure. What the sender has, at Gmail and Microsoft specifically, is a published diagnostic interface — a sanitized view of the same reputation data the filter uses internally. Gmail calls theirs Postmaster Tools. Microsoft calls theirs Smart Network Data Services (SNDS), supplemented by the Junk Mail Reporting Program (JMRP) feedback loop. Senders who do not enroll in both are operating blind, and the operator who first discovers a reputation drop through a falling open rate has discovered it four to seven days late.

Gmail Postmaster Tools — what it requires

Enrollment requires proof of control over the sending domain. Google accepts a DNS TXT record at the apex, or an HTML file at the domain's web root. The TXT record is production-stable — the HTML file expires the moment the marketing team replaces the website without remembering the verification file existed.

Verification is per-domain. An organization with a corporate domain, a cold-sending domain, and three subdomains verifies five separate properties, each with its own reputation bucket. Verifying only the corporate domain and assuming the sending subdomains inherit produces silent absence of data on the properties that actually matter.

The dashboard then exposes seven data streams: domain reputation, IP reputation, spam complaint rate, SPF/DKIM/DMARC pass rates, delivery errors, encryption rates, and feedback-loop volume.

The four reputation buckets

Gmail publishes reputation as one of four discrete buckets: Bad, Low, Medium, High. The buckets are not linearly spaced and the cost of moving between them is not symmetric. This is the single most misread element of the dashboard.

High

A High-reputation sender lands in primary with high probability, has wide tolerance for content the spam classifier would otherwise score against, and recovers quickly if a campaign nudges metrics in the wrong direction. Most senders never reach this bucket; the ones that do typically have a multi-year engagement history.

Medium

A Medium-reputation sender lands in primary on engaged recipients, in promotions on cold recipients, and in spam on the long tail. This is the realistic ceiling for a competent cold-sending operation in the first six to twelve months of life. Moving from High to Medium is, in practice, a small change — perhaps a five-to-ten-point reduction in primary-tab placement on the marginal recipient.

Low

A Low-reputation sender lands in promotions on engaged recipients, in spam on cold recipients, and is rate-limited on volume excursions. Moving from Medium to Low is, by contrast, a meaningful degradation — a thirty-to-fifty-point drop in primary-tab placement is typical, and the recovery path is measured in weeks of reduced-volume sending.

Bad

A Bad-reputation sender is effectively undeliverable. Mail is routed to spam by default, much of it is silently dropped before reaching the spam folder, and no content-level mitigation recovers placement. The path back to Medium runs through 30 to 60 days of cessation, a documented incident response, and a fresh warmup runway. In several observed cases the domain is abandoned entirely.

The asymmetry — small cost from High to Medium, large from Medium to Low, terminal from Low to Bad — is the reason the operational posture has to be continuous monitoring rather than periodic review.

The Postmaster dashboards in detail

The domain reputation chart is a 7-day rolling view of the bucket assignment. Movement within a bucket is not visible — the dashboard surfaces the categorical bucket, not the underlying continuous score. The operator infers trajectory by observing how close the line tracks to the bucket boundary.

The spam rate chart is the leading indicator. It plots the fraction of delivered mail recipients mark as spam through the Gmail UI. The February 2024 bulk-sender enforcement criterion is a spam rate under 0.3% (Chapter 10). A single-day excursion above 0.3% is recoverable; a 7-day average sustained above it triggers enforcement that propagates into the reputation bucket within 48 to 72 hours.

The authentication breakdown reports the fraction passing SPF, DKIM, and DMARC alignment. A correctly configured estate runs above 99% on all three. Below 95% on any one indicates a misconfigured source, a forwarder-induced alignment failure (Chapter 6), or a third-party platform signing under an unexpected domain.

The feedback-loop volume reports the absolute count of complaints — the numerator of the spam-rate calculation. Cross-referencing against the campaign schedule identifies which campaign produced the spike. The mechanism conforms broadly to the Abuse Reporting Format defined in RFC 5965, though Gmail aggregates rather than delivering per-message reports as some other receivers do.

The 100-message threshold

Postmaster requires a minimum daily volume — Google's documentation describes this as "hundreds" of messages, and the empirically observed floor is approximately 100 messages per day from the verified domain to Gmail recipients specifically. Below that floor, the dashboard displays an empty state and the operator has zero visibility into reputation, spam rate, or authentication results.

This is the binding constraint for very-low-volume senders — the founder-led motion sending 30 to 50 emails per day, the agency running 10 sending domains each at sub-threshold volume to spread reputation risk. The fix is volume consolidation: fewer domains, each above the visibility threshold, with inbox-level separation handled at the from-address tier. A sub-threshold sending domain is not a low-risk domain — it is a domain whose reputation degradation will be invisible until it has become catastrophic.

Microsoft SNDS — what it requires

SNDS enrollment is IP-based, not domain-based. The operator requests access for a specific IP or CIDR range, and Microsoft validates control by sending an authorization request to the contact published in the reverse DNS PTR record. The operator who does not control reverse DNS — which describes most senders on shared infrastructure — cannot enroll directly and must rely on the sending platform to expose SNDS data on their behalf.

This is the reason SNDS adoption among cold senders is lower than Postmaster adoption: the verification model is harder to satisfy. It is not a reason to skip enrollment. A sender on dedicated IPs should enroll within the first week of provisioning; a sender on shared IPs should confirm the platform exposes SNDS reports as a contractual deliverable.

SNDS then exposes five data streams per IP: a reputation classification (the color band), a complaint rate against Microsoft's user base, a count of spam trap hits, the daily sample size, and the filter result distribution — the fraction of the sample landing in inbox, junk, or blocked at the SMTP transaction.

The SNDS color bands

SNDS publishes IP reputation as one of three color bands: Green, Yellow, Red. The bands map approximately, not exactly, to Gmail's High, Medium, and Low. Microsoft does not publish a fourth band corresponding to Bad — an IP degraded past Red typically disappears from SNDS reporting entirely, which is itself a diagnostic signal.

Green

A Green IP delivers reliably to inbox across Outlook.com, Hotmail.com, and Live.com, and is treated favorably when mail reaches corporate Exchange Online tenants. Most senders new to SNDS land in Yellow on day one, not Green; Green is earned.

Yellow

A Yellow IP delivers to inbox on engaged recipients, to junk on cold recipients, and shows variability across corporate tenants. This is the operational ceiling for the first 30 to 60 days on a new IP.

Red

A Red IP is rate-limited, routed to junk by default, and produces SMTP rejections on the most aggressive corporate filters. Recovery requires reduced volume, complaint resolution, and frequently a documented submission to Microsoft's sender support process. The path back to Yellow takes two to six weeks.

Tied to SNDS is the Junk Mail Reporting Program — Microsoft's complaint feedback loop. JMRP delivers, per registered IP, a per-message report each time a recipient marks the sender's mail as junk through any Microsoft surface. Reports conform to ARF (RFC 5965). Enrollment is separate from SNDS, has its own approval workflow, and is the source of truth for the complaint numerator SNDS aggregates.

The Microsoft Smart Network ecosystem

SNDS provides the diagnostic dashboard. JMRP provides the complaint stream. Both feed the same reputation system, which controls placement across Outlook.com, Hotmail.com, Live.com, and — critically — every corporate Exchange Online tenant whose administrator has not overridden the default filtering. The last category is the one most cold senders fail to model.

A sender who does not send to consumer Outlook addresses still sends, in practice, to thousands of corporate recipients whose mail is filtered by Microsoft before reaching the tenant's mailbox. Administrators routinely whitelist transactional vendors; they do not whitelist cold outbound mail. A Red SNDS reputation produces placement failure across the entire Microsoft-routed corporate population, regardless of the tenant's filter configuration.

Yahoo and Apple

Yahoo and Apple both operate independent reputation systems and do not expose diagnostic interfaces. Yahoo's Complaint Feedback Loop produces ARF-format complaint reports for enrolled IPs but offers no Postmaster equivalent. Apple iCloud exposes neither dashboard nor complaint feedback and treats deliverability as a closed system from the sender's perspective.

The operator infers Yahoo behavior through seed-list testing (Chapter 13) and DMARC RUA reports — Yahoo's reports include the disposition applied per source IP, the closest available signal to a reputation classification. Apple is inferred entirely through seed-list testing. The absence of a dashboard is not an absence of reputation enforcement.

Operational monitoring cadence

The posture that detects reputation drops in hours instead of weeks runs on three tiers:

Daily. Postmaster domain reputation, Postmaster spam rate (24-hour value), SNDS color band for every active IP. Five-minute review, automated alert on bucket transition or color-band change.
Weekly. Postmaster authentication breakdown, spam-rate 7-day average trend, SNDS complaint rate trend, SNDS filter result distribution. Thirty-minute review, correlated against the week's campaign volume.
Per-campaign. Monitor Postmaster spam rate at 24, 48, and 72 hours. The complaint signal surfaces approximately 24 hours after the send and continues developing for another 48. A campaign that looks clean at 12 hours is not yet cleared.

Alert thresholds the operator should pre-commit to: any downward movement out of Medium, any single-day spam rate above 0.2%, any 7-day average above 0.15%, any SNDS color-band transition, any SNDS complaint rate above 0.1%. These are tighter than the published enforcement floors — 0.3% at Gmail — for the same reason fire alarms are calibrated below ignition temperature.

The recovery sequence — when reputation drops

If your Gmail Postmaster bucket transitions from Medium to Low, or your SNDS color band turns Yellow → Red, you are past the warning phase and into recovery. The sequence that works, in order:

Pause all sending from affected domains immediately. Continued volume during a reputation drop compounds the damage. The cost of pausing for 48 hours is small; the cost of pushing through is multi-week.
Run a blacklist check at MXToolbox. Confirm whether the domain or sending IP has been added to Spamhaus, SORBS, SURBL, or other operational blocklists. If yes, follow the per-listing remediation path; that's a separate workstream from reputation recovery.
Fix all DNS records. Re-verify SPF (under the 10-lookup limit), DKIM signing on the right selector, DMARC at p=none minimum with a working rua destination. A surprising fraction of “sudden” reputation drops trace to a DKIM selector change or an SPF mutation introduced by adding a new sending platform.
Restart warmup on the affected domains. Treat them as if they're new — back to the cold-start runway, with warmup mail running at the engagement floor.
Drop production cold volume to 5/day per inbox. Ramp by 5 every two days. Same curve as a new domain.
Re-verify your list before sending. High bounce rate on the early ramp will reset progress immediately. Run the list through MillionVerifier, Kickbox, or NeverBounce and remove anything that doesn't come back clean.
If recovery doesn't happen within 30 days, consider abandoning the domain. A domain stuck at Bad is sometimes cheaper to retire than to nurse back. Rotate to a fresh domain from your warming pool.

The single hardest discipline: not sending during the recovery window. Operators who refuse to pause produce a slow grind-down rather than a recovery. The math doesn't favor pushing through.

The opens-are-flat trap

The most common failure at this layer is the operator who uses open rates as a reputation proxy and considers reputation healthy as long as opens hold steady. Opens are a lagging indicator. A reputation drop begins as a spam-rate excursion visible in Postmaster within 24 to 48 hours, propagates into placement degradation visible in seed-list testing within 48 to 96 hours, and only then surfaces as an open-rate change.

The lag between the leading indicator (Postmaster spam rate) and the lagging indicator (open rate) is, in observation, 4 to 7 days. An operator monitoring only opens has a week of degraded sending under their belt before the signal reaches them. By that point the next campaign is in flight, the slide is compounding, and the recovery window has narrowed from days to weeks.

Integration with the broader monitoring stack

Postmaster and SNDS produce two of the four sources required for a complete monitoring picture. The other two: DMARC aggregate reports (Chapter 3), which surface authentication failures across every receiver that honors the protocol, and seed-list inbox-placement testing (Chapter 13), which produces direct placement visibility on the receivers — Yahoo and Apple specifically — that publish no dashboard.

Any single source is incomplete. Postmaster covers Gmail and is blind below 100 messages. SNDS covers Microsoft but requires reverse-DNS control. DMARC covers authentication but is silent on placement. Seed-list testing covers placement but is sampled. Correlated, the four produce a complete picture. Three of them, with one omitted, produce a known blind spot — survivable if documented, catastrophic if not.

Common operator failures observed in production

Never enrolling in Postmaster Tools. The most common failure and the easiest to fix. The operator stands up infrastructure, runs warmup, launches campaigns, and discovers Postmaster exists only after the first incident. By then the historical data — which would have surfaced the slide weeks earlier — does not exist.
Enrolling but not reviewing. The dashboard is configured, no recurring review is scheduled, and the data accumulates in a Google account nobody opens. The dashboard equivalent of p=none with no rua parser — the configuration exists, the operational benefit does not.
Treating SNDS as optional because "we don't send to Hotmail". The operator models Microsoft as the consumer surface and ignores the corporate Exchange Online tenants routing through the same filter. This produces consistent placement failure on the B2B population the operation actually targets.
Sub-threshold volume across fragmented domains. Ten domains at 50 emails per day each: total volume 500, Postmaster visibility zero. The architecture optimizes for reputation isolation and accidentally for monitoring blindness.
Reviewing one and assuming the other tracks it. The two ecosystems have independent reputation systems, independent enforcement floors, and independent recovery curves. A clean Postmaster reading is not evidence of a clean SNDS reading, and the operator who conflates them is wrong roughly half the time.
Failing to register JMRP. SNDS is enrolled, color bands are reviewed, but the complaint stream feeding the calculation is not flowing back to the suppression pipeline. Recipients who marked the mail as junk continue receiving campaigns, the complaint rate compounds, and the color band transitions before the operator has any mechanism to respond.

Pre-launch checklist

Postmaster Tools verified via DNS TXT for the cold-sending domain, the corporate domain, and every sending subdomain
SNDS enrolled for every dedicated IP, with reverse-DNS authorization completed
JMRP enrolled per IP, with the resulting ARF reports routed into the suppression pipeline
A documented daily review cadence with alert thresholds defined below the enforcement floors
For shared-IP senders: a contractual deliverable from the platform exposing SNDS color bands and JMRP counts per IP
A defined response protocol for a bucket transition — pause criteria, volume reduction tier, escalation to seed-list testing
A correlated dashboard surfacing Postmaster, SNDS, DMARC RUA, and seed-list outputs in a single view

Where this fits in the broader infrastructure

The authentication layer (Chapters 1 through 6) gets mail accepted. The architecture layer (Chapters 7 through 10) gets mail accepted at scale without cross-contaminating reputation. The operations layer — this chapter plus Chapters 12 through 14 — makes the prior two layers continuously verifiable. A sender who provisioned the first two correctly and skipped the third is, in practice, indistinguishable from a sender who provisioned nothing, on the precise day a campaign goes wrong.

Postmaster and SNDS are the two highest-signal, lowest-cost monitoring streams available to a cold sender. Both are free. Both are operated by the receivers whose placement decisions determine revenue. Both surface leading indicators 4 to 7 days before the movement is visible in open rates. The senders who skip them are, with very few exceptions, the senders who later submit incident reports describing a collapse that came "out of nowhere."

Related chapters

Gmail and Yahoo Bulk Sender Requirements — the policy that turns Postmaster signals into compliance gates.
Bounce Taxonomy — SMTP Codes and the 3% Threshold — the metric that pairs with reputation buckets.
Seed List Inbox-Placement Testing — the synthetic monitoring stream that complements Postmaster.
How to Warm Up an Email Inbox for Cold Outreach — Postmaster is how you verify warmup actually moved the buckets.

Was this guide useful?

Skip the setup

Allston Labs operates the full sending estate as a service.

We provision domains, configure the entire authentication record set, run warmup, and monitor reputation across providers. The stack lives under your entity. The engineer on call lives in your Slack.

See the service →Book a call →