Chapter 02 · Identity
Behavioral classifier

The bot-detection model — behavioral classifier, not rules engine.

LinkedIn's account-protection system is not a list of forbidden actions. It is an adversarial behavioral classifier that scores every authenticated session against a per-account baseline and against the population distribution of legitimate users. The operators who treat it as a rules engine — "stay under 100 connection requests per week and you are fine" — are the operators whose accounts get restricted on volume well below the documented caps.

What the classifier actually does

The detection system is a behavioral classifier. It does not enforce a list of prohibited actions; it scores the joint distribution of session-level signal against two reference distributions and decides, probabilistically, whether the session is being driven by a human or by software impersonating one. The first reference distribution is the per-account baseline — the historical signature of how this specific account has previously behaved across its login history. The second is the population distribution of legitimate users on the platform globally. A session is flagged when its joint signature diverges from both.

LinkedIn's Trust & Safety engineering team has, in public engineering talks over the past several years, characterized the system in roughly these terms: a stream of session-level features feeds an ensemble of classifiers that produce a real-valued risk score, and the score determines whether the session continues, continues with friction (captcha challenge), or gets interrupted. The exact feature set and thresholds are not published — the entire premise of an adversarial classifier is that publishing the boundary lets adversaries sit just inside it. The practical implication is that no operator can build a deterministic compliance check. The most an operator can do is reduce the probability of crossing the threshold by minimizing the dimensions on which a session deviates from a legitimate-user baseline. Most of this chapter is the catalogue of those dimensions.

The signal layers

The classifier evaluates approximately six categories of signal per session, each of which has been described in either public engineering content from the platform's Trust & Safety team or reverse-engineered consistently across multiple independent security research teams:

  • Mouse-movement entropy — the path geometry, velocity profile, and per-action variance of cursor motion between interaction targets.
  • Keystroke timing distributions — inter-keystroke intervals, dwell times, and the distribution of correction events (backspaces, edits).
  • Scroll velocity and scroll-pause patterns — the human read-pause-read-pause signature versus the synthetic linear-scroll signature.
  • Click-target heuristics — which elements on a page get clicked, with what frequency, and at what pixel offset within the element bounding box.
  • Viewport and fingerprint persistence — TLS fingerprint, browser fingerprint, viewport dimensions, and their consistency across the session lifetime and across sessions.
  • Time-distribution of action across business hours — when the account is active, in what timezone, and with what diurnal pattern.

The classifier is most confident when multiple categories produce divergent signal simultaneously. A single category at the edge of the population distribution is noise; three categories simultaneously at the edge of the distribution is signal.

Mouse-movement entropy

The entropy of a legitimate user's cursor path between two interaction targets is, in empirical measurement, distributed in a fairly narrow range. The cursor moves along a curved trajectory that approximates a minimum-jerk path, decelerates as it approaches the target, and exhibits small corrective movements in the final 100-200 pixels before clicking. The per-segment variance is on the order of 8-15% across legitimate users.

Synthetic mouse motion — generated by browser-automation frameworks driving the DOM — historically moved the cursor along straight-line paths or did not move it at all (dispatching click events directly to elements without any intervening motion). Modern automation tools attempt to mimic human paths, but most produce a recognizable signature: paths that are either too smooth (no corrective micro-movements) or that exhibit periodic regularity inconsistent with the noise floor of a human hand on a physical pointing device.

The classifier's expected per-action entropy window is a moving target — it adapts to the population — but operators running automation with deterministic or pseudo-random path generation that does not match the human noise floor are producing one of the strongest single-category signals available to the system. The operators who survive this layer are running automation that drives a real browser with a real OS-level pointing event, not synthetic motion.

Inter-action timing

The legitimate user's inter-click time is heavy-tailed. The mean is somewhere in the 2-8 second range for active browsing, with a long tail extending into minutes when the user pauses to read, switches tabs, or attends to something off-screen. The distribution is not Gaussian and it is not uniform — it has the characteristic shape of a log-normal or power-law distribution truncated at the low end by the physical minimum of human reaction time.

Automation produces two recognizable failure modes. The first is uniformity — actions firing at near-identical intervals because the operator configured a fixed delay between requests. A distribution with a standard deviation below approximately 15% of the mean is, in empirical observation, near-pathognomonic for synthetic timing. The second is the absence of the long tail. A legitimate user will, with non-trivial probability, leave a session idle for 30 seconds or 5 minutes while doing something else; an automation script that never produces a pause longer than its configured maximum is missing the right-hand side of the distribution that the classifier expects.

The empirical lower bound that triggers high-confidence detection is approximately a 200ms sustained inter-action time. No human reliably produces clicks at that cadence — the visual processing and motor planning latencies alone exceed it. Operators whose automation has any action firing under 200ms after the prior action are producing a signal the classifier interprets unambiguously.

Session fingerprinting

The classifier evaluates a composite session fingerprint built from approximately the following components:

  • TLS fingerprint (JA3/JA4) — the hash of the TLS ClientHello, which uniquely identifies the underlying TLS library and its configuration.
  • HTTP header order and casing — the order in which headers are sent and their precise capitalization, which is a stable per-client signature.
  • Canvas fingerprint — the hash of a rendered canvas element, which encodes GPU, driver, font, and rendering-pipeline characteristics.
  • WebGL renderer string — the exposed GPU model and driver version.
  • AudioContext fingerprint — the hash of a rendered audio buffer, which encodes the audio subsystem characteristics.
  • Font enumeration — the set of installed system fonts, which is moderately distinctive per machine.
  • Screen and viewport metrics — physical pixel ratio, screen dimensions, available viewport dimensions, color depth.

The persistence requirement is the operational constraint. The fingerprint that establishes a session must persist for the operational lifetime of that session — and across sessions, the fingerprint should evolve only in ways consistent with how a real device's fingerprint evolves (browser version updates, font installations, occasional GPU driver updates). A fingerprint that mutates between requests within a single session is a detection signal. A fingerprint that resets entirely between sessions on the same account is a detection signal. The implicit expected fingerprint stability window is on the order of weeks to months for the underlying canvas/WebGL/audio components; longer for the TLS fingerprint, which should only change with browser version updates.

The IP-fingerprint joint distribution

The single most operationally consequential signal in the classifier is the joint distribution of (IP address, session fingerprint, behavioral pattern) evaluated against the account's historical session graph. Each individual dimension can vary within normal bounds; the joint distribution is what gets evaluated, and sudden combinations are flagged.

An account that has historically logged in from a residential IP in Boston, on a MacBook fingerprint, with a particular keystroke timing distribution, and that suddenly produces a session from a residential IP in Frankfurt, on a Windows fingerprint, with a different keystroke timing distribution, is producing a joint signal that the classifier treats with high suspicion regardless of whether any individual component is anomalous in isolation. The Frankfurt IP is fine. The Windows fingerprint is fine. The keystroke pattern is fine. The combination of all three at once, on an account whose prior 200 sessions were all Boston/Mac/consistent-keystroke, is the signal.

This is the structural reason why account architecture and proxy infrastructure (Chapters 1 and 3) are sequenced before this chapter. A correctly architected operational session reuses the same IP, fingerprint, and behavioral pattern for the operational lifetime of the account; an incorrectly architected one rotates them and produces detection signal on every rotation.

Click-target heuristics

The legitimate user's distribution of clicks within a page is heavy on visible content and light on hidden elements. Users click buttons, links, and interactive elements that are rendered in the viewport; they do not click hidden DOM nodes, off-screen elements, or elements whose display:none property would prevent a human from seeing them.

Browser-automation tools that target elements by CSS selector and dispatch click events at the element's reported coordinates are detectable on at least two dimensions. The first is the precision of the click coordinate — automation tools click at the exact center of an element bounding box, or at exact integer pixel offsets, whereas a real user's clicks distribute around the centroid with a few-pixel standard deviation. The second is the willingness to click elements outside the rendered viewport — automation that scrolls programmatically and then clicks an element that was offscreen before the scroll is producing a signal that the classifier reads as inhuman, particularly when the scroll-to-click interval is below the physical minimum of human visual processing.

The behavioral baseline

A new account has no behavioral baseline. The classifier evaluates new-account sessions against the population distribution only, which is a weaker signal — the population is broad and accommodates a wide range of legitimate patterns. As the account accumulates session history, the per-account baseline becomes the dominant reference, and deviation from it becomes the dominant signal.

The operational consequence is sharp: running automation against a new account before establishing a baseline is the strongest possible detection signal. The first 20-50 sessions of an account's life are how the classifier learns what the account's owner looks like behaviorally. If those sessions are automated, the baseline that gets established is the automation's signature, and every subsequent session is evaluated against an inhuman reference — a configuration the classifier is specifically designed to flag.

The correct sequencing is the 2-4 week manual engagement runway described in Chapter 5. The baseline gets established with human behavior, then operational automation can run against that baseline with deviation that is small in the per-account dimension. Operators who skip the runway produce the strongest single signal available to the classifier — and they typically do not understand why their accounts are flagged within the first week.

Time-distribution across business hours

A legitimate user's daily activity clusters in two to three peaks: morning catch-up, mid-afternoon engagement, occasional evening sessions. The peaks align with the user's stated timezone (as inferred from profile metadata and historical login IPs). The diurnal distribution has near-zero activity in the 1am-6am window of the user's local timezone.

Automation that distributes its actions uniformly across 24 hours — to maximize the per-day action ceiling — is producing the simplest possible signal to detect. A flat 24-hour distribution is incompatible with any legitimate user profile on the platform. The same is true of automation that produces activity at 3am local time on an account whose profile says it is based in New York.

The classifier's tolerance for distribution shape is broad — it accommodates shift workers, international travelers, night-owl operators — but it is not infinite. Operators who run their automation on a global UTC schedule rather than the account holder's local timezone are producing one of the more easily detected patterns.

The restriction escalation ladder

The platform's enforcement is staged. An account does not get permanently restricted from a single anomalous session. The escalation ladder, as observed empirically across multi-account agency operators:

StageStateDurationTrigger to next stage
1MonitoredIndefiniteRisk score elevated; no user-visible effect; additional friction on next action
2Soft-block (action-specific)24-72hSpecific action blocked — search, connect requests, or messaging — others continue
3Soft-block (broader)72h-7dMultiple action categories blocked simultaneously
4Temporary restriction7-14dAccount login itself produces a verification challenge; outreach disabled
5Permanent restrictionIndefiniteAccount removed from the graph; appeal is the only path back

The empirical observation is that approximately 60% of accounts that reach stage 2 with an automation flag — as opposed to a benign volume flag — eventually progress to stage 5 within a 90-day window. The classifier appears to weight prior soft-block history heavily; accounts with a soft-block on record are evaluated against a tighter risk threshold on subsequent sessions.

The appeal process

The platform provides three appeal mechanisms, roughly in order of friction. The captcha challenge is the lowest-friction — a session that triggers it can typically continue after completion. The ID verification flow requires uploading a government-issued photo ID; success rates here are high for legitimate users (above 90%) and moderate for accounts with a borderline behavioral profile.

The support ticket queue is the path for accounts that have reached temporary or permanent restriction. The empirical success rate on automation-flag appeals through this channel is under 20%, with most rejected appeals returning a non-specific template response. Successful appeals typically share a profile: a longstanding account, substantial graph and content history, identifiable human use, and a single isolated flagging incident rather than a pattern. Appeals from accounts that fit none of those criteria almost uniformly fail.

What does not appear to be in the classifier

Several signals that operators commonly worry about appear, in empirical observation, to be either absent from the classifier or weighted lightly:

  • Message content keyword analysis — the platform's response to messaging volume appears to be driven by complaint rate and reply rate, not by keyword detection. Operators sending the same template at scale are flagged on the volume signal long before any content signal.
  • Specific user-agent strings — the platform supports many legitimate clients (browsers, mobile apps, the official API where applicable, third-party integrations). A non-default user-agent on its own is not a signal; a user-agent inconsistent with the rest of the fingerprint is.
  • Connection request acceptance rate, in isolation — the platform tolerates low-acceptance senders to some degree. The signal that matters is acceptance-versus-volume in the joint distribution, and the absence of behavioral consistency on top of it.

Common operator failures producing detection

  • Sub-200ms inter-action times. The single most reliable detection trigger. Any automation firing at this cadence is producing a signal the classifier reads unambiguously.
  • Near-perfect timing uniformity. A standard deviation below 15% of mean inter-action time is, in empirical observation, a strong signal regardless of how slow the cadence is.
  • Datacenter IP residency. The ASN of the source IP is a low-cost signal that the classifier evaluates routinely. Sessions from known datacenter ranges receive an immediate elevated risk score.
  • Headless browser signatures. The default fingerprints of headless Chrome, headless Firefox, and the standard browser-automation frameworks are well-catalogued. Operators running these without active fingerprint randomization and verification are producing population-distribution outliers.
  • Simultaneous sessions from impossibly distant IPs. Two active sessions on the same account from IPs that imply a physical travel time the user could not have made is a near-deterministic flag.
  • Running automation before establishing baseline. The new-account-plus-immediate-automation pattern. The strongest single detection signal available.

Pre-deployment checklist for minimizing detection signal

  • Account has been operated manually for at least 2-4 weeks with consistent fingerprint and IP residency (Chapter 5)
  • Operational IP infrastructure is residential, geographically consistent with the account's historical login pattern, and persistent across the operational session (Chapter 3)
  • Session fingerprint is generated once and persists across the operational session lifetime, with evolution only consistent with normal device aging
  • Inter-action timing distribution is heavy-tailed, with a sustained mean above 2 seconds and a non-trivial fraction of intervals above 30 seconds
  • No action fires within 200ms of any prior action under any condition
  • Daily activity is concentrated in 2-3 peaks aligned with the account's stated timezone, with near-zero activity in the local 1am-6am window
  • Mouse motion, if synthesized, produces curved paths with corrective micro-movements; if no synthesis is possible, dispatch only on the platform's published API surfaces rather than the web interface
  • No simultaneous sessions on the account from multiple IPs, ever

Where this fits in the broader infrastructure

The behavioral classifier is the substrate every operational decision in this reference sits on top of. The account architecture (Chapter 1) determines what behavioral baseline gets established. The IP infrastructure (Chapter 3) determines whether the joint (IP, fingerprint, behavior) distribution stays consistent across sessions. The connection limits (Chapter 4) determine whether the operator stays inside the volume tolerance the classifier extends to baselined accounts. The warmup runway (Chapter 5) determines whether the baseline is established with human behavior before automation runs against it. The messaging mechanics (Chapter 6) operate within the action-rate envelope the classifier permits. The automation category selection (Chapter 7) determines which signal categories the operator is exposed on.

An operator who reads only this chapter and skips the rest will produce a detection-resistant single session and a restricted account within the month. An operator who reads the full reference will produce an operational session that the classifier evaluates as legitimate not because every individual signal is perfect, but because the joint distribution stays within the bounds the per-account baseline establishes. The classifier rewards consistency over perfection.

Skip the setup

Allston Labs operates LinkedIn outbound as a service.

We provision the multi-account architecture, set up residential proxy infrastructure, run manual warmup, configure messaging cadences, and route replies into your Slack. The accounts live under your team. The engineer on call lives in your Slack.