Richard Makara’s GTM Context OS — the operational scaffold for PULL.

Richard Makara’s GTM Context OS is an open-source Claude Code repo template — cross-editor by AGENTS.md, so it runs against Cursor, Copilot, Windsurf, and Aider — that scaffolds a two-mode system: a context mode that ingests sales call transcripts and produces PULL analyses, and an operational mode that connects CRM, dialer, and enrichment via MCP and runs the resulting campaigns. The asymmetric framing: among the several “Context OS for GTM” projects now in circulation, Makara’s is the only one that genuinely operationalizes Rob Snyder’s PULL framework. Most of the others share the name and none of the methodology.

TL;DR

GTM Context OS is an open-source repo template — Claude Code-native, cross-editor via AGENTS.md — that scaffolds a context-and-operations system for a single GTM team grounded in PULL.
The directory schema is principled: context.md, demand/, segments/, messaging/, campaigns/, engine/, scripts/, workflows/, .mcp.json, .claude/. Each layer is grounded in the one below it.
Two-mode split: context mode produces PULL analyses and personas from transcripts; operational mode runs campaigns through CRM and dialer integrations.
Five slash commands ship in the box: /pull-query, /intake, /gtm-os-status, /handover, and the under-discussed /gtm-os-upgrade that handles opt-in pattern upgrades without breaking customer customizations.
The repo’s own framing: “every layer is grounded in the one below it. No messaging without demand evidence. No campaigns without tested messaging.” The layering is the entire engineering thesis.
Where it stops short: the engine/ directory is light, there is no multi-tenant architecture, there is no closed reply backflow into PULL scores, there is no agency model for running it across customers, and the campaign-execution layer is thin relative to the context layer.
The repeated operator conflation: Makara’s repo is the one with the PULL skill and the call-transcript schema. Jacob Dietle’s same-named gtm-context-os-quickstart is a generic knowledge-graph scaffold with no PULL and no sales-call DNA. Operators adopt the wrong one.
How we extend it: multi-tenant version that runs across customer engagements, the engine/ integrations actually built out, Skylarq as the closed-loop layer that re-scores PULL from inbound replies, the agency-shaped operating layer Makara intentionally left to the operator.

Why read GTM Context OS

A specific class of repo has emerged over the past nine months — open-source “Context OS for X” templates that ship as Claude Code or cross-editor scaffolds for a specific operating domain. Most of them are knowledge-graph scaffolds in disguise: directory structures with vague context.mdintent files, generic ingest scripts, and a slash command surface that is mostly cosmetic. Makara’s GTM Context OS is the exception. It is the only repo in the category that names PULL by author, encodes a transcript-to-PULL scoring schema, and treats the operational layer as a separate mode with its own MCP surface. The asymmetry between Makara’s repo and the others in the category is structurally similar to the asymmetry between PULL and the dozens of discovery rubrics that surround it — small, principled, load-bearing in operation.

The repeated operator confusion is worth naming explicitly. There is a separate, same-genre, separately-authored repo by Jacob Dietle called gtm-context-os-quickstart. The name is nearly identical. The DNA is not. Dietle’s repo is a generic knowledge-graph scaffold for GTM — it has no PULL skill, no transcript-scoring schema, no buyer-language persona-generation step, and the slash-command surface is templated rather than methodological. Operators who hear “GTM Context OS” in a podcast or a tweet and pip-search the name end up on Dietle’s repo about half the time, install it, and conclude that the category is light. The category is light in most implementations. Makara’s is not.

One upstream point that matters for the rest of this chapter. GTM Context OS is a scaffold, not a service. Makara ships a repo that gets a thoughtful operator 60% of the way to a PULL-grounded operating system; the remaining 40% — multi-tenant operation, real CRM hook plumbing, the closed reply-backflow that updates PULL scores from outbound performance, the agency-shaped operating layer — is left to the operator. The repo is honest about the boundary. Operators who treat it as a system rather than as a scaffold conclude they have a production system when they have an excellent set of empty directories.

The two-mode architecture

The central engineering decision in Makara’s repo is the clean separation between context mode and operational mode. The two modes share a directory and a config, and the boundary between them is whether the slash command reads from local files or writes to an external system. The boundary is enforced by the MCP surface in .mcp.json — operational connections are declared, opt-in, and absent by default. A fresh-cloned repo runs context mode out of the box. Operational mode lights up only when the operator wires the integrations.

Context mode is the upstream half. Sales call transcripts enter through /intake, get parsed against the PULL schema, and produce PULL-scored call analyses with evidence quotes and rationale. The scored analyses synthesize into segment-level persona files in segments/, and the persona files anchor the messaging in messaging/. The repo’s own framing, lifted from the README, is the engineering thesis: “every layer is grounded in the one below it. No messaging without demand evidence. No campaigns without tested messaging.” The layering is not aesthetic. It is the constraint that prevents the operator from writing copy from intuition and prevents the campaign layer from running off unvalidated personas.

Operational mode is the downstream half. CRM connects through MCP, dialer connects through MCP, enrichment connects through MCP, and the campaigns directory holds the actual sequences that get sent. The mode boundary is what keeps the context layer auditable — operational mode reads from messaging/ but cannot write back to demand/ or segments/ without an explicit synthesis pass. The disciplined operator runs the upstream pass on a fixed cadence and the operational pass continuously, and the two modes stay in sync without polluting each other.

The directory schema

The directory layout is the most legible artifact of Makara’s thinking. Each folder maps to a layer of the system, and the layering is enforced by which slash command writes to which folder.

context.md — the root context file that every Claude Code session reads on session start. Holds the company’s offer, ICP working hypothesis, PULL scoring conventions, and pointers into the rest of the tree. The session-level memory of the system. If context.md is empty, every slash command runs context-blind and produces low-signal output. The first hour of adoption is filling this file.
demand/ — the PULL analyses. Each scored transcript lives here as a markdown file with P, U, L, L scores, evidence quotes, and a rationale. This is the upstream-most data layer in the system — the layer that segments/ and messaging/ are derived from. Treat it as immutable per-call output; corrections happen by re-scoring, not by hand-editing.
segments/ — the persona files. Each persona is a synthesis across the demand/ files that scored above a PULL threshold — typically 2/3 or higher on at least three of the four dimensions. The persona file holds the segment definition, the firmographic cut, the language the segment uses for its own Project and its own Lacking, and the comparative references that anchored. Personas are derived from demand, not asserted.
messaging/ — the copy library. Subject lines, opening lines, value-prop framings, and CTA architecture, each one annotated with the persona it serves and the demand/ files whose language it draws from. The audit trail from a piece of copy back to the transcript line that justified it is the property that makes this layer load-bearing rather than decorative.
campaigns/ — the campaign objects. Sequences, A/B variants, audience filters, and results. Each campaign references the messaging files it draws from and the persona it targets. This is the operational-mode read surface — operational mode reads campaign objects, writes results back, and never modifies the messaging files in flight.
engine/— the integrations layer. CRM connectors, dialer connectors, enrichment connectors. In Makara’s shipped repo this layer is intentionally light — the repo describes the contract but expects the operator to wire the actual integrations. The honest framing in the repo is that the engine is a placeholder the operator must build. Operators who skip past this discover the gap on day three.
scripts/ — the orchestration shell scripts that the slash commands call. Pure plumbing — the scripts translate slash-command invocations into the file-tree operations and MCP calls that produce the actual artifacts. Worth reading before adopting because the conventions encoded here shape what is easy to extend.
workflows/ — multi-step procedures that compose slash commands. The intake → score → synthesize → revise flow lives here, as does the upgrade workflow that the /gtm-os-upgradecommand runs. The workflows are where the framework’s opinionated layering becomes executable.
.mcp.json — the MCP configuration. Declares which external systems the operational mode can talk to. A fresh repo ships this empty by design; the operator declares CRM, dialer, and enrichment connections explicitly, which is the gate that keeps context mode from accidentally triggering writes to production CRMs.
.claude/ — the Claude Code surface. Holds the slash command definitions, the agent definitions, and the skill files. This is the customization surface that the /gtm-os-upgrade command knows how to merge into when a new version of the framework ships.

The principled point of the layering is the property that messaging cannot reference a claim that does not appear in demand/, and campaigns cannot reference messaging that has not been derived from a persona. The constraint is enforced by the workflows, not by the tooling — an operator can violate it by editing files directly. The discipline of running the workflows is what makes the layering load-bearing.

The slash commands

Five commands ship in the box. Each one maps to a workflow that operates on the directory tree, and the surface is intentionally small.

/pull-query — the analytical surface. Run a question against the scored demand/ files: which segments scored highest on Urgency in the last 60 days, which Lacking patterns recurred across more than three calls, which persona has the strongest Project signal. The command operates as a structured query over the markdown corpus, with the scoring schema providing the dimensions. This is the command operators run weekly to identify ICP drift.
/intake — the ingest surface. Hand it a transcript in any format — Granola, Gong, Fathom, Zoom, plain text — and it parses, scores against PULL, writes the scored markdown into demand/, and proposes the persona-file updates that the new score implies. The proposal is opt-in; the operator confirms before segments/ is mutated, which is the gate that keeps the upstream layer auditable.
/gtm-os-status — the health surface. Reports on the state of the system: how many transcripts in demand/, the modal PULL score, the freshness of each persona file, the date of the last campaign run, the integrations connected in .mcp.json. The command is the dashboard that tells the operator whether the system is being fed or has gone stale.
/handover — the continuity surface. Generates a portable session-handover artifact that captures the current state of context.md, the open questions in the working ICP, and the next actions queued in workflows/. The command exists because the framework anticipates the operator switching between sessions and editors, and the cross-editor handover is what keeps the framework usable in a team that runs across Cursor and Claude Code and Aider in the same week.
/gtm-os-upgrade— the underrated piece. When Makara ships a new version of the framework, the upgrade command merges the new patterns into the operator’s working repo as opt-in patches. The customer keeps their customizations; the framework upgrades the parts that don’t conflict; conflicts surface as opt-in prompts. The mechanism solves the central problem of any opinionated open-source framework — how to ship improvements without breaking the operators who customized — and it is the engineering decision that distinguishes a serious open-source project from a one-shot template. Operators who skip the upgrade cadence diverge from improvements within a quarter.

Where GTM Context OS stops short

The repo is rigorous on the scaffold and intentionally thin on the production layer. Five gaps an operator hits the moment they try to run it as a system rather than as a context organizer.

The engine/ directory is described, not built. The CRM connectors, the dialer connectors, the enrichment connectors — the layer that lets operational mode actually send outbound — are sketched as contracts in the repo and left to the operator to implement. The honest framing in the README is that the engine is a placeholder. Operators who clone the repo and run /intake get value on day one. Operators who try to run a campaign on day three discover that the engine is empty and that wiring it is a two-to-four-week build for a single tenant.
No multi-tenant architecture. The repo is designed for one team in one repo. The directory schema, the slash commands, the MCP config — all of it assumes a single GTM context. The moment the operator wants to run the same framework across two customer engagements, the conventions break: two context.md files, two demand/ trees, two MCP configs, two upgrade cadences. The agency model is not in scope. For a single-team operator this is fine; for an operator running it as a service across customers, the missing tenancy is the entire build.
No closed feedback loop from reply data into PULL scores. The framework scores transcripts beautifully and synthesizes personas cleanly. What it does not do is read replies from the campaign layer and use them to update the upstream PULL scores. A “wrong timing” reply on an outbound campaign should reduce the Urgency score on that prospect and re-rank the persona. The repo provides no path for the reply event to flow back into demand/, which means the upstream layer drifts away from observed reality the longer the system runs.
No agency operating model. The framework cannot be cleanly run from one operator seat across multiple customer engagements. There is no convention for cross-tenant access control, no shared upgrade cadence across tenants, no portfolio-level view of which customer is scoring highest on which pattern. An agency that wants to standardize PULL across a dozen customers has to either fork the repo per customer or build the multi-tenancy layer themselves.
The campaign-execution layer is thin relative to the context layer. The campaigns/ directory holds objects, but the execution machinery — multi-touch cadence, A/B variant rotation, reply classification, deliverability monitoring — is left for the engine to implement. The reply rate is the metric that PULL ultimately drives, and the layer at which reply rate is measured and tuned is exactly the layer where the shipped scaffold is lightest. Operators who adopt the framework purely for the context layer get the most value; operators who adopt it expecting an end-to-end campaign system have to build the back half.

These are not criticisms of the framework. The scope Makara chose — a principled context organizer with operational-mode hooks — is the scope where the engineering thesis holds together. The gaps are intentional, and the operator who closes them is the operator who turns the scaffold into a production system.

How Allston extends it

The architecture we have built on top of Makara’s scaffold across customer engagements. The directory schema and the slash-command surface are preserved; the production layer is what we add.

Multi-tenant operation. One operator seat runs the framework across the customer portfolio. Each customer engagement gets a tenant-scoped context.md, demand/, segments/, and messaging/, with shared scoring conventions across tenants and per-tenant override surfaces. The MCP configs are tenant-scoped so a single operator session cannot accidentally write to the wrong customer’s CRM.
Real engine integrations. The CRM, dialer, and enrichment connectors that Makara left as contracts are built out as production-grade integrations. Salesforce, HubSpot, Attio on the CRM side; Aircall and Orum on the dialer side; Clay and Apollo on the enrichment side. The engine handles auth rotation, rate limits, idempotency, and dead-letter queues — the operational concerns that the scaffold defers.
Skylarq as the closed-loop layer. Every reply on every outbound campaign flows into Skylarq, classifies against the PULL dimensions, and writes a re-score event back into the demand/ tree on the relevant tenant. The PULL scores stay calibrated against observed reply reality, not against a one-shot scoring pass that gets stale by quarter-end. The loop closes in days, not quarters.
The agency-shaped operating layer. Portfolio-level views — which customers are running the framework, which are due for an /gtm-os-upgrade pass, which have ICP drift signals firing, which are running campaigns off stale messaging — surface at the operator console. The agency model lets one operator hold a dozen customer engagements at the same level of operational discipline that a single-team adopter holds one.
The campaign-execution back half.Multi-touch sequencing, A/B variant rotation across persona, reply classification, deliverability monitoring, and reply-routing into the customer’s Slack within minutes. The back half of the funnel — where reply rate is measured — is the layer Makara’s repo is lightest on and the layer where we ship the most production code.

The aggregate property of these extensions: the framework runs as a live system across a portfolio of customer engagements, not as a single-team scaffold that goes stale between scoring passes. The PULL scores refresh from reply data weekly. The personas refresh from PULL data weekly. The campaigns refresh from persona data weekly. The upgrade cadence runs monthly across the portfolio. Makara’s scaffold is the spine; the multi-tenant production layer is what we ship on top.

Operator failures observed when adopting GTM Context OS

Adopting the directory schema without ingesting any transcripts. The most common adoption pattern. The operator clones the repo, admires the structure, fills in context.md with the company’s offer, and stops. demand/ stays empty, no PULL scoring runs, no persona derivation happens, and the slash commands operate over an empty corpus. The framework needs roughly 20 scored transcripts before the upstream layer produces signal; operators who skip that ingest step have a beautiful empty scaffold within a week and conclude the framework is theoretical.
Treating the engine/ directory as built-out. The README is honest that the engine is a placeholder. Operators who skim and assume the CRM and dialer connectors are wired discover the gap on the day they try to run a campaign. The wiring is a real build — auth, rate limits, idempotency, dead-letter handling — and underestimating it produces a launch slip of two to four weeks. The framework cannot ship that layer because the integrations are customer-specific; the operator who skips reading the relevant section discovers this the hard way.
Running it on one rep’s calls but not the team’s.The PULL rollup math is calibrated against team-level scoring volume. One rep producing five scored calls a week produces noise; a four-rep team producing twenty produces signal. Operators who pilot the framework on a single founder’s calls and judge the output get an under-powered view of what the rollups can do. The threshold for useful persona derivation is roughly 20 to 30 scored transcripts per persona; below that, the synthesis pass produces personas that pattern-match to a single conversation rather than to a segment.
Skipping the /gtm-os-upgrade cadence.The underrated piece. The framework versions monthly, and the upgrade command merges new patterns into the operator’s working repo as opt-in patches. Operators who clone once and never run the upgrade command diverge from improvements within a quarter. The diverged repo still works, but the operator stops getting the value of Makara’s continued investment, and the framework becomes a frozen snapshot rather than a living scaffold. The cadence is calendar-discipline: run the upgrade command monthly, accept the patches that don’t conflict, opt out of the ones that do, move on.
Confusing Makara’s repo with Dietle’s gtm-context-os-quickstart.The same-named repos belong to different authors and contain different DNA. Makara’s repo encodes PULL with a transcript-scoring schema, a persona-derivation workflow, and a five-command surface anchored in Snyder’s methodology. Dietle’s repo is a generic knowledge-graph scaffold with no PULL skill and no sales-call format. Operators who hear “GTM Context OS” in a podcast and pip-search the name end up on the wrong repo about half the time, install it, and conclude that the category is shallow. The category is shallow in most implementations. Makara’s is the exception. Name the distinction before adopting.
Using the operational-mode slash commands before the context-mode files have been populated. The temptation, once the integrations are wired, is to skip past the upstream work and run campaigns immediately. The result is operational mode reading from an empty messaging/ tree, falling back to generic copy, and producing low-PULL outbound at scale. The two modes are sequential by design: context mode populates the upstream layers, operational mode consumes them. Inverting the order produces a faster path to a worse outcome.

GTM-Context-OS-adoption checklist

The right repo has been cloned — Makara’s, not Dietle’s same-named quickstart — and the distinction has been verified by checking for the PULL skill file in .claude/
context.md has been populated with the company’s offer, ICP working hypothesis, PULL scoring conventions, and the pointers into the rest of the tree
The first 20 sales-call transcripts have been ingested through /intake into demand/ and the modal PULL score is known across the corpus
The PULL scoring schema is fixed — a 0-3 scale per dimension, two to four evidence quotes per score, a one-sentence rationale — and the conventions live in context.md
The persona files in segments/ have been generated from the high-PULL transcripts and at least one persona has been confirmed against a recent closed-won customer
The copy in messaging/ has been revised to use the language that appears verbatim in the demand/ files of the highest-PULL buyers, not the operator’s pre-drill phrasing
The engine/ integrations have been wired to a real CRM and a real dialer — not stubbed, not mocked, not deferred — and the integrations have been tested against a sandbox before pointing at production
The /gtm-os-upgrade cadence is on the calendar — monthly at minimum — and the operator has run at least one upgrade pass to confirm the merge behavior on their customizations
The reply backflow is running — every campaign reply flows back into demand/ as a re-score event, and the PULL scores stay calibrated against observed reply reality
The /gtm-os-status command is run weekly and the metrics — transcript volume, modal PULL score, persona freshness, last campaign run, integrations connected — are tracked over time

Where this fits

GTM Context OS is the operational scaffold that builds on the methodological foundation laid out in Rob Snyder’s PULL framework. The relationship is clean: PULL is the rubric; GTM Context OS is the repo schema and slash-command surface that runs the rubric across a real corpus of transcripts. Operators who read PULL first and adopt GTM Context OS second understand exactly what each layer of the directory tree is for and why the workflows enforce the layering they do. The reverse path — adopting the repo without reading PULL — produces operators who copy the directory schema without internalizing the constraint that drives it.

On the upstream side, our Salesgraph research skills chapter covers the prospect-research scaffold that feeds the audience targeting layer in campaigns/. On the retrospective side, our closed-won deconstruction guide is the drill that runs against the highest-PULL files in demand/ to extract the linguistic patterns that anchor messaging/. The three chapters together — PULL methodologically, GTM Context OS operationally, closed-won deconstruction retrospectively — are the input set that produces an ICP and a messaging library that don’t drift.

Related chapters

The PULL framework — the methodological foundation that GTM Context OS operationalizes.
Salesgraph research skills — the prospect-research scaffold that feeds the upstream targeting layer.
Closed-won deconstruction — the retrospective drill that runs against demand/ to anchor messaging/.
ICP hypothesis testing — falsifiable ICP refinement, the campaign-layer compliment to the call-layer drill.

Was this guide useful?

Ship the scaffold

Allston Labs runs Makara’s scaffold across customer engagements with the engine, multi-tenancy, and reply backflow built out.

We adopt the GTM Context OS directory schema and slash-command surface, build the engine integrations to real CRM and dialer systems, run it multi-tenant across the customer portfolio, and close the loop with Skylarq classifying every reply back into PULL. The scaffold is open source. The production layer is what we ship.

See how we run it →Book a call →