Rival 2.0, From Brief to Survey
Authoring Pipeline
From brief to deployed survey
Every stage, every layer, every decision, from a researcher's words to a live, verified, multilingual survey on the edge.
✍️
The brief that drives this example "We need a quick brand tracker for a coffee brand. Screen for weekly+ coffee drinkers, age 18–45. Measure brand awareness, NPS, and satisfaction with key attributes. 10 minutes max."
Claude (Bedrock)
GPT-4o (Vercel AI SDK)
Deterministic code
Node.js VM
Human
Deploy
Phase 1 · Generation
🔍
Layer 1 · Before any LLM call
Progressive Disclosure
Claude reads the brief, decides which card specs to fetch
Claude ~15ms
Claude's reasoning over the brief
Screeners detected → needs flow spec
Age + frequency screeners require branch logic and screen-out card schemas
get_card_spec("flow")
Brand awareness → multi_select, NPS → nps card
Standard brand tracker cards, core + scale categories
get_card_spec("core")
get_card_spec("scale")
No images, no conjoint, no qual depth
cards-media.md, cards-research.md, cards-qualitative.md, never loaded, never billed
skipped
Three parallel file reads · no LLM
get_card_spec("core") → cards-core.md ~7k chars ← disk read, ~5ms get_card_spec("scale") → cards-scale.md ~8k chars ← disk read, ~5ms get_card_spec("flow") → cards-flow.md ~7k chars ← disk read, ~5ms // All three parallel via Promise.allSettled, ~5ms total
💡
This is progressive disclosure, not lazy loading. The point isn't performance, it's precision. Claude only reasons over context relevant to this brief. A system prompt containing all 70 card specs would dilute attention and risk Claude borrowing patterns from card types that have no business being in a brand tracker.
Layer 2 · Turn 2, Iteration 1
Claude Generates FlowDefinition JSON
streamObject enforces Zod schema, invalid JSON corrected before resolve
Claude · Bedrock SSE tokens
Iteration 1 output, FlowDefinition JSON (abridged)
{ "version": "1.0", "entry": "scr_age", "nodes": [ { "type": "screener", "id": "scr_age", "card": { "cardType": "single_choice", "content": { "textKey": "scr_age.text", "options": [ { "value": "18_24", "labelKey": "scr_age.opt.18_24" }, { "value": "25_34", "labelKey": "scr_age.opt.25_34" }, { "value": "45_plus", "labelKey": "scr_age.opt.45_plus" } ] } }, "screenOutIf": "45_plus", "next": "scr_coffee_freq" }, { "type": "screener", "id": "scr_coffee_freq", "screenOutIf": ["monthly", "rarely"], "next": "q_awareness" }, // ⚠ q_nps and q_attributes route unconditionally after q_awareness //, no branch on brand_x awareness (iteration 1 flaw) { "type": "card", "id": "q_awareness", "next": "q_nps" }, { "type": "card", "id": "q_nps", "next": "q_attributes" }, { "type": "card", "id": "q_attributes", "next": "end" }, { "type": "card", "id": "end", "card": { "cardType": "end_card" } } ], "translations": { "en": { "scr_age.text": "What is your age?", "q_nps.text": "How likely are you to recommend Brand X?", // ... all keys always use textKey/labelKey, never hardcoded text } } }
🔑
Every string in the flow uses translation keys (textKey, labelKey, etc.), never hardcoded text. This is enforced in the system prompt. It's what makes multilingual translation at publish time possible without any re-authoring.
🧠
Layers 3–5 · After generation, before forge
3× Semantic Passes, GPT-4o in Parallel
Coverage · Flow logic · AI probe quality, all three fire simultaneously
GPT-4o Promise.allSettled
Iteration 1 results, two passes fail
Pass 1: coverage
1. NPS + attributes unconditional, respondents unaware of Brand X cannot meaningfully answer. Filter after q_awareness missing.
2. No primary brand usage question, brief implies brand tracker.
FAIL
Pass 2: flow_logic
q_nps reached unconditionally after q_awareness. Respondent selecting only "none" will be asked to recommend Brand X, logically invalid. Branch on q_awareness required.
FAIL
Pass 3: ai_probes
No AI probe cards in this survey, pass vacuously.
PASS
Issues found → forge does not run. A fix prompt is built and sent back to Claude with all issues listed in JSON-term language. Iteration 2 begins.

Iteration 2 fix prompt to Claude
// Fix prompt structure The survey flow has semantic issues. Fix ALL of them. ORIGINAL BRIEF: "We need a quick brand tracker..." ISSUES TO FIX: 1. [coverage] NPS + attributes unconditional, filter after awareness needed 2. [coverage] No primary brand usage question 3. [flow_logic] Branch on q_awareness before NPS required CURRENT FLOW (JSON): { ... } Return ONLY corrected JSON. No markdown, no explanation.
Iteration 2 output, corrected nodes
+ q_primary_brand, single_choice + branch_awareness, routes on q_awareness includes "brand_x" + seq_brand_x, [q_primary_brand → q_nps → q_attributes]
Iteration 2 semantic passes, all green
Pass 1: coverage ✓
PASS
Pass 2: flow_logic ✓
PASS
Pass 3: ai_probes ✓
PASS
Max 3 iterations. Each fix prompt includes all accumulated issues + the current flow JSON. If all 3 iterations exhaust with verify still failing, the best available flow is saved with verify_ok = false, the publish gate will block until a human fixes it.
Phase 2 · Compilation
⚙️
Layer 6 · After all semantic passes green
forge(), Deterministic Compiler
FlowDefinition JSON → survey.js · same JSON always same output · no LLM
Deterministic ~5ms
Generated survey.js (abridged)
// Generated by survey-forge, do not edit manually // Flow version: 1.0 export async function run(interview) { const scr_age = await interview.ask({ id: "scr_age", type: "single_choice", content: { textKey: "scr_age.text", ... } }); if (scr_age === "45_plus") { await interview.ask({ type: "screen_out_card", ... }); return; } const scr_coffee_freq = await interview.ask({ id: "scr_coffee_freq", ... }); if (["monthly", "rarely"].includes(scr_coffee_freq)) { await interview.ask({ type: "screen_out_card", ... }); return; } const q_awareness = await interview.ask({ id: "q_awareness", type: "multi_select", ... }); if (q_awareness.includes("brand_x")) { // branch added in iteration 2 const q_primary_brand = await interview.ask({ id: "q_primary_brand", ... }); const q_nps = await interview.ask({ id: "q_nps", type: "nps", ... }); const q_attributes = await interview.ask({ id: "q_attributes", type: "slider_grid", ... }); } await interview.ask({ id: "end", type: "end_card", ... }); }
🔒
forge() throws on compiler assumption violations. Missing required fields, empty branch cases, sequence referencing unknown node IDs, caught at emit time, not at runtime. The throw is caught by the iteration loop and fed back to Claude as a fix prompt.
Phase 3 · Verification
🔬
Layers 7–10 · After forge
verifyProgram(), 3 VM Execution Paths
Node.js VM runs the compiled survey.js against mock answers on three tracks
Node.js VM 5s timeout
What verifyProgram() catches, and deliberately doesn't
✓ Catches
Syntax errors · runtime JS crashes
Missing terminal node (no end_card)
Dead branch targets (node ID doesn't exist)
Infinite loops / 5s VM timeout
Screen-out path not terminating
✗ Not designed to catch
Wrong routing logic (syntactically valid)
Missing cards on a valid path
Scale misconfiguration (min:1 max:1)
Paths not covered by 3 mock tracks
→ GPT-4o passes cover these
The three execution tracks
A
Track A, happy path, brand_x aware
✓ PASS
mock answers: scr_age="25_34" · scr_coffee_freq="daily" · q_awareness=["brand_x","nescafe"] · q_nps=9
single_choicescr_age
single_choicescr_coffee_freq
multi_selectq_awareness
single_choiceq_primary_brand
npsq_nps
slider_gridq_attributes
end_cardend ✓ terminal reached
B
Track B, happy path, brand_x not in awareness
✓ PASS
mock answers: scr_age="35_44" · scr_coffee_freq="weekly" · q_awareness=["nescafe","lavazza"]
single_choicescr_age
single_choicescr_coffee_freq
multi_selectq_awareness
← branch else: q_primary_brand, q_nps, q_attributes all skipped (brand_x absent)
end_cardend ✓ terminal reached
C
Track C, screen-out path
✓ PASS
mock answers: scr_age="45_plus"
single_choicescr_age
screen_out_cardscr_age_screen_out ✓ screen-out reached
When verify fails, the fix loop
// Example: typo in branch condition, q_awarness instead of q_awareness Track A runtime error: Cannot read properties of undefined (reading 'includes') // translateVerifyErrors() converts to JSON-term language for the fix prompt: "Track A runtime error: Cannot read properties of undefined (reading 'includes') — check branch node 'on' values reference card IDs that exist in nodes[], and all branch 'next' and 'default' targets exist in nodes[]." // Fix prompt sent back to Claude → iteration 3 → forge → verify again
🔧
In the Studio-native version, the fix prompt also includes compiler.ts source code, so Claude sees exactly which line in survey.js the typo produced and can fix it with certainty rather than inference. This is one of the six architectural improvements over the standalone server.
SSE event timeline, complete across both iterations
status{ message: "Generating flow (iteration 1/3)..." }iter 1
token{ token: "...", iteration: 1 } × manyiter 1
status{ message: "Running semantic validation (iteration 1)..." }iter 1
pass{ name: "coverage", passed: false, issues: [...], iteration: 1 }iter 1
pass{ name: "flow_logic", passed: false, issues: [...], iteration: 1 }iter 1
pass{ name: "ai_probes", passed: true, issues: [], iteration: 1 }iter 1
forge← skipped, issues founditer 1
status{ message: "Generating flow (iteration 2/3)..." }iter 2
token{ token: "...", iteration: 2 } × manyiter 2
pass{ name: "coverage", passed: true, issues: [], iteration: 2 }iter 2
pass{ name: "flow_logic", passed: true, issues: [], iteration: 2 }iter 2
pass{ name: "ai_probes", passed: true, issues: [], iteration: 2 }iter 2
status{ message: "Verifying compiled program..." }iter 2
verify{ ok: true, issues: [], paths: [...], iteration: 2 }iter 2
done{ studySlug, studyName, nodes: 9, warnings: [...], iterations: 2, elapsed: 18.4s, verifyOk: true, imageSlots: 0 }iter 2
Phase 4 · Review UI
🖼️
Layers 11–12 · Image slots + human review
Review UI, Flow Inspection + Image Slot Resolution
DB row written · image slots flagged · researcher reviews and edits inline
Human forge on edit
DB row written after verify passes
authoring_studies { id: "coffee_brand_tracker_2026", flow_json: /* corrected FlowDefinition JSON */, program: /* verified survey.js */, verify_ok: true, image_slots_resolved: true, // no image_grid cards in this survey status: "draft", iterations: 2, warnings: [ "[coverage] NPS unconditional, filter after awareness added", "[flow_logic] Branch on q_awareness required, added in iteration 2" ] }
Image slot detection, for surveys with image_grid cards
📸
Claude never generates image URLs. Every image slot has an imageIntent string instead, a structured description designed to be a high-quality stock photo search query. The system scans for unresolved slots and blocks publish until they're all filled.
q_brand_mood, image_grid ⚠ 4 slots unresolved
🔍
"young professional in a busy city street, morning commute, energetic urban atmosphere"
🔍
"person hiking in lush forest, peaceful, natural light, outdoors"
🔍
"group of friends laughing at a dinner table, warm social celebration"
🔍
"cosy home kitchen, morning coffee ritual, warm lighting"
💡
Each 🔍 Search button fires the imageIntent string as the query against Pixabay, Unsplash, or an internal asset library. The researcher picks a result, or pastes a URL, or uploads their own. Once all slots on all image_grid cards are resolved, image_slots_resolved flips to true.
Edit path, no LLM
// Researcher edits a card inline in the Review UI POST /api/authoring/edit { slug, updatedFlowJson } → forge(updatedFlow) → verifyProgram(program, updatedFlow) → ok: DB.updateProgram(slug, program, true) return { ok: true } → fail: return { ok: false, issues: translated[] } // returned directly to UI as structured errors // NO LLM, NO fix prompt, NO generation loop
🎯
The distinction is deliberate. A human made a deliberate change, the errors are theirs to fix. The UI displays them inline next to the card that caused them. No LLM involvement means instant feedback, no generation cost, and deterministic results.
Phase 5 · Publish
🚀
Layer 13 · Publish gate + deploy
Publish, Gate, Translate, Deploy to Edge
Hard gates · parallel translation · R2 upload · WfP worker deploy · D1 write
Cloudflare Claude · translate
Publish gate, hard checks, no override
🔴
verify_ok === false Returns 400 "Study has unresolved verify errors", deploy blocked entirely
🟡
image_slots_resolved === false Returns 400 "Study has 4 unresolved image slots in q_brand_mood, resolve all images before publishing"
🟢
Both gates green → proceed to deploy The gate is enforced at the API level, not the UI level, the UI cannot click around it
Translation, parallel per language · Claude
// generateTranslations(), parallel, not sequential // 60 translation keys × 4 languages (fr, de, es, ar) Sequential: 4 API calls × ~3s each = ~12s Parallel: 4 API calls simultaneously = ~3s total // Input: flat key→value JSON of clean survey copy // Output: translations/fr.json, de.json, es.json, ar.json → R2 // Already multilingual-safe: no idioms, no culture-specific refs unless brief asked
Deploy sequence, all parallel where possible
Parallel
📝
generateTranslations(flow, ["fr","de"]) → Claude
flat key→value, parallel per lang
📦
Upload survey.js → R2
{slug}/survey.js, reads stored .program from DB, no re-forge
🌍
Upload translations/{lang}.json → R2
one file per language
🔨
Build WfP bundle from wfp-entry.ts via esbuild in-process
platform=neutral, @r2/ai-providers/worker not full index
☁️
Deploy UserWorker to WfP dispatch namespace as study-{slug}
every study its own Cloudflare Worker, per-study isolation
⚙️
PATCH compat settings separately
CF ignores compatibility_date in multipart PUT, known CF API quirk
🗃️
Write D1 row: INSERT INTO studies ON CONFLICT DO UPDATE
status → published
🏗️
Publish reads stored survey.js from the DB, no re-forge, no re-verify. The program was verified at generation time. If verify_ok is false, the gate blocks before any of this runs.
Bonus · Synthetic Respondents
🤖
Post-publish · pre-launch validation
Synthetic Respondents, 1,000 LLM Calls in Parallel
Each persona runs the full survey JSON in one shot, complete answer map per respondent
1,000× parallel Same artifacts
Input → output
Input per call
survey.js flow_json translations/en.json persona: "35yr old urban coffee drinker, brand loyal, NPS detractor"
Output per call
{ scr_age: "25_34", q_awareness: ["brand_x"], q_nps: 3, q_attributes: { taste: 4, value: 2 } }
Not one question at a time. The LLM sees the full structure, understands the routing, knows which questions it would reach given its own prior answers, and produces a coherent complete response in one pass. The headless player replays the answer map through the actual renderer, proving the UI handles the data correctly.
What this unlocks
Pre-launch data validation
Does the data model hold up with real-shaped data before a single real respondent sees it?
Quota simulation
Run 1,000 synthetics, see if your quotas fill correctly across demographics
Edge case generation
Generate personas specifically designed to hit rare branches and edge paths
Analysis pipeline testing
Real-shaped data flowing through crosstabs, DataTalk, exports before launch
Regression testing
Re-run synthetics after any flow edit, compare answer distributions
All Layers · Complete Error Space
🛡️
Complete reference
14-Layer Defense Pipeline
Every layer, every error class, who catches it, the complete map
14 layers
#LayerWhat it catchesWho
1Progressive disclosureWrong card type for the job, Claude reasons only over relevant specs. Adding 50 new card types doesn't degrade generation quality.Claude
2streamObject ZodStructurally invalid JSON, missing required fields, wrong types, invalid discriminated union values. Corrected before forge ever sees it.Zod + SDK
3Pass 1: coverageResearch objectives from the brief missing from the flow. "Brief says measure consideration, no consideration card found."GPT-4o
4Pass 2: flow_logicRouting that doesn't make logical sense. Screener on wrong value, NPS asked before awareness filter, dead branches.GPT-4o
5Pass 3: ai_probesVague probe objectives that won't produce useful qual data. "objective: understand opinions" vs. a specific insight target.GPT-4o
6forge()Compiler assumption violations, empty branch cases, sequences referencing missing node IDs, fields the emitter expects but didn't find.TypeScript
7VM Track AHappy path runtime crashes, missing end_card. Typo in branch condition, undefined variable mid-execution.Node VM
8VM Track BAlternative happy path, routing divergence under different inputs. Dead-end path not exercised by Track A.Node VM
9VM Track CScreen-out path doesn't terminate at screen_out_card. Disqualified respondent routed to a question instead.Node VM
10VM 5s timeoutInfinite loops, runaway dynamic_card iterations. Loop that never terminates → path times out → verify failure → fix prompt.Node VM
11Image slot detectionimage_grid cards with unresolved imageIntent, flagged at save time. imageSlots: N in done payload, image_slots_resolved = false in DB.Code
12Edit path forge+verifyHuman-introduced errors after generation. Broken branch target, invalid card config. Returned as structured errors, no LLM.Code
13Publish gateAttempting to deploy an unverified or image-incomplete survey. Hard 400, no bypass, no "publish with warnings" path.Code
14Human reviewResearcher judgment, wording, tone, brand-specific context the system can't know. Inline edits trigger forge+verify immediately.Human
🎯
Each layer is doing what it's genuinely best at. Progressive disclosure = Claude judgment. Zod = structural enforcement. GPT-4o = reading comprehension and domain reasoning. VM = deterministic execution. Human = context the system can't know. No single layer can catch everything, together they leave no gap.
Survey is live, what was just proved
The brief is covered, GPT-4o checked it (coverage pass)
The logic is sound, GPT-4o checked it (flow_logic pass)
The structure is valid, Zod enforced it (streamObject)
The program compiles, forge proved it (deterministic compiler)
Every path terminates, the VM ran it (3 tracks, 5s timeout)
Every image is resolved, the gate checked it (image_slots_resolved)
Multilingual from day one, Claude translated all keys in parallel at publish
Per-study isolated, its own Cloudflare Worker, every session its own Durable Object
The deploy is blocked until all of the above, the publish gate enforces it
Each layer is a statement of proof, not a hope. A researcher with no programming knowledge submits a brief and gets a deployable, verified, production-ready survey, with every structural error caught automatically and every path to deployment blocked until the output is genuinely ready.