Rival 2.0, From Brief to Survey

Phase 1 · Generation

🔍

Layer 1 · Before any LLM call

Progressive Disclosure

Claude reads the brief, decides which card specs to fetch

Claude ~15ms

Claude's reasoning over the brief

Screeners detected → needs flow spec

Age + frequency screeners require branch logic and screen-out card schemas

get_card_spec("flow")

Brand awareness → multi_select, NPS → nps card

Standard brand tracker cards, core + scale categories

get_card_spec("core")
get_card_spec("scale")

No images, no conjoint, no qual depth

cards-media.md, cards-research.md, cards-qualitative.md, never loaded, never billed

skipped

Three parallel file reads · no LLM

get_card_spec("core")  →  cards-core.md   ~7k chars   ← disk read, ~5ms
get_card_spec("scale") →  cards-scale.md  ~8k chars   ← disk read, ~5ms
get_card_spec("flow")  →  cards-flow.md   ~7k chars   ← disk read, ~5ms

// All three parallel via Promise.allSettled, ~5ms total

💡

This is progressive disclosure, not lazy loading. The point isn't performance, it's precision. Claude only reasons over context relevant to this brief. A system prompt containing all 70 card specs would dilute attention and risk Claude borrowing patterns from card types that have no business being in a brand tracker.

⚡

Layer 2 · Turn 2, Iteration 1

Claude Generates FlowDefinition JSON

streamObject enforces Zod schema, invalid JSON corrected before resolve

Claude · Bedrock SSE tokens

Iteration 1 output, FlowDefinition JSON (abridged)

{
  "version": "1.0",
  "entry": "scr_age",
  "nodes": [
    { "type": "screener", "id": "scr_age",
      "card": { "cardType": "single_choice",
        "content": { "textKey": "scr_age.text",
          "options": [
            { "value": "18_24",   "labelKey": "scr_age.opt.18_24" },
            { "value": "25_34",   "labelKey": "scr_age.opt.25_34" },
            { "value": "45_plus", "labelKey": "scr_age.opt.45_plus" }
          ] } },
      "screenOutIf": "45_plus", "next": "scr_coffee_freq" },

    { "type": "screener", "id": "scr_coffee_freq",
      "screenOutIf": ["monthly", "rarely"], "next": "q_awareness" },

    // ⚠ q_nps and q_attributes route unconditionally after q_awareness
    //, no branch on brand_x awareness (iteration 1 flaw)
    { "type": "card", "id": "q_awareness", "next": "q_nps" },
    { "type": "card", "id": "q_nps",       "next": "q_attributes" },
    { "type": "card", "id": "q_attributes", "next": "end" },
    { "type": "card", "id": "end",          "card": { "cardType": "end_card" } }
  ],
  "translations": {
    "en": {
      "scr_age.text": "What is your age?",
      "q_nps.text":   "How likely are you to recommend Brand X?",
      // ... all keys always use textKey/labelKey, never hardcoded text
    }
  }
}

🔑

Every string in the flow uses translation keys (textKey, labelKey, etc.), never hardcoded text. This is enforced in the system prompt. It's what makes multilingual translation at publish time possible without any re-authoring.

🧠

Layers 3–5 · After generation, before forge

3× Semantic Passes, GPT-4o in Parallel

Coverage · Flow logic · AI probe quality, all three fire simultaneously

GPT-4o Promise.allSettled

Iteration 1 results, two passes fail

Pass 1: coverage

1. NPS + attributes unconditional, respondents unaware of Brand X cannot meaningfully answer. Filter after q_awareness missing.
2. No primary brand usage question, brief implies brand tracker.

FAIL

Pass 2: flow_logic

q_nps reached unconditionally after q_awareness. Respondent selecting only "none" will be asked to recommend Brand X, logically invalid. Branch on q_awareness required.

FAIL

Pass 3: ai_probes

No AI probe cards in this survey, pass vacuously.

PASS

⛔

Issues found → forge does not run. A fix prompt is built and sent back to Claude with all issues listed in JSON-term language. Iteration 2 begins.

Iteration 2 fix prompt to Claude

// Fix prompt structure
The survey flow has semantic issues. Fix ALL of them.

ORIGINAL BRIEF: "We need a quick brand tracker..."

ISSUES TO FIX:
1. [coverage]   NPS + attributes unconditional, filter after awareness needed
2. [coverage]   No primary brand usage question
3. [flow_logic] Branch on q_awareness before NPS required

CURRENT FLOW (JSON): { ... }

Return ONLY corrected JSON. No markdown, no explanation.

Iteration 2 output, corrected nodes

+ q_primary_brand, single_choice + branch_awareness, routes on q_awareness includes "brand_x" + seq_brand_x, [q_primary_brand → q_nps → q_attributes]

Iteration 2 semantic passes, all green

Pass 1: coverage ✓

PASS

Pass 2: flow_logic ✓

PASS

Pass 3: ai_probes ✓

PASS

↩

Max 3 iterations. Each fix prompt includes all accumulated issues + the current flow JSON. If all 3 iterations exhaust with verify still failing, the best available flow is saved with verify_ok = false, the publish gate will block until a human fixes it.

Phase 2 · Compilation

⚙️

Layer 6 · After all semantic passes green

forge(), Deterministic Compiler

FlowDefinition JSON → survey.js · same JSON always same output · no LLM

Deterministic ~5ms

Generated survey.js (abridged)

// Generated by survey-forge, do not edit manually
// Flow version: 1.0

export async function run(interview) {

  const scr_age = await interview.ask({
    id: "scr_age", type: "single_choice", content: { textKey: "scr_age.text", ... }
  });
  if (scr_age === "45_plus") {
    await interview.ask({ type: "screen_out_card", ... }); return;
  }

  const scr_coffee_freq = await interview.ask({ id: "scr_coffee_freq", ... });
  if (["monthly", "rarely"].includes(scr_coffee_freq)) {
    await interview.ask({ type: "screen_out_card", ... }); return;
  }

  const q_awareness = await interview.ask({ id: "q_awareness", type: "multi_select", ... });

  if (q_awareness.includes("brand_x")) {  // branch added in iteration 2
    const q_primary_brand = await interview.ask({ id: "q_primary_brand", ... });
    const q_nps           = await interview.ask({ id: "q_nps", type: "nps", ... });
    const q_attributes    = await interview.ask({ id: "q_attributes", type: "slider_grid", ... });
  }

  await interview.ask({ id: "end", type: "end_card", ... });
}

🔒

forge() throws on compiler assumption violations. Missing required fields, empty branch cases, sequence referencing unknown node IDs, caught at emit time, not at runtime. The throw is caught by the iteration loop and fed back to Claude as a fix prompt.

Phase 3 · Verification

🔬

Layers 7–10 · After forge

verifyProgram(), 3 VM Execution Paths

Node.js VM runs the compiled survey.js against mock answers on three tracks

Node.js VM 5s timeout

What verifyProgram() catches, and deliberately doesn't

✓ Catches

Syntax errors · runtime JS crashes
Missing terminal node (no end_card)
Dead branch targets (node ID doesn't exist)
Infinite loops / 5s VM timeout
Screen-out path not terminating

✗ Not designed to catch

Wrong routing logic (syntactically valid)
Missing cards on a valid path
Scale misconfiguration (min:1 max:1)
Paths not covered by 3 mock tracks
→ GPT-4o passes cover these

The three execution tracks

Track A, happy path, brand_x aware

✓ PASS

mock answers: scr_age="25_34" · scr_coffee_freq="daily" · q_awareness=["brand_x","nescafe"] · q_nps=9

single_choicescr_age

single_choicescr_coffee_freq

multi_selectq_awareness

single_choiceq_primary_brand

npsq_nps

slider_gridq_attributes

end_cardend ✓ terminal reached

Track B, happy path, brand_x not in awareness

✓ PASS

mock answers: scr_age="35_44" · scr_coffee_freq="weekly" · q_awareness=["nescafe","lavazza"]

single_choicescr_age

single_choicescr_coffee_freq

multi_selectq_awareness

← branch else: q_primary_brand, q_nps, q_attributes all skipped (brand_x absent)

end_cardend ✓ terminal reached

Track C, screen-out path

✓ PASS

mock answers: scr_age="45_plus"

single_choicescr_age

screen_out_cardscr_age_screen_out ✓ screen-out reached

When verify fails, the fix loop

// Example: typo in branch condition, q_awarness instead of q_awareness
Track A runtime error: Cannot read properties of undefined (reading 'includes')

// translateVerifyErrors() converts to JSON-term language for the fix prompt:
"Track A runtime error: Cannot read properties of undefined (reading 'includes')
— check branch node 'on' values reference card IDs that exist in nodes[],
  and all branch 'next' and 'default' targets exist in nodes[]."

// Fix prompt sent back to Claude → iteration 3 → forge → verify again

🔧

In the Studio-native version, the fix prompt also includes compiler.ts source code, so Claude sees exactly which line in survey.js the typo produced and can fix it with certainty rather than inference. This is one of the six architectural improvements over the standalone server.

SSE event timeline, complete across both iterations

status{ message: "Generating flow (iteration 1/3)..." }iter 1

token{ token: "...", iteration: 1 } × manyiter 1

status{ message: "Running semantic validation (iteration 1)..." }iter 1

pass{ name: "coverage", passed: false, issues: [...], iteration: 1 }iter 1

pass{ name: "flow_logic", passed: false, issues: [...], iteration: 1 }iter 1

pass{ name: "ai_probes", passed: true, issues: [], iteration: 1 }iter 1

forge← skipped, issues founditer 1

status{ message: "Generating flow (iteration 2/3)..." }iter 2

token{ token: "...", iteration: 2 } × manyiter 2

pass{ name: "coverage", passed: true, issues: [], iteration: 2 }iter 2

pass{ name: "flow_logic", passed: true, issues: [], iteration: 2 }iter 2

pass{ name: "ai_probes", passed: true, issues: [], iteration: 2 }iter 2

status{ message: "Verifying compiled program..." }iter 2

verify{ ok: true, issues: [], paths: [...], iteration: 2 }iter 2

done{ studySlug, studyName, nodes: 9, warnings: [...], iterations: 2, elapsed: 18.4s, verifyOk: true, imageSlots: 0 }iter 2

Phase 4 · Review UI

🖼️

Layers 11–12 · Image slots + human review

Review UI, Flow Inspection + Image Slot Resolution

DB row written · image slots flagged · researcher reviews and edits inline

Human forge on edit

DB row written after verify passes

authoring_studies {
  id:                   "coffee_brand_tracker_2026",
  flow_json:            /* corrected FlowDefinition JSON */,
  program:              /* verified survey.js */,
  verify_ok:            true,
  image_slots_resolved: true,   // no image_grid cards in this survey
  status:               "draft",
  iterations:           2,
  warnings: [
    "[coverage] NPS unconditional, filter after awareness added",
    "[flow_logic] Branch on q_awareness required, added in iteration 2"
  ]
}

Image slot detection, for surveys with image_grid cards

📸

Claude never generates image URLs. Every image slot has an imageIntent string instead, a structured description designed to be a high-quality stock photo search query. The system scans for unresolved slots and blocks publish until they're all filled.

q_brand_mood, image_grid ⚠ 4 slots unresolved

🔍

"young professional in a busy city street, morning commute, energetic urban atmosphere"

🔍

"person hiking in lush forest, peaceful, natural light, outdoors"

🔍

"group of friends laughing at a dinner table, warm social celebration"

🔍

"cosy home kitchen, morning coffee ritual, warm lighting"

💡

Each 🔍 Search button fires the imageIntent string as the query against Pixabay, Unsplash, or an internal asset library. The researcher picks a result, or pastes a URL, or uploads their own. Once all slots on all image_grid cards are resolved, image_slots_resolved flips to true.

Edit path, no LLM

// Researcher edits a card inline in the Review UI
POST /api/authoring/edit  { slug, updatedFlowJson }
  → forge(updatedFlow)
  → verifyProgram(program, updatedFlow)
  → ok:   DB.updateProgram(slug, program, true)
           return { ok: true }
  → fail: return { ok: false, issues: translated[] }
           // returned directly to UI as structured errors
           // NO LLM, NO fix prompt, NO generation loop

🎯

The distinction is deliberate. A human made a deliberate change, the errors are theirs to fix. The UI displays them inline next to the card that caused them. No LLM involvement means instant feedback, no generation cost, and deterministic results.

Phase 5 · Publish

🚀

Layer 13 · Publish gate + deploy

Publish, Gate, Translate, Deploy to Edge

Hard gates · parallel translation · R2 upload · WfP worker deploy · D1 write

Cloudflare Claude · translate

Publish gate, hard checks, no override

🔴

verify_ok === false Returns 400 "Study has unresolved verify errors", deploy blocked entirely

🟡

image_slots_resolved === false Returns 400 "Study has 4 unresolved image slots in q_brand_mood, resolve all images before publishing"

🟢

Both gates green → proceed to deploy The gate is enforced at the API level, not the UI level, the UI cannot click around it

Translation, parallel per language · Claude

// generateTranslations(), parallel, not sequential
// 60 translation keys × 4 languages (fr, de, es, ar)

Sequential: 4 API calls × ~3s each = ~12s
Parallel:   4 API calls simultaneously = ~3s total

// Input: flat key→value JSON of clean survey copy
// Output: translations/fr.json, de.json, es.json, ar.json → R2
// Already multilingual-safe: no idioms, no culture-specific refs unless brief asked

Deploy sequence, all parallel where possible

Parallel

📝

generateTranslations(flow, ["fr","de"]) → Claude

flat key→value, parallel per lang

📦

Upload survey.js → R2

{slug}/survey.js, reads stored .program from DB, no re-forge

🌍

Upload translations/{lang}.json → R2

one file per language

🔨

Build WfP bundle from wfp-entry.ts via esbuild in-process

platform=neutral, @r2/ai-providers/worker not full index

☁️

Deploy UserWorker to WfP dispatch namespace as study-{slug}

every study its own Cloudflare Worker, per-study isolation

⚙️

PATCH compat settings separately

CF ignores compatibility_date in multipart PUT, known CF API quirk

🗃️

Write D1 row: INSERT INTO studies ON CONFLICT DO UPDATE

status → published

🏗️

Publish reads stored survey.js from the DB, no re-forge, no re-verify. The program was verified at generation time. If verify_ok is false, the gate blocks before any of this runs.

Bonus · Synthetic Respondents

🤖

Post-publish · pre-launch validation

Synthetic Respondents, 1,000 LLM Calls in Parallel

Each persona runs the full survey JSON in one shot, complete answer map per respondent

1,000× parallel Same artifacts

Input → output

Input per call

survey.js
flow_json
translations/en.json
persona: "35yr old urban coffee
drinker, brand loyal,
NPS detractor"

Output per call

{
  scr_age:      "25_34",
  q_awareness:  ["brand_x"],
  q_nps:        3,
  q_attributes: {
    taste: 4, value: 2
  }
}

⚡

Not one question at a time. The LLM sees the full structure, understands the routing, knows which questions it would reach given its own prior answers, and produces a coherent complete response in one pass. The headless player replays the answer map through the actual renderer, proving the UI handles the data correctly.

What this unlocks

Pre-launch data validation

Does the data model hold up with real-shaped data before a single real respondent sees it?

Quota simulation

Run 1,000 synthetics, see if your quotas fill correctly across demographics

Edge case generation

Generate personas specifically designed to hit rare branches and edge paths

Analysis pipeline testing

Real-shaped data flowing through crosstabs, DataTalk, exports before launch

Regression testing

Re-run synthetics after any flow edit, compare answer distributions

All Layers · Complete Error Space

🛡️

Complete reference

14-Layer Defense Pipeline

Every layer, every error class, who catches it, the complete map

14 layers

#	Layer	What it catches	Who
1	Progressive disclosure	Wrong card type for the job, Claude reasons only over relevant specs. Adding 50 new card types doesn't degrade generation quality.	Claude
2	streamObject Zod	Structurally invalid JSON, missing required fields, wrong types, invalid discriminated union values. Corrected before forge ever sees it.	Zod + SDK
3	Pass 1: coverage	Research objectives from the brief missing from the flow. "Brief says measure consideration, no consideration card found."	GPT-4o
4	Pass 2: flow_logic	Routing that doesn't make logical sense. Screener on wrong value, NPS asked before awareness filter, dead branches.	GPT-4o
5	Pass 3: ai_probes	Vague probe objectives that won't produce useful qual data. "objective: understand opinions" vs. a specific insight target.	GPT-4o
6	forge()	Compiler assumption violations, empty branch cases, sequences referencing missing node IDs, fields the emitter expects but didn't find.	TypeScript
7	VM Track A	Happy path runtime crashes, missing end_card. Typo in branch condition, undefined variable mid-execution.	Node VM
8	VM Track B	Alternative happy path, routing divergence under different inputs. Dead-end path not exercised by Track A.	Node VM
9	VM Track C	Screen-out path doesn't terminate at screen_out_card. Disqualified respondent routed to a question instead.	Node VM
10	VM 5s timeout	Infinite loops, runaway dynamic_card iterations. Loop that never terminates → path times out → verify failure → fix prompt.	Node VM
11	Image slot detection	image_grid cards with unresolved imageIntent, flagged at save time. imageSlots: N in done payload, image_slots_resolved = false in DB.	Code
12	Edit path forge+verify	Human-introduced errors after generation. Broken branch target, invalid card config. Returned as structured errors, no LLM.	Code
13	Publish gate	Attempting to deploy an unverified or image-incomplete survey. Hard 400, no bypass, no "publish with warnings" path.	Code
14	Human review	Researcher judgment, wording, tone, brand-specific context the system can't know. Inline edits trigger forge+verify immediately.	Human

🎯

Each layer is doing what it's genuinely best at. Progressive disclosure = Claude judgment. Zod = structural enforcement. GPT-4o = reading comprehension and domain reasoning. VM = deterministic execution. Human = context the system can't know. No single layer can catch everything, together they leave no gap.

✅ Survey is live, what was just proved

✓The brief is covered, GPT-4o checked it (coverage pass)

✓The logic is sound, GPT-4o checked it (flow_logic pass)

✓The structure is valid, Zod enforced it (streamObject)

✓The program compiles, forge proved it (deterministic compiler)

✓Every path terminates, the VM ran it (3 tracks, 5s timeout)

✓Every image is resolved, the gate checked it (image_slots_resolved)

✓Multilingual from day one, Claude translated all keys in parallel at publish

✓Per-study isolated, its own Cloudflare Worker, every session its own Durable Object

✓The deploy is blocked until all of the above, the publish gate enforces it

Each layer is a statement of proof, not a hope. A researcher with no programming knowledge submits a brief and gets a deployable, verified, production-ready survey, with every structural error caught automatically and every path to deployment blocked until the output is genuinely ready.