Nodes
A Node represents a single step in a Workflow. Each node contains a natural language instruction that an AI model executes, with access to tools from declared Skills.
Fields
Section titled “Fields”| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | REQUIRED | — | Display name for this node. MUST be non-empty. |
instruction | Source | REQUIRED | — | Natural language instruction for the AI model. MUST be non-empty. |
skills | string[] | OPTIONAL | [] | Skill IDs this node has access to. |
output | JSON Schema object | OPTIONAL | — | Structured output schema for this node’s result. |
max_turns | integer ≥ 1 | OPTIONAL | implementation‑defined | Maximum AI model turns for this node’s execution. |
model | string | OPTIONAL | inherited / impl‑defined | Execution model for this node’s AI invocation. Free-text passthrough. See Model Selection. |
disallowed_tools | string[] | OPTIONAL | [] | Built-in agent tools removed from the model context at this node. See Disallowed Tools. |
rules | NodeSources | OPTIONAL | inherited | Rules for this node. Additive by default; see cascade. |
context | NodeSources | OPTIONAL | inherited | Context for this node. Additive by default; see cascade. |
eval | Eval | OPTIONAL | — | Named evaluators (value, function, judge) run after the AI finishes the node. |
eval_policy | string | OPTIONAL | all_pass | How to aggregate evaluator results. v1 supports all_pass. |
requires | Requires | OPTIONAL | — | Machine-checked pre-conditions evaluated before the AI starts the node. |
retry | Retry | OPTIONAL | — | Node-local retry on eval failure with optional autonomous reflection. |
Max Turns Semantics
Section titled “Max Turns Semantics”The max_turns field limits how many AI model turns (request-response cycles) a node may consume. Each turn is one model invocation — a turn that produces tool calls and a turn that produces a final response both count.
A conforming executor:
- MUST enforce
max_turnswhen present, stopping the model after the specified number of turns. - When
max_turnsis absent, the executor applies its own default. The default is implementation-defined. - When the limit is reached, the executor SHOULD capture whatever partial result is available and produce a
NodeResultwithstatus: "failed"and a descriptive error indata. The executor MUST NOT silently discard the node’s work.
Different nodes have vastly different compute needs. A context-gathering node that queries multiple APIs may need many turns, while a summarization node with no tools may need one. Per-node limits give workflow authors fine-grained control over the compute budget.
Model Selection
Section titled “Model Selection”The optional model field selects which AI model executes a node. It is the execution-time counterpart to the judge model / judge_model fields, which select the model for evaluation.
Resolution is a cascade, narrowest wins:
- Node-level
model. - Workflow-level
model. - The executor’s implementation-defined default model.
A conforming executor:
- MUST forward the resolved model to the AI invocation (step 6 of the Node Execution Sequence) when any layer specifies one.
- MUST fall back to its implementation-defined default model when no layer specifies one. The executor MUST NOT invent a hardcoded model name in the spec.
- MUST NOT validate
modelagainst a registry or allowlist. The value is free-text passed through verbatim, consistent withjudge_model. Model availability and naming are the responsibility of the backend (or gateway) the executor targets.
model is independent of judge_model: one selects the execution model, the other the evaluation model. A node MAY set both. The common use is cost tiering: a cheap model on mechanical grunt nodes (lint, format, summarize) and a stronger model on the reasoning nodes.
Disallowed Tools
Section titled “Disallowed Tools”The disallowed_tools field names built-in agent tools the AI model MUST NOT have access to at this node. Names refer to the agent runtime’s built-in tools (e.g. Bash, Read, Edit, Write, WebFetch, WebSearch), not to skill-provided tool names declared in Skills.
Typical use: keep an implement node focused on the repo by removing WebFetch and WebSearch, or harden a notification node by removing Bash so the agent cannot shell out.
A conforming executor:
- MUST prevent the model from invoking any tool listed in
disallowed_toolsfor the duration of this node’s run. - SHOULD remove the named tools from the model’s context entirely, rather than rejecting calls after the fact, so the model does not waste turns attempting blocked tools.
- MUST NOT apply
disallowed_toolsto skill-provided tools registered viaskills. Skills are gated by their own declaration (only listedskillsare available);disallowed_toolsis exclusively for built-in agent tools. - When the field is absent or empty, the executor applies its default tool set with no removals.
implement: name: Implement Fix instruction: Read the issue, write the fix, run tests, commit. skills: - github disallowed_tools: - WebFetch - WebSearchThe exact set of names the runtime recognizes is implementation-defined and tracks the agent SDK in use. Workflow authors SHOULD verify a name is honored before relying on it; a typo silently no-ops because there is no built-in tool of that name to remove.
Rules & Context
Section titled “Rules & Context”Nodes can declare their own rules and context that interact with the workflow-level declarations via cascade.
NodeSources Type
Section titled “NodeSources Type”Per-node rules and context accept two forms:
Array form (additive — default):
investigate: name: Root Cause Analysis instruction: Investigate the alert. context: - ./security-playbook.mdThe node inherits all workflow-level sources AND adds its own.
Object form (with only flag):
vendor-review: name: Vendor License Review instruction: Check dependency licenses. rules: only: true sources: - ./license-policy.mdWhen only: true, the node does not inherit workflow-level sources for that field. Only the node’s own sources are used. This blocks cascade for that field only — context still inherits normally unless it also sets only: true.
Cascade Semantics
Section titled “Cascade Semantics”For each of rules and context, a conforming executor MUST resolve the effective sources for a node using this algorithm:
- Start with runtime input sources (if any).
- Append workflow-level sources (if any).
- If the node declares the field:
only: true— discard steps 1–2, use only the node’ssources.- Otherwise (array form, or
onlyabsent/false) — append the node’s sources.
The resolved sources are concatenated in this order and prepended to the node’s instruction. See Input Augmentation for the full assembly.
Effective rules = (only? node-only : runtime + workflow + node)Effective context = (only? node-only : runtime + workflow + node)Example: Mixed Cascade
Section titled “Example: Mixed Cascade”# Workflow levelrules: - ./coding-standards.mdcontext: - ./ARCHITECTURE.md
nodes: gather: name: Gather Context instruction: Investigate the alert. # No per-node rules/context — inherits everything from workflow
security-audit: name: Security Audit instruction: Audit for OWASP top 10. # Additive — gets workflow context + this context: - ./security-playbook.md
license-check: name: License Review instruction: Check dependency licenses. # Override — only these rules, no inheritance rules: only: true sources: - ./license-policy.md # Context still inherits from workflow (ARCHITECTURE.md)Instruction Semantics
Section titled “Instruction Semantics”The instruction field is the primary directive for the AI model at this step.
A conforming executor:
- MUST pass the instruction to the AI model as the primary directive.
- MUST NOT alter, summarize, or truncate the instruction.
- MAY augment the instruction with context from prior nodes (see Execution Model).
Input Augmentation
Section titled “Input Augmentation”A conforming executor MUST resolve rules and context from all three layers (runtime input, workflow-level, node-level) per the cascade semantics above. The effective rules are prepended with the heading:
## Rules — You MUST Follow These
{effective rules, concatenated}The effective context is prepended with the heading:
## Background Context
{effective context, concatenated}Skill Instruction Injection
Section titled “Skill Instruction Injection”If any skill referenced by the node has an instruction field, a conforming executor MUST inject each skill’s instruction into the prompt, in the order the skills appear in the node’s skills array:
## Skill: {skill.name}
{skill.instruction}When rules, context, and skill instructions are all present, the assembly order is: rules first, then context, then skill instructions, then the node’s base instruction, separated by ---.
┌─────────────────────────────────┐│ ## Rules — You MUST Follow These│ ← effective rules (cascaded)│ {rules content} │├─────────────────────────────────┤│ --- │├─────────────────────────────────┤│ ## Background Context │ ← effective context (cascaded)│ {context content} │├─────────────────────────────────┤│ --- │├─────────────────────────────────┤│ ## Skill: {skill.name} │ ← one per skill with instruction│ {skill.instruction} │├─────────────────────────────────┤│ --- │├─────────────────────────────────┤│ {node.instruction} │ ← the node's own instruction└─────────────────────────────────┘Externalizing Instructions
Section titled “Externalizing Instructions”Because instruction is a Source, you can keep long or shared prompts in a file or URL and reference them by path:
investigate: name: Root Cause Analysis instruction: ./prompts/investigate.md skills: - github - linearOr pin to a shared playbook repo:
gather: name: Gather Context instruction: https://raw.githubusercontent.com/acme/playbook/main/gather.mdFiles and URLs resolve once, eagerly, before any node runs. Resolved content is recorded in trace.sources with a content hash for audit.
Skills Semantics
Section titled “Skills Semantics”The skills array declares which Skills the node can access during execution.
A conforming executor:
- MUST resolve tools from the listed skill IDs and make them available to the AI model during node execution.
- MUST NOT make tools available from skills not listed in the node’s
skillsarray. - SHOULD silently skip skills that are not configured (missing required config values) rather than failing the workflow. The node executes with whatever tools are available from the remaining skills.
Output Schema Semantics
Section titled “Output Schema Semantics”The output field declares a JSON Schema that the node’s result data MUST conform to.
A conforming executor:
- MUST request structured output from the AI model conforming to this schema when
outputis present. - The structured output becomes the node’s result data, available to downstream nodes via context accumulation.
- When
outputis absent, the node’s result data is implementation-defined.
Requires
Section titled “Requires”Machine-checked pre-conditions. Evaluated before the LLM runs. If any declared check fails, the node is marked failed (or skipped) and the LLM is never invoked.
Catch missing upstream context — bad runtime input, an upstream node that returned without producing a required field — before burning tokens. The checks are deterministic and run synchronously.
Schema
Section titled “Schema”requires: output_required: [string] # paths must resolve, non-null/undefined output_matches: [OutputMatch] # equals / in / matches on_fail: fail | skip # default: failPath roots
Section titled “Path roots”Paths resolve against the cross-node context map:
{ input: <runtime input>, [priorNodeId]: <data of prior node>, ... }The grammar is identical to eval paths: dotted segments,
[*] wildcards, optional all: / any: prefix.
| Path | Resolves to |
|---|---|
input.repoUrl | The repoUrl field on the runtime input. |
triage.recommendation | data.recommendation of the prior triage node. |
any:scan.findings[*].severity | At least one finding has a non-null severity. |
Failure modes
Section titled “Failure modes”on_fail | Result status | Result data |
|---|---|---|
fail (default) | failed | { error: "requires failed: ..." } |
skip | skipped | { skipped_reason: "requires not met: ..." } |
In both cases the LLM is not invoked. Routing continues normally — edges
with when conditions can read the failure status and route around it.
Example
Section titled “Example”nodes: open_pr: name: Open PR instruction: Open a PR with the fix skills: [github] requires: output_required: - input.repoUrl - implement_fix.branch output_matches: - { path: implement_fix.filesChanged, matches: "^[1-9]" } on_fail: failThe eval field declares a list of named evaluators the executor runs after the AI model finishes a node. Each evaluator produces an EvalResult with a pass verdict and optional reasoning. Under the default eval_policy: all_pass, every evaluator must pass for the node to pass; any failure marks the node failed and the workflow halts (or routes to the next failure edge).
Eval catches the common “the model claims success without doing the work” failure mode: a tool was supposed to be called and wasn’t, a required field is missing from the structured output, a value is outside the allowed set, or the result claims a contract it doesn’t actually meet. It complements, and does not replace, conditional edges, which decide where to go next based on the result.
The shape mirrors how every other agent eval framework (LangSmith, Promptfoo, OpenAI Evals, DeepEval, Ragas) names this primitive. SWEny’s three evaluator kinds map to the same three categories the field has converged on.
Schema
Section titled “Schema”nodes: open_pr: name: Open PR instruction: Open a PR with the fix. skills: [github] eval: - name: pr_was_created kind: function rule: all_tools_called: [github_create_pr] - name: pr_url_present kind: value rule: output_required: [prUrl] eval_policy: all_pass # default; can be omittedEach entry in eval is an Evaluator with these fields:
| Field | Required | Applies to | Description |
|---|---|---|---|
name | REQUIRED | all kinds | Stable identifier for the evaluator. Used in result objects and retry preambles. |
kind | REQUIRED | all kinds | One of value, function, judge. |
rule | REQUIRED for value / function | value, function | The deterministic rule. See per-kind shapes below. |
rubric | REQUIRED for judge | judge | Natural-language rubric the judge model evaluates against the node’s data and trace. |
pass_when | OPTIONAL for judge | judge | Expected verdict word. Default yes. MUST be a single whitespace-free token; the judge response is parsed for it. |
model | OPTIONAL for judge | judge | Override the judge model for this evaluator. See Judge mechanics. |
A node with no eval field has no post-conditions; the node passes when the AI finishes without error.
The three kinds
Section titled “The three kinds”value: data-shape match
Section titled “value: data-shape match”Pure, deterministic, fast. Operates on the node’s structured output (result.data). Use when you can express the contract as a path-and-operator check.
The rule object accepts output_required and output_matches:
eval: - name: pr_url_well_formed kind: value rule: output_required: [prUrl, branchName] output_matches: - { path: prUrl, matches: "^https://github.com/" } - { path: branchName, matches: "^sweny/" }output_required is a list of paths into result.data that must each resolve to a present, non-null value. output_matches is a list of OutputMatch entries, each asserting equals, in, or matches against a path.
A single value evaluator MAY combine both fields. They are AND-ed within the rule.
function: trace-shape match
Section titled “function: trace-shape match”Pure, deterministic, fast. Operates on the node’s tool-call trace, not its data. Use when the contract is “the model did (or did not) call this tool.”
The rule object accepts any_tool_called, all_tools_called, and no_tool_called:
eval: - name: pr_was_created kind: function rule: all_tools_called: [github_create_pr] no_tool_called: [github_force_push]A tool “was called and succeeded” when it appears in the node’s tool-call trace with no error. For no_tool_called, any appearance, successful or not, counts as a violation.
The function kind is also the natural home for any future code-based check (for example, “the diff touched fewer than 10 files”). v1 covers tool-call assertions; the kind is intentionally named function rather than tool_call to leave that door open.
judge: LLM-as-judge
Section titled “judge: LLM-as-judge”Calls a small Claude model with the node’s data, the node’s tool-call trace, and the author’s rubric. The judge returns a single verdict word (default yes / no) plus a short reasoning string. Use when the contract is conditional, semantic, or otherwise outside the reach of a deterministic rule.
eval: - name: tests_present_when_pass_claimed kind: judge rubric: | If result.data.test_status is "pass", does result.data.test_files_changed contain at least one real test file path? An empty array with status "pass" is a contract violation. If status is anything else, this rule passes vacuously. pass_when: yesThe judge sees the node’s result.data, the tool-call trace, and the rubric. It does not see runtime input or upstream node results unless you put them in the rubric explicitly.
See Judge mechanics for model selection, cost gating, and parse failures.
eval_policy
Section titled “eval_policy”How the executor aggregates evaluator results into a single node verdict.
| Policy | v1 status | Behavior |
|---|---|---|
all_pass | shipped | Every evaluator must pass. Default. |
any_pass | reserved | At least one evaluator must pass. Not implemented in v1. |
weighted | reserved | Sum of scores above a threshold. Not implemented in v1. |
A conforming executor MUST accept eval_policy: all_pass and MAY reject other values until the corresponding semantics ship.
EvalResult type
Section titled “EvalResult type”Each evaluator produces a structured result. The full list lands on NodeResult.evals:
| Field | Type | Description |
|---|---|---|
name | string | The evaluator’s name. |
kind | value | function | judge | The evaluator’s kind. |
pass | boolean | Whether this evaluator passed. |
reasoning | string (optional) | Failure detail. Populated by the judge model on judge evaluators, by the executor’s failure formatter on value/function. Capped at ~500 characters. |
score | number (optional) | Reserved for weighted aggregation. Not populated in v1. |
Downstream nodes can read individual evaluator outcomes via context paths, e.g. priorNode.evals.pr_was_created.pass.
OutputMatch type
Section titled “OutputMatch type”| Field | Type | Required | Description |
|---|---|---|---|
path | string | REQUIRED | A path into result.data (see Path grammar). |
equals | any | one-of | Strict deep equality against the resolved value. |
in | any[] | one-of | The resolved value is in the array (deep equality per element). |
matches | string | one-of | A JavaScript regex source (no flags); the resolved value is coerced to a string and tested against it. |
Exactly one of equals, in, or matches MUST be set per entry.
Path grammar
Section titled “Path grammar”A path is a .-separated sequence of segments. A segment is either:
- An identifier matching
[a-zA-Z_][a-zA-Z0-9_]*— object property access, OR - An identifier followed by
[*]— wildcard expansion over an array.
The path MAY be prefixed with all: or any: (see Wildcard semantics). When no prefix is present, all: is implied.
Examples:
| Path | Resolves to |
|---|---|
prUrl | The prUrl property of result.data. |
findings[*].severity | The severity property of every element of the findings array. |
any:checks[*].conclusion | The conclusion property of any element of the checks array. |
issue.metadata.url | A nested object property. |
The grammar is intentionally minimal so workflow authors can read it in a sentence and tooling (Studio, linters) can parse it in a few lines. Richer expressions (filter predicates, JSONPath, CEL) are out of scope.
Path resolution
Section titled “Path resolution”A path is resolved by walking segments left-to-right against result.data:
- A non-wildcard segment that doesn’t exist on its parent object → resolution fails.
- A
[*]segment requires its parent to be an array. If the parent is not an array, resolution fails. If the parent is an empty array, expansion succeeds and the wildcard rule below applies. - Encountering
nullmid-path → resolution fails.
A failed resolution is treated as a failed check inside value evaluators. The reasoning string names the missing segment.
Wildcard semantics
Section titled “Wildcard semantics”When a path contains [*] and resolves successfully:
all:(default) — every resolved value MUST satisfy the operator. An empty array is vacuously true.any:— at least one resolved value MUST satisfy the operator. An empty array is false.
output_required follows the same rule. output_required: [findings[*].severity] means the findings array is present and every finding has a non-null severity. output_required: ["any:findings[*].severity"] means at least one finding does.
Judge mechanics
Section titled “Judge mechanics”Model selection
Section titled “Model selection”Three layers of override, narrowest wins:
- Evaluator-level
modelfield on ajudgeevaluator. - Node-level
judge_modelfield. - Workflow-level
judge_modelfield. Defaultclaude-haiku-4-5.
Judges return a single token verdict, so a small fast model is the right default.
Cost gating
Section titled “Cost gating”Workflow-level judge_budget (integer, default 50) caps the expected number of judge calls per workflow run. The executor SHOULD warn at load time when count(judges) * estimated_runs exceeds the budget. The budget is a soft signal in v1, not a hard runtime cap.
Parse failures
Section titled “Parse failures”When the judge response cannot be parsed for the pass_when token (model returned garbage, timed out, errored), the executor:
- Retries the judge call once.
- If the second call also fails, the evaluator is recorded as
pass: falsewithreasoning: "judge parse failure". - A workflow author who wants a parse-failure to be a halt can wrap the judge in a stricter retry policy at the node level.
Aggregation and failure reporting
Section titled “Aggregation and failure reporting”A conforming executor:
- MUST evaluate every declared evaluator (no fast-fail), so the workflow author sees every problem in one pass.
- MUST populate
NodeResult.evalswith one EvalResult per evaluator, in the order they were declared. - MUST mark the node failed under
eval_policy: all_passif any evaluator fails. The node’serrormessage is a structured list, one line per failing evaluator:name (kind): reasoning. - MUST NOT run
evalwhen the AI model already failed the node. Eval only runs against successful node executions.
A representative failure message:
eval failed (policy: all_pass): - pr_was_created (function): required all of [github_create_pr] to succeed, called: [github_search_issues] - pr_url_well_formed (value): output_required 'prUrl' missing segment 'prUrl' - tests_present_when_pass_claimed (judge): test_status was 'pass' but test_files_changed was emptyWorked example
Section titled “Worked example”implement-fix: name: Implement Fix instruction: Open a PR that fixes the issue and verify CI is green. skills: - github output: type: object properties: prUrl: { type: string } branchName: { type: string } test_status: { type: string, enum: [pass, fail, no-framework] } test_files_changed: { type: array, items: { type: string } } checks: type: array items: type: object properties: name: { type: string } conclusion: { type: string } required: [prUrl, branchName, test_status] eval: - name: pr_was_created kind: function rule: all_tools_called: [github_create_pr] no_tool_called: [github_force_push] - name: pr_url_well_formed kind: value rule: output_required: [prUrl, branchName] output_matches: - { path: prUrl, matches: "^https://github.com/" } - { path: branchName, matches: "^sweny/" } - { path: any:checks[*].conclusion, equals: "success" } - name: status_is_recognized kind: value rule: output_matches: - { path: test_status, in: [pass, fail, no-framework] } - name: tests_present_when_pass_claimed kind: judge rubric: | If result.data.test_status is "pass", does result.data.test_files_changed contain at least one real test file path? An empty array with status "pass" is a contract violation. If status is anything else, this rule passes vacuously. pass_when: yesWhen to use eval
Section titled “When to use eval”- Use a
valueevaluator for data-shape post-conditions: facts about the structured output that you can express as a path and an operator. - Use a
functionevaluator for trace-shape post-conditions: a specific tool was, or was not, called. - Use a
judgeevaluator for semantic or conditional post-conditions: claims that depend on context, comparisons across fields, or anything outside the reach of a deterministic rule. - Use conditional edges for routing: which node runs next based on the result.
- Use the node’s
outputJSON Schema for shape: the structural contract on the data. Eval is for the looser “did the model actually do the right thing” checks that JSON Schema can’t express.
Examples
Section titled “Examples”Minimal Node
Section titled “Minimal Node”gather: name: Gather Context instruction: Pull error details, logs, and recent commits related to the alert. skills: - github - sentry max_turns: 80Node with Structured Output
Section titled “Node with Structured Output”investigate: name: Root Cause Analysis instruction: >- Classify each issue as novel or duplicate. Assess severity and fix complexity for each finding. max_turns: 50 skills: - github - linear output: type: object properties: findings: type: array items: type: object properties: title: { type: string } severity: { type: string, enum: [critical, high, medium, low] } is_duplicate: { type: boolean } fix_complexity: { type: string, enum: [simple, moderate, complex] } required: [title, severity, is_duplicate] novel_count: { type: number } highest_severity: { type: string } required: [findings, novel_count, highest_severity]Node with No Skills
Section titled “Node with No Skills”A node with no skills has no tools. The AI model executes the instruction using only its training and the accumulated context.
summarize: name: Summarize Findings instruction: >- Produce a concise summary of all findings for the notification. Include severity, root cause, and links to created issues.Node-local self-healing on eval failure. When eval fails, the
executor re-invokes the LLM up to max additional times, prepending
feedback derived from the failing evaluators.
Triggered ONLY by eval failure, not by tool/API errors and not by
requires failure. Re-running cannot fix upstream data problems.
Schema
Section titled “Schema”retry: max: integer # ≥ 1 instruction: # optional | string # static preamble | { auto: true } # LLM-generated diagnosis (default prompt) | { reflect: string } # LLM-generated diagnosis (author prompt)instruction value | Behavior |
|---|---|
| (omitted) | Default preamble: a structured list of failing evaluators (name (kind): reasoning, one per line) followed by “Fix and try again.” |
"static text" | Author’s text + the structured failing-evaluator list appended. |
{ auto: true } | Executor calls claude.ask with a default reflection prompt; the response becomes preamble. |
{ reflect: "<prompt>" } | Same as auto, but the author’s reflect prompt is used as the diagnosis question. |
The preamble is prepended to the node’s normal instruction so the LLM sees it before the original task. Each retry uses only the latest eval failure as feedback. Older errors are noise.
Reflection failure
Section titled “Reflection failure”If claude.ask throws or returns empty during autonomous mode, the
executor falls back to the default static preamble for that attempt and
logs a warning. Reflection failure never escalates to a workflow failure.
retry × autonomous reflection is up to 2 × max + 1 LLM calls per
node (initial + N retries × 2 calls each). Workflow authors set the
ceiling via max.
Trace and observer events
Section titled “Trace and observer events”Each attempt is recorded as its own TraceStep with a retryAttempt
field (0-indexed). The executor emits a node:retry observer event
before each retry attempt with { node, attempt, reason, preamble }.
Example
Section titled “Example”nodes: open_pr: name: Open PR instruction: Open a PR with the fix skills: [github] eval: - name: pr_was_created kind: function rule: any_tool_called: [github_create_pr] - name: pr_url_present kind: value rule: output_required: [prUrl] retry: max: 2 instruction: { auto: true }