Skip to content

Nodes

A Node represents a single step in a Workflow. Each node contains a natural language instruction that an AI model executes, with access to tools from declared Skills.

FieldTypeRequiredDefaultDescription
namestringREQUIREDDisplay name for this node. MUST be non-empty.
instructionSourceREQUIREDNatural language instruction for the AI model. MUST be non-empty.
skillsstring[]OPTIONAL[]Skill IDs this node has access to.
outputJSON Schema objectOPTIONALStructured output schema for this node’s result.
max_turnsinteger ≥ 1OPTIONALimplementation‑definedMaximum AI model turns for this node’s execution.
modelstringOPTIONALinherited / impl‑definedExecution model for this node’s AI invocation. Free-text passthrough. See Model Selection.
disallowed_toolsstring[]OPTIONAL[]Built-in agent tools removed from the model context at this node. See Disallowed Tools.
rulesNodeSourcesOPTIONALinheritedRules for this node. Additive by default; see cascade.
contextNodeSourcesOPTIONALinheritedContext for this node. Additive by default; see cascade.
evalEvalOPTIONALNamed evaluators (value, function, judge) run after the AI finishes the node.
eval_policystringOPTIONALall_passHow to aggregate evaluator results. v1 supports all_pass.
requiresRequiresOPTIONALMachine-checked pre-conditions evaluated before the AI starts the node.
retryRetryOPTIONALNode-local retry on eval failure with optional autonomous reflection.

The max_turns field limits how many AI model turns (request-response cycles) a node may consume. Each turn is one model invocation — a turn that produces tool calls and a turn that produces a final response both count.

A conforming executor:

  • MUST enforce max_turns when present, stopping the model after the specified number of turns.
  • When max_turns is absent, the executor applies its own default. The default is implementation-defined.
  • When the limit is reached, the executor SHOULD capture whatever partial result is available and produce a NodeResult with status: "failed" and a descriptive error in data. The executor MUST NOT silently discard the node’s work.

Different nodes have vastly different compute needs. A context-gathering node that queries multiple APIs may need many turns, while a summarization node with no tools may need one. Per-node limits give workflow authors fine-grained control over the compute budget.

The optional model field selects which AI model executes a node. It is the execution-time counterpart to the judge model / judge_model fields, which select the model for evaluation.

Resolution is a cascade, narrowest wins:

  1. Node-level model.
  2. Workflow-level model.
  3. The executor’s implementation-defined default model.

A conforming executor:

  • MUST forward the resolved model to the AI invocation (step 6 of the Node Execution Sequence) when any layer specifies one.
  • MUST fall back to its implementation-defined default model when no layer specifies one. The executor MUST NOT invent a hardcoded model name in the spec.
  • MUST NOT validate model against a registry or allowlist. The value is free-text passed through verbatim, consistent with judge_model. Model availability and naming are the responsibility of the backend (or gateway) the executor targets.

model is independent of judge_model: one selects the execution model, the other the evaluation model. A node MAY set both. The common use is cost tiering: a cheap model on mechanical grunt nodes (lint, format, summarize) and a stronger model on the reasoning nodes.

The disallowed_tools field names built-in agent tools the AI model MUST NOT have access to at this node. Names refer to the agent runtime’s built-in tools (e.g. Bash, Read, Edit, Write, WebFetch, WebSearch), not to skill-provided tool names declared in Skills.

Typical use: keep an implement node focused on the repo by removing WebFetch and WebSearch, or harden a notification node by removing Bash so the agent cannot shell out.

A conforming executor:

  • MUST prevent the model from invoking any tool listed in disallowed_tools for the duration of this node’s run.
  • SHOULD remove the named tools from the model’s context entirely, rather than rejecting calls after the fact, so the model does not waste turns attempting blocked tools.
  • MUST NOT apply disallowed_tools to skill-provided tools registered via skills. Skills are gated by their own declaration (only listed skills are available); disallowed_tools is exclusively for built-in agent tools.
  • When the field is absent or empty, the executor applies its default tool set with no removals.
implement:
name: Implement Fix
instruction: Read the issue, write the fix, run tests, commit.
skills:
- github
disallowed_tools:
- WebFetch
- WebSearch

The exact set of names the runtime recognizes is implementation-defined and tracks the agent SDK in use. Workflow authors SHOULD verify a name is honored before relying on it; a typo silently no-ops because there is no built-in tool of that name to remove.

Nodes can declare their own rules and context that interact with the workflow-level declarations via cascade.

Per-node rules and context accept two forms:

Array form (additive — default):

investigate:
name: Root Cause Analysis
instruction: Investigate the alert.
context:
- ./security-playbook.md

The node inherits all workflow-level sources AND adds its own.

Object form (with only flag):

vendor-review:
name: Vendor License Review
instruction: Check dependency licenses.
rules:
only: true
sources:
- ./license-policy.md

When only: true, the node does not inherit workflow-level sources for that field. Only the node’s own sources are used. This blocks cascade for that field only — context still inherits normally unless it also sets only: true.

For each of rules and context, a conforming executor MUST resolve the effective sources for a node using this algorithm:

  1. Start with runtime input sources (if any).
  2. Append workflow-level sources (if any).
  3. If the node declares the field:
    • only: true — discard steps 1–2, use only the node’s sources.
    • Otherwise (array form, or only absent/false) — append the node’s sources.

The resolved sources are concatenated in this order and prepended to the node’s instruction. See Input Augmentation for the full assembly.

Effective rules = (only? node-only : runtime + workflow + node)
Effective context = (only? node-only : runtime + workflow + node)
# Workflow level
rules:
- ./coding-standards.md
context:
- ./ARCHITECTURE.md
nodes:
gather:
name: Gather Context
instruction: Investigate the alert.
# No per-node rules/context — inherits everything from workflow
security-audit:
name: Security Audit
instruction: Audit for OWASP top 10.
# Additive — gets workflow context + this
context:
- ./security-playbook.md
license-check:
name: License Review
instruction: Check dependency licenses.
# Override — only these rules, no inheritance
rules:
only: true
sources:
- ./license-policy.md
# Context still inherits from workflow (ARCHITECTURE.md)

The instruction field is the primary directive for the AI model at this step.

A conforming executor:

  • MUST pass the instruction to the AI model as the primary directive.
  • MUST NOT alter, summarize, or truncate the instruction.
  • MAY augment the instruction with context from prior nodes (see Execution Model).

A conforming executor MUST resolve rules and context from all three layers (runtime input, workflow-level, node-level) per the cascade semantics above. The effective rules are prepended with the heading:

## Rules — You MUST Follow These
{effective rules, concatenated}

The effective context is prepended with the heading:

## Background Context
{effective context, concatenated}

If any skill referenced by the node has an instruction field, a conforming executor MUST inject each skill’s instruction into the prompt, in the order the skills appear in the node’s skills array:

## Skill: {skill.name}
{skill.instruction}

When rules, context, and skill instructions are all present, the assembly order is: rules first, then context, then skill instructions, then the node’s base instruction, separated by ---.

┌─────────────────────────────────┐
│ ## Rules — You MUST Follow These│ ← effective rules (cascaded)
│ {rules content} │
├─────────────────────────────────┤
│ --- │
├─────────────────────────────────┤
│ ## Background Context │ ← effective context (cascaded)
│ {context content} │
├─────────────────────────────────┤
│ --- │
├─────────────────────────────────┤
│ ## Skill: {skill.name} │ ← one per skill with instruction
│ {skill.instruction} │
├─────────────────────────────────┤
│ --- │
├─────────────────────────────────┤
│ {node.instruction} │ ← the node's own instruction
└─────────────────────────────────┘

Because instruction is a Source, you can keep long or shared prompts in a file or URL and reference them by path:

investigate:
name: Root Cause Analysis
instruction: ./prompts/investigate.md
skills:
- github
- linear

Or pin to a shared playbook repo:

gather:
name: Gather Context
instruction: https://raw.githubusercontent.com/acme/playbook/main/gather.md

Files and URLs resolve once, eagerly, before any node runs. Resolved content is recorded in trace.sources with a content hash for audit.

The skills array declares which Skills the node can access during execution.

A conforming executor:

  • MUST resolve tools from the listed skill IDs and make them available to the AI model during node execution.
  • MUST NOT make tools available from skills not listed in the node’s skills array.
  • SHOULD silently skip skills that are not configured (missing required config values) rather than failing the workflow. The node executes with whatever tools are available from the remaining skills.

The output field declares a JSON Schema that the node’s result data MUST conform to.

A conforming executor:

  • MUST request structured output from the AI model conforming to this schema when output is present.
  • The structured output becomes the node’s result data, available to downstream nodes via context accumulation.
  • When output is absent, the node’s result data is implementation-defined.

Machine-checked pre-conditions. Evaluated before the LLM runs. If any declared check fails, the node is marked failed (or skipped) and the LLM is never invoked.

Catch missing upstream context — bad runtime input, an upstream node that returned without producing a required field — before burning tokens. The checks are deterministic and run synchronously.

requires:
output_required: [string] # paths must resolve, non-null/undefined
output_matches: [OutputMatch] # equals / in / matches
on_fail: fail | skip # default: fail

Paths resolve against the cross-node context map:

{ input: <runtime input>, [priorNodeId]: <data of prior node>, ... }

The grammar is identical to eval paths: dotted segments, [*] wildcards, optional all: / any: prefix.

PathResolves to
input.repoUrlThe repoUrl field on the runtime input.
triage.recommendationdata.recommendation of the prior triage node.
any:scan.findings[*].severityAt least one finding has a non-null severity.
on_failResult statusResult data
fail (default)failed{ error: "requires failed: ..." }
skipskipped{ skipped_reason: "requires not met: ..." }

In both cases the LLM is not invoked. Routing continues normally — edges with when conditions can read the failure status and route around it.

nodes:
open_pr:
name: Open PR
instruction: Open a PR with the fix
skills: [github]
requires:
output_required:
- input.repoUrl
- implement_fix.branch
output_matches:
- { path: implement_fix.filesChanged, matches: "^[1-9]" }
on_fail: fail

The eval field declares a list of named evaluators the executor runs after the AI model finishes a node. Each evaluator produces an EvalResult with a pass verdict and optional reasoning. Under the default eval_policy: all_pass, every evaluator must pass for the node to pass; any failure marks the node failed and the workflow halts (or routes to the next failure edge).

Eval catches the common “the model claims success without doing the work” failure mode: a tool was supposed to be called and wasn’t, a required field is missing from the structured output, a value is outside the allowed set, or the result claims a contract it doesn’t actually meet. It complements, and does not replace, conditional edges, which decide where to go next based on the result.

The shape mirrors how every other agent eval framework (LangSmith, Promptfoo, OpenAI Evals, DeepEval, Ragas) names this primitive. SWEny’s three evaluator kinds map to the same three categories the field has converged on.

nodes:
open_pr:
name: Open PR
instruction: Open a PR with the fix.
skills: [github]
eval:
- name: pr_was_created
kind: function
rule:
all_tools_called: [github_create_pr]
- name: pr_url_present
kind: value
rule:
output_required: [prUrl]
eval_policy: all_pass # default; can be omitted

Each entry in eval is an Evaluator with these fields:

FieldRequiredApplies toDescription
nameREQUIREDall kindsStable identifier for the evaluator. Used in result objects and retry preambles.
kindREQUIREDall kindsOne of value, function, judge.
ruleREQUIRED for value / functionvalue, functionThe deterministic rule. See per-kind shapes below.
rubricREQUIRED for judgejudgeNatural-language rubric the judge model evaluates against the node’s data and trace.
pass_whenOPTIONAL for judgejudgeExpected verdict word. Default yes. MUST be a single whitespace-free token; the judge response is parsed for it.
modelOPTIONAL for judgejudgeOverride the judge model for this evaluator. See Judge mechanics.

A node with no eval field has no post-conditions; the node passes when the AI finishes without error.

Pure, deterministic, fast. Operates on the node’s structured output (result.data). Use when you can express the contract as a path-and-operator check.

The rule object accepts output_required and output_matches:

eval:
- name: pr_url_well_formed
kind: value
rule:
output_required: [prUrl, branchName]
output_matches:
- { path: prUrl, matches: "^https://github.com/" }
- { path: branchName, matches: "^sweny/" }

output_required is a list of paths into result.data that must each resolve to a present, non-null value. output_matches is a list of OutputMatch entries, each asserting equals, in, or matches against a path.

A single value evaluator MAY combine both fields. They are AND-ed within the rule.

Pure, deterministic, fast. Operates on the node’s tool-call trace, not its data. Use when the contract is “the model did (or did not) call this tool.”

The rule object accepts any_tool_called, all_tools_called, and no_tool_called:

eval:
- name: pr_was_created
kind: function
rule:
all_tools_called: [github_create_pr]
no_tool_called: [github_force_push]

A tool “was called and succeeded” when it appears in the node’s tool-call trace with no error. For no_tool_called, any appearance, successful or not, counts as a violation.

The function kind is also the natural home for any future code-based check (for example, “the diff touched fewer than 10 files”). v1 covers tool-call assertions; the kind is intentionally named function rather than tool_call to leave that door open.

Calls a small Claude model with the node’s data, the node’s tool-call trace, and the author’s rubric. The judge returns a single verdict word (default yes / no) plus a short reasoning string. Use when the contract is conditional, semantic, or otherwise outside the reach of a deterministic rule.

eval:
- name: tests_present_when_pass_claimed
kind: judge
rubric: |
If result.data.test_status is "pass", does
result.data.test_files_changed contain at least one real test
file path? An empty array with status "pass" is a contract
violation. If status is anything else, this rule passes
vacuously.
pass_when: yes

The judge sees the node’s result.data, the tool-call trace, and the rubric. It does not see runtime input or upstream node results unless you put them in the rubric explicitly.

See Judge mechanics for model selection, cost gating, and parse failures.

How the executor aggregates evaluator results into a single node verdict.

Policyv1 statusBehavior
all_passshippedEvery evaluator must pass. Default.
any_passreservedAt least one evaluator must pass. Not implemented in v1.
weightedreservedSum of scores above a threshold. Not implemented in v1.

A conforming executor MUST accept eval_policy: all_pass and MAY reject other values until the corresponding semantics ship.

Each evaluator produces a structured result. The full list lands on NodeResult.evals:

FieldTypeDescription
namestringThe evaluator’s name.
kindvalue | function | judgeThe evaluator’s kind.
passbooleanWhether this evaluator passed.
reasoningstring (optional)Failure detail. Populated by the judge model on judge evaluators, by the executor’s failure formatter on value/function. Capped at ~500 characters.
scorenumber (optional)Reserved for weighted aggregation. Not populated in v1.

Downstream nodes can read individual evaluator outcomes via context paths, e.g. priorNode.evals.pr_was_created.pass.

FieldTypeRequiredDescription
pathstringREQUIREDA path into result.data (see Path grammar).
equalsanyone-ofStrict deep equality against the resolved value.
inany[]one-ofThe resolved value is in the array (deep equality per element).
matchesstringone-ofA JavaScript regex source (no flags); the resolved value is coerced to a string and tested against it.

Exactly one of equals, in, or matches MUST be set per entry.

A path is a .-separated sequence of segments. A segment is either:

  • An identifier matching [a-zA-Z_][a-zA-Z0-9_]* — object property access, OR
  • An identifier followed by [*] — wildcard expansion over an array.

The path MAY be prefixed with all: or any: (see Wildcard semantics). When no prefix is present, all: is implied.

Examples:

PathResolves to
prUrlThe prUrl property of result.data.
findings[*].severityThe severity property of every element of the findings array.
any:checks[*].conclusionThe conclusion property of any element of the checks array.
issue.metadata.urlA nested object property.

The grammar is intentionally minimal so workflow authors can read it in a sentence and tooling (Studio, linters) can parse it in a few lines. Richer expressions (filter predicates, JSONPath, CEL) are out of scope.

A path is resolved by walking segments left-to-right against result.data:

  • A non-wildcard segment that doesn’t exist on its parent object → resolution fails.
  • A [*] segment requires its parent to be an array. If the parent is not an array, resolution fails. If the parent is an empty array, expansion succeeds and the wildcard rule below applies.
  • Encountering null mid-path → resolution fails.

A failed resolution is treated as a failed check inside value evaluators. The reasoning string names the missing segment.

When a path contains [*] and resolves successfully:

  • all: (default) — every resolved value MUST satisfy the operator. An empty array is vacuously true.
  • any: — at least one resolved value MUST satisfy the operator. An empty array is false.

output_required follows the same rule. output_required: [findings[*].severity] means the findings array is present and every finding has a non-null severity. output_required: ["any:findings[*].severity"] means at least one finding does.

Three layers of override, narrowest wins:

  1. Evaluator-level model field on a judge evaluator.
  2. Node-level judge_model field.
  3. Workflow-level judge_model field. Default claude-haiku-4-5.

Judges return a single token verdict, so a small fast model is the right default.

Workflow-level judge_budget (integer, default 50) caps the expected number of judge calls per workflow run. The executor SHOULD warn at load time when count(judges) * estimated_runs exceeds the budget. The budget is a soft signal in v1, not a hard runtime cap.

When the judge response cannot be parsed for the pass_when token (model returned garbage, timed out, errored), the executor:

  • Retries the judge call once.
  • If the second call also fails, the evaluator is recorded as pass: false with reasoning: "judge parse failure".
  • A workflow author who wants a parse-failure to be a halt can wrap the judge in a stricter retry policy at the node level.

A conforming executor:

  • MUST evaluate every declared evaluator (no fast-fail), so the workflow author sees every problem in one pass.
  • MUST populate NodeResult.evals with one EvalResult per evaluator, in the order they were declared.
  • MUST mark the node failed under eval_policy: all_pass if any evaluator fails. The node’s error message is a structured list, one line per failing evaluator: name (kind): reasoning.
  • MUST NOT run eval when the AI model already failed the node. Eval only runs against successful node executions.

A representative failure message:

eval failed (policy: all_pass):
- pr_was_created (function): required all of [github_create_pr] to succeed, called: [github_search_issues]
- pr_url_well_formed (value): output_required 'prUrl' missing segment 'prUrl'
- tests_present_when_pass_claimed (judge): test_status was 'pass' but test_files_changed was empty
implement-fix:
name: Implement Fix
instruction: Open a PR that fixes the issue and verify CI is green.
skills:
- github
output:
type: object
properties:
prUrl: { type: string }
branchName: { type: string }
test_status: { type: string, enum: [pass, fail, no-framework] }
test_files_changed: { type: array, items: { type: string } }
checks:
type: array
items:
type: object
properties:
name: { type: string }
conclusion: { type: string }
required: [prUrl, branchName, test_status]
eval:
- name: pr_was_created
kind: function
rule:
all_tools_called: [github_create_pr]
no_tool_called: [github_force_push]
- name: pr_url_well_formed
kind: value
rule:
output_required: [prUrl, branchName]
output_matches:
- { path: prUrl, matches: "^https://github.com/" }
- { path: branchName, matches: "^sweny/" }
- { path: any:checks[*].conclusion, equals: "success" }
- name: status_is_recognized
kind: value
rule:
output_matches:
- { path: test_status, in: [pass, fail, no-framework] }
- name: tests_present_when_pass_claimed
kind: judge
rubric: |
If result.data.test_status is "pass", does
result.data.test_files_changed contain at least one real test
file path? An empty array with status "pass" is a contract
violation. If status is anything else, this rule passes
vacuously.
pass_when: yes
  • Use a value evaluator for data-shape post-conditions: facts about the structured output that you can express as a path and an operator.
  • Use a function evaluator for trace-shape post-conditions: a specific tool was, or was not, called.
  • Use a judge evaluator for semantic or conditional post-conditions: claims that depend on context, comparisons across fields, or anything outside the reach of a deterministic rule.
  • Use conditional edges for routing: which node runs next based on the result.
  • Use the node’s output JSON Schema for shape: the structural contract on the data. Eval is for the looser “did the model actually do the right thing” checks that JSON Schema can’t express.
gather:
name: Gather Context
instruction: Pull error details, logs, and recent commits related to the alert.
skills:
- github
- sentry
max_turns: 80
investigate:
name: Root Cause Analysis
instruction: >-
Classify each issue as novel or duplicate. Assess severity and
fix complexity for each finding.
max_turns: 50
skills:
- github
- linear
output:
type: object
properties:
findings:
type: array
items:
type: object
properties:
title: { type: string }
severity: { type: string, enum: [critical, high, medium, low] }
is_duplicate: { type: boolean }
fix_complexity: { type: string, enum: [simple, moderate, complex] }
required: [title, severity, is_duplicate]
novel_count: { type: number }
highest_severity: { type: string }
required: [findings, novel_count, highest_severity]

A node with no skills has no tools. The AI model executes the instruction using only its training and the accumulated context.

summarize:
name: Summarize Findings
instruction: >-
Produce a concise summary of all findings for the notification.
Include severity, root cause, and links to created issues.

Node-local self-healing on eval failure. When eval fails, the executor re-invokes the LLM up to max additional times, prepending feedback derived from the failing evaluators.

Triggered ONLY by eval failure, not by tool/API errors and not by requires failure. Re-running cannot fix upstream data problems.

retry:
max: integer # ≥ 1
instruction: # optional
| string # static preamble
| { auto: true } # LLM-generated diagnosis (default prompt)
| { reflect: string } # LLM-generated diagnosis (author prompt)
instruction valueBehavior
(omitted)Default preamble: a structured list of failing evaluators (name (kind): reasoning, one per line) followed by “Fix and try again.”
"static text"Author’s text + the structured failing-evaluator list appended.
{ auto: true }Executor calls claude.ask with a default reflection prompt; the response becomes preamble.
{ reflect: "<prompt>" }Same as auto, but the author’s reflect prompt is used as the diagnosis question.

The preamble is prepended to the node’s normal instruction so the LLM sees it before the original task. Each retry uses only the latest eval failure as feedback. Older errors are noise.

If claude.ask throws or returns empty during autonomous mode, the executor falls back to the default static preamble for that attempt and logs a warning. Reflection failure never escalates to a workflow failure.

retry × autonomous reflection is up to 2 × max + 1 LLM calls per node (initial + N retries × 2 calls each). Workflow authors set the ceiling via max.

Each attempt is recorded as its own TraceStep with a retryAttempt field (0-indexed). The executor emits a node:retry observer event before each retry attempt with { node, attempt, reason, preamble }.

nodes:
open_pr:
name: Open PR
instruction: Open a PR with the fix
skills: [github]
eval:
- name: pr_was_created
kind: function
rule:
any_tool_called: [github_create_pr]
- name: pr_url_present
kind: value
rule:
output_required: [prUrl]
retry:
max: 2
instruction: { auto: true }