ControlKit

A compiler toolchain for turning control-system policies into deterministic C and Rust artifacts for embedded deployment.

controlkit demo verify → compile → benchmark

# verify → compile → benchmark

$ controlkit verify benchmarks/double_integrator_lqr/controller.yaml

outputs/verification/double_integrator_lqr_verification.json

passed: true

$ controlkit benchmark --all

outputs/benchmarks/double_integrator_lqr/report.md

outputs/benchmarks/rocket_hover_lqr/report.md

benchmark_cases: 4

reports: json + markdown

status: pass/fail criteria checked

Language Python

Targets C, Rust

Policies LQR, MPC-lite, RL MLP

Assurance Verification, benchmarks

ControlKit is built as a compact compiler workflow: typed IR, policy frontends, code generators, optimization passes, verification reports, closed-loop benchmarks, and a CLI. The deployment step becomes explicit, inspectable, and testable instead of living as an ad hoc translation from notebook math to firmware source.

Control Philosophy

A controller is a small program with physical consequences. ControlKit treats it that way: typed at the boundary, explicit about dimensions, deterministic in generated code, and measured before deployment.

The goal is not to replace modeling tools or hide the math. The goal is to make the deployment boundary boring: preserve the control law, reject impossible shapes early, emit source a firmware engineer can read, and benchmark the artifact that will actually run on target hardware.

Background

Control algorithms are frequently designed in simulation-first environments, then manually translated into embedded code. That translation step is fragile: matrix shapes, actuator bounds, solver loops, and floating-point assumptions all need to survive the move from design artifact to target source.

ControlKit treats that move as a compiler problem. The user gives the tool a controller specification. A frontend validates the policy, lowers it into a typed internal representation, and then target backends generate deployable source. The benchmark runner compares the generated result against a Python reference path and records static estimates.

Architecture

The core pipeline is deliberately small: specs enter through YAML or Python APIs, frontends enforce controller-specific contracts, IR nodes preserve shape and semantic information, verification checks reject unsafe shapes or unstable feedback, optimization passes rewrite safe expression trees, and backends emit deterministic C or Rust.

Spec Frontend IR Verify Optimization Backend Benchmark

The important design choice is that policy semantics do not leak directly into backend string generation. LQR, MPC-lite, and RL policies each get validated structures before code generation begins.

Compiler Contracts

ControlKit's main technical bet is that embedded control code should be generated from explicit contracts rather than from loose Python objects. Each compiler layer adds a narrow boundary: frontend specs validate user intent, IR nodes validate dimensions, verification checks stability and constraints, backends validate target support, and benchmarks validate runtime behavior against a reference.

Shape contract

Matrices, vectors, gain matrices, bounds, costs, and neural-network layers carry dimensions that are checked before code generation.

Policy contract

LQR lowers to feedback control, MPC lowers to a finite-horizon projected-gradient solver, and RL lowers to fixed-shape MLP inference.

Target contract

C and Rust backends expose the same control_step(input, output) surface while preserving target-specific implementation details.

Validation contract

Controllers can be verified for dimensions, closed-loop stability, constraints, numerical robustness, and generated-code consistency hooks.

Intermediate Representation

The first serious layer is the IR. It knows about scalar constants, vectors, matrices, matrix-vector multiplication, addition, subtraction, clipping, linear systems, control laws, finite-horizon MPC controllers, and fixed-shape RL policies.

IRModule(
    name="rl_balance",
    policy="rl",
    rl_policies=(RlPolicyIR(...),),
)

IR construction validates dimensions immediately. If a gain matrix cannot multiply the state vector, if MPC bounds do not match the control dimension, or if an MLP layer has a ragged weight matrix, the compiler fails before any target code is emitted.

Expression IR Represents scalar constants, vectors, matrices, MatVecMul, addition, subtraction, negation, and clipping as typed expression nodes.

System IR Represents linear dynamics such as x_dot = Ax + Bu for continuous systems and x_next = Ax + Bu for discrete-time systems.

Controller IR Stores first-class LQR, MPC-lite, and RL policy nodes on the module so backends can generate specialized source instead of reverse-engineering expressions.

# LQR feedback lowered into IR semantics
u = clip(-(K @ x), u_min, u_max)

# MPC-lite generated solver semantics
for iteration in range(solver_iterations):
    X[0] = x
    X[k + 1] = A @ X[k] + B @ U[k]
    grad_u[k] = r * U[k] + B.T @ lambda[k + 1]
    U[k] = clip(U[k] - step_size * grad_u[k], u_min, u_max)

# RL policy semantics
for layer in layers:
    y = W @ x + b
    x = activation(y)

Validation

Most useful compiler errors happen before generation. ControlKit tries to reject invalid controllers where the mistake is still named in control-system terms: a bad B matrix, a mismatched gain, a wrong saturation vector, or an incompatible neural layer.

LQR Rejects gain matrices whose columns do not match state_dim.

MPC-lite Rejects non-discrete dynamics, invalid horizons, bad cost vectors, and control bounds whose lengths do not equal control_dim.

RL MLP Rejects empty networks, ragged weight matrices, invalid activations, missing JSON weights, and incompatible adjacent layer dimensions.

Backends Reject unsupported module shapes rather than emitting partial target code.

Frontends

Frontends keep user-facing specs small and policy-specific. LQR specs describe a gain matrix, MPC-lite specs describe discrete dynamics and box constraints, and RL specs point to fixed-shape JSON weights.

LQR Feedback control lowered as u = -Kx, with optional saturation.

MPC-lite Finite-horizon projected-gradient control over x_next = Ax + Bu.

RL MLP Dependency-free inference for Linear, ReLU, and Tanh layers.

Custom Example

Here is a complete one-input controller workflow for a tiny two-state room model. The controller reads the current temperature error and error rate, computes u = -Kx, clips the actuator command, emits C, and benchmarks the generated artifact.

1. Write the spec

# custom_room_lqr.yaml
# x[0] = temperature error, x[1] = error rate
# u[0] = heater/cooler command clipped to [-1, 1]
policy: lqr
name: room_temperature
state_dim: 2
control_dim: 1
state_name: x
control_name: u
K:
  - [0.75, 0.18]
u_min: [-1.0]
u_max: [1.0]

2. Inspect and validate

# Fail early if dimensions or bounds are wrong.
controlkit validate custom_room_lqr.yaml

# Inspect the lowered module before generation.
controlkit inspect custom_room_lqr.yaml

# Expected shape summary:
# policy: lqr
# state_dim: 2
# control_dim: 1
# controllers: 1

3. Compile to C

# Emit deterministic C source and header files.
controlkit compile custom_room_lqr.yaml \
  --target c \
  --output build/room_c

# Generated artifact surface:
# void control_step(
#   const float x[CONTROLKIT_STATE_DIM],
#   float u[CONTROLKIT_CONTROL_DIM]
# );

4. Benchmark the artifact

# Compare Python reference latency with generated C.
controlkit benchmark custom_room_lqr.yaml \
  --output build/room_bench \
  --iterations 1000 \
  --warmup-iterations 10

# Report includes:
# python_ns_per_call
# c_ns_per_call
# operation_count
# memory_footprint_bytes

Aerospace Case Study

Consider a small satellite attitude loop where a reaction wheel applies torque around a single body axis. The controller state is angle error and angular-rate error; the output is a bounded wheel torque command. This is an illustrative deployment example, not a flight-certified controller.

Plant abstraction

# State vector:
# x[0] = attitude error, radians
# x[1] = angular-rate error, rad/s
#
# Control:
# u[0] = reaction-wheel torque command
#
# Control law:
# u = clip(-Kx, -0.025, 0.025)

Controller spec

policy: lqr
name: reaction_wheel_axis
state_dim: 2
control_dim: 1
state_name: attitude_error
control_name: wheel_torque
K:
  - [0.042, 0.318]
u_min: [-0.025]
u_max: [0.025]

Generated boundary

# Firmware calls one deterministic function.
void control_step(
    const float x[2],
    float u[1]
);

# No heap allocation.
# Fixed loop bounds.
# Saturation emitted inline.

What gets checked

# K must be 1 x 2.
# u_min/u_max must have length 1.
# generated C is benchmarked against Python.
# report includes latency and operation count.

Backends

The C backend emits standalone .h and .c files. The Rust backend emits fixed-array source that keeps a #![no_std]-compatible style where possible. Both expose the same basic control surface:

void control_step(
    const float x[CONTROLKIT_STATE_DIM],
    float u[CONTROLKIT_CONTROL_DIM]
);

Generated C uses straightforward loops and float arrays. Generated Rust uses array references and mutable output buffers. RL Tanh is target-specific: C calls tanhf, while Rust emits a small deterministic approximation to avoid a standard-library math dependency.

for (int row = 0; row < CONTROLKIT_CONTROL_DIM; row++) {
    float acc = 0.0f;
    for (int col = 0; col < CONTROLKIT_STATE_DIM; col++) {
        acc += K[row][col] * x[col];
    }
    u[row] = -acc;
}

#![no_std]

pub fn control_step(
    x: &[f32; CONTROLKIT_STATE_DIM],
    u: &mut [f32; CONTROLKIT_CONTROL_DIM],
) {
    // generated fixed-shape controller body
}

The generated code favors boring loops over cleverness. That is intentional: embedded controller code should be inspectable, deterministic, and easy to compare against the mathematical controller that produced it.

Optimization

The optimizer performs symbolic simplification over the expression IR. The pass is small, but it establishes the compiler shape: rewrite only when the transformation preserves controller semantics and improves generated work.

Constant folding

Pure constant expressions collapse before backend emission.

Algebraic identities

Rules like x + 0 -> x, x * 1 -> x, and x * 0 -> 0 reduce noise.

Dead computation

Unused expression nodes can be pruned from generated work.

Loop unrolling

Small controllers can opt into expanded loops when the target benefits.

before: add(matvec(K, x), vector([0.0, 0.0]))
after:  matvec(K, x)

Benchmarks

ControlKit now includes a closed-loop benchmark suite, not just isolated function timing. Each case has a problem statement, deterministic model, controller spec, local runner, expected results, and JSON/Markdown reports.

Benchmark cases 4

Metrics runtime, error, effort

Reports JSON, Markdown

double_integrator_lqr Two-state position/velocity stabilization with acceleration saturation.

cartpole_lqr_linearized Small-angle cartpole stabilization around upright equilibrium.

rocket_hover_lqr Toy hover linearization with thrust and torque commands.

pid_mass_spring_damper Reference PID regulation of a damped mass-spring system.

{
  "benchmark_name": "double_integrator_lqr",
  "controller_type": "lqr",
  "dt": 0.05,
  "horizon_steps": 120,
  "mean_runtime_us": 1.26,
  "max_runtime_us": 5.50,
  "p95_runtime_us": 1.66,
  "final_state_norm": 0.002575,
  "max_state_norm": 1.043344,
  "total_control_effort": 1.410541,
  "passed": true,
  "failure_reason": ""
}

The suite measures controller runtime, final stabilization error, maximum state norm, total control effort, and pass/fail criteria. When generated C is available, the runner also records generated-code runtime beside the Python reference path.

Run one case

# Closed-loop simulation + report output.
controlkit benchmark \
  benchmarks/double_integrator_lqr/controller.yaml

Run the suite

# Runs every folder under benchmarks/.
controlkit benchmark --all

Report files

outputs/benchmarks//results.json
outputs/benchmarks//report.md

Verification

Verification adds a static assurance layer before deployment. It checks matrix shapes, closed-loop stability, actuator and state constraints, numerical robustness, and keeps a generated-code consistency hook ready for target execution adapters.

Dimensions Ensures A is square, B matches A, and K is compatible with u = -Kx.

Stability Checks A_cl = A - BK; discrete systems require spectral radius below one, continuous systems require negative real eigenvalues.

Constraints Validates lower/upper bounds and warns when feedback can exceed actuator limits before saturation.

Numerics Rejects NaN/Inf values, reports condition numbers, and warns near stability boundaries.

# Emit machine-readable and human-readable verification reports.
controlkit verify benchmarks/double_integrator_lqr/controller.yaml

{
  "controller_name": "double_integrator_lqr",
  "system_type": "discrete",
  "passed": true,
  "stability_margin": 0.048685,
  "warnings": [
    "closed-loop eigenvalues are close to the stability boundary"
  ],
  "recommendations": [
    "Review warnings before deploying to constrained hardware."
  ]
}

Takeaways

ControlKit is intentionally not a giant modeling environment. It is a transparent compiler layer for the deployment boundary: validate the controller, preserve its semantics in IR, emit readable source, and benchmark the result.

The project is still pre-alpha, but the shape is now visible: control policies can be treated like programs, and embedded controller deployment can look much more like a compiler workflow.

Open GitHub