๐Ÿ”ฅ EMBER
Ephemeral Model-Based Burst Execution Runtime

Predictable compute for an energy-constrained world.

#distributed-systems #inference #python-3.14 #grpc #fault-tolerance

EMBER is a hierarchical control plane that governs high-variance Python inference tasks across a distributed regional fabric. It replaces ad-hoc offloading with deterministic routing โ€” balancing local thermal budgets against regional energy costs so the system behaves predictably even when demand doesn't.

Performance is a side effect. Predictability is the goal.


What it enforces


Phase 3 โ€” Steel Thread

Baseline validation proving each architectural pillar under test.

1.96ms
gRPC p99 RTT (20ms budget)
0.09s
43-worker boot time
2MB
System delta (400MB shared)
0.00ms
RTT in OPEN state (zero overhead)
Ticket Status
Skeleton gRPC & Baseline Network Latency 1,000-ping test, p99 = 1.96ms within 20ms budget PASS
Admission Controller & Hard Concurrency Limit Concurrency wall, 100ms dead letter TTL, 90%/75% hysteresis PASS ยท 7/7
Circuit Breaker & Failure Injection Abort-early at 22ms, 3-failure trip, half-open recovery PASS ยท 8/8
Local Execution Substrate 43-worker boot, 400MB shared memory, 450ms task leasing PASS ยท 6/6

Architecture

Edge + Regional topology. Chicago is an optimization layer, not a dependency.

Client
  โ”‚
  โ–ผ
Edge Controller (Columbus)
  โ”‚
  โ”œโ”€โ”€ Admission Gate         90%/75% hysteresis, 22-slot queue
  โ”‚
  โ”œโ”€โ”€ Routing Client โ”€โ”€โ”€โ”€โ”€โ”€โ”€ RouteRequest โ”€โ”€โ†’ Chicago Regional
  โ”‚                           20ms timeout / circuit breaker
  โ”‚
  โ”œโ”€โ”€ Bounded Queue          Dead letter TTL: 100ms
  โ”‚
  โ”œโ”€โ”€ Worker Scheduler       Task leasing: 450ms reclaim
  โ”‚
  โ””โ”€โ”€ Worker Pool (43)       Shared model weights via mmap

What's next

The steel thread is complete. Every architectural pillar is validated. Upcoming work focuses on the pieces the design report flagged as still missing.