SYSTEM_STATE: READY

DISK-LLM RUNTIME Qwen3.5-9B Adapter

This page is an illustrative telemetry surface, not the authoritative benchmark record. The corrected benchmark framing and archived Modal artifact live on the homepage and documentation pages.

METRIC_STREAM_01

LAYER LATENCY MAP

AVG: 198.5ms

TRANSFORMER_L1201.2ms

TRANSFORMER_L2185.1ms

TRANSFORMER_L3350.8ms

TRANSFORMER_L4190.4ms

TRANSFORMER_L5205.2ms

CAPACITY_GAUGE

MEMORY PRESSURE

70.4%

RAM_OCCUPANCY

ALLOCATED

504.4 GB

RESERVED

11.3 GB

SPATIAL_ALLOCATION

LIVE TENSOR MAP

MAPPED

UNMAPPED

FRAGMENTED

terminal RUNTIME_TERMINAL_OUTPUT // TTY1

[INFO] Initializing Disk-LLM runtime pipeline...

[INFO] Loading manifest: ./packed-qwen35/manifest.json

[INFO] Mapping tensors via numpy.memmap...

[INFO] Disk-swapping layer_012 weights (~430MB per layer block)

[WARN] High memory pressure detected — RSS pinned at 11264 MB

Processing batch_size=1, prompt_tokens=128

First token latency: 126.61s

[INFO] Latency spike in L3 attributed to sequential I/O page faults.

[INFO] Throughput: 0.016 tokens/sec

Generation complete: 2 tokens produced