SYSTEM_STATE: READY
DISK-LLM RUNTIME
Qwen3.5-9B Adapter
This page is an illustrative telemetry surface, not the authoritative benchmark record. The corrected benchmark framing and archived Modal artifact live on the homepage and documentation pages.
METRIC_STREAM_01
LAYER LATENCY MAP
AVG: 198.5ms
TRANSFORMER_L1201.2ms
TRANSFORMER_L2185.1ms
TRANSFORMER_L3350.8ms
TRANSFORMER_L4190.4ms
TRANSFORMER_L5205.2ms
MEMORY PRESSURE
70.4%
RAM_OCCUPANCY
ALLOCATED
504.4 GB
RESERVED
11.3 GB
SPATIAL_ALLOCATION
LIVE TENSOR MAP
MAPPED
UNMAPPED
FRAGMENTED
terminal
RUNTIME_TERMINAL_OUTPUT // TTY1
[INFO] Initializing Disk-LLM runtime pipeline...
[INFO] Loading manifest: ./packed-qwen35/manifest.json
[INFO] Mapping tensors via numpy.memmap...
[INFO] Disk-swapping layer_012 weights (~430MB per layer block)
[WARN] High memory pressure detected — RSS pinned at 11264 MB
Processing batch_size=1, prompt_tokens=128
First token latency: 126.61s
[INFO] Latency spike in L3 attributed to sequential I/O page faults.
[INFO] Throughput: 0.016 tokens/sec
Generation complete: 2 tokens produced
_