FFN vs KV-cache Bottleneck Analysis | Altifigence™ Research

research.altifigence.com > notebooks > Gemma 3N E4B

Experiment data

1,218.0 msAverage decode mean across experiment runs

26.2 ppFFN share reduction from short to long context

28.0 ppKV-cache attention share increase

Per-run decode component timing from experiment_results.csv.

FFNQKVOutput projectionAttentionRuntime overhead

Component share from context_sweep_results.csv as context length grows from short to long prompts.

FFN shareKV-cache attention share

Decode throughput and FFN timing from thread_scaling_results.csv.

Decode tok/sFFN ms trend

Normalized curves from bottleneck_analysis.csv compare FFN weight-read pressure, KV-cache attention reads, and decode latency.

FFN GEMV read/tok, max 1,680.0 MBKV-cache read/tok, max 44.98 MBDecode latency, max 1,372.1 ms