research.altifigence.com > notebooks > Gemma 3N E4B
FFN vs KV-cache Bottleneck Analysis
Context-length sweep for FFN weight reads, KV-cache reads, measured component shares, and bottleneck movement.
Experiment data
1,218.0 msAverage decode mean across experiment runs
26.2 ppFFN share reduction from short to long context
28.0 ppKV-cache attention share increase
Decode latency breakdown
Per-run decode component timing from experiment_results.csv.
FFNQKVOutput projectionAttentionRuntime overhead
FFN versus KV-cache bottleneck shift
Component share from context_sweep_results.csv as context length grows from short to long prompts.
FFN shareKV-cache attention share
CPU-thread scaling
Decode throughput and FFN timing from thread_scaling_results.csv.
Decode tok/sFFN ms trend
GEMV, GEMM, and memory-bandwidth relationship
Normalized curves from bottleneck_analysis.csv compare FFN weight-read pressure, KV-cache attention reads, and decode latency.
FFN GEMV read/tok, max 1,680.0 MBKV-cache read/tok, max 44.98 MBDecode latency, max 1,372.1 ms