┌────┐ ┌────┐
  │▓▓▓▓│~│▓▓▓▓│
  └────┘ └────┘

Local LLM Memory Calculator

Estimate memory requirements and decode speed for running LLMs locally.

Smallest Qwen3.5 model. Text + vision. Good for edge devices.
Weight Quant
KV Cache Quant
Context
Batch
Framework

RTX 3060: 360 GB/s

BF16: 51 TFLOPS

8.7 GB
Qwen3.5-0.8B @ Q8, KV Q8, 256K ctx, batch 1
Memory usage12 GB available
0.7 GB 7.0 GB 1.0 GB
Fits — 8.7 GB / 12 GB
Est. Token Generation Speed
Token generation only · Prompt processing depends on compute (TFLOPS)
300.0tok/s
via llama.cpp · ±30%