📖 Step 9: AI/LLM#311 / 350

Quantization

📖One-line summary

A technique that reduces model size and computation by lowering the numerical precision of weights.

Reducing the numerical precision of model weights to shrink size. Like compressing a high-res photo — quality barely drops but it loads much faster.

정밀도를 낮춰 모델 경량화

FP32

140GB

FP16

70GB

INT8

35GB

INT4

17GB