๐Ÿ“– Step 9: AI/LLM#311 / 350

Quantization

Quantization

๐Ÿ“–One-line summary

A technique that reduces model size and computation by lowering the numerical precision of weights.

๐Ÿ’กEasy explanation

Reducing the numerical precision of model weights to shrink size. Like compressing a high-res photo โ€” quality barely drops but it loads much faster.

โœจExample

์ •๋ฐ€๋„๋ฅผ ๋‚ฎ์ถฐ ๋ชจ๋ธ ๊ฒฝ๋Ÿ‰ํ™”

FP32
140GB
FP16
70GB
INT8
35GB
INT4
17GB