๐ Step 9: AI/LLM#311 / 350
Quantization
Quantization
๐One-line summary
A technique that reduces model size and computation by lowering the numerical precision of weights.
๐กEasy explanation
Reducing the numerical precision of model weights to shrink size. Like compressing a high-res photo โ quality barely drops but it loads much faster.
โจExample
์ ๋ฐ๋๋ฅผ ๋ฎ์ถฐ ๋ชจ๋ธ ๊ฒฝ๋ํ
FP32
140GB
FP16
70GB
INT8
35GB
INT4
17GB