📖 Step 9: AI/LLM#303 / 350

Reinforcement Learning from Human Feedback

Reinforcement Learning from Human Feedback (RLHF)

📖One-line summary

A reinforcement learning technique that aligns model behavior using human preference feedback.

💡Easy explanation

A training method where humans pick 'this answer is better,' and the model learns from that feedback. It's why ChatGPT answers politely instead of chaotically.

✨Example

인간 피드백으로 모델 조율

👍 A

예의 바른 답변

👎 B

무례한 답변

↓ 사람이 A를 선택 ↓

🧠 모델이 A 방향으로 학습

Pre-training

Direct Preference Optimization

Other terms in this category

Chain of Thought (CoT)