📖 Step 9: AI/LLM#274 / 350

Multimodal

📖One-line summary

A model that handles images, audio, or video alongside text.

An AI that understands images, audio, and video alongside text. The reason you can show it a photo and ask "what's this?"

📝

Text

🖼️

Image

🎵

Audio

🎬

Video

↓

🤖 One single model

Write a multimodal prompt that turns a user-uploaded receipt image into JSON of items and amounts.

Design a pipeline that uses a multimodal model to extract tables from PDFs, and clarify how it differs from OCR.

Design a system that auto-classifies and routes mixed image + text user inquiries.

Try these prompts in your AI coding assistant!