Build Multimodal Generative AI Applications

IBM

Explore multimodal AI combining text, images, and speech to build advanced and interactive generative AI applications.

Unknown3 weeksEnglish

Free

About this Course

Ready to level up your GenAI skills? Step into the exciting world of multimodal AI, where language, images, and speech come together to build smarter, more interactive applications. In this hands-on course, youâll learn how to build systems that work across multiple modalities, from creating AI-powered storytellers and meeting assistants to developing image captioning tools and video generation apps. Youâll gain experience with real-world tools like IBMâs Granite, OpenAIâs Whisper, Sora and D

What You'll Learn

Build skills to develop multimodal generative AI applications
Understand fundamental concepts and challenges of multimodal AI
Build AI applications using models like Granite, Whisper, and DALL·E
Develop chatbots and image/video generation models using advanced frameworks

Instructors

Hailey Quach

Ricky Shi

Topics

Web Development

OpenAI API

Application Development

Multimodal Prompts

Flask (Web Framework)

Generative Model Architectures

Software Development

Prompt Engineering

Web Applications

LLM Application

Course Info

PlatformCoursera

LevelUnknown

PacingUnknown

PriceFree

Skills

تطوير الويب

واجهة برمجة تطبيقات OpenAI

تطوير التطبيقات

العبارات متعددة الوسائط

إطار عمل Flask

هندسة النماذج التوليدية

تطوير البرمجيات

هندسة العبارات

Web Applications

LLM Application

Start Learning Now