Whisk: Google's Image-Based AI Generator
Reimagining AI Input: Reducing the Stress of Writing Text Prompts for AI Image Generation
What is happening?
Google Labs launched Whisk, an image-to-image generator that allows users to upload photos and get a combined, AI-generated image.
What problems does Whisk solve?
Traditional AI image generators require carefully crafted text prompts, creating a high barrier to entry
Many users struggle to translate their visual ideas into effective text descriptions
Artists and creatives need rapid visual explorations
How did they solve the problems?
Breaks down image generation into three simple parts:
what you want to create (subject)
where you want it to be (scene)
how you want it to look (style)
Each part can be defined by dropping in an image or picking from Google's suggestions
Text prompts are still there if users want them, but they're optional – not required
Users can edit auto-generated prompts
Why these product decisions make sense
1. Match natural thinking patterns
People think in visuals before words
Breaking down creation into "what, where, and how" matches our mental model
Visual input reduces cognitive load
2. Balance automation with control
Automates the hard parts (caption generation)
Keeps users in control with editable prompts
3. Build for exploration
Speed over perfection
Easy refinement options
Multiple paths to the same goal
Worth thinking about
How can you reduce users' cognitive load by aligning with their natural mental model? Review the complexity of your product.