TWO PROMPTS. ONE CHARACTER. INFINITE VIDEOS.
The hardest part of building visual content with AI isn't the prompt. It's character consistency.
You generate an image you love.
You try to generate it again; slight angle change, different scene, same character
→ and the AI gives you someone who looks vaguely related but isn't actually the same character.
I just found a workflow that solved it with just two prompts using GPT Image 2.0 and Grok Imagine 💫

The Workflow
Step 1 → GPT Image 2.0 generates a master character reference sheet. One comprehensive board with everything: identity, expressions, head angles, postures, wardrobe details, hand gestures. This is your visual bible.
Step 2 → Grok Imagine uses that sheet as input to generate cinematic videos. The reference sheet locks the character.
The result: Every video you generate using that reference sheet give you the same character
Step 1 → Build the Reference Sheet
Paste this into GPT Image 2.0
Replace [STYLE] and [SUBJECT_DESCRIPTION] with whatever you're building.
Create a single unified MASTER CHARACTER REFERENCE SHEET
from these inputs:
[STYLE]: [anime / stylized 3d / cinematic realism /
noir / live-action / etc.]
[SUBJECT_DESCRIPTION]: [character description — name,
age, personality, wardrobe, accessories, theme]
Create the board in a 4:3 horizontal layout. Clean,
neutral, minimal, technical layout on pure white or
off-white background. Apply [STYLE] only to the
character, not the board layout.
Use this layout:
- top row = title + horizontal info block, COLOR PALETTE
- center = MAIN IDENTITY + SCALE SHEET (largest section)
- right = EXPRESSION PROGRESSION + HEAD DETAIL SHEET +
NEUTRAL BASELINE + POSTURE VARIATION + CLOSE-UP POSE
- bottom = WARDROBE / ACCESSORIES + HAND GESTURES
Include:
1. TOP INFO BLOCK
Name, Alias, Role, Age, Personality, Core Theme,
Speech Accent
2. COLOR PALETTE
6-8 minimal swatches matching the character's world.
No labels.
3. MAIN IDENTITY + SCALE SHEET
Front, 3/4, Side, Back views over measurement guide
lines. Include 2 silhouette thumbnails (Neutral Stance
+ Profile) in a corner.
4. EXPRESSION PROGRESSION
8 panels: Neutral, Curious, Worried, Surprised, Afraid,
Sad, Determined, Relieved
MICRO EXPRESSIONS: 5 panels — subtle eye tension,
slight smirk, lip tension, micro fear, controlled breath
5. HEAD DETAIL SHEET
3/4, Side, Top, Low, Diagonal angles. Keep facial
structure fully consistent.
6. NEUTRAL BASELINE
1 panel: fully relaxed, no emotion
7. POSTURE VARIATION
3 panels: relaxed, tense, confident
8. CLOSE-UP POSE
1 cinematic close-up from chest-up. Natural expressive
pose matching personality.
9. WARDROBE / ACCESSORIES DETAILS
4 close-up callouts: hairstyle, outerwear, footwear,
accessories, fabric detail
10. PROP (only if relevant)
1 isolated image. Object name, type, traits.
11. HAND GESTURES
5 panels: relaxed hand, tense fingers, pointing,
gripping, subtle gesture near face
Keep the subject fully consistent across all panels.
The MAIN IDENTITY + SCALE SHEET must visually dominate
the board. Final image should look like a premium
production visual bible.My Character Reference Sheet using this prompt personalized:

Step 2 → Generate the Video
Upload the reference sheet to Grok Imagine.
Paste this prompt with the sheet attached with your settings on video:
Using @[image1] Create a cinematic character introduction video.
Open with [character] looking into camera and speaking
naturally, introducing themselves in their own words.
Do not treat the sheet as a single image. Use its
elements as separate shots.
Structure: detail → identity → presence → full reveal
Sequence:
- Open on a hand gesture or subtle detail
- Cut to face close-up — eyes first, then full face
- They begin speaking — natural, slightly looking off-camera
- Mid-shot — they shift position, glance back, continue
- Full reveal — fully framed, owning the space
- Close on a final controlled expression
Make them active throughout:
- subtle weight shifts
- hand brushing through hair
- glancing away then returning eye contact
- a small breath before a key line
- purposeful gestures, never busy
Show acting range:
- Confidence as the baseline, not performance
- Brief hesitation on a vulnerable line
- Curiosity in how they watch the camera
- Intensity in stillness
- Express through micro-expressions, eye work, tone,
body language
Include:
- Face close-ups (especially the eyes)
- Outfit/material details
- Expressive performance moments
Keep everything grounded and realistic. Cinematic
realism only.
Camera:
- Controlled, minimal movement
- Soft push-ins on key lines
- Light tracking when they shift position
- Subtle handheld feel without shake
Lighting:
- Cinematic and consistent
- One warm light source as emotional anchor
- Catching face, eyes, edge of jawline
Audio direction:
- Natural speaking voice
- No music underneath dialogue (let voice carry it or enter your own quote / preferences)
End on a confident shot, character fully established.
Final frame holds 1-2 seconds before fade.My video using the ref. sheet and this prompt:

The reference sheet is the trick and with GPT Image 2.0, these have become easier than ever to generate. Most people prompt AI tools with descriptions and hope for consistency.
When you give Grok Imagine (or Seedance) an actual reference sheet showing your character from every angle, in every expression, with wardrobe detail, there's nothing left to interpret. The character is locked in.
It's the difference between describing your friend to a sketch artist vs handing them a photo.
BTW… THANK YOU FOR BEING AN INTEGRAL PART OF MY COMMUNITY
🧡
Here’s a quick gift for you.
Grab this guide for free to make your profile better
Use code FREE to get it for FREE!

Here’s another variation I tried. Notice how the character stays the same:

Using @[image1] Create a cinematic outro video clip of Auny.
THE CHARACTER:
A confident woman in her early 30s. Caramel skin (warm but not too dark). Dark expressive eyes that hold weight.
Small gold nose stud. Small stud earrings. Introspective, magnetic, controlled intensity. Confident but never loud.
WARDROBE:
She's wearing a fitted black long sleeve satin sundress.
Dark color palette throughout.
HAIR:
Low, slightly messy bun at the nape of her neck. A few face-framing pieces left loose near her jawline.
Effortless and intentional.
THE SCENE:
She's looking directly into the camera, chest-up framing.
Warm in tone but composed. The moment after a conversation ends.
DIALOGUE:
She says, naturally and warmly:
"Hope you enjoyed this. Don't forget to drop a follow if you found it helpful!"
DELIVERY DIRECTION:
- Confident but warm — not selling, just inviting
- Slight smile on "enjoyed"
- Small head tilt or look down briefly between sentences — natural, human pause
- Direct eye contact on the CTA "drop a follow"
- A tiny lift in her voice on "helpful" — genuine, not performative
ACTING NOTES:
- One small purposeful gesture during delivery — a soft wave at the audience
- Subtle weight shift between the two sentences
- A small breath or smile after the line lands
CAMERA:
- Static or very gentle slow push-in
- Locked on her face for the line
- Slight handheld feel without shake
- Frame holds for 1 second after she finishes speaking
LIGHTING:
- Dark mode atmosphere
- One warm amber light source as emotional anchor (left or right side)
- Deep cyan/midnight blue ambient fill
- Light catches her face, the satin dress, the loose hair pieces near her jawline
- Soft and warm overall — the energy of an ending
STYLE:
Cinematic realism only. She should look like a real person filmed cinematically.
End on a soft fade to black after the final beat.IF YOU WANT HELP BUILDING A CONTENT SYSTEM THAT ACTUALLY SOUNDS LIKE YOU
Book a 1-hour Content & Branding Strategy Call with me.
We'll look at what you're currently doing, figure out where the friction is, and map out a system that works with how you actually think and create.

Check out some of the other services I offer:
Don’t forget to reply and let me know what you thought of this newsletter.
And reply back with the videos you end up creating.
See you on the next one
- AUNY 🧡
