rip veo?

Sponsored by

Worst part of AI video right now?

Google just released Gemini Omni and called it "Nano Banana, but for video."

Nano Banana's whole thing was keeping a subject identical while everything around them changed.

Doing the same in video is the same problem at much higher difficulty.

You're holding consistency across motion, edits, and however many turns of conversation it takes to land the shot.

So I ran three tests to see if Omni lives up to its own framing.

Quick caveat before we get into it: the Gemini app refused to generate for me because of my location (India, which is supposedly supported, but here we are).

I ended up running everything through Google Flow, which also has Omni access. If you're outside the US and the app gives you trouble, that's the workaround.

But first, catchup:

NEWS NEWS NEWS

Tools That Caught My Attention

1. Willow Voice: AI dictation that works system-wide on Mac, Windows, and iPhone. Hit a hotkey, talk, and it drops formatted text into Slack, Gmail, Cursor, or any app you're typing in. Free to start, $15/mo paid.

2. Brew: Email marketing platform with an AI agency built in. Describe what you want and Brew handles the strategy, copy, design, and sending. Aimed at startups that don't have a lifecycle marketer on staff.

3. Kept: A local archive for your AI chats across ChatGPT, Claude, Gemini, Grok, and Kimi. Conversations save to your filesystem as plain markdown with full-text search and an MCP server so other tools can plug into your vault. Free and MIT-licensed.

Test 1: The Likeness Test

Input video:

Source video generated with Veo 3.1, then handed to Omni.

Test Prompt: “Keep the person exactly the same. Same face, same clothes, same walking gait. Place them walking through a dense bioluminescent forest at night, with the glow reflecting on their skin and clothes.”

Output:

The subject part held up and the face stayed the same, and clothes unchanged.

So it’s as consistent as Nano Banana.

But the environment let it down.

I got the bioluminescent forest but it felt animated, instead of the same photorealistic style from the input video.

The lighting was a miss too.

I asked for the glow to reflect on the subject. Omni read "glow" and made him look translucent.

My take: Solid 6/10. It got the hardest part right (subject consistency) and missed the easier part (matching the source style and basic light behaviour).

Test 2: The Multi-turn Ceiling

This was the test I enjoyed the most.

Input Video:

Source video generated with Veo 3.1.

Test Prompts (in order):

Change the city sidewalk to a winding cobblestone street in an old European town. Keep the person identical.
Add a light rain falling in the scene. The person should now look wet.
Change the time of day to dusk, with warm orange streetlights coming on.
Change the camera angle to a slow tracking shot from in front of the person, walking backwards.

Turn 1: spot on. Cobblestone replaced the sidewalk, the person came through identical.

Turn 2: Omni interpreted "light rain" literally and accurately as there are wet patches on the clothes, the kind that you'd get from a drizzle.

But the rain wasn't really visible in the background. Maybe it was there, too subtle to register.

Either way, slightly underwhelming after turn 1.

Turn 3 was clean. Day shifted to dusk, streetlights came on, and crucially it remembered the wetness from turn 2.

The cobbles are still glistening, the reflections from the streetlights land right.

Turn 4 is where the four-turn ceiling finally showed up.

Camera angle swapped to a front-facing tracking shot, which it nailed initially.

Then in the last second the person abruptly stops, starts walking backwards, and there's a visible camera glitch.

My take: 7/10, generous.

The middle two turns held the scene together more impressively than I expected.

The fact that turn 4 came apart at the seams exposes its ceiling. But the editing loop while it works is by far the best thing about Omni.

Test 3: The Physics Test

Test Prompt: Slow motion shot. A heavy iron bowling ball drops from about ten feet onto a stack of five empty cardboard boxes. The boxes crumple and crush asymmetrically as the ball impacts and bounces. Photorealistic, dramatic lighting.

Text-to-video, no source clip.

First problem: it ignored "iron" and gave me a regular sport bowling ball.

The visual and sound production was fine. And it nailed the dramatic lighting and photorealism like I asked.

The physics is where it broke.

The ball started fine. It tore through the first half of the stack like a heavy object should.

Then near the end, it bounced. You see the issue with this right?

My take: 5/10. This was the easiest test for Google to defend on paper because they made it the centerpiece of the launch.

Pichai's marble video is the loudest physics claim they made.

My test was their claim with different objects, and the model failed in the most pattern-matching-y way possible.

So Is It Nano Banana For Video?

Half.

The subject consistency claim (Nano Banana parallel) held up. I was skeptical about it going in, but it's what Omni did best.

Across both Test 1 and the first three turns of Test 2, the model kept the person identical, but the world model framing is doing a lot of marketing heavy lifting.

Physics works when the model has seen a pattern enough times to copy it convincingly. Push it sideways and it just gives two correct-looking moments that don't go together.

That's just pattern matching dressed up as reasoning.

My read: Omni isn't really a Veo replacement on raw output.

Most reviews running side-by-side comparisons say Veo 3.1 still wins on pure fidelity.

Omni is simply a workflow upgrade and the chat-based editing loop is the best thing about the product.

So is it Nano Banana for video? The consistency half, yes.

The rest, not so much.

Try it

See what breaks for you. If you run something interesting through it, send the output my way.

Until next time,
Vaibhav 🤝

If you read till here, you might find this interesting

#AD 1

Tabs + PwC: Pricing Playbook for the AI Era

Pricing models are evolving fast—and finance teams are feeling it. Usage-based and hybrid structures unlock new revenue potential, but they also create real challenges around rev rec, forecasting, and scaling operations.

On June 10th, leaders from Tabs and PwC are going live to share how modern B2B companies are navigating this shift—with practical frameworks and real-world examples you can actually use.

Save your spot for the live session on June 10th, 1–2PM EDT. Can't join live? Register anyway—the recording will be sent straight to your inbox.

Register Now

#AD 2

“Who is this person again?”

You’ve had that moment. Walking into a call, scrolling through old emails, trying to remember what you promised. Lindy texts you a brief 15 minutes before: attendee context, past discussions, open items, talking points. All pulled automatically. Try Lindy free.

Try free today