In partnership with

if you missed it, TLDR: V3.2 was competent, open, and cheap. But when we ran three tests, Gemini 3 beat it across the board.

The final verdict was that it's a fine alternative, but it does not stand out.

Four months later, DeepSeek dropped V4.

So I ran (almost) the same tests on the new model. Before getting into what I found, let’s catch up on some AI this week:

Spotlight: 2-Day AI Mastermind

Before we get into today's edition, if you're a professional who wants to go from knowing about AI to building with it, this weekend is a good place to start.

9th May | 10AM to 7PM IST / EST
Session 1 | Getting Started with Generative AI
Session 2 | Building Personalised AI Agents

10th May | 10AM to 7PM IST / EST
Session 3 | Building Products Using AI
Session 4 | Visual Storytelling & Content Creation Using AI
Session 5 | Mastermind Graduation

You can choose between an Engineering Track (for those who know coding) and a Non-Engineering Track (for those who want to use AI without writing code manually).

It's free. 25 spots left!

NEWS NEWS NEWS

Tools That Caught My Attention

1. Picsart AI Playground: One interface for 146 AI models across video, image, and audio for Sora, VEO, Kling, Runway, Flux, ElevenLabs, and more from 28 providers. Pay-per-generation credits instead of stacked subscriptions, with every output auto-saving to a single project board ready for further editing.

2. Flowly: Deploy a personal AI assistant on WhatsApp, Telegram, or Discord in one click, powered by your choice of GPT, Claude, or Gemini. Handles automation, conversation management, and workflow tasks without needing to set up your own server or VPS.

3. Postiz: An open-source social media scheduling tool that lets you plan, generate, and publish posts across 30+ networks from a single visual calendar. You can prompt it directly from Claude, ChatGPT, or n8n to draft and schedule posts automatically, with a built-in AI agent that handles captions, images, and short video in the same workflow.

DeepSeek V4

V4 is DeepSeek's most significant release since R1, the reasoning model that caught us off-guard in Jan 2025.

It now only has two models: V4-Pro and V4-Flash.

The headline upgrade is context length.

V3.2 handled 128K tokens. V4 handles 1 million.

That's roughly 750,000 words in a single request.

And they redesigned the attention architecture from scratch to make processing that much context efficient.

In a 1M-token scenario, V4-Pro needs only 27% of the compute and 10% of the memory that V3.2 required.

In plain terms: reading a 500-page document costs a fraction of what it used to.

Both models support Thinking and Non-Thinking modes and default to 1M context.

On pricing, DeepSeek keeps winning.

Flash at $0.14 per million input tokens, $0.28 output. Pro at $1.74 per million input, $3.48 output.

Compare that to Gemini 3 or GPT-5 and you're looking at 5-10x cheaper at the Pro tier.

The Tests

We ran the same two prompts from the December edition and added one new test that V3.2 couldn't handle due to its context limit.

For each test: Expert Mode, Smart Search off. Deep Thinking on for Tests 1 and 2, off for Test 3.

Test 1: Coding (Reddit scraper)

Prompt: "Create a Python script that scrapes the top 10 posts from a subreddit and saves them to a CSV file. The CSV should include: post title, author, upvotes, and URL. Include error handling and comments explaining each step."

Model: V4-Pro, Expert Mode, Deep Thinking on

Output:

Thought process

Output (excluding the code)

The December version of this test was where V3.2 got it most wrong.

It over-engineered the script into an interactive CLI, added extra columns (timestamps, comment counts, score), and used a hacky JSON endpoint instead of Reddit's official API.

Gemini 3 nailed it that time.

V4 fixed the over-engineering.

It still uses Reddit's JSON endpoint rather than PRAW, which remains the more correct engineering call.

The difference between now and December is that V4 understood what was being asked.

Takeaway: V4 understood the assignment.

Test 2: Problem solving (equity split)

Prompt: "A startup has 3 co-founders who need to split equity. Founder A contributed the initial idea and worked full-time for 6 months. Founder B joined 3 months ago as CTO with critical technical expertise. Founder C is a part-time advisor who will become full-time after funding. Design an equitable split accounting for time commitment, role importance, future contributions, and vesting schedules. Show your reasoning step-by-step."

Model: V4-Pro, Expert Mode, Deep Thinking on

Output:

Part of the long thought process

Output (summary only)

Before explaining the output, just want to mention you should try this prompt and see its thought process play out in real time. It is far more impressive than the output.

In December, V3.2 gave a flat A=45%, B=35%, C=20% with loose vesting language.

The reasoning read like the model working something out in real time, while Gemini 3 came back with specific legal mechanisms.

V4 built a proper points-based contribution model.

Sweat equity points assigned per month of full-time work, a 1.2x scarcity multiplier applied to B's future contribution as CTO, and past service credited directly into vesting percentages.

A gets 12.5% immediate vest for 6 months served, B gets 6.25% for 3 months. C gets a 9-month forfeiture trigger if they don't transition to full-time after funding.

And a 3% ESOP reserve is held separately and released only on that transition.

Final split: A=45%, B=33%, C=22% (moving to 25% on full-time conversion)

You can follow the logic, push back on assumptions, and hand the output to a lawyer as a starting document.

V3.2 output wasn't usable in that way.

Takeaway: V4's reasoning improvement has closed the gap with Gemini 3 significantly.

Test 3: Long context (V4 tech report)

Prompt: "Here is the DeepSeek V4 technical report. Summarize the 5 most important things a non-technical knowledge worker needs to understand about DeepSeek V4 — what changed, what it means practically, and whether they should switch to it. Keep each point to 2-3 sentences. No jargon."

Model: V4-Pro, Expert Mode, Deep Thinking off, Smart Search off

Output:

We didn’t do this test in December. V3.2's 128K context limit made it impractical for documents this size.

1M context is V4's headline upgrade, so I tested it directly.

The output covered five things, with an honest caveat about ecosystem maturity, and throughout, the framing stayed practical.

Takeaway: V4 passed.

My take

December's conclusion was: DeepSeek isn't bad, but it's nowhere near the top models.

That conclusion no longer holds.

V4 is a step up across all three tests.

It's still trailing Gemini 3.1 and GPT-5.5 at the frontier.

The benchmark gaps are smaller. but the price-to-performance ratio has shifted enough that it's worth taking seriously, especially if you're running long-form workflows or building something on the API.

The naming situation is also finally clean. No more V3.2 / V3.2-Exp / V3.2-Thinking / V3.2-Speciale confusion.

if you tried V3.2 in december and walked away unimpressed, V4 is worth another look.

Reply below: have you tried V4 yet? and has anything changed for you since December?

Until next time,
Vaibhav 🤝🏻

If you read till here, you might find this interesting

#AD 1

Your docs are being read by AI. Is yours ready?

Over 50% of traffic across Mintlify's customer base is now AI agents, not humans. If your docs aren't structured for agents, your product is invisible to AI. Mintlify just raised a $45M Series B to build the knowledge layer for the agent era.

#AD 2

The $60B Anime & Manga Boom Has Escaped Japan

Most people assume anime & manga are Japanese industries. The numbers tell a different story.

For the first time in history, international revenue has surpassed Japan’s. Netflix says viewership tripled in five years, with over half of its 300M subscribers watching.

TOKYOPOP’s been preparing for this moment for nearly 30 years. They helped bring anime and manga to the West in the 90s, becoming one of the industry’s most-respected names.

In the process, they earned licensing contracts with giants like Nintendo and Disney and saw their stories told in 50 countries and 30+ languages. That’s translated to $15M in annual revenue.

And it’s just beginning. The anime & manga market’s projected to grow from $37B today to $60B by 2030. Get 5% in guaranteed TOKYOPOP investor bonus stock by May 6 as they scale toward $50M in targeted 2030 revenue.

This is a paid advertisement for TokyoPop Regulation CF offering. Please read the offering circular at https://invest.tokyopop.com/

Reply

Avatar

or to participate

Keep Reading