a16z
June 11, 2025

The Ultimate AI Video Stack: Up-to-Date Best Tools to Make Content With AI

Justine, a partner at a16z and an avid AI creator (Venture Twins on X), unveils her curated AI video toolkit. This is your guide to the best-in-class models for turning ideas into compelling video content, from initial generation to final polish.

Crafting Narratives: Text-to-Video with V3

  • "Starting with V3, which I think is currently the best text-to-video model."
  • "I've noticed that the model will often do something weird if you don't put in enough text to fill 8 seconds of audio... I find it's actually better to have more text that gets cut off than too little."
  • V3, accessed via Google Labs Flow (requiring a Google Ultra AI subscription), is touted as the leading text-to-video model, uniquely capable of natively generating audio simultaneously with video.
  • For cohesive multi-scene videos in V3, structure prompts sequentially to avoid jarring jump cuts; describe transitions clearly.
  • To prevent V3 from inserting awkward filler dialogue, ensure your script provides more text than strictly needed for the intended video duration (e.g., for an 8-second clip).

Bringing Stills to Life: Image-to-Video with Cling 2.1

  • "Up next is my favorite model for generating a video from an image... And that is Cling 2.1."
  • "One of the things I really like about Cling is that it's decently hard to mess up, I would say."
  • Cling 2.1 (app.clingai.com) is the preferred tool for animating static images, lauded for its user-friendliness and reliable, high-quality outputs.
  • The "Master" version of Cling 2.1 currently supports animating from a single start frame, with potential for multi-frame support in the future.
  • Cling offers intuitive controls for camera movement (e.g., "camera follows the subject") and allows for the integration of sound into animated visuals.

Giving Characters a Voice: Lip-Syncing with Hedra

  • "Next we're going to be talking about how you make a character speak. And my favorite tool for this is by far Hedra."
  • "Once you clone your own voice with a really short audio script, you can then use it to generate yourself saying anything in the future."
  • Hedra (hedra.com) excels at creating videos of characters speaking, accurately syncing uploaded or generated audio to a provided image.
  • The process involves a start image, an audio script (which can be uploaded, recorded, or AI-generated within Hedra), and a descriptive text prompt.
  • A standout feature is Hedra's voice cloning, enabling users to generate speech in their own voice for avatars. For best results, begin with a character image exhibiting a neutral facial expression.

Cinematic Flair & Multi-Model Play: Higsfield and Krea

  • "Up next is Higsfield, which is a very cool VFX platform." (referring to Higsfield)
  • "Next is my favorite place to use open source models... or to test a bunch of different models in one place. All right, here we are at Krea." (referring to Krea)
  • Higsfield offers tools for "Hollywood grade VFX," enabling users to apply sophisticated effects like "flood" or "action run and set on fire" to images, either uploaded or generated within the platform.
  • Krea functions as a versatile multi-modality hub, perfect for experimentation. It allows running the same prompt and image across diverse models such as Huan or Pika 2.2, facilitating direct comparison.
  • Krea also boasts powerful video enhancers (its own and Topaz), which can upscale resolution, boost frame rates (e.g., to 60fps), and correct imperfections like duplicate frames, polishing outputs to a professional sheen.

Key Takeaways:

  • The AI video landscape is a mosaic of specialized tools, demanding a "stacked" approach rather than a one-size-fits-all solution. This rapid evolution presents opportunities for creators to pioneer new workflows and for investors to back differentiated tools in a booming market. Community sharing, as highlighted by Justine, accelerates discovery in this nascent field.

Actionable Insights:

  • Assemble Your AI Arsenal: Master video creation by strategically combining specialized tools: V3 for text-to-video, Cling 2.1 for image animation, Hedra for lip-sync, Higsfield for VFX, and Krea for multi-model experimentation and enhancement.
  • Master the Art of the Prompt: Precision in prompting is paramount. Sequential descriptions in V3 ensure narrative coherence, while ample text for audio prevents awkward AI-generated filler.
  • Iterate, Enhance, Conquer: Beyond initial generation, platforms like Krea are crucial for refining AI video, offering upscaling, frame rate boosts, and cross-model comparisons to achieve professional-grade outputs.

For further insights, watch the full podcast: Link

This episode unveils a leading venture capitalist's curated AI video toolkit, offering a practical guide to the best models for specific creative tasks and highlighting the rapidly evolving landscape of AI-driven content creation.

Meet Justine: VC, Creator, and AI Video Enthusiast

  • Justine, a partner at venture capital firm A16Z and an active AI creator known as Venture Twins on X, kicks off the discussion by sharing her passion for AI video tools. She acknowledges the challenge many face: “it can be pretty overwhelming to figure out which model to use for a specific task.” This sets the stage for her to reveal her personal AI video stack, designed for consumer creators seeking optimal results.
  • Speaker Insight: Justine's dual role as an investor and a hands-on creator provides a unique blend of market awareness and practical, user-centric experience.
  • Strategic Implication: The proliferation of specialized AI models underscores the need for curated stacks and platforms that simplify tool discovery and integration, a potential area for investment or research in the Crypto AI space, particularly concerning decentralized marketplaces for AI services.

V3 by Google: Mastering Text-to-Video Generation

  • Justine identifies V3, accessed via Google Labs' "Flow" tool (labs.google/fx/tools/flow), as the current leading text-to-video model—an AI system that generates video sequences from textual descriptions. Access requires a Google Ultra AI subscription.
  • She emphasizes that for native audio generation simultaneous with video, only the text-to-video function within V3 is effective; frames-to-video (animating sequences of images) and "ingredients-to-video" (combining elements like characters and scenes) do not support this integrated audio feature.
  • Settings Tip: Justine recommends setting two outputs per prompt to manage credit consumption and ensuring the model is set to V3, as it can default to V2.
  • Prompting Strategy: While some users employ highly detailed prompts, Justine prefers simpler prompts, iterating based on results. She advises describing scenes sequentially to avoid disjointed jump cuts. For instance, “drone shots flying through starts in a large full of shoes and flies through halfway into a new room full of paintings.”
  • For dialogue, she notes, “it's actually better to have more text that gets cut off than too little” to prevent the model from inserting awkward filler words in videos with short audio segments.
  • Actionable Insight: The credit-based system for V3 highlights the compute-intensive nature of advanced AI video generation. Crypto AI researchers might explore how decentralized compute networks could offer more cost-effective or accessible alternatives for such models.

Cling 2.1: Animating Still Images with Precision

  • For transforming static images into dynamic videos (image-to-video), Justine favors Cling 2.1, accessible at app.clingai.com. She advises selecting the "Master" version for higher quality outputs.
  • Currently, Cling 2.1 supports only a start frame, though Justine anticipates the addition of end-frame capabilities soon.
  • She demonstrates by animating an image of a lightsaber battle and a cat with a boombox, showcasing the model's ability to generate action and maintain character consistency.
  • Justine appreciates Cling's user-friendliness: “One of the things I really like about Cling is that it's decently hard to mess up, I would say.”
  • The platform also allows for adding sound effects, which Justine experiments with by prompting “lightsaber battle” for the corresponding video.
  • Strategic Implication: The focus on user experience and robust default outputs in tools like Cling 2.1 suggests a trend towards democratizing AI video creation. Investors should watch for platforms that successfully abstract complexity while delivering powerful results.

Hedra: Bringing Characters to Life with Synchronized Speech

  • Hedra (hedra.com) is Justine's top choice for making characters speak realistically. The process requires a start frame (character image), an audio script, and a text prompt.
  • Key inputs include selecting the model (Hedra's Character 3), video dimensions (aspect ratio – the proportional relationship between width and height), and resolution (the detail an image holds).
  • Hedra offers flexibility in audio input: generating speech within the platform, recording audio, or uploading existing audio files. Justine highlights its voice cloning feature, which she used to create an AI version of her own voice.
  • She demonstrates with two examples: her Ghibli-style avatar and a “baby podcaster” image. For multi-character scenes, Hedra allows users to drag and select the specific face to be animated.
  • A tip for better results: “I've noticed that Hijra is better when you start with a neutral face from the character,” especially if the accompanying audio isn't emotionally congruent with an initial expressive face.
  • Crypto AI Relevance: The ability to clone voices and animate avatars has significant implications for digital identity and presence in metaverses or decentralized social platforms. Secure and verifiable AI-generated personas could become a key research area.

Higsfield: Crafting Hollywood-Grade Visual Effects

  • Justine introduces Higsfield as a platform for creating sophisticated VFX (Visual Effects). A key feature is the ability to browse and run effects created by other users.
  • She showcases this by applying a “flood” effect to a pixel-style cat image, noting her pleasant surprise: “I wasn't sure actually if it would animate it in pixel style, but I think it did a pretty good job.”
  • Another example involves generating a “shot of woman running through library” and applying an “action run set on fire” effect using Higsfield's built-in model.
  • Actionable Insight: Platforms like Higsfield, which combine AI generation with community-driven effects libraries, point towards collaborative ecosystems in content creation. Crypto AI could enhance such platforms through tokenized incentives for effect creators or decentralized governance of asset libraries.

Krea: A Hub for Open-Source Models and Video Enhancement

  • Justine highlights Krea (transcribed as "Korea" but widely known as Krea AI) as her preferred platform for utilizing open-source models—AI models whose source code is publicly available—like "Juan" (likely a transcription of a model like AnimateDiff or similar, given the context of open-source video models) and "Hunan" (possibly another open-source model or a typo), and for testing multiple models simultaneously.
  • Krea is a multimodality generation and editing platform, handling both image and video. Justine demonstrates by generating "anime dogs" in Krea's image tab, then transitioning one image to the video tool.
  • The platform allows running the same prompt and starting image across various models (e.g., "One," Pika 2.2, "How Low" – likely another model name).
  • A standout feature is Krea's suite of enhancers—tools that improve video quality. Justine uses Topaz (a popular third-party enhancer) integrated within Krea to upscale a video to 60 frames per second (FPS)—the rate at which consecutive images appear on a display—and fix duplicate frames.
  • “This is one of my favorite things about Krea, which is you can run sort of all of these tools on top of your AI outputs in one place.”
  • Strategic Implication: The integration of diverse models, including open-source options, and post-processing tools within a single platform like Krea is powerful. For Crypto AI investors, this signals opportunities in aggregator platforms or decentralized AI marketplaces that offer similar composability and access to a wide range of specialized AI services.

Justine's Closing Thoughts: The Evolving AI Creative Landscape

  • Justine concludes by emphasizing the rapid evolution of AI creative tools and the collaborative nature of the community. She invites viewers to share their own "creator stack,” highlighting that “we're all so early and there are so many new tools and workflows to try out.”
  • Speaker's Perspective: Justine's closing remarks underscore her genuine enthusiasm and belief in the community's role in discovering and sharing new AI capabilities.

Conclusion: Navigating the AI Video Frontier

  • This overview of Justine's AI video stack reveals a dynamic landscape where specialized tools excel at distinct tasks. For Crypto AI investors and researchers, the key takeaway is the accelerating development and accessibility of sophisticated AI content creation, signaling opportunities in decentralized compute, open-source model ecosystems, and novel media platforms.

Others You May Like