When it comes to cutting-edge AI experiments, the latest frontier isn’t just art generators or chatbots—it’s living, breathing virtual worlds that you can step into and explore in real-time. From Microsoft’s AI-generated rendition of Quake to “Oasis,” a generative Minecraft-style environment, researchers have been busy asking: what if AI could create entire universes on the fly? Now, another player has entered the scene: Odyssey, a startup backed by Pixar cofounder Edwin Catmull, is offering a public sneak peek at what it calls “interactive video.” Instead of watching a pre-recorded clip, you’re immersed in a world that adapts to your every keystroke—albeit in a somewhat glitchy, dreamlike state for now.
You might have already read about Microsoft’s AI-powered experiment that recreates Quake II entirely within a neural network, powered by models like Muse and WHAMM. In that browser-based demo, you can shamble through a familiar corridor, shoot distorted enemies, and see bygone 90s graphics reimagined by machine learning—albeit with stuttering frame rates and fuzzy visuals that betray its early-stage nature. Likewise, “Oasis,” a generative Minecraft clone released in late 2024, lets you walk through a blocky terrain built from next-frame prediction rather than voxel engines. Players found it quirky, unpredictable, and often more of a “hallucination” than a playable game, but it was still a proof of concept for what AI can do in sandbox worlds.
The ambition goes further still. Google DeepMind has hinted at assembling a “world model” that simulates physical environments, with whispers of teams dedicated to letting AI imagine entire cities or ecosystems. While those projects remain under wraps, Odyssey’s public research preview is the first time most of us can wander through a procedurally generated environment that feels like a cross between Google Street View and a first-person game—all powered by GPUs rather than game engines.
Odyssey, headquartered in London but backed by investors on both sides of the Atlantic, was co-founded by Oliver Cameron and Jeff Hawke—veterans from the self-driving car world who pivoted from autonomous navigation to world modeling. The crux of their vision? Build a neural network that can generate and stream video frames fast enough to feel interactive—without needing a traditional game engine. With a $27 million war chest, sourced from firms like EQT Ventures, GV, and Air Street Capital, plus the imprimatur of Pixar legend Edwin Catmull (who joined the board in December 2024), Odyssey set out to prove that “interactive video” is more than a catchy buzzword.
On their website, Odyssey describes interactive video as “video you can both watch and interact with, imagined entirely by AI in real-time.” Essentially, think about hitting WASD to roam through a forest glade or a mall corridor, but every frame is synthesized by a neural model predicting what should come next—not pulled from polygons or precomputed textures. The startup even likens it to “an early version of the Holodeck,” though they’re quick to admit things can feel “like exploring a glitchy dream—raw, unstable, but undeniably new.”
So, how does a digital illusion like Odyssey’s come together? At its core lies a “world model” that ingests your camera position, orientation, controller inputs, and a short history of past frames. From there, it predicts the next frame pixel-by-pixel, streaming it back to your screen. According to Odyssey, clusters of NVIDIA H100 GPUs—operating out of data centers in the U.S. and Europe—power this pipeline, allowing the system to generate frames in “as little as 40 ms.” In other words, it can push out 25 frames per second (or more) if network conditions cooperate.
Behind that 40 ms number: a combination of efficient model architectures and optimizations for parallel processing. By sidestepping full-blown rendering engines (like Unreal or Unity), the neural network is free to imagine entire scenes from scratch. Yet that freedom comes at a cost. Because it’s “hallucinating” content on the fly, you’ll notice blurry textures, floating objects, or flickering details—almost like watching a dream where shapes morph and warp in unexpected ways.
When you sign up for the public research preview (which is free for now, assuming you can snag an available GPU slot), you get about two and a half minutes to roam through a randomized environment before the demo resets. In any given session, you might spawn in a misty wooded area with a rustic cabin, teleported to a neon-lit shopping mall, or transported to a nondescript parking lot outside a modern office building. The scenery regenerates every time you reload, so no two explorations are ever quite the same—even if you try to retrace your steps exactly.
Controls are minimal: WASD for movement, mouse or arrow keys to look around, and that’s about it. The experience isn’t polished—push against a wall and you might clip through it, watch a tree bend in half as you stride by, or see the same parking lot barrier appear, vanish, and reappear in a new spot. But that’s part of the charm; you’re peeking under the hood of an AI that’s still learning how the real world should behave.
It’s easy to point out the rough edges. Frame rates dip if your connection wavers, textures are fuzzy, and physics is rudimentary—objects don’t consistently collide, shadows flicker, and lighting can be inconsistent. Yet for anyone who grew up marveling at Google Street View or flicking between game maps, there’s an undeniable thrill in stepping into a digital realm that is literally made up as you go along.
Odyssey themselves caution that this is an “early version,” emphasizing that interactive video feels more like an experiment than a product. Edwin Catmull, who helped steer Pixar to its industry-changing roots, is pragmatic about its limitations: he sees neural networks improving over time—perhaps layering on more filters or training on higher-resolution footage—to eventually yield crisper, more stable environments.
While Odyssey focuses on immersive, explorable video, Microsoft’s WHAMM demo aims to replicate existing game logic—rendering Quake II from scratch, complete with basic enemy AI and collision detection. However, WHAMM’s world is confined to a single level, lurching from corridor to corridor with choppy visuals, whereas Odyssey’s worlds are entirely fluid and generative (at the expense of polish). Meanwhile, projects like “Oasis” (the AI-powered Minecraft clone) use “next-frame prediction” to fill in terrain and mobs but lack coherent physics or world persistence, often culminating in surreal hallucinations. Odyssey’s advantage lies in delivering a more consistent—albeit still fuzzy—sense of space, rather than pure block-by-block reconstruction.
Another distinction: Oasis was crowd-funded and used open-source datasets to train its “block world.” Odyssey, by contrast, employs proprietary footage captured via high-resolution backpack cameras—giving its model a deeper well of real-world geometry to learn from. That means trees, buildings, and cars look (slightly) more grounded, even if they slip into abstraction when the model gets confused.
Even in its ragged form, Odyssey’s interactive video represents a fundamental shift in how narratives could be told. Instead of filmmakers or game designers painstakingly crafting every frame, AI systems could generate bespoke worlds on demand—tailored to each user’s whims. Imagine gods of cinema handing you the reins: wander into a medieval village, pull the lever on a dungeon gate, and watch as the AI spins cobblestone streets, torches, and lurking goblins in real time.
For storytellers, that means no more fixed scripts or render-farms; instead, narrative arcs could branch infinitely, adapting to viewer choices. Whether it’s “choose your own adventure” films that reset each time you hit a choice point or open-ended VR experiences where the world itself learns from player behavior, interactive video could be the bedrock of Web 4.0, an era where the line between creator and consumer blurs.
Of course, this raises questions about artistry and control: will AI-driven worlds feel soulless without human touch? Can a neural network deliver emotional beats the way a Pixar film does? Catmull and his board colleagues argue it’s not about replacing artists, but about empowering them—giving them tools to iterate faster, explore more ideas, and hand-edit AI outputs through engines like Unreal or Blender.
Odyssey is already eyeing its next milestones. They aim to upgrade from 2.5-minute demo loops to longer stretches, improve resolution beyond the current ~640×360-ish feel, and integrate rudimentary physics so objects collide and react more realistically. Down the road, they envision “world simulators” that understand gravity, fluid dynamics, and object materials—so you could push a crate, watch it tumble, and hear footsteps echoing.
Meanwhile, studios may catch on: by December 2024, Odyssey announced plans to let creators export AI-generated scenes into traditional pipelines (like After Effects or game engines), where human artists can refine details. That hybrid model—AI crafting a first draft, humans adding polish—could slash production costs for indie filmmakers and small studios.
And as generative models continue to evolve, there’s plenty of competition. Projects like GenEx and PhysGen3D are pushing the envelope on photo-realism from single images or structured environments. Researchers are already exploring how to imbue AI worlds with accurate lighting, material properties, and physics. In less than a year, what looks like a dream today could feel indistinguishable from a traditional video game—or even rival a low-budget film.
Odyssey’s interactive video is a peek behind the curtain at how AI might redefine immersive entertainment. It’s rough around the edges—textures blur, physics glitch, and timeouts are common—but it’s also undeniably thrilling to step into a world that is literally imagined in real time. With Edwin Catmull’s guidance and $27 million in funding, Odyssey is betting that the thrill of limitless, procedurally generated worlds will outweigh the early limitations. And if they succeed, the way we play, watch, and tell stories could change forever.
Discover more from GadgetBond
Subscribe to get the latest posts sent to your email.
