Meta is quietly building a new image-and-video generation model code-named Mango — a multimodal, foundation-class system meant to do for visuals what large language models have done for text: create, edit and understand images and moving pictures from prompts and existing media. Insiders say Mango is being tuned not just for photorealism but for temporal coherence (fewer frame-to-frame glitches), fine stylistic control, and the ability to “understand” a clip well enough to summarize, re-edit, or spin out variations.
That technical ambition matters because Mango isn’t being developed as a gimmick or an external API play. Meta wants the model to sit inside Facebook, Instagram and Messenger — powering native creator tools, automated ad creative, and on-the-fly editing that could let a small brand generate tailored product videos without a human shoot. The product pitch is straightforward: if the model can match or surpass rivals on quality while running in Meta’s production stack, the company can turn model outputs into features that reach billions overnight.
Under the hood, Mango is reportedly being built as a true multimodal foundation: one system that both generates visual media and forms a shared understanding of it. That dual role would let Mango do things today’s filter suites can’t — for example, watch a long clip, pull out the five most compelling moments, and then automatically make a localized cut or a stylized micro-ad optimized for different markets. It’s the difference between a tool that paints a picture and a tool that can read, rewrite and repurpose an entire visual narrative.
The Mango project sits at the center of a renewed AI offensive inside Meta, organized under a newly formed group sometimes referred to as Meta Superintelligence Labs and overseen by Alexandr Wang, the former Scale AI founder whom Meta recruited to accelerate its machine-learning push. The reorganization, plus a wave of high-profile hires, signals that the company is trying to convert raw compute and talent into practical product advantages rather than purely academic releases.
Mango is being developed in parallel with a next-generation text model code-named Avocado — a system that sources describe as especially focused on coding and software-agent capabilities. Put together, Mango and Avocado would give Meta a vertically integrated stack: Avocado could generate and orchestrate software and copy, while Mango supplies visuals and video, enabling workflows where language, logic and imagery are produced by the same family of models. That’s part product roadmap, part strategic hedge: owning the full pipeline reduces engineering friction and creates new monetizable hooks for ads, creator subscriptions and enterprise tools.
If Mango ships broadly, the near-term user impact is simple to imagine. Instagram and Facebook Reels could offer native text-to-video and text-to-image creators; ad systems could automatically assemble bespoke creative variations targeted to micro-audiences; and the editing workflow for creators could shift from “shoot, then edit” to “describe, generate, refine.” For small creators and brands, that lowers the cost of production; for Meta, it means more content, more engagement signals and more inventory to monetize.
But power cuts both ways. Hyper-realistic, easily produced video raises acute risks — deepfakes, political misinformation, and copyright conflicts — inside platforms already under regulatory scrutiny in the U.S. and EU. How Meta chooses to handle provenance, labeling, watermarking and takedowns for Mango outputs will be as determinative as the model’s raw capability. The company’s choices here will shape not only product trust but also the legal and policy battles it faces around elections and platform safety.
There are obvious marketplace realities, too. OpenAI, Google and several Chinese players have already demonstrated photorealistic image generation and increasingly competent video prototypes; Mango will not arrive in a vacuum. Meta’s principal competitive edge isn’t novelty in model design so much as distribution: a global social graph, a huge base of creators, and an ad business that can convert new features into immediate revenue. That makes Mango a high-stakes product bet — if it works, Meta can quickly entrench it; if it misfires, the company still faces the reputational and regulatory costs of deploying potent generative tools at scale.
On timing, people familiar with Meta’s plans say Mango and Avocado are being targeted for internal testing, partner pilots and limited creator betas ahead of a broader push in the first half of 2026. Expect staged rollouts: early previews for selected creators and advertisers, followed by deeper integration in the consumer apps if safety and quality metrics check out. That staged approach reflects both the technical challenge of video generation and the company’s need to get governance right before a wide-scale release.
Finally, this is an infrastructure story as much as an algorithm one. Meta has signaled — internally and to investors — plans for large investments in data centers and AI compute to support models of this scale. Mango will therefore be more than an app feature; it will be a justification for a multi-year, multi-billion-dollar build-out that reshapes how Meta allocates capital across research, chips and services. That fact helps explain the urgency behind the new lab structure and the executive attention being paid to these projects.
Mango’s plausibility rests on two threads that are easy to underplay: technical execution and governance. Building a model that is both creatively useful and safe at scale is fiendishly difficult; policing its outputs at the speed of social media is even harder. For users and creators, the immediate question isn’t only what Mango can make, but what Meta will make permissible — and whether the company can keep ahead of misuse as the model goes from lab demos to billions of feeds.
In the end, Mango is Meta’s attempt to make AI-generated imagery and video a native feature of social media rather than a separate curiosity — to make the platform itself a canvas that sometimes never saw a camera. Whether the company pulls that off will depend on engineering, economics and choices about safety and transparency that will be argued over in boardrooms and courtrooms as much as in R&D labs.
Discover more from GadgetBond
Subscribe to get the latest posts sent to your email.
