Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

Google perhaps eradicating outdated At a Look widget on Pixel telephones

November 12, 2025

This analyst simply raised his worth goal on Village Farms

November 12, 2025

Uzbek Ambassador in Abu Dhabi Hosts Reception to Mark Nationwide Day

November 12, 2025
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • Google perhaps eradicating outdated At a Look widget on Pixel telephones
  • This analyst simply raised his worth goal on Village Farms
  • Uzbek Ambassador in Abu Dhabi Hosts Reception to Mark Nationwide Day
  • J&T strikes 80M parcels a day—how did it grow to be a courier powerhouse?
  • 27 scientists in Eire on Extremely Cited Researchers listing
  • A Community Chief Powering India’s Digital Future
  • Tremendous Mario Galaxy Film will get first trailer, new casting particulars
  • Honasa widens premium play with oral magnificence wager, says fast commerce drives 10% of complete income
Wednesday, November 12
NextTech NewsNextTech News
Home - AI & Machine Learning - Meta AI Releases V-JEPA 2: Open-Supply Self-Supervised World Fashions for Understanding, Prediction, and Planning
AI & Machine Learning

Meta AI Releases V-JEPA 2: Open-Supply Self-Supervised World Fashions for Understanding, Prediction, and Planning

NextTechBy NextTechJune 12, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Meta AI Releases V-JEPA 2: Open-Supply Self-Supervised World Fashions for Understanding, Prediction, and Planning
Share
Facebook Twitter LinkedIn Pinterest Email


Meta AI has launched V-JEPA 2, a scalable open-source world mannequin designed to study from video at web scale and allow strong visible understanding, future state prediction, and zero-shot planning. Constructing upon the joint-embedding predictive structure (JEPA), V-JEPA 2 demonstrates how self-supervised studying from passive web video, mixed with minimal robotic interplay knowledge, can yield a modular basis for clever bodily brokers.

Scalable Self-Supervised Pretraining from 1M Hours of Video

V-JEPA 2 is pretrained on over 1 million hours of internet-scale video mixed with 1 million photos. Utilizing a visible masks denoising goal, the mannequin learns to reconstruct masked spatiotemporal patches in a latent illustration area. This method avoids the inefficiencies of pixel-level prediction by specializing in predictable scene dynamics whereas disregarding irrelevant noise.

To scale JEPA pretraining to this stage, Meta researchers launched 4 key methods:

  • Information scaling: Constructed a 22M-sample dataset (VideoMix22M) from public sources like SSv2, Kinetics, HowTo100M, YT-Temporal-1B, and ImageNet.
  • Mannequin scaling: Expanded the encoder capability to over 1B parameters utilizing ViT-g.
  • Coaching schedule: Adopted a progressive decision technique and prolonged pretraining to 252K iterations.
  • Spatial-temporal augmentation: Educated on progressively longer and higher-resolution clips, reaching 64 frames at 384×384 decision.

These design selections led to an 88.2% common accuracy throughout six benchmark duties—together with SSv2, Diving-48, Jester, Kinetics, COIN, and ImageNet—surpassing earlier baselines.

Understanding through Masked Illustration Studying

V-JEPA 2 reveals sturdy movement understanding capabilities. On the One thing-One thing v2 benchmark, it achieves 77.3% top-1 accuracy, outperforming fashions like InternVideo and VideoMAEv2. For look understanding, it stays aggressive with state-of-the-art image-text pretraining fashions like DINOv2 and PEcoreG.

The encoder’s representations had been evaluated utilizing attentive probes, verifying that self-supervised studying alone can yield transferable and domain-agnostic visible options relevant throughout numerous classification duties.

Temporal Reasoning through Video Query Answering

To evaluate temporal reasoning, the V-JEPA 2 encoder is aligned with a multimodal massive language mannequin and evaluated on a number of video question-answering duties. Regardless of missing language supervision throughout pretraining, the mannequin achieves:

  • 84.0% on PerceptionTest
  • 76.9% on TempCompass
  • 44.5% on MVP
  • 36.7% on TemporalBench
  • 40.3% on TOMATO

These outcomes problem the idea that visual-language alignment requires co-training from the beginning, demonstrating {that a} pretrained video encoder may be aligned publish hoc with sturdy generalization.

V-JEPA 2-AC: Studying Latent World Fashions for Robotic Planning

A key innovation on this launch is V-JEPA 2-AC, an action-conditioned variant of the pretrained encoder. Advantageous-tuned utilizing solely 62 hours of unlabeled robotic video from the Droid dataset, V-JEPA 2-AC learns to foretell future video embeddings conditioned on robotic actions and poses. The structure is a 300M parameter transformer with block-causal consideration, skilled utilizing a teacher-forcing and rollout goal.

This enables zero-shot planning by model-predictive management. The mannequin infers motion sequences by minimizing the gap between imagined future states and visible objectives utilizing the Cross-Entropy Methodology (CEM). It achieves excessive success in duties akin to reaching, greedy, and pick-and-place on unseen robotic arms in numerous labs—with none reward supervision or extra knowledge assortment.

Screenshot 2025 06 12 at 1.06.59 AM 1

Benchmarks: Sturdy Efficiency and Planning Effectivity

In comparison with baselines like Octo (conduct cloning) and Cosmos (latent diffusion world fashions), V-JEPA 2-AC:

  • Executes plans in ~16 seconds per step (versus 4 minutes for Cosmos).
  • Reaches a 100% success fee on attain duties.
  • Outperforms others in grasp and manipulation duties throughout object sorts.
Screenshot 2025 06 12 at 1.07.34 AM

Notably, it operates utilizing a monocular RGB digital camera with out calibration or environment-specific fine-tuning, reinforcing the generalization functionality of the discovered world mannequin.

Conclusion

Meta’s V-JEPA 2 represents a big development in scalable self-supervised studying for bodily intelligence. By decoupling commentary studying from motion conditioning and leveraging large-scale passive video, V-JEPA 2 demonstrates that general-purpose visible representations may be harnessed for each notion and management in the true world.


Try the Paper, Fashions on Hugging Face and GitHub Web page. All credit score for this analysis goes to the researchers of this challenge. Additionally, be happy to comply with us on Twitter and don’t neglect to affix our 99k+ ML SubReddit and Subscribe to our Publication.


Screen Shot 2021 09 14 at 9.02.24 AM

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

Screenshot 2025 06 08 at 3.40.53%E2%80%AFPM
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

Maya1: A New Open Supply 3B Voice Mannequin For Expressive Textual content To Speech On A Single GPU

November 12, 2025

Methods to Cut back Price and Latency of Your RAG Software Utilizing Semantic LLM Caching

November 12, 2025

Baidu Releases ERNIE-4.5-VL-28B-A3B-Considering: An Open-Supply and Compact Multimodal Reasoning Mannequin Beneath the ERNIE-4.5 Household

November 12, 2025
Add A Comment
Leave A Reply Cancel Reply

Economy News

Google perhaps eradicating outdated At a Look widget on Pixel telephones

By NextTechNovember 12, 2025

The At a Look Widget on Google Pixel telephones has been the bane of my…

This analyst simply raised his worth goal on Village Farms

November 12, 2025

Uzbek Ambassador in Abu Dhabi Hosts Reception to Mark Nationwide Day

November 12, 2025
Top Trending

Google perhaps eradicating outdated At a Look widget on Pixel telephones

By NextTechNovember 12, 2025

The At a Look Widget on Google Pixel telephones has been the…

This analyst simply raised his worth goal on Village Farms

By NextTechNovember 12, 2025

Village Farms’ breakout second quarter wasn’t a one-off, in keeping with Beacon…

Uzbek Ambassador in Abu Dhabi Hosts Reception to Mark Nationwide Day

By NextTechNovember 12, 2025

His Excellency Suhail Mohamed Al Mazrouei, UAE Minister of Vitality and Infrastructure,…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!