Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

Is Curaleaf nonetheless a purchase?

December 4, 2025

Impartial Ladybird Browser Constructed On New Internet Engine

December 4, 2025

AI Interview Sequence #4: Transformers vs Combination of Specialists (MoE)

December 4, 2025
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • Is Curaleaf nonetheless a purchase?
  • Impartial Ladybird Browser Constructed On New Internet Engine
  • AI Interview Sequence #4: Transformers vs Combination of Specialists (MoE)
  • Gemini tops Google India’s AI search traits for 2025
  • Easy methods to save cell knowledge on MTN, Airtel, Glo, and 9mobile
  • Are you pondering of a profession in Cork’s skilled providers area?
  • Sony’s A7 V Arrives with Quiet Velocity and Sharp Focus, Because of 33MP Partially Stacked Sensor
  • ELEVATE and Ennismore Host RAK’s Largest Occasion of the Season to Unveil the AED 1.8 M Mondrian Al Marjan Island Seaside Residences
Thursday, December 4
NextTech NewsNextTech News
Home - Robotics & Automation - Educating robotic insurance policies with out new demonstrations: interview with Jiahui Zhang and Jesse Zhang
Robotics & Automation

Educating robotic insurance policies with out new demonstrations: interview with Jiahui Zhang and Jesse Zhang

NextTechBy NextTechDecember 4, 2025No Comments7 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Educating robotic insurance policies with out new demonstrations: interview with Jiahui Zhang and Jesse Zhang
Share
Facebook Twitter LinkedIn Pinterest Email


The ReWiND methodology, which consists of three phases: studying a reward perform, pre-training, and utilizing the reward perform and pre-trained coverage to be taught a brand new language-specified job on-line.

Of their paper ReWiND: Language-Guided Rewards Educate Robotic Insurance policies with out New Demonstrations, which was offered at CoRL 2025, Jiahui Zhang, Yusen Luo, Abrar Anwar, Sumedh A. Sontakke, Joseph J. Lim, Jesse Thomason, Erdem Bıyık and Jesse Zhang introduce a framework for studying robotic manipulation duties solely from language directions with out per-task demonstrations. We requested Jiahui Zhang and Jesse Zhang to inform us extra.

What’s the matter of the analysis in your paper, and what downside have been you aiming to resolve?

Our analysis addresses the issue of enabling robotic manipulation insurance policies to resolve novel, language-conditioned duties with out amassing new demonstrations for every job. We start with a small set of demonstrations within the deployment atmosphere, practice a language-conditioned reward mannequin on them, after which use that discovered reward perform to fine-tune the coverage on unseen duties, with no extra demonstrations required.

Inform us about ReWiND – what are the principle options and contributions of this framework?

ReWiND is an easy and efficient three-stage framework designed to adapt robotic insurance policies to new, language-conditioned duties with out amassing new demonstrations. Its most important options and contributions are:

  1. Reward perform studying within the deployment atmosphere
    We first be taught a reward perform utilizing solely 5 demonstrations per job from the deployment atmosphere.
    • The reward mannequin takes a sequence of photos and a language instruction, and predicts per-frame progress from 0 to 1, giving us a dense reward sign as a substitute of sparse success/failure.
    • To reveal the mannequin to each profitable and failed behaviors with out having to gather failed conduct demonstrations, we introduce a video rewind augmentation: For a video segmentation V(1:t), we select an intermediate level t1. We reverse the phase V(t1:t) to create V(t:t1) and append it again to the unique sequence. This generates an artificial sequence that resembles “making progress then undoing progress,” successfully simulating failed makes an attempt.
    • This enables the reward mannequin to be taught a smoother and extra correct dense reward sign, bettering generalization and stability throughout coverage studying.
  2. Coverage pre-training with offline RL
    As soon as we’ve the discovered reward perform, we use it to relabel the small demonstration dataset with dense progress rewards. We then practice a coverage offline utilizing these relabeled trajectories.
  3. Coverage fine-tuning within the deployment atmosphere
    Lastly, we adapt the pre-trained coverage to new, unseen duties within the deployment atmosphere. We freeze the reward perform and use it because the suggestions for on-line reinforcement studying. After every episode, the newly collected trajectory is relabeled with dense rewards from the reward mannequin and added to the replay buffer. This iterative loop permits the coverage to repeatedly enhance and adapt to new duties with out requiring any extra demonstrations.

May you speak in regards to the experiments you carried out to check the framework?

We consider ReWiND in each the MetaWorld simulation atmosphere and the Koch real-world setup. Our evaluation focuses on two points: the generalization capability of the reward mannequin and the effectiveness of coverage studying. We additionally examine how properly completely different insurance policies adapt to new duties below our framework, demonstrating vital enhancements over state-of-the-art strategies.

(Q1) Reward generalization – MetaWorld evaluation
We gather a metaworld dataset in 20 coaching duties, every job embody 5 demos, and 17 associated however unseen duties for analysis. We practice the reward perform with the metaworld dataset and a subset of the OpenX dataset.

We examine ReWiND to LIV[1], LIV-FT, RoboCLIP[2], VLC[3], and GVL[4]. For generalization to unseen duties, we use video–language confusion matrices. We feed the reward mannequin video sequences paired with completely different language directions and count on the appropriately matched video–instruction pairs to obtain the best predicted rewards. Within the confusion matrix, this corresponds to the diagonal entries having the strongest (darkest) values, indicating that the reward perform reliably identifies the proper job description even for unseen duties.

Screenshot 2025 11 27 at 15.58.15Video-language reward confusion matrix. See the paper for extra data.

For demo alignment, we measure the correlation between the reward mannequin’s predicted progress and the precise time steps in profitable trajectories utilizing Pearson r and Spearman ρ. For coverage rollout rating, we consider whether or not the reward perform appropriately ranks failed, near-success, and profitable rollouts. Throughout these metrics, ReWiND considerably outperforms all baselines—for instance, it achieves 30% greater Pearson correlation and 27% greater Spearman correlation than VLC on demo alignment, and delivers about 74% relative enchancment in reward separation between success classes in contrast with the strongest baseline LIV-FT.

(Q2) Coverage studying in simulation (MetaWorld)
We pre-train on the identical 20 duties after which consider RL on 8 unseen MetaWorld duties for 100k atmosphere steps.

Utilizing ReWiND rewards, the coverage achieves an interquartile imply (IQM) success charge of roughly 79%, representing a ~97.5% enchancment over the most effective baseline. It additionally demonstrates considerably higher pattern effectivity, reaching greater success charges a lot earlier in coaching.

(Q3) Coverage studying in actual robotic (Koch bimanual arms)
Setup: a real-world tabletop bimanual Koch v1.1 system with 5 duties, together with in-distribution, visually cluttered, and spatial-language generalization duties.
We use 5 demos for the reward mannequin and 10 demos for the coverage on this more difficult setting. With about 1 hour of real-world RL (~50k env steps), ReWiND improves common success from 12% → 68% (≈5× enchancment), whereas VLC solely goes from 8% → 10%.

Are you planning future work to additional enhance the ReWiND framework?

Sure, we plan to increase ReWiND to bigger fashions and additional enhance the accuracy and generalization of the reward perform throughout a broader vary of duties. The truth is, we have already got a workshop paper extending ReWiND to larger-scale fashions.

As well as, we purpose to make the reward mannequin able to immediately predicting success or failure, with out counting on the atmosphere’s success sign throughout coverage fine-tuning. At the moment, though ReWiND gives dense rewards, we nonetheless depend on the atmosphere to point whether or not an episode has been profitable. Our objective is to develop a totally generalizable reward mannequin that may present each correct dense rewards and dependable success detection by itself.

References

[1] Yecheng Jason Ma et al. “Liv: Language-image representations and rewards for robotic management.” Worldwide Convention on Machine Studying. PMLR, 2023.
[2] Sumedh Sontakke et al. “Roboclip: One demonstration is sufficient to be taught robotic insurance policies.” Advances in Neural Data Processing Programs 36 (2023): 55681-55693.
[3] Minttu Alakuijala et al. “Video-language critic: Transferable reward features for language-conditioned robotics.” arXiv:2405.19988 (2024).
[4] Yecheng Jason Ma et al. “Imaginative and prescient language fashions are in-context worth learners.” The Thirteenth Worldwide Convention on Studying Representations. 2024.

Concerning the authors

Screenshot 2025 11 27 at 15.52.22

Jiahui Zhang is a Ph.D. scholar in Laptop Science on the College of Texas at Dallas, suggested by Prof. Yu Xiang. He acquired his M.S. diploma from the College of Southern California, the place he labored with Prof. Joseph Lim and Prof. Erdem Bıyık.

Screenshot 2025 11 27 at 15.53.15

Jesse Zhang is a postdoctoral researcher on the College of Washington, suggested by Prof. Dieter Fox and Prof. Abhishek Gupta. He accomplished his Ph.D. on the College of Southern California, suggested by Prof. Jesse Thomason and Prof. Erdem Bıyık at USC, and Prof. Joseph J. Lim at KAIST.



Smith

Lucy Smith
is Senior Managing Editor for Robohub and AIhub.

Smith


Lucy Smith
is Senior Managing Editor for Robohub and AIhub.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s traits as we speak: learn extra, subscribe to our publication, and grow to be a part of the NextTech neighborhood at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

What producers have to know

December 4, 2025

Empowering the Workforce By means of Robotics and AI

December 3, 2025

MassRobotics Expands Bodily AI Fellowship with AWS and NVIDIA, Opening Purposes for the 2026 Cohort

December 2, 2025
Add A Comment
Leave A Reply Cancel Reply

Economy News

Is Curaleaf nonetheless a purchase?

By NextTechDecember 4, 2025

Beacon Securities analyst Russell Stanley maintained his “Purchase” ranking and C$5.00 value goal on Curaleaf…

Impartial Ladybird Browser Constructed On New Internet Engine

December 4, 2025

AI Interview Sequence #4: Transformers vs Combination of Specialists (MoE)

December 4, 2025
Top Trending

Is Curaleaf nonetheless a purchase?

By NextTechDecember 4, 2025

Beacon Securities analyst Russell Stanley maintained his “Purchase” ranking and C$5.00 value…

Impartial Ladybird Browser Constructed On New Internet Engine

By NextTechDecember 4, 2025

The online browser might be one of the essential instruments in your…

AI Interview Sequence #4: Transformers vs Combination of Specialists (MoE)

By NextTechDecember 4, 2025

Query: MoE fashions include way more parameters…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!