Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

Didero lands $30M to place manufacturing procurement on ‘agentic’ autopilot

February 13, 2026

How Offchain Order Matching Engines Lower Fuel Charges in Crypto Exchanges

February 13, 2026

Lung Most cancers Medicine Exhibits Promising New Potential in Treating Ovarian

February 13, 2026
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • Didero lands $30M to place manufacturing procurement on ‘agentic’ autopilot
  • How Offchain Order Matching Engines Lower Fuel Charges in Crypto Exchanges
  • Lung Most cancers Medicine Exhibits Promising New Potential in Treating Ovarian
  • 2026 B.C. funds wants to guard rebates and incentives that decrease vitality payments
  • YouTube monetization replace: What creators have to know as ‘AI slop’ overwhelms the platform
  • Is This AGI? Google’s Gemini 3 Deep Suppose Shatters Humanity’s Final Examination And Hits 84.6% On ARC-AGI-2 Efficiency Right this moment
  • Marine Institute searching for candidates for 2026 Bursary Programme
  • Moore Threads Achieves Day-0 Compatibility for Zhipu GLM-5 Massive Mannequin, Advancing China’s Home GPU Ecosystem
Friday, February 13
NextTech NewsNextTech News
Home - Robotics & Automation - Interview with Yuki Mitsufuji: Bettering AI picture era
Robotics & Automation

Interview with Yuki Mitsufuji: Bettering AI picture era

NextTechBy NextTechJune 17, 2025No Comments8 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Interview with Yuki Mitsufuji: Bettering AI picture era
Share
Facebook Twitter LinkedIn Pinterest Email



Yuki Mitsufuji is a Lead Analysis Scientist at Sony AI. Yuki and his staff introduced two papers on the latest Convention on Neural Info Processing Techniques (NeurIPS 2024). These works deal with completely different points of picture era and are entitled: GenWarp: Single Picture to Novel Views with Semantic-Preserving Generative Warping and PaGoDA: Progressive Rising of a One-Step Generator from a Low-Decision Diffusion Instructor . We caught up with Yuki to search out out extra about this analysis.

There are two items of analysis we’d wish to ask you about right this moment. May we begin with the GenWarp paper? May you define the issue that you simply had been targeted on on this work?

The issue we aimed to resolve is named single-shot novel view synthesis, which is the place you may have one picture and wish to create one other picture of the identical scene from a distinct digital camera angle. There was a variety of work on this house, however a significant problem stays: when an picture angle modifications considerably, the picture high quality degrades considerably. We wished to have the ability to generate a brand new picture based mostly on a single given picture, in addition to enhance the standard, even in very difficult angle change settings.

How did you go about fixing this drawback – what was your methodology?

The prevailing works on this house are likely to make the most of monocular depth estimation, which suggests solely a single picture is used to estimate depth. This depth data allows us to vary the angle and alter the picture based on that angle – we name it “warp.” In fact, there might be some occluded components within the picture, and there might be data lacking from the unique picture on the right way to create the picture from a special approach. Subsequently, there’s at all times a second section the place one other module can interpolate the occluded area. Due to these two phases, within the current work on this space, geometrical errors launched in warping can’t be compensated for within the interpolation section.

We resolve this drawback by fusing the whole lot collectively. We don’t go for a two-phase method, however do it unexpectedly in a single diffusion mannequin. To protect the semantic that means of the picture, we created one other neural community that may extract the semantic data from a given picture in addition to monocular depth data. We inject it utilizing a cross-attention mechanism, into the principle base diffusion mannequin. Because the warping and interpolation had been accomplished in a single mannequin, and the occluded half could be reconstructed very effectively along with the semantic data injected from outdoors, we noticed the general high quality improved. We noticed enhancements in picture high quality each subjectively and objectively, utilizing metrics similar to FID and PSNR.

Can folks see among the photographs created utilizing GenWarp?

Sure, we even have a demo, which consists of two components. One exhibits the unique picture and the opposite exhibits the warped photographs from completely different angles.

Transferring on to the PaGoDA paper, right here you had been addressing the excessive computational price of diffusion fashions? How did you go about addressing that drawback?

Diffusion fashions are very talked-about, but it surely’s well-known that they’re very pricey for coaching and inference. We deal with this situation by proposing PaGoDA, our mannequin which addresses each coaching effectivity and inference effectivity.

It’s straightforward to speak about inference effectivity, which immediately connects to the velocity of era. Diffusion often takes a variety of iterative steps in the direction of the ultimate generated output – our aim was to skip these steps in order that we may rapidly generate a picture in only one step. Individuals name it “one-step era” or “one-step diffusion.” It doesn’t at all times should be one step; it might be two or three steps, for instance, “few-step diffusion”. Mainly, the goal is to resolve the bottleneck of diffusion, which is a time-consuming, multi-step iterative era methodology.

In diffusion fashions, producing an output is usually a sluggish course of, requiring many iterative steps to provide the ultimate end result. A key pattern in advancing these fashions is coaching a “scholar mannequin” that distills information from a pre-trained diffusion mannequin. This enables for sooner era—typically producing a picture in only one step. These are sometimes called distilled diffusion fashions. Distillation signifies that, given a trainer (a diffusion mannequin), we use this data to coach one other one-step environment friendly mannequin. We name it distillation as a result of we will distill the knowledge from the unique mannequin, which has huge information about producing good photographs.

Nevertheless, each basic diffusion fashions and their distilled counterparts are often tied to a hard and fast picture decision. Which means if we would like a higher-resolution distilled diffusion mannequin able to one-step era, we would want to retrain the diffusion mannequin after which distill it once more on the desired decision.

This makes the complete pipeline of coaching and era fairly tedious. Every time a better decision is required, now we have to retrain the diffusion mannequin from scratch and undergo the distillation course of once more, including important complexity and time to the workflow.

The distinctiveness of PaGoDA is that we practice throughout completely different decision fashions in a single system, which permits it to attain one-step era, making the workflow way more environment friendly.

For instance, if we wish to distill a mannequin for photographs of 128×128, we will do this. But when we wish to do it for an additional scale, 256×256 let’s say, then we must always have the trainer practice on 256×256. If we wish to lengthen it much more for increased resolutions, then we have to do that a number of occasions. This may be very pricey, so to keep away from this, we use the concept of progressive rising coaching, which has already been studied within the space of generative adversarial networks (GANs), however not a lot within the diffusion house. The concept is, given the trainer diffusion mannequin skilled on 64×64, we will distill data and practice a one-step mannequin for any decision. For a lot of decision circumstances we will get a state-of-the-art efficiency utilizing PaGoDA.

May you give a tough concept of the distinction in computational price between your methodology and commonplace diffusion fashions. What sort of saving do you make?

The concept could be very easy – we simply skip the iterative steps. It’s extremely depending on the diffusion mannequin you employ, however a typical commonplace diffusion mannequin previously traditionally used about 1000 steps. And now, fashionable, well-optimized diffusion fashions require 79 steps. With our mannequin that goes down to 1 step, we’re it about 80 occasions sooner, in concept. In fact, all of it is dependent upon the way you implement the system, and if there’s a parallelization mechanism on chips, folks can exploit it.

Is there anything you want to add about both of the tasks?

In the end, we wish to obtain real-time era, and never simply have this era be restricted to pictures. Actual-time sound era is an space that we’re .

Additionally, as you may see within the animation demo of GenWarp, the photographs change quickly, making it appear to be an animation. Nevertheless, the demo was created with many photographs generated with pricey diffusion fashions offline. If we may obtain high-speed era, let’s say with PaGoDA, then theoretically, we may create photographs from any angle on the fly.

Discover out extra:

  • GenWarp: Single Picture to Novel Views with Semantic-Preserving Generative Warping, Junyoung Search engine marketing, Kazumi Fukuda, Takashi Shibuya, Takuya Narihira, Naoki Murata, Shoukang Hu, Chieh-Hsin Lai, Seungryong Kim, Yuki Mitsufuji.
  • GenWarp demo
  • PaGoDA: Progressive Rising of a One-Step Generator from a Low-Decision Diffusion Instructor, Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon.

About Yuki Mitsufuji

Yuki thumbnail

Yuki Mitsufuji is a Lead Analysis Scientist at Sony AI. Along with his position at Sony AI, he’s a Distinguished Engineer for Sony Group Company and the Head of Inventive AI Lab for Sony R&D. Yuki holds a PhD in Info Science & Expertise from the College of Tokyo. His groundbreaking work has made him a pioneer in foundational music and sound work, similar to sound separation and different generative fashions that may be utilized to music, sound, and different modalities.



AIhub square 2021

AIhub
is a non-profit devoted to connecting the AI group to the general public by offering free, high-quality data in AI.

AIhub square 2021


AIhub
is a non-profit devoted to connecting the AI group to the general public by offering free, high-quality data in AI.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

How Sennheiser elevated PCB testing by 33% with a Robotiq 2F-85 gripper

February 12, 2026

Sven Koenig wins the 2026 ACM/SIGAI Autonomous Brokers Analysis Award

February 11, 2026

Nationwide Robotics Week 2026 Underscores Robotics as a Essential U.S. Business and Workforce Engine

February 11, 2026
Add A Comment
Leave A Reply Cancel Reply

Economy News

Didero lands $30M to place manufacturing procurement on ‘agentic’ autopilot

By NextTechFebruary 13, 2026

Tim Spencer realized simply how difficult manufacturing procurement could be whereas working Markai, an e-commerce…

How Offchain Order Matching Engines Lower Fuel Charges in Crypto Exchanges

February 13, 2026

Lung Most cancers Medicine Exhibits Promising New Potential in Treating Ovarian

February 13, 2026
Top Trending

Didero lands $30M to place manufacturing procurement on ‘agentic’ autopilot

By NextTechFebruary 13, 2026

Tim Spencer realized simply how difficult manufacturing procurement could be whereas working…

How Offchain Order Matching Engines Lower Fuel Charges in Crypto Exchanges

By NextTechFebruary 13, 2026

Introduction: Why Are Excessive Fuel Charges a Vital Downside for Crypto Exchanges…

Lung Most cancers Medicine Exhibits Promising New Potential in Treating Ovarian

By NextTechFebruary 13, 2026

A groundbreaking examine spearheaded by researchers on the Mayo Clinic gives transformative…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!