Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

Alibaba Cloud to Construct Hyperscale Computing Heart in Shanghai’s Jinshan District

March 11, 2026

How Durham, North Carolina, kick-started reasonably priced housing improvement

March 11, 2026

TikTok permitted to maintain Canadian operations with new guidelines

March 11, 2026
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • Alibaba Cloud to Construct Hyperscale Computing Heart in Shanghai’s Jinshan District
  • How Durham, North Carolina, kick-started reasonably priced housing improvement
  • TikTok permitted to maintain Canadian operations with new guidelines
  • Bio-inspired robo-dolphin might quickly be vacuuming oil off the ocean’s floor
  • Jupiter’s moons go away chilly ‘footprints’ within the planet’s auroras, James Webb House Telescope finds
  • Alphamab Oncology Appoints Dr. Hongwei Wang as Chief Expertise Officer
  • How one can Construct a Worthwhile On-line Enterprise from Scratch in 2026
  • China Performs the Lengthy Recreation in AI Whereas US Chases Superintelligence: Brookings
Wednesday, March 11
NextTech NewsNextTech News
Home - AI & Machine Learning - Qwen Crew Introduces Qwen-Picture-Edit: The Picture Enhancing Model of Qwen-Picture with Superior Capabilities for Semantic and Look Enhancing
AI & Machine Learning

Qwen Crew Introduces Qwen-Picture-Edit: The Picture Enhancing Model of Qwen-Picture with Superior Capabilities for Semantic and Look Enhancing

NextTechBy NextTechAugust 19, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Qwen Crew Introduces Qwen-Picture-Edit: The Picture Enhancing Model of Qwen-Picture with Superior Capabilities for Semantic and Look Enhancing
Share
Facebook Twitter LinkedIn Pinterest Email


Within the area of multimodal AI, instruction-based picture enhancing fashions are reworking how customers work together with visible content material. Simply launched in August 2025 by Alibaba’s Qwen Crew, Qwen-Picture-Edit builds on the 20B-parameter Qwen-Picture basis to ship superior enhancing capabilities. This mannequin excels in semantic enhancing (e.g., fashion switch and novel view synthesis) and look enhancing (e.g., exact object modifications), whereas preserving Qwen-Picture’s power in complicated textual content rendering for each English and Chinese language. Built-in with Qwen Chat and out there through Hugging Face, it lowers obstacles for skilled content material creation, from IP design to error correction in generated art work.

Structure and Key Improvements

Qwen-Picture-Edit extends the Multimodal Diffusion Transformer (MMDiT) structure of Qwen-Picture, which includes a Qwen2.5-VL multimodal giant language mannequin (MLLM) for textual content conditioning, a Variational AutoEncoder (VAE) for picture tokenization, and the MMDiT spine for joint modeling. For enhancing, it introduces twin encoding: the enter picture is processed by Qwen2.5-VL for high-level semantic options and the VAE for low-level reconstructive particulars, concatenated within the MMDiT’s picture stream. This permits balanced semantic coherence (e.g., sustaining object id throughout pose adjustments) and visible constancy (e.g., preserving unmodified areas).

The Multimodal Scalable RoPE (MSRoPE) positional encoding is augmented with a body dimension to distinguish pre- and post-edit photographs, supporting duties like text-image-to-image (TI2I) enhancing. The VAE, fine-tuned on text-rich information, achieves superior reconstruction with 33.42 PSNR on basic photographs and 36.63 on text-heavy ones, outperforming FLUX-VAE and SD-3.5-VAE. These enhancements permit Qwen-Picture-Edit to deal with bilingual textual content edits whereas retaining authentic font, measurement, and magnificence.

Key Options of Qwen-Picture-Edit

  • Semantic and Look Enhancing: Helps low-level visible look enhancing (e.g., including, eradicating, or modifying parts whereas maintaining different areas unchanged) and high-level visible semantic enhancing (e.g., IP creation, object rotation, and magnificence switch, permitting pixel adjustments with semantic consistency).
  • Exact Textual content Enhancing: Allows bilingual (Chinese language and English) textual content enhancing, together with direct addition, deletion, and modification of textual content in photographs, whereas preserving the unique font, measurement, and magnificence.
  • Robust Benchmark Efficiency: Achieves state-of-the-art outcomes on a number of public benchmarks for picture enhancing duties, positioning it as a strong basis mannequin for era and manipulation.

Coaching and Information Pipeline

Leveraging Qwen-Picture’s curated dataset of billions of image-text pairs throughout Nature (55%), Design (27%), Individuals (13%), and Artificial (5%) domains, Qwen-Picture-Edit employs a multi-task coaching paradigm unifying T2I, I2I, and TI2I goals. A seven-stage filtering pipeline refines information for high quality and steadiness, incorporating artificial textual content rendering methods (Pure, Compositional, Complicated) to deal with long-tail points in Chinese language characters.

Coaching makes use of circulate matching with a Producer-Client framework for scalability, adopted by supervised fine-tuning and reinforcement studying (DPO and GRPO) for choice alignment. For editing-specific duties, it integrates novel view synthesis and depth estimation, utilizing DepthPro as a trainer mannequin. This ends in strong efficiency, similar to correcting calligraphy errors by chained edits.

Screenshot 2025 08 18 at 4.06.26 PM 1

Superior Enhancing Capabilities

Qwen-Picture-Edit shines in semantic enhancing, enabling IP creation like producing MBTI-themed emojis from a mascot (e.g., Capybara) whereas preserving character consistency. It helps 180-degree novel view synthesis, rotating objects or scenes with excessive constancy, attaining 15.11 PSNR on GSO—surpassing specialised fashions like CRM. Fashion switch transforms portraits into creative types, similar to Studio Ghibli, sustaining semantic integrity.

For look enhancing, it provides parts like signboards with sensible reflections or removes high-quality particulars like hair strands with out altering environment. Bilingual textual content enhancing is exact: altering “Hope” to “Qwen” on posters or correcting Chinese language characters in calligraphy through bounding containers. Chained enhancing permits iterative corrections, e.g., fixing “稽” step-by-step till correct.

Benchmark Outcomes and Evaluations

Qwen-Picture-Edit leads enhancing benchmarks, scoring 7.56 total on GEdit-Bench-EN and seven.52 on CN, outperforming GPT Picture 1 (7.53 EN, 7.30 CN) and FLUX.1 Kontext [Pro] (6.56 EN, 1.23 CN). On ImgEdit, it achieves 4.27 total, excelling in duties like object substitute (4.66) and magnificence adjustments (4.81). Depth estimation yields 0.078 AbsRel on KITTI, aggressive with DepthAnything v2.

Human evaluations on AI Enviornment place its base mannequin third amongst APIs, with robust textual content rendering benefits. These metrics spotlight its superiority in instruction-following and multilingual constancy.

Deployment and Sensible Utilization

Qwen-Picture-Edit is deployable through Hugging Face Diffusers:

from diffusers import QwenImageEditPipeline
import torch
from PIL import Picture

pipeline = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Picture-Edit")
pipeline.to(torch.bfloat16).to("cuda")

picture = Picture.open("enter.png").convert("RGB")
immediate = "Change the rabbit's colour to purple, with a flash mild background."
output = pipeline(picture=picture, immediate=immediate, num_inference_steps=50, true_cfg_scale=4.0).photographs
output.save("output.png")

Alibaba Cloud’s Mannequin Studio gives API entry for scalable inference. Licensed underneath Apache 2.0, the GitHub repository gives coaching code.

Future Implications

Qwen-Picture-Edit advances vision-language interfaces, enabling seamless content material manipulation for creators. Its unified method to understanding and era suggests potential extensions to video and 3D, fostering revolutionary functions in AI-driven design.


Try the Technical Particulars, Fashions on Hugging Face and Attempt the Chat right here. Be happy to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be at liberty to observe us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Publication.


Screen Shot 2021 09 14 at 9.02.24 AM

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s traits at this time: learn extra, subscribe to our e-newsletter, and change into a part of the NextTech group at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

NVIDIA AI Releases Nemotron-Terminal: A Systematic Knowledge Engineering Pipeline for Scaling LLM Terminal Brokers

March 10, 2026

ByteDance Releases DeerFlow 2.0: An Open-Supply SuperAgent Harness that Orchestrates Sub-Brokers, Reminiscence, and Sandboxes to do Complicated Duties

March 10, 2026

The best way to Construct a Danger-Conscious AI Agent with Inner Critic, Self-Consistency Reasoning, and Uncertainty Estimation for Dependable Resolution-Making

March 10, 2026
Add A Comment
Leave A Reply Cancel Reply

Economy News

Alibaba Cloud to Construct Hyperscale Computing Heart in Shanghai’s Jinshan District

By NextTechMarch 11, 2026

Chinese language tech large Alibaba Cloud signed a strategic cooperation settlement with the federal government…

How Durham, North Carolina, kick-started reasonably priced housing improvement

March 11, 2026

TikTok permitted to maintain Canadian operations with new guidelines

March 11, 2026
Top Trending

Alibaba Cloud to Construct Hyperscale Computing Heart in Shanghai’s Jinshan District

By NextTechMarch 11, 2026

Chinese language tech large Alibaba Cloud signed a strategic cooperation settlement with…

How Durham, North Carolina, kick-started reasonably priced housing improvement

By NextTechMarch 11, 2026

A $95 million bond settlement in 2019 has led to a swell…

TikTok permitted to maintain Canadian operations with new guidelines

By NextTechMarch 11, 2026

In style short-form video app TikTok is as soon as once more…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!