Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

Ontario invests $250M to help medical isotopes

March 5, 2026

Stellaris Enterprise Companions is raring to again AI startups however conviction is vital: Companion Alok Goyal

March 5, 2026

MultiChoice to close down Showmax after 11 years

March 5, 2026
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • Ontario invests $250M to help medical isotopes
  • Stellaris Enterprise Companions is raring to again AI startups however conviction is vital: Companion Alok Goyal
  • MultiChoice to close down Showmax after 11 years
  • LEGO and Ferrari kick off F1 2026 Season in Melbourne with new Driver Helmet Units
  • Constructing belonging with Apple’s Cathy Kearney and Kristina Raspe
  • The Studio Show XDR Is Apple’s Boldest Show Improve Ever
  • Qwen {Hardware} Head: "One-Sentence Activity Completion" to Drive AI Glasses Demand
  • AI startup Intron expands speech recognition to 57 languages
Thursday, March 5
NextTech NewsNextTech News
Home - AI & Machine Learning - Qwen Crew Introduces Qwen-Picture-Edit: The Picture Enhancing Model of Qwen-Picture with Superior Capabilities for Semantic and Look Enhancing
AI & Machine Learning

Qwen Crew Introduces Qwen-Picture-Edit: The Picture Enhancing Model of Qwen-Picture with Superior Capabilities for Semantic and Look Enhancing

NextTechBy NextTechAugust 19, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Qwen Crew Introduces Qwen-Picture-Edit: The Picture Enhancing Model of Qwen-Picture with Superior Capabilities for Semantic and Look Enhancing
Share
Facebook Twitter LinkedIn Pinterest Email


Within the area of multimodal AI, instruction-based picture enhancing fashions are reworking how customers work together with visible content material. Simply launched in August 2025 by Alibaba’s Qwen Crew, Qwen-Picture-Edit builds on the 20B-parameter Qwen-Picture basis to ship superior enhancing capabilities. This mannequin excels in semantic enhancing (e.g., fashion switch and novel view synthesis) and look enhancing (e.g., exact object modifications), whereas preserving Qwen-Picture’s power in complicated textual content rendering for each English and Chinese language. Built-in with Qwen Chat and out there through Hugging Face, it lowers obstacles for skilled content material creation, from IP design to error correction in generated art work.

Structure and Key Improvements

Qwen-Picture-Edit extends the Multimodal Diffusion Transformer (MMDiT) structure of Qwen-Picture, which includes a Qwen2.5-VL multimodal giant language mannequin (MLLM) for textual content conditioning, a Variational AutoEncoder (VAE) for picture tokenization, and the MMDiT spine for joint modeling. For enhancing, it introduces twin encoding: the enter picture is processed by Qwen2.5-VL for high-level semantic options and the VAE for low-level reconstructive particulars, concatenated within the MMDiT’s picture stream. This permits balanced semantic coherence (e.g., sustaining object id throughout pose adjustments) and visible constancy (e.g., preserving unmodified areas).

The Multimodal Scalable RoPE (MSRoPE) positional encoding is augmented with a body dimension to distinguish pre- and post-edit photographs, supporting duties like text-image-to-image (TI2I) enhancing. The VAE, fine-tuned on text-rich information, achieves superior reconstruction with 33.42 PSNR on basic photographs and 36.63 on text-heavy ones, outperforming FLUX-VAE and SD-3.5-VAE. These enhancements permit Qwen-Picture-Edit to deal with bilingual textual content edits whereas retaining authentic font, measurement, and magnificence.

Key Options of Qwen-Picture-Edit

  • Semantic and Look Enhancing: Helps low-level visible look enhancing (e.g., including, eradicating, or modifying parts whereas maintaining different areas unchanged) and high-level visible semantic enhancing (e.g., IP creation, object rotation, and magnificence switch, permitting pixel adjustments with semantic consistency).
  • Exact Textual content Enhancing: Allows bilingual (Chinese language and English) textual content enhancing, together with direct addition, deletion, and modification of textual content in photographs, whereas preserving the unique font, measurement, and magnificence.
  • Robust Benchmark Efficiency: Achieves state-of-the-art outcomes on a number of public benchmarks for picture enhancing duties, positioning it as a strong basis mannequin for era and manipulation.

Coaching and Information Pipeline

Leveraging Qwen-Picture’s curated dataset of billions of image-text pairs throughout Nature (55%), Design (27%), Individuals (13%), and Artificial (5%) domains, Qwen-Picture-Edit employs a multi-task coaching paradigm unifying T2I, I2I, and TI2I goals. A seven-stage filtering pipeline refines information for high quality and steadiness, incorporating artificial textual content rendering methods (Pure, Compositional, Complicated) to deal with long-tail points in Chinese language characters.

Coaching makes use of circulate matching with a Producer-Client framework for scalability, adopted by supervised fine-tuning and reinforcement studying (DPO and GRPO) for choice alignment. For editing-specific duties, it integrates novel view synthesis and depth estimation, utilizing DepthPro as a trainer mannequin. This ends in strong efficiency, similar to correcting calligraphy errors by chained edits.

Screenshot 2025 08 18 at 4.06.26 PM 1

Superior Enhancing Capabilities

Qwen-Picture-Edit shines in semantic enhancing, enabling IP creation like producing MBTI-themed emojis from a mascot (e.g., Capybara) whereas preserving character consistency. It helps 180-degree novel view synthesis, rotating objects or scenes with excessive constancy, attaining 15.11 PSNR on GSO—surpassing specialised fashions like CRM. Fashion switch transforms portraits into creative types, similar to Studio Ghibli, sustaining semantic integrity.

For look enhancing, it provides parts like signboards with sensible reflections or removes high-quality particulars like hair strands with out altering environment. Bilingual textual content enhancing is exact: altering “Hope” to “Qwen” on posters or correcting Chinese language characters in calligraphy through bounding containers. Chained enhancing permits iterative corrections, e.g., fixing “稽” step-by-step till correct.

Benchmark Outcomes and Evaluations

Qwen-Picture-Edit leads enhancing benchmarks, scoring 7.56 total on GEdit-Bench-EN and seven.52 on CN, outperforming GPT Picture 1 (7.53 EN, 7.30 CN) and FLUX.1 Kontext [Pro] (6.56 EN, 1.23 CN). On ImgEdit, it achieves 4.27 total, excelling in duties like object substitute (4.66) and magnificence adjustments (4.81). Depth estimation yields 0.078 AbsRel on KITTI, aggressive with DepthAnything v2.

Human evaluations on AI Enviornment place its base mannequin third amongst APIs, with robust textual content rendering benefits. These metrics spotlight its superiority in instruction-following and multilingual constancy.

Deployment and Sensible Utilization

Qwen-Picture-Edit is deployable through Hugging Face Diffusers:

from diffusers import QwenImageEditPipeline
import torch
from PIL import Picture

pipeline = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Picture-Edit")
pipeline.to(torch.bfloat16).to("cuda")

picture = Picture.open("enter.png").convert("RGB")
immediate = "Change the rabbit's colour to purple, with a flash mild background."
output = pipeline(picture=picture, immediate=immediate, num_inference_steps=50, true_cfg_scale=4.0).photographs
output.save("output.png")

Alibaba Cloud’s Mannequin Studio gives API entry for scalable inference. Licensed underneath Apache 2.0, the GitHub repository gives coaching code.

Future Implications

Qwen-Picture-Edit advances vision-language interfaces, enabling seamless content material manipulation for creators. Its unified method to understanding and era suggests potential extensions to video and 3D, fostering revolutionary functions in AI-driven design.


Try the Technical Particulars, Fashions on Hugging Face and Attempt the Chat right here. Be happy to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be at liberty to observe us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Publication.


Screen Shot 2021 09 14 at 9.02.24 AM

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s traits at this time: learn extra, subscribe to our e-newsletter, and change into a part of the NextTech group at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

YuanLab AI Releases Yuan 3.0 Extremely: A Flagship Multimodal MoE Basis Mannequin, Constructed for Stronger Intelligence and Unequalled Effectivity

March 5, 2026

How one can Construct an EverMem-Fashion Persistent AI Agent OS with Hierarchical Reminiscence, FAISS Vector Retrieval, SQLite Storage, and Automated Reminiscence Consolidation

March 5, 2026

LangWatch Open Sources the Lacking Analysis Layer for AI Brokers to Allow Finish-to-Finish Tracing, Simulation, and Systematic Testing

March 4, 2026
Add A Comment
Leave A Reply Cancel Reply

Economy News

Ontario invests $250M to help medical isotopes

By NextTechMarch 5, 2026

KINCARDINE — The Ontario authorities is offering a provincial assure by way of the Indigenous…

Stellaris Enterprise Companions is raring to again AI startups however conviction is vital: Companion Alok Goyal

March 5, 2026

MultiChoice to close down Showmax after 11 years

March 5, 2026
Top Trending

Ontario invests $250M to help medical isotopes

By NextTechMarch 5, 2026

KINCARDINE — The Ontario authorities is offering a provincial assure by way…

Stellaris Enterprise Companions is raring to again AI startups however conviction is vital: Companion Alok Goyal

By NextTechMarch 5, 2026

Enterprise capital agency Stellaris Enterprise Companions is doubling down on AI and…

MultiChoice to close down Showmax after 11 years

By NextTechMarch 5, 2026

Canal+ will shut down Showmax, the African streaming platform run by its…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!