On January 14, Zhipu AI introduced the open-source launch of GLM-Picture, a next-generation picture technology mannequin developed in collaboration with Huawei. The mannequin was educated finish to finish—from information processing to last coaching—on Ascend Atlas 800T A2 {hardware} utilizing Huawei’s MindSpore AI framework, making it the primary state-of-the-art (SOTA) multimodal mannequin totally educated on Chinese language-made chips.
Lower than 24 hours after its open-source launch, GLM-Picture climbed to No.1 on the Hugging Face Trending leaderboard, the world’s largest open-source AI group. This additionally marks the primary time a domestically educated Chinese language mannequin, relying solely on native computing {hardware}, has reached the highest place on a significant worldwide AI platform.

In keeping with Zhipu AI, the last word purpose of the GLM-Picture venture is full-stack innovation. The mannequin represents the GLM group’s exploration of a brand new technology of “cognitive generative” AI paradigms, exemplified by applied sciences resembling Nano Banana Professional.
From an architectural perspective, GLM-Picture diverges from the generally used Latent Diffusion Mannequin (LDM) method in open-source picture technology. As an alternative, it adopts a hybrid structure combining autoregressive modeling with a diffusion decoder. Whereas remaining broadly aligned with mainstream options, this design has demonstrated superior efficiency in knowledge-intensive technology duties.
From a coaching and infrastructure standpoint, GLM-Picture achieves full coaching and inference compatibility on Ascend Atlas 800T A2 {hardware} and the MindSpore framework. Actual-world coaching efficiency reaches the theoretical efficiency ceiling of the underlying compute {hardware}, validating the feasibility of coaching SOTA-level fashions solely on home AI computing platforms.
By way of benchmark efficiency, GLM-Picture ranks first amongst open-source fashions on each CVTG-2K (complicated visible textual content technology) and LongText-Bench (long-text rendering). The mannequin demonstrates sturdy instruction-following capabilities, correct textual content technology, and explicit power in Chinese language character rendering, making it nicely fitted to posters, shows, instructional illustrations, and different knowledge-intensive visible purposes.

Open-source hyperlinks:
-
GitHub: https://github.com/zai-org/GLM-Picture
-
Hugging Face: https://huggingface.co/zai-org/GLM-Picture
Supply: IT House
Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s tendencies right now: learn extra, subscribe to our publication, and turn out to be a part of the NextTech group at NextTech-news.com

