Unified Thinker

A General Reasoning Modular Core for Image Generation

Sashuai Zhou1,2* Qiang Zhou2* Jijin Hu2* Hanqing Yang2* Yue Cao3 Junpeng Ma4
Yinchao Ma2 Jun Song2† Tiezheng Ge2 Cheng Yu2 Bo Zheng2 Zhou Zhao1†
1Zhejiang University  •  2Alibaba Group  •  3Nanjing University  •  4Fudan University
Main Results

Unified Thinker bridges the reasoning-execution gap with modular upgrades.

Abstract

Despite impressive progress in high-fidelity image synthesis, generative models still struggle with logic-intensive instruction following, exposing a persistent reasoning-execution gap. We propose Unified Thinker, a task-agnostic reasoning architecture designed as a unified planning core that can plug into diverse generators and workflows. Unified Thinker decouples a dedicated Thinker from the image Generator, enabling modular upgrades of reasoning without retraining the entire model. We further introduce a two-stage training paradigm: first building a structured planning interface, then applying reinforcement learning to ground its policy in pixel-level feedback, ensuring plans optimize visual correctness over textual plausibility.

Methodology

Pipeline

Figure 1: Think-then-Execute. Structured planning for precise synthesis.

We decouple the reasoning process into a dedicated Thinker module. This allows the model to generate structured plans before the Generator begins the pixel-level synthesis.

BibTeX

@misc{zhou2026unifiedthinkergeneralreasoning,
  title={Unified Thinker: A General Reasoning Modular Core for Image Generation}, 
  author={Sashuai Zhou and Qiang Zhou and Jijin Hu and Hanqing Yang and Yue Cao and Junpeng Ma and Yinchao Ma and Jun Song and Tiezheng Ge and Cheng Yu and Bo Zheng and Zhou Zhao},
  year={2026},
  eprint={2601.03127},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2601.03127}, 
}