GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

Tao Liu*, Chongyu Wang*, Rongjie Li, Yingchen Yu, Bai Song, Xuming He

October 2025

GUI-Rise agent framework overview. It introduces a three-subtask framework that integrates structured reasoning, action prediction, and history summarization. At each step, the agent performs structured reasoning (progress estimation and decision analysis), predicts the next GUI action, and updates a compact history summary for the next iteration.

Abstract

Multimodal Large Language Models (MLLMs) have advanced GUI navigation agents, but current methods lack cross-domain generalization and effective history utilization. We propose GUI-Rise, a reasoning-enhanced framework integrating structured reasoning, action prediction, and history summarization. Trained via supervised fine-tuning on pseudo-labeled trajectories and GRPO reinforcement learning, it uses history-aware rewards to link summary quality with action performance. Evaluations show state-of-the-art results on standard benchmarks, with strong out-of-domain generalization, validating robust reasoning across diverse GUI navigation tasks.

Type

Conference paper

Publication

In Proceedings of the Neural Information Processing Systems (NeurIPS), 2025