GymVerse Logo GymVerse: How Far Are We from Fully Automated Environment Scaling for Self-Evolving Agents?

1Institute of Automation, Chinese Academy of Sciences, 2School of Artificial Intelligence, University of Chinese Academy of Sciences
*Equal Contribution

“Welcome to the Era of Experience.”

— David Silver and Richard S. Sutton

GymVerse Overview

Overview of the automated environment synthesis pipeline. (a) Environment synthesis workflow with complexity control; (b) Key functions defining the synthesized environment program; (c) Interaction loop between the agent and the synthesized environment.

Abstract

As the learning paradigm transitions from static data to interaction-driven experience, environments play a central role in enabling agents to learn, adapt, and evolve continuously through interaction. Yet the high cost and limited scalability of manually constructed environments pose a fundamental bottleneck to experience learning for agents. Therefore, automatically scaling environments is a necessary step toward the era of experience.

In this paper, we investigate how far we are from fully automated environment scaling for self-evolving agents. Specifically, our contributions are threefold: (1) We propose an automated environment synthesis workflow with explicit control over environment complexity, enabling environments to adapt to the agent’s evolving capabilities; (2) We introduce a principled evaluation framework to assess synthesized environments along the dimensions of correctness, difficulty, and diversity; (3) We conduct a systematic study and find that environment properties such as scale, complexity, correctness, and feedback design play a critical role in agent learning.

To this end, we introduce GymVerse, a comprehensive framework for environment synthesis, evaluation, and agent training. We further propose a simple yet effective reinforcement learning algorithm (PERPO) to support stable training on synthesized and evolving environments.

Extensive experiments on GymVerse demonstrate that training on synthesized environments enables effective generalization to unseen environments.

🔍 Research Questions

RQ I How can environments be automatically synthesized and continuously evolved?

RQ II What dimensions are essential for evaluating synthesized environments?

RQ III Which environment properties most strongly influence agent learning?

🧠 Method

📊 Experiments

Conclusion

📏 We introduce GymVerse, a comprehensive framework for environment synthesis, evaluation, and agent training.

💥GymVerse supports the automatic generation of diverse environments across multiple domains, together with a multi-stage correctness filtering pipeline that ensures environment reliability for downstream training and evaluation.

🧠Building on GymVerse, we systematically investigate how key environment properties, including environment scale, complexity evolution, correctness, and feedback design, affect agent learning dynamics and generalization.

🚀In particular, training Qwen3-4B-Instruct on only 16 synthesized environments yields consistent and substantial generalization gains across both multi-turn and single-turn benchmarks, demonstrating robustness to unseen environments.

Supplementary

BibTeX

[BibTeX placeholder]