Open Source

XBai-o4

[ICLR2026] Test-Time Scaling with Reflective Generative Model.

Source: GitHub Pricing: Open Source
💻 View Code

About This Project

XBai o4 is trained based on our proposed reflective generative form, which combines “Long-CoT Reinforcement Learning” and “Process Reward Learning” into a unified training form. This form enables a single model to simultaneously achieve deep reasoning and high-quality reasoning trajectory selection.

Tags

LLM Machine Learning reinforcement-learning

Reviews & Ratings

Share your experience

User Reviews (0)

No reviews yet. Be the first to share your experience!