About This Project
XBai o4 is trained based on our proposed reflective generative form, which combines “Long-CoT Reinforcement Learning” and “Process Reward Learning” into a unified training form. This form enables a single model to simultaneously achieve deep reasoning and high-quality reasoning trajectory selection.
Tags
Reviews & Ratings
Share your experience
User Reviews (0)
No reviews yet. Be the first to share your experience!