DeepSeekMath Meets Order Book: Group-Aware Policy Optimization for High-Frequency Directional Trading

arXiv:2605.25527v1 Announce Type: new Abstract: This paper studies reinforcement learning for high-frequency trading on limit order books by pairing an Order-Flow-based state model with policy-gradient methods. Instead of value-based RL techniques like tabular Q-learning, our approach deploys policy-based methods like vanilla PPO and DeepSeekMath-inspired variants like GRPO and GSPO, that use group-normalized updates and downside-aware shaping. On backtests with financial assets AMZN, AAPL, and GOOG under a simplified backtesting setup based on spread-scaled rewards, these new policies improve
The rapid advancement in Large Language Models (LLMs) and reinforcement learning techniques is enabling more sophisticated applications in complex domains like high-frequency trading.
This development indicates a growing capability for AI to autonomously manage high-stakes financial operations, potentially altering market microstructure and investment strategies.
AI models are moving beyond traditional analytical tools to become active, policy-driven agents in real-time financial markets, directly influencing trading decisions and execution.
- · Hedge funds with advanced AI capabilities
- · Quantitative trading firms
- · AI-driven trading platform providers
- · Deep learning researchers
- · Traditional discretionary traders
- · Retail investors without advanced tools
- · Trading firms slow to adopt AI
More efficient and faster execution of high-frequency trading strategies, potentially increasing market liquidity fragmentation.
Increased volatility and flash crashes due to complex, opaque AI interactions, requiring new regulatory frameworks.
The development of 'AI versus AI' trading wars, leading to an arms race in financial AI and potentially systemic risks if models diverge unexpectedly.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG