REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Jialong Liu

Dehan Shen

Yanbo Wen

Zeyu Jiang

Changhao Chen ^*

The Hong Kong University of Science and Technology (Guangzhou)

^*Corresponding author

Paper Code arXiv

Abstract

Extreme legged parkour demands rapid terrain assessment and precise foot placement under highly dynamic conditions. While recent learning-based systems achieve impressive agility, they remain fundamentally fragile to perceptual degradation, where even brief visual noise or latency can cause catastrophic failure. To overcome this, we propose Robust Extreme Agility Learning (REAL), an end-to-end framework for reliable parkour under sensory corruption. Instead of relying on perfectly clean perception, REAL tightly couples vision, proprioceptive history, and temporal memory. We distill a cross-modal teacher policy into a deployable student equipped with a FiLM-modulated Mamba backbone to actively filter visual noise and build short-term terrain memory. Furthermore, a physics-guided Bayesian state estimator enforces rigid-body consistency during high-impact maneuvers. Validated on a Unitree Go2 quadruped, REAL successfully traverses extreme obstacles even with a 1-meter visual blind zone, while strictly satisfying real-time control constraints with a bounded 13.1 ms inference time.

Motivation

Extreme legged parkour seeks to endow quadrupedal robots with the ability to execute highly agile maneuvers across discontinuous, cluttered, and dynamically challenging terrains. Such tasks require rapid terrain assessment, gait transitions, precise foot placement, and continuous balance regulation — all under strict timing and torque constraints.

Despite recent advances, existing methods remain highly vulnerable to perceptual degradation. Extreme parkour entails impacts, rapid rotations, flight phases, and motion blur, where even brief visual corruption can cause catastrophic failure. Most approaches treat perception as a direct feedforward input to control, without modeling observation uncertainty, exploiting temporal memory, or enforcing physics consistency.

REAL framework teaser: robot performing parkour with nominal vision and under severe visual degradation — Robust extreme parkour with the proposed REAL framework. The robot successfully chains highly dynamic maneuvers across complex terrains with nominal vision (green box), and maintains stable locomotion even under severe visual degradation (red box).

Method

REAL is an end-to-end policy learning framework that tightly integrates vision, proprioception, and temporal memory within a unified spatio-temporal architecture. The framework adopts a two-stage training paradigm:

Stage 1 — Privileged Teacher Policy Learning: A privileged teacher policy is trained via reinforcement learning (RL) in simulation with access to proprioceptive observations, terrain scan points, and additional privileged information.
Stage 2 — Student Policy Distillation: The teacher policy is distilled into a deployable student policy that operates using only onboard depth sensing and proprioception.

System architecture of REAL. **Stage 1** trains a privileged teacher policy via Proprioception-Terrain Associated Reasoning. **Stage 2** distills a deployable student policy using an onboard Mamba-FiLM spatial-temporal backbone and physics-guided filtering, stabilized by a consistency-aware loss gating strategy.

Spatio-Temporal Policy Learning

The privileged teacher leverages a cross-modal attention mechanism to establish structured proprioception–terrain reasoning. Terrain features are retrieved and aggregated conditioned on the robot’s proprioceptive state via:

\text{Attention}(Q, K, V) = \text{softmax}\!\left(\frac{QK^\top}{\sqrt{d_k}}\right)V

The student policy integrates two core modules:

FiLM (Feature-wise Linear Modulation): Proprioception dynamically modulates visual features to suppress unreliable signals during impacts or rapid rotations:

\text{FiLM}(F_{\text{CNN}}) = \gamma(p_t) \odot F_{\text{CNN}} + \beta(p_t)

Mamba Temporal Backbone: Provides efficient sequence modeling with linear-time $O(1)$ inference complexity, maintaining long-horizon terrain memory when exteroceptive input degrades:

h_t = A_t h_{t-1} + B_t x_t, \quad y_t = C_t h_t

Physics-Guided Filtering

An uncertainty-aware neural velocity predictor is fused with rigid-body dynamics through an Extended Kalman Filter (EKF). The learned predictor provides adaptive uncertainty estimates, while the dynamics model enforces physical constraints.

The Kalman gain adapts automatically: when the neural predictor reports high uncertainty, correction magnitude decreases. Confident predictions exert stronger influence, ensuring stable velocity tracking under impacts, slippage, and partial sensor degradation.

Consistency-Aware Loss Gating

An adaptive gating coefficient $\lambda$ dynamically balances behavioral cloning (BC) and reinforcement learning (RL):

\lambda = \sigma\!\left(k \cdot (\tau - \|a_S - a_T\|_2)\right)

\mathcal{L}_{\text{total}} = \lambda \mathcal{L}_{\text{RL}} + (1 - \lambda) \mathcal{L}_{\text{BC}}

When action discrepancy is large, imitation learning dominates for stability. As the student aligns with the teacher, the objective shifts toward RL for robustness.

Experiments

Extreme Terrain Traversability

Snapshots of REAL policy executing dynamic maneuvers across extreme terrains — Snapshots of the REAL policy executing dynamic manoeuvres across extreme terrains.

REAL achieves a high overall success rate, effectively doubling the performance of prior vision-only baselines across hurdles, steps, and gaps.

Method	Hurdles SR	Steps SR	Gaps SR	Overall SR	Overall MXD	MEV ↓
Extreme Parkour	0.18	0.14	0.10	0.16	0.21	34.24
RPL	0.05	0.04	0.03	0.04	0.10	1.56
SoloParkour	0.42	0.49	0.36	0.39	0.34	96.93
REAL (Ours)	0.82	0.94	0.28	0.78	0.45	18.41

Robustness Against Perceptual Degradation

Under severe visual degradation conditions — frame drops, Gaussian noise, and spatial FoV occlusion — REAL demonstrates exceptional resilience:

Method	Nominal SR	Frame Drop SR	Gaussian Noise SR	FoV Occlusion SR
Extreme Parkour	0.16	0.16 (↓0.00)	0.11 (↓0.05)	0.13 (↓0.03)
RPL	0.04	0.01 (↓0.04)	0.01 (↓0.03)	0.01 (↓0.03)
SoloParkour	0.39	0.20 (↓0.19)	0.37 (↓0.03)	0.41 (↑0.02)
REAL (Ours)	0.78	0.61 (↓0.17)	0.51 (↓0.27)	0.72 (↓0.06)

Simulation results on complex parkour terrains with a 1m vision-masked blind zone — Simulation results on complex parkour terrains featuring a 1 m vision-masked blind zone. REAL successfully traverses all terrains while maintaining kinematic stability. Compared with Extreme Parkour baseline (top row).

With vision masked 1 m before obstacles, standard baselines suffer catastrophic forgetting and immediate failure. REAL leverages historical multi-modal data to implicitly track the terrain:

Method	SR ↑	MXD ↑	MEV ↓	Time ↓	Coll. ↓
Extreme Parkour	0.11	0.20	44.03	0.15	0.29
RPL	0.00	0.03	0.35	0.00	0.04
SoloParkour	0.36	0.34	103.50	0.06	0.09
REAL (Ours)	0.55	0.39	24.84	0.03	0.08

Real-World Deployment

Zero-shot sim-to-real transfer of REAL on a Unitree Go2 quadruped — Zero-shot sim-to-real transfer of the REAL policy on a physical Unitree Go2 quadruped. Using only onboard perception and computing, the robot completes various real-world obstacle courses: (a) leaping onto a high platform, (b) moving through scattered boxes, and (c) climbing a steep staircase.

REAL achieves robust zero-shot sim-to-real transfer, completing diverse real-world obstacle courses with only onboard perception and computing. The policy is deployed via a custom C++ framework, with inference optimized using ONNX.

Real-world extreme blind test. Left: Baseline fails immediately upon losing visual input. Right: REAL utilizes proprioceptive history to maintain environmental memory, enabling robust blind traversal across unstructured obstacles.

Baseline Results - Visual Degradation Test

BASELINE METHODS

Flat Terrain

Fails immediately without visual input

Hurdle Terrain

Severe performance degradation

Step Terrain

Struggles without visual guidance

OUR METHOD (REAL)

Flat Terrain

Maintains stable locomotion

Hurdle Terrain

Successfully navigates obstacles

Step Terrain

Robust performance with physics guidance

Real-Time Performance

Onboard inference latency comparison between REAL (Mamba) and Transformer baseline — Onboard inference latency measured over 1,000 continuous control steps. The REAL policy with Mamba backbone maintains a highly predictable execution time of $O(1)$ (~13.1 ms/step), strictly satisfying the 20 ms real-time control budget. The Transformer baseline averages 23.07 ms and violates the constraint.

Ablation Study

Component Ablation

Method	SR ↑	MXD ↑	MEV ↓	Time ↓	Coll. ↓
REAL (Ours)	0.78	0.45	18.41	0.02	0.06
REAL (w/ MLP Est.)	0.73	0.43	19.34	0.02	0.06
REAL (w/o FiLM)	0.44	0.51	93.43	0.28	0.06
REAL (w/o Mamba)	0.51	0.47	89.96	0.26	0.05

Without Mamba: Success rate plummets and Mean Edge Violations increase nearly fivefold, confirming that Mamba’s long-term sequence modeling is essential for tracking historical terrain features.
Without FiLM: Success rate drops to 44%. Without dynamic sensory gating, the policy treats noisy visual inputs as ground truth, leading to frequent collisions.

Velocity Estimation

Estimator Architecture	RMSE ↓
MLP (Baseline)	0.52
MLP + EKF	0.40
1D ResNet (Single frame)	0.33
1D ResNet (10 frames)	0.28
1D ResNet + EKF (Ours)	0.23

Training Convergence

BibTeX Citation

@article{real2026,
  title   = {REAL: Robust Extreme Agility via Spatio-Temporal
             Policy Learning and Physics-Guided Filtering},
  author  = {Jialong Liu, Dehan Shen, Yanbo Wen,
             Zeyu Jiang and Changhao Chen},
  year    = {2026}
}

REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Abstract

Motivation

Method

Spatio-Temporal Policy Learning

Physics-Guided Filtering

Consistency-Aware Loss Gating

Experiments

Extreme Terrain Traversability

Robustness Against Perceptual Degradation

Blind-Zone Maneuvers

Real-World Deployment

Blind Zone Video Comparison

Baseline Results - Visual Degradation Test

Flat Terrain

Hurdle Terrain

Step Terrain

REAL Results - Robust Blind Navigation

Flat Terrain

Hurdle Terrain

Step Terrain

Real-Time Performance

Ablation Study

Component Ablation

Velocity Estimation

Training Convergence

BibTeX Citation