REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

The Hong Kong University of Science and Technology (Guangzhou)
*Corresponding author

Abstract

Extreme legged parkour demands rapid terrain assessment and precise foot placement under highly dynamic conditions. While recent learning-based systems achieve impressive agility, they remain fundamentally fragile to perceptual degradation, where even brief visual noise or latency can cause catastrophic failure. To overcome this, we propose Robust Extreme Agility Learning (REAL), an end-to-end framework for reliable parkour under sensory corruption. Instead of relying on perfectly clean perception, REAL tightly couples vision, proprioceptive history, and temporal memory. We distill a cross-modal teacher policy into a deployable student equipped with a FiLM-modulated Mamba backbone to actively filter visual noise and build short-term terrain memory. Furthermore, a physics-guided Bayesian state estimator enforces rigid-body consistency during high-impact maneuvers. Validated on a Unitree Go2 quadruped, REAL successfully traverses extreme obstacles even with a 1-meter visual blind zone, while strictly satisfying real-time control constraints with a bounded 13.1 ms inference time.

Motivation

Extreme legged parkour seeks to endow quadrupedal robots with the ability to execute highly agile maneuvers across discontinuous, cluttered, and dynamically challenging terrains. Such tasks require rapid terrain assessment, gait transitions, precise foot placement, and continuous balance regulation — all under strict timing and torque constraints.

Despite recent advances, existing methods remain highly vulnerable to perceptual degradation. Extreme parkour entails impacts, rapid rotations, flight phases, and motion blur, where even brief visual corruption can cause catastrophic failure. Most approaches treat perception as a direct feedforward input to control, without modeling observation uncertainty, exploiting temporal memory, or enforcing physics consistency.

REAL framework teaser: robot performing parkour with nominal vision and under severe visual degradation
Robust extreme parkour with the proposed REAL framework. The robot successfully chains highly dynamic maneuvers across complex terrains with nominal vision (green box), and maintains stable locomotion even under severe visual degradation (red box).

Method

REAL is an end-to-end policy learning framework that tightly integrates vision, proprioception, and temporal memory within a unified spatio-temporal architecture. The framework adopts a two-stage training paradigm:

System architecture of REAL
System architecture of REAL. Stage 1 trains a privileged teacher policy via Proprioception-Terrain Associated Reasoning. Stage 2 distills a deployable student policy using an onboard Mamba-FiLM spatial-temporal backbone and physics-guided filtering, stabilized by a consistency-aware loss gating strategy.

Spatio-Temporal Policy Learning

The privileged teacher leverages a cross-modal attention mechanism to establish structured proprioception–terrain reasoning. Terrain features are retrieved and aggregated conditioned on the robot’s proprioceptive state via:

Attention(Q,K,V)=softmax ⁣(QKdk)V\text{Attention}(Q, K, V) = \text{softmax}\!\left(\frac{QK^\top}{\sqrt{d_k}}\right)V

The student policy integrates two core modules:

FiLM(FCNN)=γ(pt)FCNN+β(pt)\text{FiLM}(F_{\text{CNN}}) = \gamma(p_t) \odot F_{\text{CNN}} + \beta(p_t) ht=Atht1+Btxt,yt=Cthth_t = A_t h_{t-1} + B_t x_t, \quad y_t = C_t h_t

Physics-Guided Filtering

An uncertainty-aware neural velocity predictor is fused with rigid-body dynamics through an Extended Kalman Filter (EKF). The learned predictor provides adaptive uncertainty estimates, while the dynamics model enforces physical constraints.

The Kalman gain adapts automatically: when the neural predictor reports high uncertainty, correction magnitude decreases. Confident predictions exert stronger influence, ensuring stable velocity tracking under impacts, slippage, and partial sensor degradation.

Consistency-Aware Loss Gating

An adaptive gating coefficient λ\lambda dynamically balances behavioral cloning (BC) and reinforcement learning (RL):

λ=σ ⁣(k(τaSaT2))\lambda = \sigma\!\left(k \cdot (\tau - \|a_S - a_T\|_2)\right) Ltotal=λLRL+(1λ)LBC\mathcal{L}_{\text{total}} = \lambda \mathcal{L}_{\text{RL}} + (1 - \lambda) \mathcal{L}_{\text{BC}}

When action discrepancy is large, imitation learning dominates for stability. As the student aligns with the teacher, the objective shifts toward RL for robustness.

Experiments

Extreme Terrain Traversability

Snapshots of REAL policy executing dynamic maneuvers across extreme terrains
Snapshots of the REAL policy executing dynamic manoeuvres across extreme terrains.

REAL achieves a high overall success rate, effectively doubling the performance of prior vision-only baselines across hurdles, steps, and gaps.

MethodHurdles SRSteps SRGaps SROverall SROverall MXDMEV ↓
Extreme Parkour0.180.140.100.160.2134.24
RPL0.050.040.030.040.101.56
SoloParkour0.420.490.360.390.3496.93
REAL (Ours)0.820.940.280.780.4518.41

Robustness Against Perceptual Degradation

Under severe visual degradation conditions — frame drops, Gaussian noise, and spatial FoV occlusion — REAL demonstrates exceptional resilience:

MethodNominal SRFrame Drop SRGaussian Noise SRFoV Occlusion SR
Extreme Parkour0.160.16 (↓0.00)0.11 (↓0.05)0.13 (↓0.03)
RPL0.040.01 (↓0.04)0.01 (↓0.03)0.01 (↓0.03)
SoloParkour0.390.20 (↓0.19)0.37 (↓0.03)0.41 (↑0.02)
REAL (Ours)0.780.61 (↓0.17)0.51 (↓0.27)0.72 (↓0.06)

Blind-Zone Maneuvers

Simulation results on complex parkour terrains with a 1m vision-masked blind zone
Simulation results on complex parkour terrains featuring a 1 m vision-masked blind zone. REAL successfully traverses all terrains while maintaining kinematic stability. Compared with Extreme Parkour baseline (top row).

With vision masked 1 m before obstacles, standard baselines suffer catastrophic forgetting and immediate failure. REAL leverages historical multi-modal data to implicitly track the terrain:

MethodSR ↑MXD ↑MEV ↓Time ↓Coll. ↓
Extreme Parkour0.110.2044.030.150.29
RPL0.000.030.350.000.04
SoloParkour0.360.34103.500.060.09
REAL (Ours)0.550.3924.840.030.08

Real-World Deployment

Zero-shot sim-to-real transfer of REAL on a Unitree Go2 quadruped
Zero-shot sim-to-real transfer of the REAL policy on a physical Unitree Go2 quadruped. Using only onboard perception and computing, the robot completes various real-world obstacle courses: (a) leaping onto a high platform, (b) moving through scattered boxes, and (c) climbing a steep staircase.

REAL achieves robust zero-shot sim-to-real transfer, completing diverse real-world obstacle courses with only onboard perception and computing. The policy is deployed via a custom C++ framework, with inference optimized using ONNX.

Real-world extreme blind test. Left: Baseline fails immediately upon losing visual input. Right: REAL utilizes proprioceptive history to maintain environmental memory, enabling robust blind traversal across unstructured obstacles.

Blind Zone Video Comparison

Baseline Results - Visual Degradation Test

BASELINE METHODS

Flat Terrain

Fails immediately without visual input

Hurdle Terrain

Severe performance degradation

Step Terrain

Struggles without visual guidance

REAL Results - Robust Blind Navigation

OUR METHOD (REAL)

Flat Terrain

Maintains stable locomotion

Hurdle Terrain

Successfully navigates obstacles

Step Terrain

Robust performance with physics guidance

Real-Time Performance

Onboard inference latency comparison between REAL (Mamba) and Transformer baseline
Onboard inference latency measured over 1,000 continuous control steps. The REAL policy with Mamba backbone maintains a highly predictable execution time of O(1)O(1) (~13.1 ms/step), strictly satisfying the 20 ms real-time control budget. The Transformer baseline averages 23.07 ms and violates the constraint.

Ablation Study

Component Ablation

MethodSR ↑MXD ↑MEV ↓Time ↓Coll. ↓
REAL (Ours)0.780.4518.410.020.06
REAL (w/ MLP Est.)0.730.4319.340.020.06
REAL (w/o FiLM)0.440.5193.430.280.06
REAL (w/o Mamba)0.510.4789.960.260.05

Velocity Estimation

Estimator ArchitectureRMSE ↓
MLP (Baseline)0.52
MLP + EKF0.40
1D ResNet (Single frame)0.33
1D ResNet (10 frames)0.28
1D ResNet + EKF (Ours)0.23

Training Convergence

Training convergence comparison
Training convergence of the depth actor. The consistency-aware loss gating accelerates imitation learning and achieves a lower final loss compared to a fixed-weight baseline.

BibTeX Citation

@article{real2026,
title = {REAL: Robust Extreme Agility via Spatio-Temporal
Policy Learning and Physics-Guided Filtering},
author = {Jialong Liu, Dehan Shen, Yanbo Wen,
Zeyu Jiang and Changhao Chen},
year = {2026}
}