BioVLA: A Bio-Inspired Vision-Language-Action Model for Robotic Manipulation

HUST Logo

1Huazhong University of Science and Technology
Last updated: October 11, 2025

Abstract

In embodied intelligence, visual perception possesses substantially higher information entropy than language, providing dense, continuous, and multi-level cues essential for understanding the physical world and guiding robotic action. While language offers symbolic abstraction, vision conveys rich spatial and functional priors that dominate perception-driven decision-making. To fully exploit this high-entropy modality, we propose BioVLA, a bio-inspired vision-language-action model designed to enhance robotic manipulation through neuro-inspired mechanisms. Drawing inspiration from the human visual cortex, BioVLA introduces a bio-inspired visual encoder that simulates region-specific neural responses to different visual cues, enabling multi-functional feature representations and adaptive re-weighting through function-aware re-response. This mechanism refines visual representations by emphasizing function-specific responses while maintaining balanced multi-cue integration. Furthermore, BioVLA incorporates a vision-guided action refinement module, which dynamically modulates the hidden states of the action decoder based on visual feedback, thereby preserving rich visual information throughout the perception–action transformation. Experiments conducted on the RoboTwin 2.0 platform demonstrate that BioVLA achieves significant performance improvements over existing state-of-the-art VLA models across diverse manipulation tasks.

Pipeline

Pipeline

The pipeline begins with a bio-inspired visual encoder that models region-specific responses analogous to the human visual cortex, producing multi-functional feature representations. These visual responses are further refined through a function-aware re-response and adaptive re-weighting mechanism. Finally, a vision-guided action refinement module dynamically modulates the hidden states of the action decoder, ensuring stable perception–action alignment and mitigating information loss in long-horizon reasoning.

Experiments

Task Success Comparison (Clean & Randomized)

Trained and evaluated on the RoboTwin 2.0 platform using the Aloha-AgileX embodiment with 50 demonstrations per task. The best-performing result in each row is highlighted in red, and the second-best in green.

Task Mode BioVLA RDT Pi0 ACT DP DP3 OpenVLA-OFT
Click Alarmclock Clean 79%61%63%32%61%77%75%
Randomized 18%12%11%4%5%14%19%
Dump Bin Bigbin Clean 93%64%83%68%49%85%22%
Randomized 25%32%24%1%0%53%15%
Hanging Mug Clean 31%23%11%7%8%17%10%
Randomized 10%16%3%0%0%1%4%
Open Laptop Clean 86%59%85%56%49%82%40%
Randomized 13%32%46%0%0%7%26%
Place Cans Plasticbox Clean 69%6%34%16%40%48%4%
Randomized 7%5%2%0%0%3%2%
Press Stapler Clean 88%41%62%17%6%69%14%
Randomized 40%24%29%6%0%3%12%
Put Bottles Dustbin Clean 61%21%54%27%22%60%33%
Randomized 25%4%13%1%0%21%13%
Shake Bottle Clean 98%74%97%74%65%98%22%
Randomized 47%45%60%10%8%19%21%
Turn Switch Clean 49%35%27%5%36%46%39%
Randomized 32%15%23%2%1%8%26%
Average Clean 70.7%43.2%57.0%33.1%36.0%63.8%25.8%
Randomized 23.2%19.0%22.9%6.2%3.3%13.8%13.9%

Additional tasks and results will be continuously updated.

Visualization

Qualitative visualization of BioVLA performing manipulation tasks.

⏰ Click Alarmclock

Click Alarmclock 1
Click Alarmclock 2
Click Alarmclock 3
Click Alarmclock 4
Click Alarmclock 5
Click Alarmclock 6

🧺 Dump Bin Bigbin

Dump Bin Bigbin 1
Dump Bin Bigbin 2
Dump Bin Bigbin 3
Dump Bin Bigbin 4
Dump Bin Bigbin 5
Dump Bin Bigbin 6

☕ Hanging Mug

Hanging Mug 1
Hanging Mug 2
Hanging Mug 3
Hanging Mug 4
Hanging Mug 5
Hanging Mug 6

💻 Open Laptop

Open Laptop 1
Open Laptop 2
Open Laptop 3
Open Laptop 4
Open Laptop 5
Open Laptop 6

🧃 Place Cans Plasticbox

Place Cans Plasticbox 1
Place Cans Plasticbox 2
Place Cans Plasticbox 3
Place Cans Plasticbox 4
Place Cans Plasticbox 5
Place Cans Plasticbox 6

📎 Press Stapler

Press Stapler 1
Press Stapler 2
Press Stapler 3
Press Stapler 4
Press Stapler 5
Press Stapler 6

🗑️ Put Bottles Dustbin

Put Bottles Dustbin 1
Put Bottles Dustbin 2
Put Bottles Dustbin 3
Put Bottles Dustbin 4
Put Bottles Dustbin 5
Put Bottles Dustbin 6

🧴 Shake Bottle

Shake Bottle 1
Shake Bottle 2
Shake Bottle 3
Shake Bottle 4
Shake Bottle 5
Shake Bottle 6

🔘 Turn Switch

Turn Switch 1
Turn Switch 2
Turn Switch 3
Turn Switch 4
Turn Switch 5
Turn Switch 6