Journey

A timeline outlining my academic and research trajectory, key milestones, and the progression of my research interests in robotics and embodied AI.

Academic Timeline

Key milestones in my academic and research journey

🎓

2025

Research Assistant

Embodied AI Research

After giving up my guaranteed admission to a graduate program in Astronautics, I am currently working as a Research Assistant at SJTU MINT Lab on Embodied Intelligence (VLA).

PhDEmbodied AIRobotics

🔬

2024

First Bold Attempt

Guangzhou Xiangrui Technology Innovation Co., Ltd.

Building on my prior hands-on experience with UAVs and robotics at university, I once attempted to co-found a startup with classmates, although it ultimately did not succeed.

Master'sComputer VisionDeep Learning

💡

2023

First Research Internship

Intelligent Robotics Research Center, NWPU

Special thanks to Prof. Ma for introducing me to Robot & 3D research and for his strong support at the beginning of my research.

ResearchAIPublications

🎯

2021

First National-Level Award

YOLO, STM32, Qt

Participated in a national-level disciplinary competition for the first time and achieved 4th place among 88 finalist teams in the national finals, receiving the National First Prize.

Bachelor'sEngineeringHonors

🌱

2020

First Research Project

Aviation Science and Technology Innovation Base

After a half-year evaluation, I was selected to join NWPU AMT, where I worked with the team to bring our self-developed unmanned aerial vehicles into flight.

ResearchMachine LearningDiscovery

Research Evolution

How my research interests have evolved over time

Early Stage

Hardware & Control

Started from hardware and electronics fundamentals, studying control theory, robotics, and microcontroller systems, building a strong theoretical foundation.

Growth Phase

3DV & Robotics

Transitioned to deep learning and computer vision, working on 6D pose estimation, digital twins, and real-world robot control.

Current Focus

Embodied AI

Now focused on Embodied AI, integrating 3D vision, control theory, and physical modalities to enable robotic systems to interact with the physical world more smoothly.

Reflections

Lessons learned and insights gained along the way

On the “V” Modality in VLA

"VLA inherits from VLM, whose vision capabilities are largely derived from understanding static images."

We may replace the vision encoder with a video encoder, or enhance contextual understanding in VLA to strengthen visual capabilities.

On the “L” Modality in VLA

"The language modality is a minor component of the VLA training data and provides negligible conditioning for downstream tasks."

We may boost the “L” modality via pseudo-labeling, or rethink the roles of “V” and “L” in action control beyond the VLM framework.

On world-model based methods

"We can enhance robotic action capabilities through future prediction, although such predictions are costly and not always accurate."

WM can be used solely as a reward model for RL in post-training, or as a high-level model providing low-frequency guidance. Occasional prediction errors are acceptable; the key is to efficiently extract useful supervisory signals from its outputs.

Interested in Collaboration?

I'm always open to discussing research ideas, potential collaborations, or just having a chat about AI and robotics.

Get in Touch Read My Blog