跳过正文

Arxiv

LDA-1B: Scaling Latent Dynamics Action Model via Universal Embodied Data Ingestion
Jiangran Lyu*
Kai Liu*
Xuheng Zhang*
Haoran Liao
Yusen Feng
Wenxuan Zhu
Tingrui Shen
Jiayi Chen
Jiazhao Zhang
Yifei Dong
Wenbo Cui
Senmao Qi
Shuo Wang
Yixin Zheng
Mi Yan
Xuesong Shi
Haoran Li
Dongbin Zhao
Ming-Yu Liu
Zhizheng Zhang
Li Yi
Yizhou Wang
He Wang
Arxiv Github RSS 在投
Recent robot foundation models largely rely on large-scale behavior cloning, which imitates expert …
NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions
Haolin Yang*
Yuxing Long*
Zhuoyuan Yu
Zihan Yang
Minghan Wang
Jiapeng Xu
Yihan Wang
Ziyan Yu
Wenzhe Cai
Lei Kang
Hao Dong1,2,‡
Arxiv ICRA
Instruction-following navigation is a key step toward embodied intelligence.
Neural Force Field: Few shot learning of generalized physical reasoning
Shiqian Li
Ruihong Shen
Yaoyu Tao
Chi Zhang
Yixin Zhu
Arxiv Github ICLR
We present NFF, a modeling framework built on NODE that learns interpretable force field …
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use
Zijian Wu
Xiangyan Liu
Xinyuan Zhang
Lingjun Chen
Fanqing Meng
Lingxiao Du
Yiran Zhao
Fanshi Zhang
Yaoqi Ye
Jiawei Wang
Zirui Wang
Jinjie Ni
Yufan Yang
Arvin Xu
Michael Qizhe Shieh
Arxiv Github ICLR 2026
The MCP standardizes how LLMs interact with external systems, forming the foundation for general …
Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields
Shiqian Li
Ruihong Shen
Junfeng Ni
Chang Pan
Chi Zhang
Yixin Zhu
Arxiv Github ICLR
Predicting physical dynamics from visual data remains a fundamental challenge in AI, as it requires …
Luminark: Training-free, Probabilistically-Certified Watermarking for General Vision Generative Models
Jiayi Xu
Zhang Zhang
Yuanrui Zhang
Ruitao Chen
Yixian Xu
Tianyu He
Di He
Arxiv
In this paper, we introduce \emph{Luminark}, a training-free and probabilistically-certified …
CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model
Zhuoyuan Yu*
Yuxing Long*
Zihan Yang
Chengyan Zeng
Hongwei Fan
Jiyao Zhang
Hao Dong†
Arxiv AAAI CCF A
Existing vision-and-language navigation models often deviate from the correct trajectory when …
SimLauncher: Launching Sample-Efficient Real-world Robotic Reinforcement Learning via Simulation Pre-training
Mingdong Wu
Lehong Wu
Yizhuo Wu
Weiyao Huang
Hongwei Fan
Zheyuan Hu
Haoran Geng
Jinzhou Li
Jiahe Ying
Long Yang
Yuanpei Chen
Hao Dong
Arxiv IROS 2025
Autonomous learning of dexterous, long-horizon robotic skills has been a longstanding pursuit of …
Playing with Transformer at 30+ FPS via Next-Frame Diffusion
Xinle Cheng
Tianyu He†
Jiayi Xu
Junliang Guo
Di He
Jiang Bian
Arxiv NeurIPS 2025
In this work, we present Next-Frame Diffusion (NFD), an autoregressive diffusion transformer that …
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation
Yiming Qin
Zhu Xu
Yang Liu†
Arxiv CVPR 2025
OmniPhysGS: 3D Constitutive Gaussians for General Physics-based Dynamics Generation
Yuchen Lin
Chenguo Lin†
Jianjin Xu
Yadong Mu‡
Arxiv ICLR 2025
ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning
Xiangru Tang*
Tianyu Hu*
Muyang Ye*
Yanjun Shao*
Xunjian Yin
Siru Ouyang
Wangchunshu Zhou
Pan Lu
Zhuosheng Zhang
Yilun Zhao
Arman Cohan
Mark Gerstein
Arxiv ICLR 2025
We present ChemAgent, a novel framework designed to improve the performance of LLMs through a …
Autonomous Character-Scene Interaction Synthesis from Text Instruction
Nan Jiang
Zimo He
Zi Wang
Hongjie Li
Yixin Chen
Siyuan Huang†
Yixin Zhu†
Arxiv SIGGRAPH Asia 2024
DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes
Jialiang Zhang*
Haoran Liu*
Danshi Li*
Xinqiang Yu*
Haoran Geng
Yufei Ding
Jiayi Chen
He Wang†
Arxiv CoRL2024
Scaling up dynamic human-scene interaction modeling
Nan Jiang
Zhiyuan Zhang
Hongjie Li
Xiaoxuan Ma
Zan Wang
Yixin Chen
Tengyu Liu
Yixin Zhu†
Siyuan Huang†
Arxiv Github CVPR2024