跳过正文

研究

2025

ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools
Shaofeng Yin
Ting Lei
Yang Liu†
ICCV 2025
Playing with Transformer at 30+ FPS via Next-Frame Diffusion
Xinle Cheng
Tianyu He†
Jiayi Xu
Junliang Guo
Di He
Jiang Bian
Arxiv NeurIPS 2025
In this work, we present Next-Frame Diffusion (NFD), an autoregressive diffusion transformer that incorporates block-wise causal attention, enabling iterative sampling and efficient inference via parallel token generation within each frame.
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation
Yiming Qin
Zhu Xu
Yang Liu†
Arxiv CVPR 2025
OmniPhysGS: 3D Constitutive Gaussians for General Physics-based Dynamics Generation
Yuchen Lin
Chenguo Lin†
Jianjin Xu
Yadong Mu‡
Arxiv ICLR 2025
ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning
Xiangru Tang*
Tianyu Hu*
Muyang Ye*
Yanjun Shao*
Xunjian Yin
Siru Ouyang
Wangchunshu Zhou
Pan Lu
Zhuosheng Zhang
Yilun Zhao
Arman Cohan
Mark Gerstein
Arxiv ICLR 2025
We present ChemAgent, a novel framework designed to improve the performance of LLMs through a dynamic, self-updating library.

2024

ProgressGym: Alignment with a Millennium of Moral Progress
Tianyi Qiu*†
Yang Zhang*
Xuchuan Huang
Jasmine Xinze Li
Jiaming Ji
Yaodong Yang
Github NeurIPS2024
Autonomous Character-Scene Interaction Synthesis from Text Instruction
Nan Jiang
Zimo He
Zi Wang
Hongjie Li
Yixin Chen
Siyuan Huang†
Yixin Zhu†
Arxiv SIGGRAPH Asia 2024
DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes
Jialiang Zhang*
Haoran Liu*
Danshi Li*
Xinqiang Yu*
Haoran Geng
Yufei Ding
Jiayi Chen
He Wang†
Arxiv CoRL2024
Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach
Yufei Ding*
Haoran Geng*
Chaoyi Xu
Xiaomeng Fang
Jiazhao Zhang
Songlin Wei
Qiyu Dai
Zhizheng Zhang
He Wang†
Github IROS2024
MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion
Lehong Wu
Lilang Lin
Jiahang Zhang
Yiyang Ma
Jiaying Liu†
Github ECCV2024
Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection
Ting Lei
Shaofeng Yin
Yuxin Peng
Yang Liu†
Github ECCV2024
Language Models Represent Beliefs of Self and Others
Wentao Zhu
Zhining Zhang
Yizhou Wang
Github ICML2024
Scaling up dynamic human-scene interaction modeling
Nan Jiang
Zhiyuan Zhang
Hongjie Li
Xiaoxuan Ma
Zan Wang
Yixin Chen
Tengyu Liu
Yixin Zhu†
Siyuan Huang†
Arxiv Github CVPR2024
Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection
Ting Lei
Shaofeng Yin
Yang Liu†
Github CVPR2024
Diff-BGM: A Diffusion Model for Video Background Music Generation
Sizhe Li
Yiming Qin
Minghang Zheng
Xin Jin
Yang Liu†
Github CVPR2024