研究
2025
ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools
↗
↖
Shaofeng Yin
, Ting Lei
, Yang Liu†
ICCV 2025
Playing with Transformer at 30+ FPS via Next-Frame Diffusion
↗
↖
Xinle Cheng
, Tianyu He†
, Jiayi Xu
, Junliang Guo
, Di He
, Jiang Bian
Arxiv
NeurIPS 2025
In this work, we present Next-Frame Diffusion (NFD), an autoregressive diffusion transformer that incorporates block-wise causal attention, enabling iterative sampling and efficient inference via parallel token generation within each frame.
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation
↗
↖
Yiming Qin
, Zhu Xu
, Yang Liu†
Arxiv
CVPR 2025
OmniPhysGS: 3D Constitutive Gaussians for General Physics-based Dynamics Generation
↗
↖
Yuchen Lin
, Chenguo Lin†
, Jianjin Xu
, Yadong Mu‡
Arxiv
ICLR 2025
ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning
↗
↖
Xiangru Tang*
, Tianyu Hu*
, Muyang Ye*
, Yanjun Shao*
, Xunjian Yin
, Siru Ouyang
, Wangchunshu Zhou
, Pan Lu
, Zhuosheng Zhang
, Yilun Zhao
, Arman Cohan
, Mark Gerstein
Arxiv
ICLR 2025
We present ChemAgent, a novel framework designed to improve the performance of LLMs through a dynamic, self-updating library.
2024
ProgressGym: Alignment with a Millennium of Moral Progress
Tianyi Qiu*†
, Yang Zhang*
, Xuchuan Huang
, Jasmine Xinze Li
, Jiaming Ji
, Yaodong Yang
Github
NeurIPS2024
Autonomous Character-Scene Interaction Synthesis from Text Instruction
Nan Jiang
, Zimo He
, Zi Wang
, Hongjie Li
, Yixin Chen
, Siyuan Huang†
, Yixin Zhu†
Arxiv
SIGGRAPH Asia 2024
DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes
↗
↖
Jialiang Zhang*
, Haoran Liu*
, Danshi Li*
, Xinqiang Yu*
, Haoran Geng
, Yufei Ding
, Jiayi Chen
, He Wang†
Arxiv
CoRL2024
Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach
↗
↖
Yufei Ding*
, Haoran Geng*
, Chaoyi Xu
, Xiaomeng Fang
, Jiazhao Zhang
, Songlin Wei
, Qiyu Dai
, Zhizheng Zhang
, He Wang†
Github
IROS2024
MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion
↗
↖
Lehong Wu
, Lilang Lin
, Jiahang Zhang
, Yiyang Ma
, Jiaying Liu†
Github
ECCV2024
Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection
Ting Lei
, Shaofeng Yin
, Yuxin Peng
, Yang Liu†
Github
ECCV2024
Language Models Represent Beliefs of Self and Others
Wentao Zhu
, Zhining Zhang
, Yizhou Wang
Github
ICML2024
Scaling up dynamic human-scene interaction modeling
↗
↖
Nan Jiang
, Zhiyuan Zhang
, Hongjie Li
, Xiaoxuan Ma
, Zan Wang
, Yixin Chen
, Tengyu Liu
, Yixin Zhu†
, Siyuan Huang†
Arxiv
Github
CVPR2024
Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection
Ting Lei
, Shaofeng Yin
, Yang Liu†
Github
CVPR2024
Diff-BGM: A Diffusion Model for Video Background Music Generation
Sizhe Li
, Yiming Qin
, Minghang Zheng
, Xin Jin
, Yang Liu†
Github
CVPR2024