Bin Xiao

Bin Xiao is a Research Engineer of Meta GenAI team. He is working the multi-modality Llama model development at Meta. His research interests include computer vision, deep learning and multi-modality large language models. His representative works include phi-3-vision, Florence models, and high-resolution network (HRNet).

Research Highlights

2024-present Multimodality LLama post-training
2024-2024: Leading Phi-3-vision and Phi-3.5-vision project, developing one of the best “small” multi-modal LLMs.
2020-2023: Led Florence project and co-authored Florence-1 / Florence-2; Florence-2 was accepted as an oral presentation at CVPR 2024 (90 of 2719 accepted papers).
2021: Co-authored CvT, one of the first hybrid transformer and CNN architectures; ranks 6th among the most cited papers in ICCV 2021 (Citations: 2,000+).
2019-2020: Co-authored HRNet, the first neural architecture backbone maintaining high-resolution representations for dense recognition tasks; ranks 4th among the most cited papers in CVPR 2019 (Citations: 5,000+).
2018-2019: Co-authored SimpleBaseline, established the baseline for human pose estimation and tracking; ranks 12th among the most cited papers in ECCV 2018 (Citations: 2,100+).

Selected publications

Please refer to my Google scholar for a full list of my publications.

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Microsoft GenAI team
[arXiv] [HuggingFace]
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan
[arXiv] [HuggingFace] CVPR 2024 (oral)
Deep High-Resolution Representation Learning for Human Pose Estimation
Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang
Conference on Computer Vision and Pattern Recognition (CVPR), 2019
[Paper] [Code] Paper Digest Most Influential CVPR Papers
CvT: Introducing Convolutions to Vision Transformers
Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang
International Conference on Computer Vision (ICCV), 2021
[Paper] [Code] Paper Digest Most Influential ICCV Papers
Simple baselines for human pose estimation and tracking
Bin Xiao, Haiping Wu, Yichen Wei
European Conference on Computer Vision (ECCV), 2018
[Paper] [Code] Paper Digest Most Influential ECCV Papers
DaViT: Dual Attention Vision Transformers
Mingyu Ding, Bin Xiao, Noel Codella, Ping Luo, Jingdong Wang, Lu Yuan
European Conference on Computer Vision (ECCV), 2022
Unified contrastive learning in image-text-label space
Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Bin Xiao, Ce Liu, Lu Yuan, Jianfeng Gao
Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[Paper]
Florence: A New Foundation Model for Computer Vision
Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, Jianfeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang
[arXiv]
Lite-hrnet: A lightweight high-resolution network
Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, Jingdong Wang
Conference on Computer Vision and Pattern Recognition (CVPR), 2021
[Paper] [Code]
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
Bowen Cheng, Bin Xiao, Jingdong Wang, Honghui Shi, Thomas S Huang, Lei Zhang
Conference on Computer Vision and Pattern Recognition (CVPR), 2020
[Paper] [Code]
Deep High-Resolution Representation Learning for Visual Recognition
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao
IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2020
[Paper]
Integral human pose regression
Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, Yichen Wei
European Conference on Computer Vision (ECCV), 2018
[Paper]

Honors and Awards

1st place in Look into Person Challenge 2019: Single-Person Human Pose Estimation Track
2nd place in Object356 Challenge 2019 : Full track
1st place in PoseTrack Multi-Person Pose Tracking Challenge 2018
2nd place in COCO Keypoint Detection Challenge 2018

Professinonal Activities

Conference reviewer: CVPR, ICCV, ECCV, ICLR, and et.al.
Journal reviewer: T-PAMI, T-MM, IJCV, and et.al.