About
Microsoft AII am a Member of Technical Staff at Microsoft AI, based in Redmond, WA, building multimodal and agentic AI systems that people can actually use. Representative work includes Phi-3 Vision, Florence, CvT, and HRNet.
View Google Scholar Profile →Research Interests
- Multimodal & Agentic AI
- Neural Architecture
- Vision-Language Models
- Pose Estimation & Dense Prediction
- Reasoning & Coding Models
Trajectory
2018 – present-
2025 – present
Reasoning, agentic, and coding model training.
-
2024 – 2025
Multimodal Llama post-training.
-
2024
Led Phi-3 Vision and Phi-3.5 Vision, defining a strong generation of compact multimodal LLMs.
-
2020 – 2023
Led and co-authored the Florence-1 and Florence-2 projects. Florence-2 selected as a CVPR 2024 oral presentation.
-
2018 – 2021
Led and co-authored CvT, HRNet, and SimpleBaseline — three durable reference points in vision research.
Selected Publications
5 papers- 05
- 04
- 03
- 02
- 01
Recognition
8 highlights- Florence-2 accepted as a CVPR 2024 oral presentation.
- HRNet — among the most cited CVPR 2019 papers, 7,000+ citations.
- CvT — among the most cited ICCV 2021 papers, 3,000+ citations.
- SimpleBaseline — among the most cited ECCV 2018 papers, 2,800+ citations.
- 1st place — Look into Person Challenge 2019, Single-Person Pose Estimation Track.
- 2nd place — Object365 Challenge 2019, Full track.
- 1st place — PoseTrack Multi-Person Pose Tracking Challenge 2018.
- 2nd place — COCO Keypoint Detection Challenge 2018.