About

Microsoft AI

I am a Member of Technical Staff at Microsoft AI, based in Redmond, WA, building multimodal and agentic AI systems that people can actually use. Representative work includes Phi-3 Vision, Florence, CvT, and HRNet.

View Google Scholar Profile →

Research Interests

  • Multimodal & Agentic AI
  • Neural Architecture
  • Vision-Language Models
  • Pose Estimation & Dense Prediction
  • Reasoning & Coding Models

Key Projects

Trajectory

2018 – present
  1. 2025 – present

    Reasoning, agentic, and coding model training.

  2. 2024 – 2025

    Multimodal Llama post-training.

  3. 2024

    Led Phi-3 Vision and Phi-3.5 Vision, defining a strong generation of compact multimodal LLMs.

  4. 2020 – 2023

    Led and co-authored the Florence-1 and Florence-2 projects. Florence-2 selected as a CVPR 2024 oral presentation.

  5. 2018 – 2021

    Led and co-authored CvT, HRNet, and SimpleBaseline — three durable reference points in vision research.

Selected Publications

5 papers
  1. 05

    2024 · Technical ReportMultimodal LLM

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Compact models practical on-device, including strong multimodal capability through Phi-3 Vision and Phi-3.5 Vision.

  2. 04

    2024 · CVPROral27M+ downloads

    Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

    A unified model spanning captioning, OCR, grounding, and segmentation — one architecture, many tasks.

  3. 03

    ICCV 20213000+ citations

    CvT: Introducing Convolutions to Vision Transformers

    An early hybrid CNN-transformer architecture. Among the most cited ICCV 2021 papers.

  4. 02

    CVPR 20197000+ citations

    Deep High-Resolution Representation Learning for Human Pose Estimation

    The HRNet family — high-resolution representations maintained throughout for dense recognition. Among the most cited CVPR 2019 papers.

  5. 01

    ECCV 20182800+ citations

    Simple Baselines for Human Pose Estimation and Tracking

    A high-performing baseline that became a durable reference for pose estimation and tracking research.

Recognition

8 highlights
  • Florence-2 accepted as a CVPR 2024 oral presentation.
  • HRNet — among the most cited CVPR 2019 papers, 7,000+ citations.
  • CvT — among the most cited ICCV 2021 papers, 3,000+ citations.
  • SimpleBaseline — among the most cited ECCV 2018 papers, 2,800+ citations.
  • 1st place — Look into Person Challenge 2019, Single-Person Pose Estimation Track.
  • 2nd place — Object365 Challenge 2019, Full track.
  • 1st place — PoseTrack Multi-Person Pose Tracking Challenge 2018.
  • 2nd place — COCO Keypoint Detection Challenge 2018.

Contact

Links
Location
Microsoft AI · Redmond, WA