I am currently a final-year master student at School of Software, Tsinghua University. Previously, I received my B.S. from School of Computer Science and Engineering, Central South University, where I was awarded Outstanding Undergraduate Thesis Award (Top 2%). Now I am a research intern at Microsoft Research Asia mentored by Dr. Shaohan Huang, focusing primarily on Mixture-of-Experts and Multimodal Pre-training. Prior to this, I spent two years as a research intern at SenseTime, mainly focusing on developing advanced imaging technologies and integrating AI algorithms into imaging sensors and chips. Additionally, I have been a core team member in several international competitions, achieving notable rankings. I have published several papers at top international AI conferences like ICLR/KDD/MM.
My research aim to build up unified AI system capable of simultaneously processing information from multiple modalities and addressing various downstream tasks. With this goal, I have explored following topics:
- mixture-of-experts: mixture-of-experts’s structure, pretraining and domain adaption.
- language-based multi-modal intelligence: multi-modal learning (acoustic, vision, language and point clouds modalities, etc.), multi-modal retrieval, downstream adaption (zero-shot learning, parameter-efficient fine-tuning).
- alignment & debiasing: alignment for large language models (LLMs) and diffusion models, RLAIF.
News
- 05/2024 Two papers were accepted to KDD2024.
- 01/2024 One paper was accepted to ICLR2024.
- 12/2023 One paper was accepted to ICASSP2024.
- 11/2023 One paper was accepted to Information Sciences.
- 07/2023 One paper was accepted to ICIP 2023.
- 04/2023 Join Natural Language Computing group of Microsoft Research Asia (MSRA) as a research intern.
- 03/2023 One paper was accepted to ICASSP 2023 (Top 3%).
- 03/2022 Obtain 2nd Place Award at Mobile Intelligent Photography & Imaging (MIPI) Workshop for RGBW Remosaic @ CVPR 2023.
- 09/2022 One paper was accepted to Information Processing & Management (IPM, IF=8.6).
- 09/2022 Obtain 3rd Place Award at Mobile AI (AIM) workshop for Learned Smartphone ISP Challenge @ ECCV 2022.
- 06/2022 Obtain Winner Award at New Trends in Image Restoration and Enhancement (NTIRE) Workshop @ CVPR 2022.
- 06/2022 One paper was accepted to ACMMM 2022.
Research Experience
- Natural Language Computing Group (NLC), Microsoft Research Asia
- 2023.04 - present, Research intern
- mentored by Dr. Shaohan Huang
- AI Sensing & Imaging Group, SenseTime
- 2021.04 - 2023.04, Computer Vision Research intern
- mentored by Yaqi Wu and Dr. Feng Zhang
Publications
$^{*}$ indicates co-first author
🧑🎨 Mixture-of-Experts
Xun Wu, Shaohan Huang, Wenhui Wang, Furu Wei
- Significantly increase the expert activation ratio and enhanced fine-grained understanding capabilities in the SMoE model by partitioning tokens before the gating functions.
International Conference on Learning Representations (ICLR), 2024
Xun Wu, Shaohan Huang, Furu Wei
- Achieve a flexible and dynamic combination of multiple trained LoRAs through a gating network while preserving their individual characteristics.
Routing Evidence for Unseen Actions in Video Moment Retrieval (paper released soon)
SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2024
Guolong Wang *, Xun Wu *, Zhen Qin, Liangliang Shi
- Propose Routing Evidence (RE) by investigating mixture of prediection heads with uncertainty estimator (e.g., evidence learning), and achieve state-of-the-art out-of-domain (e.g., unseen actions) retrival performance on video moment retrieval task.
📚 Multi-modal Intelligence
Prompt-based Zero-shot Video Moment Retrieval
ACM Multimedia Conference (ACMMM), 2022
Guolong Wang *, Xun Wu *, Zhaoyuan Liu, JunChi Yan
- Propose.
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation
Xun Wu, Shaohan Huang, Guolong Wang, Furu Wei
- Propose VisionPrefer and VP-Score.
ICASSP 2023
Instance-Aware Hierarchical Structured Policy for Prompt Learning in Vision-Language Models. Xun Wu *, Guolong Wang *, Zhaoyuan Liu, Xuan Dang, Zhen Qin. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. (Top 3% recongintion)IPM
Reducing 0s bias in video moment retrieval with a circular competence-based captioner. Guolong Wang, Xun Wu, Zhaoyuan Liu, Zhen Qin. Information Processing & Management (IPM, IF=8.6), 2022.Information Sciences
Proaressive Reinforcement Learning for Video Summarization. Guolong Wang *, Xun Wu *, JunChi Yan. Information Sciences (IS, IF=8.1), 2023.
Imaging & Low-level Vision (no longer my primary focus)
Joint Demosaicing and Denoising with Gradient Guidance in Quad Bayer CFA
Xun Wu *, Yaqi Wu * , Jiawei Zhang, Feng Zhang, Jimmy S. Ren
-
ICME 2024
Improving Image Reconstruction and Synthesis by Balancing the Optimization from Frequency Perspective. Xua Dang *, Xun Wu *, Guolong Wang, Zhen Qin. IEEE International Conference on Multimedia and Expo (ICME), 2024. (Oral) -
ICIP 2023
Joint Demosaicing and Denoising with Gradient Guidance in Quad Bayer CFA. Xun Wu *, Yaqi Wu *, Jiawei Zhang, Feng Zhang, Jimmy S. Ren. IEEE International Conference on Image Processing (ICIP), 2023.
Competition
- New Trends in Image Restoration and Enhancement (NTIRE) Workshop on Multi-Spectral Filter Array Demosaicing @ CVPR2022, Winner Award (157 teams)
- Mobile AI (AIM) workshop for Learned Smartphone ISP Challenge @ ECCV2022, 3rd Place Award
ECCVW 2022
Residual Feature Distillation Channel Spatial Attention Network for ISP on Smartphone. Jiesi Zheng, Zhihao Fan, Xun Wu, Yaqi Wu, Feng Zhang. European Conference on Computer Vision - Advances in Image Manipulation workshop and challenges (CVPRW), 2022.
- Mobile Intelligent Photography & Imaging (MIPI) Workshop for RGBW Remosaic @ ECCV2022, 2nd Place Award (81 teams)
CVPRW 2023
OTST: A Two-Phase Framework for Joint Denoising and Remosaicing in RGBW CFA. Zhihao Fan *, Xun Wu *, Fanqing Meng, Yaqi Wu, Feng Zhang. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition - 2nd Mobile Intelligent Photography & Imaging Workshop (CVPRW), 2022.
- Mobile Intelligent Photography & Imaging (MIPI) Workshop for Quad Bayer Remosaic @ ECCV2022, 5nd Place
ECCVW 2022
Learning to Joint Remosaic and Denoise in Quad Bayer CFA via Universal Multi-scale Channel Attention Network. Xun Wu, Zhihao Fan, Jiesi Zheng, Yaqi Wu, Feng Zhang. European Conference on Computer Vision - 1nd Mobile Intelligent Photography & Imaging Workshop (ECCVW), 2022.
Honors and Awards
- 2023.11 Ubiquant Scholarship, Tsinghua University.
- 2023.06 Top-3% Paper recongition, ICASSP 2023.
- 2021.10 - 2023.10 Outstanding Student Award & Second-class Scholarship (×3), Tsinghua University.
- 2021.06 Province-level Outstanding Graduate Student Award (Top 5 in 242 students), Central South University.
- 2021.06 Outstanding student paper award, Central South University.
- 2020.10 Wanxing Technology Innovation Scholarship (Top 10 in 242 students), Central South University.
- 2017.10 - 2021.10 Outstanding Student Award & Second-class Scholarship (×3), Central South University.
- 2017.02 Sencond Prize in Provinces, Chinese Physics Olympiad.
Educations
- 2021.09 - present, Tsinghua University
- 2017.09 - 2021.06, Central South University.
Academic Services
Conference Reviewer:
- 2024 CVPR, NIPS, MM, KDD
- 2023 MM, ICASSP, CVPR
Miscellaneous
- 🏀 I am a big fan of basketball. I’ve been in the THU SS basketball team. Back to my undergrad, I was a member of the CSU SCSE basketball team, where we won the 4nd Place of the 2018 CSU Cup basketball tournament.
- 😉 I am very interested in painting as well as traditional Chinese calligraphy.