About

Hi there, welcome to my homepage! :)

I am currently an applied scientist at Microsoft. Before joining Microsoft, I received my M.Eng. from Tokyo Institute of Technology (Tokyo Tech), Japan fortunately advised by Prof. Takahiro Shinozaki at Shinozaki Lab.. Prior to that, I received my B.Eng. from Nanjing University (NJU), China, in 2019.

News

Apr 17, 2023: The ChatGPT robustness paper is accepetd by the workshop on Trustworthy and Reliable Large-Scale Machine Learning Models (RTML) at ICLR 2023 as a highlight paper!

Feb 22, 2023: We conducted a Out-Of-Distribution / adversarial robustness analysis for ChatGPT in this paper!

Jan 21, 2023: FreeMatch is accepted to ICLR 2023!

Sep 19, 2022: Check out our new SSL work USB: A Unified Semi-supervised Learning Benchmark accepted to NeurIPS 2022 Datasets and Benchmarks!

Sep 17, 2022: Margin Calibration is accepted to ACML 2022!

Aug 15, 2022: Check out our new SSL work Exploiting Unlabeled Data for Target-Oriented Opinion Words Extraction, it is accepted to COLING 2022!

Jul 09, 2022: Our paper on Automatic Spoken Language Acquisition is accepted to IEEE Journal of Selected Topics in Signal Processing Special Issue on Self-Supervised Learning for Speech and Audio Processing!

May 17, 2022: Check out our new state-of-the-art method FreeMatch on semi-supervised learning!

Dec 15, 2021: Check out our new preprint Margin Calibration for long-tailed visual recognition! We propose a simple and effective method to calibrate the biased margins for unbiased logits.

Research Experiences

  • Speech Recognition: I have some experiences in multilingual & cross-lingual [Hou+, TASLP2022, Hou+, ICASSP2021, Hou+, Interspeech2020] / cross-domain [Hou+, Interspeech2021] speech recognition. I am familiar with and contributed to ESPnet, the best ever toolkit for end-to-end processing. We made a wrapper to simplify its use (only ASR part): EasyEspnet.
  • Semi-supervised Learning (SSL): In [Zhang+, NeurIPS2021], we proposed a simple and effective technique, namely Curriculum Pseudo Labeling (CPL), to boost the performance of FixMatch and various SSL algorithms. We also open-sourced TorchSSL with 9 popular algorithms to enable fair comparison and boost the research and development of SSL.
  • Transfer Learning: We implemented several algorithms like DAN, DANN, DeepCoral and DSAN in the DeepDA toolkit.
  • Spoken Language Acquisition: We developed software robots that automatically acquire spoken language to solve a series of tasks through unsupervised and reinforcement learning [Komatsu+, JSTSP2022] [Zhang+, Interspeech2020] [Gao+, ICASSP2020]

Softwares

  • NeuralSpeech: A research project in Microsoft Research Asia focusing on neural network based speech processing, including automatic speech recognition (ASR), text to speech (TTS), etc.
  • TorchSSL: An all-in-one toolkit based on PyTorch for semi-supervised learning (SSL), containing code for 9 popular SSL algorithms to enable fair comparison and boost the development of SSL algorithms.
  • EasyEspnet: A wrapper for easier usage of ESPnet that helps you write/run/debug codes in a more friendly Python style.
  • DeepDA: A lightweight, easy-to-extend, easy-to-learn and high-performance toolkit based on PyTorch for domain adaptation (DA) of deep neural networks.
  • spolacq: Spoken Language Acquisition From Conversation Based On Reinforcement Learning

Publications

Preprint

  1. Han Zhu, Gaofeng Cheng, Jindong Wang, Wenxin Hou, Pengyuan Zhang, Yonghong Yan, “Boosting Cross-Domain Speech Recognition with Self-Supervision”, arxiv, 2022. [paper]

Journals

  1. Ryota Komatsu, Shengzhou Gao, Wenxin Hou, Mingxin Zhang, Tomohiro Tanaka, Keisuke Toyoda, Yusuke Kimura,Kent Hino, Yu Iwamoto, Kosuke Mori, Takuma Okamoto, and Takahiro Shinozaki, “Automatic Spoken Language Acquisition Based on Observation and Dialogue”, in IEEE Journal of Selected Topics in Signal Processing (JSTSP) Special Issue on Self-Supervised Learning for Speech and Audio Processing, 2022. [paper] [code]
  2. Wenxin Hou, Han Zhu, Yidong Wang, Jindong Wang, Tao Qin, Renjun Xu, Takahiro Shinozaki, “Exploiting Adapters for Cross-lingual Low-resource Speech Recognition”, in IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2022. [paper] [code] [知乎]

International Conferences

  1. Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Yue Fan, Zhen Wu, Jindong Wang, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Bernt Schiele, Xing Xie, “FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning”, accepted to The 11th International Conference on Learning Representations (ICLR), 2023. [paper] [code]
  2. Yidong Wang, Bowen Zhang, Wenxin Hou, Zhen Wu, Jindong Wang, Takahiro Shinozaki, “Margin Calibration for Long-Tailed Visual Recognition”, in Asian Conference on Machine Learning (ACML), 2022. [paper]
  3. Yidong Wang, Hao Chen, Yue Fan, Wang Sun, Ran Tao, Wenxin Hou, Renjie Wang, Linyi Yang, Zhi Zhou, Lan-Zhe Guo, Heli Qi, Zhen Wu, Yu-Feng Li, Satoshi Nakamura, Wei Ye, Marios Savvides, Bhiksha Raj, Takahiro Shinozaki, Bernt Schiele, Jindong Wang, Xing Xie, Yue Zhang, “USB: A Unified Semi-supervised Learning Benchmark”, in Advances in Neural Information Processing Systems (NeurIPS) 35 Track Datasets and Benchmarks, 2022. [paper] [code]
  4. Yidong Wang, Hao Wu, Ao Liu, Wenxin Hou, Zhen Wu, Jindong Wang, Takahiro Shinozaki, Manabu Okumura, Yue Zhang, “Exploiting Unlabeled Data for Target-Oriented Opinion Words Extraction”, in Proc. The 29th International Conference on Computational Linguistics (COLING), 2022. [paper] [code]
  5. Bowen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, Jindong Wang, Manabu Okumura, Takahiro Shinozaki, “FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling”, in Proc. Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS 2021), Online, December 2021. [paper] [poster] [code] [video] [知乎]
  6. Wenxin Hou, Jindong Wang, Xu Tan, Tao Qin, Takahiro Shinozaki, “Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching”, in Proc. Interspeech 2021, Brno, Czech Republic, August 2021. [paper] [slides] [code] [video] [知乎]
  7. Wenxin Hou, Yidong Wang, Shengzhou Gao, Takahiro Shinozaki, “Meta-Adapter: Efficient Cross-Lingual Adaptation with Meta-Learning”, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toronto, Ontario, Canada, June 2021. [paper] [slides] [code]
  8. Wenxin Hou, Yue Dong, Bairong Zhuang, Longfei Yang, Jiatong Shi and Takahiro Shinozaki, “Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning”, in Proc. Interspeech 2020, Shanghai, China, October 2020. [paper] [slides] [code] [demo]
  9. Mingxin Zhang, Tomohiro Tanaka, Wenxin Hou, Shengzhou Gao and Takahiro Shinozaki, “Sound-Image Grounding Based Focusing Mechanism for Efficient Automatic Spoken Language Acquisition”, in Proc. Interspeech 2020, Shanghai, China, October 2020. [paper] [code]
  10. Shengzhou Gao, Wenxin Hou, Tomohiro Tanaka and Takahiro Shinozaki, “Spoken Language Acquisition Based on Reinforcement Learning and Word Unit Segmentation”, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 2020. [paper] [slides] [code]

Workshops

  1. Jindong Wang, Xixu Hu, Wenxin Hou, Hao Chen, Runkai Zheng, Yidong Wang, Linyi Yang, Haojun Huang, Wei Ye, Xiubo Geng, Binxing Jiao, Yue Zhang, Xing Xie, “On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective”, accepted by the workshop on Trustworthy and Reliable Large-Scale Machine Learning Models (RTML) at ICLR 2023. [paper] [code]

Education

  • 2019.09- 2021.09: M.Eng., Tokyo Institute of Technology, Tokyo, Japan
  • 2015.09- 2019.07: B.Eng., Nanjing University, Nanjing, China

Experience

  • 2021.06 - 2021.08: Intern, Trip.com Group, Shanghai, China
  • 2020.12 - 2021:06: Research Intern, Microsoft Research Asia, Beijing, China
  • 2019.11 - 2020.12: Research Assistant, Tokyo Institute of Technology, Tokyo, Japan
  • 2019.04 - 2019.08: Intern, Nanjing Turing AI Institute, Nanjing, China

Honors and Awards

  • Stars of Tomorrow, Microsoft Research Asia, 2021
  • Student Travel Grant for Interspeech, ISCA, 2021
  • JASSO Honors Scholarship, Tokyo Tech, 2019
  • Excellence in Nanjing University Training Program of Innovation for Undergraduates, 2019
  • Renmin Scholarship (2nd class), Nanjing University, 2018
  • Outstanding Student, Nanjing University, 2018
  • Renmin Scholarship (3rd class), Nanjing University, 2017
  • Third Prize, China Undergraduate Mathematical Contest in Modeling (Jiangsu Province), 2017

Academic Services

  • Transaction Reviewer: TASLP 2022
  • Conference Reviewer: InterSpeech 2022

Teaching

  • Teaching Assistant, Speech Information Technology, Tokyo Institute of Technology, 2020

Miscellaneous

  • My hometown is Yangzhou, China, a small but beautiful garden city!
  • I have been a fan of Jay Chou since 2005 😄
  • I can speak Chinese (native), English (fluent) and Japanese (intermediate). (Sad that my Japanese listening is really poor…)
  • Special thanks to Dr. Jindong Wang, my internship mentor at Microsoft Research Asia!