default search action
Zhuo Chen 0006
Person information
- affiliation: Microsoft, Redmond, WA, USA
- affiliation (PhD 2017): Columbia University, New York, NY, USA
Other persons with the same name
- Zhuo Chen — disambiguation page
- Zhuo Chen 0001 — CSIRO DATA61, Marsfield, NSW, Australia (and 1 more)
- Zhuo Chen 0002 — University of Science and Technology of China, Chinese Academy of Sciences, School of Information Science and Technology, Hefei, China
- Zhuo Chen 0003 — Tongji University, College of Surveying and Geo-Informatics, Shanghai, China
- Zhuo Chen 0004 — National University of Singapore, Singapore (and 1 more)
- Zhuo Chen 0005 — National University of Defense Technology, College of Artificial Intelligence, Changsha, China
- Zhuo Chen 0007 — Zhejiang University, College of Computer Science and Technology, Hangzhou, China
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j10]Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka:
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer. IEEE ACM Trans. Audio Speech Lang. Process. 32: 3355-3364 (2024) - [j9]Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu, Shujie Liu, Yashesh Gaur, Zhuo Chen, Jinyu Li, Furu Wei:
VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation. IEEE ACM Trans. Audio Speech Lang. Process. 32: 3709-3716 (2024) - [c97]Sha Guo, Lin Sui, Chen-Lin Zhang, Zhuo Chen, Wenhan Yang, Lingyu Duan:
A Unified Image Compression Method for Human Perception and Multiple Vision Tasks. ECCV (71) 2024: 342-359 - [c96]He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li:
ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge. ICASSP Workshops 2024: 63-64 - [c95]Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li:
T-SOT FNT: Streaming Multi-Talker ASR with Text-Only Domain Adaptation Capability. ICASSP 2024: 11531-11535 - [i78]He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li:
ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge. CoRR abs/2401.03473 (2024) - [i77]Sha Guo, Zhuo Chen, Yang Zhao, Ning Zhang, Xiaotong Li, Lingyu Duan:
Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach. CoRR abs/2410.06149 (2024) - 2023
- [c94]Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu:
The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR. ASRU 2023: 1-8 - [c93]Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu:
On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration. ASRU 2023: 1-8 - [c92]Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez:
Speech Separation with Large-Scale Self-Supervised Learning. ICASSP 2023: 1-5 - [c91]Quchen Fu, Szu-Wei Fu, Yaran Fan, Yu Wu, Zhuo Chen, Jayant Gupchup, Ross Cutler:
Real-Time Speech Interruption Analysis: from Cloud to Client Deployment. ICASSP 2023: 1-5 - [c90]Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang:
Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition. ICASSP 2023: 1-5 - [c89]Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition. ICASSP 2023: 1-5 - [c88]Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng:
Target Sound Extraction with Variable Cross-Modality Clues. ICASSP 2023: 1-5 - [c87]Heming Wang, Yao Qian, Hemin Yang, Nauyuki Kanda, Peidong Wang, Takuya Yoshioka, Xiaofei Wang, Yiming Wang, Shujie Liu, Zhuo Chen, DeLiang Wang, Michael Zeng:
DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks. ICASSP 2023: 1-5 - [c86]Jian Wu, Zhuo Chen, Min Hu, Xiong Xiao, Jinyu Li:
Speaker Change Detection For Transformer Transducer ASR. ICASSP 2023: 1-5 - [c85]Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Simulating Realistic Speech Overlaps Improves Multi-Talker ASR. ICASSP 2023: 1-5 - [c84]Kai Feng, Zhuo Chen, Fei Gao, Zhe Wang, Long Xu, Weisi Lin:
Post-Training Quantization for Vision Transformer in Transformed Domain. ICME 2023: 1457-1462 - [c83]Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Wanxiang Che, Xiangzhan Yu, Furu Wei:
BEATs: Audio Pre-Training with Acoustic Tokenizers. ICML 2023: 5178-5193 - [c82]Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng:
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers. INTERSPEECH 2023: 1314-1318 - [c81]Midia Yousefi, Naoyuki Kanda, Dongmei Wang, Zhuo Chen, Xiaofei Wang, Takuya Yoshioka:
Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach. INTERSPEECH 2023: 3502-3506 - [c80]Sha Guo, Zhuo Chen, Yang Zhao, Ning Zhang, Xiaotong Li, Lingyu Duan:
Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach. ACM Multimedia 2023: 1431-1442 - [i76]Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei:
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers. CoRR abs/2301.02111 (2023) - [i75]Jian Wu, Zhuo Chen, Min Hu, Xiong Xiao, Jinyu Li:
Speaker Change Detection for Transformer Transducer ASR. CoRR abs/2302.08549 (2023) - [i74]Ziqiang Zhang, Long Zhou, Chengyi Wang, Sanyuan Chen, Yu Wu, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei:
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling. CoRR abs/2303.03926 (2023) - [i73]Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng:
Target Sound Extraction with Variable Cross-modality Clues. CoRR abs/2303.08372 (2023) - [i72]Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu, Shujie Liu, Yashesh Gaur, Zhuo Chen, Jinyu Li, Furu Wei:
VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation. CoRR abs/2305.16107 (2023) - [i71]Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng:
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers. CoRR abs/2305.18747 (2023) - [i70]Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu:
On decoder-only architecture for speech-to-text and large language model integration. CoRR abs/2307.03917 (2023) - [i69]Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka:
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer. CoRR abs/2308.06873 (2023) - [i68]Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li:
t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability. CoRR abs/2309.08131 (2023) - [i67]Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu:
The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR. CoRR abs/2309.13573 (2023) - [i66]Jing Pan, Jian Wu, Yashesh Gaur, Sunit Sivasankaran, Zhuo Chen, Shujie Liu, Jinyu Li:
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning. CoRR abs/2311.02248 (2023) - 2022
- [j8]Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, Furu Wei:
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing. IEEE J. Sel. Top. Signal Process. 16(6): 1505-1518 (2022) - [j7]Chenda Li, Zhuo Chen, Yanmin Qian:
Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation. IEEE ACM Trans. Audio Speech Lang. Process. 30: 1508-1520 (2022) - [c79]Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Zhuo Chen, Xuedong Huang:
One Model to Enhance Them All: Array Geometry Agnostic Multi-Channel Personalized Speech Enhancement. ICASSP 2022: 271-275 - [c78]Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang:
Personalized speech enhancement: new models and Comprehensive evaluation. ICASSP 2022: 356-360 - [c77]Yixuan Zhang, Zhuo Chen, Jian Wu, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li:
Continuous Speech Separation with Recurrent Selective Attention Network. ICASSP 2022: 6017-6021 - [c76]Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda:
VarArray: Array-Geometry-Agnostic Continuous Speech Separation. ICASSP 2022: 6027-6031 - [c75]Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez:
All-Neural Beamformer for Continuous Speech Separation. ICASSP 2022: 6032-6036 - [c74]Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu:
Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training. ICASSP 2022: 6152-6156 - [c73]Desh Raj, Liang Lu, Zhuo Chen, Yashesh Gaur, Jinyu Li:
Continuous Streaming Multi-Talker ASR with Dual-Path Transducers. ICASSP 2022: 7317-7321 - [c72]Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR. ICASSP 2022: 8082-8086 - [c71]Wei Wang, Zhuo Chen, Zhe Wang, Jie Lin, Long Xu, Weisi Lin:
Channel-Wise Bit Allocation for Deep Visual Feature Quantization. ICIP 2022: 3978-3982 - [c70]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings. INTERSPEECH 2022: 521-525 - [c69]Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei:
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? INTERSPEECH 2022: 3699-3703 - [c68]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Multi-Talker ASR with Token-Level Serialized Output Training. INTERSPEECH 2022: 3774-3778 - [c67]Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei:
Separating Long-Form Speech with Group-wise Permutation Invariant Training. INTERSPEECH 2022: 5383-5387 - [c66]Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu:
Exploring WavLM on Speech Enhancement. SLT 2022: 451-457 - [i65]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Multi-Talker ASR with Token-Level Serialized Output Training. CoRR abs/2202.00842 (2022) - [i64]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings. CoRR abs/2203.16685 (2022) - [i63]Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei:
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? CoRR abs/2204.12765 (2022) - [i62]Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu:
Ultra Fast Speech Separation Model with Teacher Student Learning. CoRR abs/2204.12777 (2022) - [i61]Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition. CoRR abs/2209.04974 (2022) - [i60]Gang Liu, Tianyan Zhou, Yong Zhao, Yu Wu, Zhuo Chen, Yao Qian, Jian Wu:
The Microsoft System for VoxCeleb Speaker Recognition Challenge 2022. CoRR abs/2209.11266 (2022) - [i59]Quchen Fu, Szu-Wei Fu, Yaran Fan, Yu Wu, Zhuo Chen, Jayant Gupchup, Ross Cutler:
Real-time Speech Interruption Analysis: From Cloud to Client Deployment. CoRR abs/2210.13334 (2022) - [i58]Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Simulating realistic speech overlaps improves multi-talker ASR. CoRR abs/2210.15715 (2022) - [i57]Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez:
Speech separation with large-scale self-supervised learning. CoRR abs/2211.05172 (2022) - [i56]Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang:
Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition. CoRR abs/2211.05564 (2022) - [i55]Xiaofei Wang, Zhuo Chen, Yu Shi, Jian Wu, Naoyuki Kanda, Takuya Yoshioka:
Breaking trade-offs in speech separation with sparsely-gated mixture of experts. CoRR abs/2211.06493 (2022) - [i54]Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu:
Exploring WavLM on Speech Enhancement. CoRR abs/2211.09988 (2022) - [i53]Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Furu Wei:
BEATs: Audio Pre-Training with Acoustic Tokenizers. CoRR abs/2212.09058 (2022) - 2021
- [j6]Peidong Wang, Zhuo Chen, DeLiang Wang, Jinyu Li, Yifan Gong:
Speaker Separation Using Speaker Inventories and Estimated Speech. IEEE ACM Trans. Audio Speech Lang. Process. 29: 537-546 (2021) - [c65]Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio. ASRU 2021: 296-303 - [c64]Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng:
Continuous Speech Separation with Ad Hoc Microphone Arrays. EUSIPCO 2021: 1100-1104 - [c63]Yi Luo, Zhuo Chen, Cong Han, Chenda Li, Tianyan Zhou, Nima Mesgarani:
Rethinking The Separation Layers In Speech Separation Networks. ICASSP 2021: 1-5 - [c62]Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian:
Dual-Path Modeling for Long Recording Speech Separation in Meetings. ICASSP 2021: 5739-5743 - [c61]Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Jinyu Li, Takuya Yoshioka, Chengyi Wang, Shujie Liu, Ming Zhou:
Continuous Speech Separation with Conformer. ICASSP 2021: 5749-5753 - [c60]Xiong Xiao, Naoyuki Kanda, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao, Gang Liu, Yu Wu, Jian Wu, Shujie Liu, Jinyu Li, Yifan Gong:
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020. ICASSP 2021: 5824-5828 - [c59]Sanyuan Chen, Yu Wu, Zhuo Chen, Takuya Yoshioka, Shujie Liu, Jin-Yu Li, Xiangzhan Yu:
Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer. ICASSP 2021: 6139-6143 - [c58]Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka:
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR. ICASSP 2021: 6503-6507 - [c57]Sefik Emre Eskimez, Xiaofei Wang, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka:
Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement. Interspeech 2021: 2686-2690 - [c56]Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu:
Ultra Fast Speech Separation Model with Teacher Student Learning. Interspeech 2021: 3026-3030 - [c55]Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen:
Continuous Speech Separation Using Speaker Inventory for Long Recording. Interspeech 2021: 3036-3040 - [c54]Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li:
Investigation of Practical Aspects of Single Channel Speech Separation for ASR. Interspeech 2021: 3066-3070 - [c53]Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone. Interspeech 2021: 3430-3434 - [c52]Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe:
Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker. Interspeech 2021: 3555-3559 - [c51]Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, Jingdong Chen:
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario. Interspeech 2021: 3665-3669 - [c50]Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
End-to-End Speaker-Attributed ASR with Transformer. Interspeech 2021: 4413-4417 - [c49]Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Böddeker, Zhuo Chen, Shinji Watanabe:
ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration. SLT 2021: 785-792 - [c48]Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings. SLT 2021: 809-816 - [c47]Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, Zhuo Chen, Zhong Meng, Takuya Yoshioka:
Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription. SLT 2021: 833-840 - [c46]Chenda Li, Yi Luo, Cong Han, Jinyu Li, Takuya Yoshioka, Tianyan Zhou, Marc Delcroix, Keisuke Kinoshita, Christoph Böddeker, Yanmin Qian, Shinji Watanabe, Zhuo Chen:
Dual-Path RNN for Long Recording Speech Separation. SLT 2021: 865-872 - [c45]Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey:
Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis. SLT 2021: 897-904 - [c44]Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin W. Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey:
Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement. SLT 2021: 905-911 - [i52]Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian:
Dual-Path Modeling for Long Recording Speech Separation in Meetings. CoRR abs/2102.11634 (2021) - [i51]Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng:
Continuous Speech Separation with Ad Hoc Microphone Arrays. CoRR abs/2103.02378 (2021) - [i50]Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone. CoRR abs/2103.16776 (2021) - [i49]Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
End-to-End Speaker-Attributed ASR with Transformer. CoRR abs/2104.02128 (2021) - [i48]Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, Jingdong Chen:
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario. CoRR abs/2104.03603 (2021) - [i47]Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li:
Investigation of Practical Aspects of Single Channel Speech Separation for ASR. CoRR abs/2107.01922 (2021) - [i46]Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio. CoRR abs/2107.02852 (2021) - [i45]Desh Raj, Liang Lu, Zhuo Chen, Yashesh Gaur, Jinyu Li:
Continuous Streaming Multi-Talker ASR with Dual-path Transducers. CoRR abs/2109.08555 (2021) - [i44]Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR. CoRR abs/2110.03151 (2021) - [i43]Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda:
VarArray: Array-Geometry-Agnostic Continuous Speech Separation. CoRR abs/2110.05745 (2021) - [i42]Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu:
UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training. CoRR abs/2110.05752 (2021) - [i41]Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez:
All-neural beamformer for continuous speech separation. CoRR abs/2110.06428 (2021) - [i40]Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang:
Personalized Speech Enhancement: New Models and Comprehensive Evaluation. CoRR abs/2110.09625 (2021) - [i39]Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Zhuo Chen, Xuedong Huang:
One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement. CoRR abs/2110.10330 (2021) - [i38]Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei:
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing. CoRR abs/2110.13900 (2021) - [i37]Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei:
Separating Long-Form Speech with Group-Wise Permutation Invariant Training. CoRR abs/2110.14142 (2021) - [i36]Yixuan Zhang, Zhuo Chen, Jian Wu, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li:
Continuous Speech Separation with Recurrent Selective Attention Network. CoRR abs/2110.14838 (2021) - [i35]Sien Chen, Jian Jin, Lili Meng, Weisi Lin, Zhuo Chen, Tsui-Shan Chang, Zhengguang Li, Huaxiang Zhang:
A New Image Codec Paradigm for Human and Machine Uses. CoRR abs/2112.10071 (2021) - 2020
- [j5]Zhuo Chen, Kui Fan, Shiqi Wang, Lingyu Duan, Weisi Lin, Alex ChiChung Kot:
Toward Intelligent Sensing: Intermediate Deep Feature Compression. IEEE Trans. Image Process. 29: 2230-2243 (2020) - [c43]Yi Luo, Zhuo Chen, Takuya Yoshioka:
Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation. ICASSP 2020: 46-50 - [c42]Yi Luo, Zhuo Chen, Nima Mesgarani, Takuya Yoshioka:
End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation. ICASSP 2020: 6394-6398 - [c41]Yong Zhao, Tianyan Zhou, Zhuo Chen, Jian Wu:
Improving Deep CNN Networks with Long Temporal Context for Text-Independent Speaker Verification. ICASSP 2020: 6834-6838 - [c40]Zhuo Chen, Takuya Yoshioka, Liang Lu, Tianyan Zhou, Zhong Meng, Yi Luo, Jian Wu, Xiong Xiao, Jinyu Li:
Continuous Speech Separation: Dataset and Analysis. ICASSP 2020: 7284-7288 - [c39]Zhuo Chen, Ling-Yu Duan, Shiqi Wang, Weisi Lin, Alex C. Kot:
Data Representation in Hybrid Coding Framework for Feature Maps Compression. ICIP 2020: 3094-3098 - [c38]Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka:
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers. INTERSPEECH 2020: 36-40 - [c37]Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie:
An End-to-End Architecture of Online Multi-Channel Speech Separation. INTERSPEECH 2020: 81-85 - [c36]Dongmei Wang, Zhuo Chen, Takuya Yoshioka:
Neural Speech Separation Using Spatially Distributed Microphones. INTERSPEECH 2020: 339-343 - [i34]Zhuo Chen, Takuya Yoshioka, Liang Lu, Tianyan Zhou, Zhong Meng, Yi Luo, Jian Wu, Jinyu Li:
Continuous speech separation: dataset and analysis. CoRR abs/2001.11482 (2020) - [i33]Dongmei Wang, Zhuo Chen, Takuya Yoshioka:
Neural Speech Separation Using Spatially Distributed Microphones. CoRR abs/2004.13670 (2020) - [i32]Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka:
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers. CoRR abs/2006.10930 (2020) - [i31]Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings. CoRR abs/2008.04546 (2020) - [i30]Sanyuan Chen, Yu Wu, Zhuo Chen, Jinyu Li, Chengyi Wang, Shujie Liu, Ming Zhou:
Continuous Speech Separation with Conformer. CoRR abs/2008.05773 (2020) - [i29]Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie:
An End-to-end Architecture of Online Multi-channel Speech Separation. CoRR abs/2009.03141 (2020) - [i28]Peidong Wang, Zhuo Chen, DeLiang Wang, Jinyu Li, Yifan Gong:
Speaker Separation Using Speaker Inventories and Estimated Speech. CoRR abs/2010.10556 (2020) - [i27]Xiong Xiao, Naoyuki Kanda, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao, Gang Liu, Yu Wu, Jian Wu, Shujie Liu, Jinyu Li, Yifan Gong:
Microsoft Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2020. CoRR abs/2010.11458 (2020) - [i26]Sanyuan Chen, Yu Wu, Zhuo Chen, Takuya Yoshioka, Shujie Liu, Jinyu Li:
Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer. CoRR abs/2010.12180 (2020) - [i25]Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Mao-Kui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey:
Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis. CoRR abs/2011.02014 (2020) - [i24]Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka:
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR. CoRR abs/2011.02921 (2020) - [i23]Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, Zhuo Chen, Zhong Meng, Takuya Yoshioka:
Exploring End-to-End Multi-channel ASR with Bias Information for Meeting Transcription. CoRR abs/2011.03110 (2020) - [i22]Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Böddeker, Zhuo Chen, Shinji Watanabe:
ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration. CoRR abs/2011.03706 (2020) - [i21]Yi Luo, Zhuo Chen, Cong Han, Chenda Li, Tianyan Zhou, Nima Mesgarani:
Rethinking the Separation Layers in Speech Separation Networks. CoRR abs/2011.08400 (2020) - [i20]Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen:
Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording. CoRR abs/2012.09727 (2020)
2010 – 2019
- 2019
- [c35]Peidong Wang, Zhuo Chen, Xiong Xiao, Zhong Meng, Takuya Yoshioka, Tianyan Zhou, Liang Lu, Jinyu Li:
Speech Separation Using Speaker Inventory. ASRU 2019: 230-236 - [c34]Takuya Yoshioka, Yan Huang, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Igor Abramovski, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao, Tianyan Zhou, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang:
Advances in Online Audio-Visual Meeting Transcription. ASRU 2019: 276-283 - [c33]Xiong Xiao, Zhuo Chen, Takuya Yoshioka, Hakan Erdogan, Changliang Liu, Dimitrios Dimitriadis, Jasha Droppo, Yifan Gong:
Single-channel Speech Extraction Using Speaker Inventory and Attention Network. ICASSP 2019: 86-90 - [c32]Takuya Yoshioka, Zhuo Chen, Changliang Liu, Xiong Xiao, Hakan Erdogan, Dimitrios Dimitriadis:
Low-latency Speaker-independent Continuous Speech Separation. ICASSP 2019: 6980-6984 - [c31]Zhuo Chen, Jie Lin, Zhe Wang, Vijay Chandrasekhar, Weisi Lin:
Beyond Ranking Loss: Deep Holographic Networks for Multi-Label Video Search. ICIP 2019: 879-883 - [c30]Takuya Yoshioka, Dimitrios Dimitriadis, Andreas Stolcke, William Hinthorn, Zhuo Chen, Michael Zeng, Xuedong Huang:
Meeting Transcription Using Asynchronous Distant Microphones. INTERSPEECH 2019: 2968-2972 - [c29]Zhuo Chen, Kui Fan, Shiqi Wang, Ling-Yu Duan, Weisi Lin, Alex C. Kot:
Lossy Intermediate Deep Learning Feature Compression and Evaluation. ACM Multimedia 2019: 2414-2422 - [i19]Takuya Yoshioka, Zhuo Chen, Changliang Liu, Xiong Xiao, Hakan Erdogan, Dimitrios Dimitriadis:
Low-Latency Speaker-Independent Continuous Speech Separation. CoRR abs/1904.06478 (2019) - [i18]Takuya Yoshioka, Zhuo Chen, Dimitrios Dimitriadis, William Hinthorn, Xuedong Huang, Andreas Stolcke, Michael Zeng:
Meeting Transcription Using Virtual Microphone Arrays. CoRR abs/1905.02545 (2019) - [i17]Liang Lu, Xiong Xiao, Zhuo Chen, Yifan Gong:
PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch. CoRR abs/1907.05955 (2019) - [i16]Yi Luo, Zhuo Chen, Takuya Yoshioka:
Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation. CoRR abs/1910.06379 (2019) - [i15]Yi Luo, Zhuo Chen, Nima Mesgarani, Takuya Yoshioka:
End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation. CoRR abs/1910.14104 (2019) - [i14]Takuya Yoshioka, Igor Abramovski, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang, Yan Huang, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao, Tianyan Zhou:
Advances in Online Audio-Visual Meeting Transcription. CoRR abs/1912.04979 (2019) - 2018
- [j4]Yi Luo, Zhuo Chen, Nima Mesgarani:
Speaker-Independent Speech Separation With Deep Attractor Network. IEEE ACM Trans. Audio Speech Lang. Process. 26(4): 787-796 (2018) - [c28]Zhuo Chen, Takuya Yoshioka, Xiong Xiao, Linyu Li, Michael L. Seltzer, Yifan Gong:
Efficient Integration of Fixed Beamformers and Speech Separation Networks for Multi-Channel Far-Field Speech Separation. ICASSP 2018: 5384-5388 - [c27]Jinyu Li, Rui Zhao, Zhuo Chen, Changliang Liu, Xiong Xiao, Guoli Ye, Yifan Gong:
Developing Far-Field Speaker System Via Teacher-Student Learning. ICASSP 2018: 5699-5703 - [c26]Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, Fil Alleva:
Multi-Microphone Neural Speech Separation for Far-Field Multi-Talker Speech Recognition. ICASSP 2018: 5739-5743 - [c25]Zhong Meng, Jinyu Li, Zhuo Chen, Yang Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang Juang:
Speaker-Invariant Training Via Adversarial Learning. ICASSP 2018: 5969-5973 - [c24]Zhuo Chen, Weisi Lin, Shiqi Wang, Long Xu, Leida Li:
Image Quality Assessment Based Label Smoothing in Deep Neural Network Learning. ICASSP 2018: 6742-6746 - [c23]Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, Xiong Xiao, Fil Alleva:
Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks. INTERSPEECH 2018: 3038-3042 - [c22]Zhuo Chen, Xiong Xiao, Takuya Yoshioka, Hakan Erdogan, Jinyu Li, Yifan Gong:
Multi-Channel Overlapped Speech Recognition with Location Guided Speech Extraction Network. SLT 2018: 558-565 - [i13]Zhuo Chen, Jinyu Li, Xiong Xiao, Takuya Yoshioka, Huaming Wang, Zhenghao Wang, Yifan Gong:
Cracking the cocktail party problem by multi-beam deep attractor network. CoRR abs/1803.10924 (2018) - [i12]Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang Juang:
Speaker-Invariant Training via Adversarial Learning. CoRR abs/1804.00732 (2018) - [i11]Jinyu Li, Rui Zhao, Zhuo Chen, Changliang Liu, Xiong Xiao, Guoli Ye, Yifan Gong:
Developing Far-Field Speaker System Via Teacher-Student Learning. CoRR abs/1804.05166 (2018) - [i10]Zhuo Chen, Weisi Lin, Shiqi Wang, Lingyu Duan, Alex C. Kot:
Intermediate Deep Feature Compression: the Next Battlefield of Intelligent Sensing. CoRR abs/1809.06196 (2018) - [i9]Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, Xiong Xiao, Fil Alleva:
Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks. CoRR abs/1810.03655 (2018) - 2017
- [b1]Zhuo Chen:
Single Channel auditory source separation with neural network. Columbia University, USA, 2017 - [j3]Takaaki Hori, Zhuo Chen, Hakan Erdogan, John R. Hershey, Jonathan Le Roux, Vikramjit Mitra, Shinji Watanabe:
Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend. Comput. Speech Lang. 46: 401-418 (2017) - [j2]Lin Ma, Zhuo Chen, Long Xu, Yihua Yan:
Multimodal deep learning for solar radio burst classification. Pattern Recognit. 61: 573-582 (2017) - [c21]Wan Ding, Dong-Yan Huang, Zhuo Chen, Xinguo Yu, Weisi Lin:
Facial action recognition using very deep networks for highly imbalanced class distribution. APSIPA 2017: 1368-1372 - [c20]Zhong Meng, Zhuo Chen, Vadim Mazalov, Jinyu Li, Yifan Gong:
Unsupervised adaptation with domain separation networks for robust speech recognition. ASRU 2017: 214-221 - [c19]Zhuo Chen, Jinyu Li, Xiong Xiao, Takuya Yoshioka, Huaming Wang, Zhenghao Wang, Yifan Gong:
Cracking the cocktail party problem by multi-beam deep attractor network. ASRU 2017: 437-444 - [c18]James O'Sullivan, Zhuo Chen, Sameer A. Sheth, Guy McKhann, Ashesh D. Mehta, Nima Mesgarani:
Neural decoding of attentional selection in multi-speaker environments without access to separated sources. EMBC 2017: 1644-1647 - [c17]Yi Luo, Zhuo Chen, John R. Hershey, Jonathan Le Roux, Nima Mesgarani:
Deep clustering and conventional networks for music separation: Stronger together. ICASSP 2017: 61-65 - [c16]Zhuo Chen, Yi Luo, Nima Mesgarani:
Deep attractor network for single-microphone speaker separation. ICASSP 2017: 246-250 - [c15]Sisi Chen, Long Xu, Lin Ma, Weiqiang Zhang, Zhuo Chen, Yihua Yan:
Convolutional neural network for classification of solar radio spectrum. ICME Workshops 2017: 198-201 - [c14]Xuexin Yu, Long Xu, Lin Ma, Zhuo Chen, Yihua Yan:
Solar radio spectrum classification with LSTM. ICME Workshops 2017: 519-524 - [c13]Wenqing Sun, Long Xu, Xin Huang, Weiqiang Zhang, Tianjiao Yuan, Zhuo Chen, Yihua Yan:
Forecasting of ionospheric vertical total electron content (TEC) using LSTM networks. ICMLC 2017: 340-344 - [c12]Zhuo Chen, Yan Huang, Jinyu Li, Yifan Gong:
Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection. INTERSPEECH 2017: 3632-3636 - [p1]John R. Hershey, Jonathan Le Roux, Shinji Watanabe, Scott Wisdom, Zhuo Chen, Yusuf Ziya Isik:
Novel Deep Architectures in Speech Processing. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 135-164 - [i8]Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li, Yifan Gong:
End-to-End Attention based Text-Dependent Speaker Verification. CoRR abs/1701.00562 (2017) - [i7]Zhuo Chen, Yi Luo, Nima Mesgarani:
Speaker-independent Speech Separation with Deep Attractor Network. CoRR abs/1707.03634 (2017) - [i6]Zhuo Chen, Weisi Lin, Shiqi Wang, Long Xu, Leida Li:
Image Quality Assessment Guided Deep Neural Networks Training. CoRR abs/1708.03880 (2017) - [i5]Zhong Meng, Zhuo Chen, Vadim Mazalov, Jinyu Li, Yifan Gong:
Unsupervised Adaptation with Domain Separation Networks for Robust Speech Recognition. CoRR abs/1711.08010 (2017) - 2016
- [j1]Zhuo Chen, Lin Ma, Long Xu, Chengming Tan, Yihua Yan:
Imaging and representation learning of solar radio spectrums for classification. Multim. Tools Appl. 75(5): 2859-2875 (2016) - [c11]John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabe:
Deep clustering: Discriminative embeddings for segmentation and separation. ICASSP 2016: 31-35 - [c10]Yusuf Ziya Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey:
Single-Channel Multi-Speaker Separation Using Deep Clustering. INTERSPEECH 2016: 545-549 - [c9]Tasha Nagamine, Zhuo Chen, Nima Mesgarani:
Adaptation of Neural Networks Constrained by Prior Statistics of Node Co-Activations. INTERSPEECH 2016: 1583-1587 - [c8]Long Xu, Lin Ma, Zhuo Chen, Xianyou Zeng, Yihua Yan:
Perceptual image quality enhancement for solar radio image. QoMEX 2016: 1-6 - [c7]Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li, Yifan Gong:
End-to-End attention based text-dependent speaker verification. SLT 2016: 171-178 - [i4]Yusuf Ziya Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey:
Single-Channel Multi-Speaker Separation using Deep Clustering. CoRR abs/1607.02173 (2016) - [i3]Yi Luo, Zhuo Chen, John R. Hershey, Jonathan Le Roux, Nima Mesgarani:
Deep Clustering and Conventional Networks for Music Separation: Stronger Together. CoRR abs/1611.06265 (2016) - [i2]Zhuo Chen, Yi Luo, Nima Mesgarani:
Deep attractor network for single-microphone speaker separation. CoRR abs/1611.08930 (2016) - 2015
- [c6]Takaaki Hori, Zhuo Chen, Hakan Erdogan, John R. Hershey, Jonathan Le Roux, Vikramjit Mitra, Shinji Watanabe:
The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition. ASRU 2015: 475-481 - [c5]Roger Hsiao, Jeff Z. Ma, William Hartmann, Martin Karafiát, Frantisek Grézl, Lukás Burget, Igor Szöke, Jan Cernocký, Shinji Watanabe, Zhuo Chen, Sri Harish Reddy Mallidi, Hynek Hermansky, Stavros Tsakalidis, Richard M. Schwartz:
Robust speech recognition in unknown reverberant and noisy conditions. ASRU 2015: 533-538 - [c4]Long Xu, Ying Weng, Zhuo Chen:
Solar Radio Astronomical Big Data Classification. HPCA (China) 2015: 126-133 - [c3]Zhuo Chen, Shinji Watanabe, Hakan Erdogan, John R. Hershey:
Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks. INTERSPEECH 2015: 3274-3278 - [c2]Long Xu, Lin Ma, Zhuo Chen, Yihua Yan, Jinjian Wu:
Perceptual Quality Improvement for Synthesis Imaging of Chinese Spectral Radioheliograph. PCM (2) 2015: 94-105 - [c1]Zhuo Chen, Lin Ma, Long Xu, Ying Weng, Yihua Yan:
Multimodal Learning for Classification of Solar Radio Spectrum. SMC 2015: 1035-1040 - [i1]John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabe:
Deep clustering: Discriminative embeddings for segmentation and separation. CoRR abs/1508.04306 (2015)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-02-05 21:30 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint