Very deep convolutional neural networks for raw waveforms W Dai, C Dai, S Qu, J Li, S Das 2017 IEEE international conference on acoustics, speech and signal …, 2017 | 500 | 2017 |
Learning joint embedding with multimodal cues for cross-modal video-text retrieval NC Mithun, J Li, F Metze, AK Roy-Chowdhury Proceedings of the 2018 ACM on international conference on multimedia …, 2018 | 286 | 2018 |
A comparison of five multiple instance learning pooling functions for sound event detection with weak labeling Y Wang, J Li, F Metze IEEE International Conference on Acoustics, Speech and Signal Processing …, 2019 | 209 | 2019 |
Masked autoencoders that listen PY Huang, H Xu, J Li, A Baevski, M Auli, W Galuba, F Metze, ... Advances in Neural Information Processing Systems 35, 28708-28720, 2022 | 208 | 2022 |
Adversarial camera stickers: A physical camera-based attack on deep learning systems J Li, FR Schmidt, JZ Kolter Proceedings of the 36th International Conference on Machine Learning, 2019 | 194 | 2019 |
A comparison of deep learning methods for environmental sound detection J Li, W Dai, F Metze, S Qu, S Das 2017 IEEE International conference on acoustics, speech and signal …, 2017 | 189 | 2017 |
Universal phone recognition with a multilingual allophone system X Li, S Dalmia, J Li, M Lee, P Littell, J Yao, A Anastasopoulos, ... ICASSP 2020, 2020 | 131 | 2020 |
Adversarial music: Real world audio adversary against wake-word detection system J Li, S Qu, X Li, J Szurley, JZ Kolter, F Metze Advances in Neural Information Processing Systems 32, 2019 | 82 | 2019 |
Real-time fine grained occupancy estimation using depth sensors on arm embedded platforms S Munir, RS Arora, C Hesling, J Li, J Francis, C Shelton, C Martin, A Rowe, ... 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS …, 2017 | 58 | 2017 |
Joint embeddings with multimodal cues for video-text retrieval NC Mithun, J Li, F Metze, AK Roy-Chowdhury International Journal of Multimedia Information Retrieval 8, 3-18, 2019 | 34 | 2019 |
Towards Zero-shot Learning for Automatic Phonemic Transcription X Li, S Dalmia, DR Mortensen, J Li, AW Black, F Metze AAAI 2020, 2020 | 33 | 2020 |
Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection SY Tseng, J Li, Y Wang, J Szurley, F Metze, S Das InterSpeech 2018, 2017 | 23 | 2017 |
Eventness: Object detection on spectrograms for temporal localization of audio events P Pham, J Li, J Szurley, S Das 2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018 | 21 | 2018 |
Understanding audio pattern using convolutional neural network from raw waveforms S Qu, J Li, W Dai, S Das arXiv preprint arXiv:1611.09524, 2016 | 21 | 2016 |
Comparing the max and noisy-or pooling functions in multiple instance learning for weakly supervised sequence learning tasks Y Wang, J Li, F Metze InterSpeech 2018, 2018 | 16 | 2018 |
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification JB Li, S Qu, PB Huang, F Metze InterSpeech 2022, 2022 | 12 | 2022 |
Sound event detection for real life audio DCASE challenge JL Dai Wei, P Pham, S Das, S Qu, F Metze Proc. Workshop Detection and Classification of Acoustic Scenes and Events, 2016 | 12 | 2016 |
Audio-visual event recognition through the lens of adversary JB Li, K Ma, S Qu, PY Huang, F Metze ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 10 | 2021 |
On adversarial robustness of large-scale audio visual learning JB Li, S Qu, X Li, PYB Huang, F Metze ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 8 | 2022 |
Acoustic scene recognition with deep neural networks (DCASE challenge 2016) JL Dai Wei, P Pham, S Das, S Qu Robert Bosch Research and Technology Center 3, 2016 | 8 | 2016 |