VALSE Webinar 20240124-04期总第338期声音与视觉生成的二重奏

2024-1-19 19:23| 发布者: 程一-计算所| 查看: 1247| 评论: 0

摘要: 报告嘉宾：Wei Xue (香港科技大学)报告题目：Building the Singing Voice Foundation Model报告嘉宾：刘俊 (新加坡科技与设计大学)报告题目：Beyond Image Generation - Diffusion Models for 3D Pose and Mesh Reco ...

报告嘉宾：Wei Xue (香港科技大学)

报告题目：Building the Singing Voice Foundation Model

报告嘉宾：刘俊 (新加坡科技与设计大学)

报告题目：Beyond Image Generation - Diffusion Models for 3D Pose and Mesh Recovery

报告嘉宾：Wei Xue (香港科技大学)

报告时间：2024年1月24日 (星期三)晚上20:00 (北京时间)

报告题目：Building the Singing Voice Foundation Model

报告人简介：

Wei Xue is currently an Assistant Professor at Division of Emerging Interdisciplinary Areas (EMIA), Hong Kong University of Science and Technology (HKUST). He received the Bachelor degree in automatic control from Huazhong University of Science and Technology in 2010, and the Ph.D degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences in 2015. From August 2015 to September 2018 he was first a Marie Curie Experienced Researcher and then a Research Associate in Speech and Audio Processing Group, Department of Electrical & Electronic Engineering, Imperial College London, UK. He was a Senior Research Scientist at JD AI Research, Beijing, from November 2018 to December 2021, where he was leading the R&D on front-end speech processing and acoustic modelling for robust speech recognition. From January 2022 to April 2023 he was an Assistant Professor at Department of Computer Sciences, Hong Kong Baptist University. He was a visiting scholar at Université de Toulon and KU Leuven. Wei's research interests are in speech and music intelligence, including AI music generation, speech enhancement and separation, room acoustics, as well as speech and audio event recognition. He was a former Marie Curie Fellow and was selected into the Beijing Overseas Talent Aggregation Project. He currently leads the AI music research in the theme-based Art-Tech project which totally received HK$52.8 million from Hong Kong RGC.

个人主页：

http://wei-xue.com/

报告摘要：

We built a large singing voice foundation model to achieve cross-gender, language, vocal range, zero resource, and rapid generation of singing voice synthesis. Unlike traditional AI singers that require hours of training data and fixed repertoire, this model can support lyrics and melody modifications. It can achieve the effect of singing any new song using only tens of seconds of data, achieving song synthesis rather than simple conversion. This talk will introduce a series of supporting technologies, including timbre synthesizer based on NAS-FM, CoMoSpeech, CoMoSVC, etc.

参考文献：

[1] Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, Yike Guo, “NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis Based on Frequency Modulation,” in Proc. IJCAI, 2023.

[2] Zhen Ye, Wei Xue, Xu Tan, Jie Chen, Qifeng Liu, Yike Guo, “CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model,” in Proc. ACM Multimedia, 2023.

[3] Lu, Yiwen, Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, and Yike Guo. "CoMoSVC: Consistency Model-based Singing Voice Conversion." arXiv preprint arXiv:2401.01792 (2024).

报告嘉宾：刘俊 (新加坡科技与设计大学)

报告时间：2024年1月24日 (星期三)晚上20:30 (北京时间)

报告题目：Beyond Image Generation - Diffusion Models for 3D Pose and Mesh Recovery

报告人简介：

Jun Liu is currently an assistant professor in SUTD. His research interests include computer vision and artificial intelligence. His works have been published in premier computer vision journals and conferences, including TPAMI, CVPR, ICCV, and ECCV. His Google Scholar citation count is 11,400. He is an Associate Editor of IEEE Transactions on Image Processing and IEEE Transactions on Biometrics, Behavior, and Identity Science, IET Image Processing, IET Computer Vision, and Visual Intelligence, and serves/has served as an Area Chair of CVPR, ECCV, ICML, NeurIPS, ICLR, MM, and WACV, etc.

个人主页：

https://people.sutd.edu.sg/~jun_liu/

报告摘要：

AIGC has attracted a lot of research attention recently. The great success of the text-to-image generation models is partially due to the emergent of the diffusion models. Generally, the diffusion models rely on the progressive noising and denoising steps based on the Gaussian distribution. However, in some scenarios, the distribution of the noise is non-Gaussian. In our work, we investigated two techniques that are able to handle non-Gaussian-distribution noise. We showcase the efficacy of our techniques on 3D pose estimation and 3D mesh recovery tasks where our method achieves state-of-the-art results.

参考文献：

[1] Jia Gong, Lin Geng Foo, Zhipeng Fan, Qiuhong Ke, Hossein Rahmani, Jun Liu, DiffPose: Toward More Reliable 3D Pose Estimation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

[2] Lin Geng Foo, Jia Gong, Hossein Rahmani, Jun Liu, Distribution-Aligned Diffusion for Human Mesh Recovery, International Conference on Computer Vision (ICCV), 2023.

主持人：万人杰 (香港浸会大学)

主持人简介：

Renjie Wan received the BEng degree from the University of Electronic Science and Technology of China, in 2012 and the PhD degree from Nanayang Technological University, Singapore, in 2019. He is currently an assistant professor of Hong Kong Baptist University, Hong Kong. He is the outstanding reviewer of ICCV 2019 and the recipient of the Microsoft CRSF Award, VCIP 2020 Best Paper Award, and the Wallenberg-NTU Presidential Postdoctoral Fellowship.

个人主页：

https://www.comp.hkbu.edu.hk/~renjiewan/

特别鸣谢本次Webinar主要组织者：

主办AC：万人杰 (香港浸会大学)

收藏邀请

上一篇：VALSE Webinar 20240117-03期总第337期学术新人“修炼手册”下一篇：VALSE Webinar 20240228-05期总第339期开放世界下的具身智能系统

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2025-8-2 20:55 , Processed in 0.013944 second(s), 14 queries .

返回顶部

VALSE Webinar 20240124-04期 总第338期 声音与视觉生成的二重奏

相关分类

下级分类

VALSE Webinar 20240124-04期总第338期声音与视觉生成的二重奏