VALSE

VALSE 首页 活动通知 查看内容

VALSE 论文速览 第190期:TMac for Acoustic Event Classification

2024-8-27 10:30| 发布者: 程一-计算所| 查看: 351| 评论: 0

摘要: 论文题目:TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification作者列表:刘猛 (国防科技大学)、梁科(国防科技大学)、胡达宇(国防科技大学)、于灏(国防科技大学)、刘悦(国防科技大学)、孟 ...

论文题目:

TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification

作者列表:

刘猛 (国防科技大学)、梁科 (国防科技大学)胡达宇 (国防科技大学)于灏 (国防科技大学)刘悦 (国防科技大学)孟令源 (国防科技大学)涂文轩 (国防科技大学)周思航 (国防科技大学)刘新旺 (国防科技大学)


B站观看网址:

https://www.bilibili.com/video/BV14T421Y7J4/


论文摘要:

Audiovisual data is everywhere in this digital age, which raises higher requirements for the deep learning models developed on them. To well handle the information of the multi-modal data is the key to a better audiovisual modal. We observe that these audiovisual data naturally have temporal attributes, such as the time information for each frame in the video. More concretely, such data is inherently multi-modal according to both audio and visual cues, which proceed in a strict chronological order. It indicates that temporal information is important in multi-modal acoustic event modeling for both intra- and inter-modal. However, existing methods deal with each modal feature independently and simply fuse them together, which neglects the mining of temporal relation and thus leads to sub-optimal performance. With this motivation, we propose a Temporal Multi-modal graph learning method for Acoustic event Classification, called TMac, by modeling such temporal information via graph learning techniques. In particular, we construct a temporal graph for each acoustic event, dividing its audio data and video data into multiple segments. Each segment can be considered as a node, and the temporal relationships between nodes can be considered as timestamps on their edges. In this case, we can smoothly capture the dynamic information in intra-modal and inter-modal. Several experiments are conducted to demonstrate TMac outperforms other SOTA models in performance. 


参考文献:

[1] Meng Liu, Ke Liang, Dayu Hu, Hao Yu, Yue Liu, Lingyuan Meng, WenxuanTu, Sihang Zhou, Xinwang Liu. "TMac: Temporal multi-modal graph learning for acoustic event classification." Proceedings of the 31st ACM International Conference on Multimedia. 2023.


论文链接:

[https://arxiv.org/abs/2309.11845]

 

代码链接:

[https://github.com/MGitHubL/TMac]

 

视频讲者简介:

刘猛,国防科技大学计算机专业博士生。研究方向为图学习和聚类分析。第一作者在ICLR、SIGIR、ACM MM、TNNLS和CIKM等会议期刊上发表论文8篇,谷歌学术引用300余次。获CCHI 2023最佳学生论文、国家奖学金、校级优秀硕士论文等荣誉。多次受邀在VALSE、AI TIME、Mila等平台做报告。担任TKDE、TOIS、TNNLS、TOMM和NeurIPS、ICML、ICLR、KDD等期刊会议审稿人。


个人主页:

https://scholar.google.com/citations?user=6DtqpM8AAAAJ&hl=zh-CN&authuser=1



特别鸣谢本次论文速览主要组织者:

月度轮值AC:陈使明 (阿联酋人工智能大学)


活动参与方式

1、VALSE每周举行的Webinar活动依托B站直播平台进行,欢迎在B站搜索VALSE_Webinar关注我们!

直播地址:

https://live.bilibili.com/22300737;

历史视频观看地址:

https://space.bilibili.com/562085182/ 


2、VALSE Webinar活动通常每周三晚上20:00进行,但偶尔会因为讲者时区问题略有调整,为方便您参加活动,请关注VALSE微信公众号:valse_wechat 或加入VALSE QQ T群,群号:863867505);


*注:申请加入VALSE QQ群时需验证姓名、单位和身份缺一不可。入群后,请实名,姓名身份单位。身份:学校及科研单位人员T;企业研发I;博士D;硕士M。


3、VALSE微信公众号一般会在每周四发布下一周Webinar报告的通知。


4您也可以通过访问VALSE主页:http://valser.org/ 直接查看Webinar活动信息。Webinar报告的PPT(经讲者允许后),会在VALSE官网每期报告通知的最下方更新。

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2024-12-21 22:45 , Processed in 0.013481 second(s), 14 queries .

Powered by Discuz! X3.4

Copyright © 2001-2020, Tencent Cloud.

返回顶部