VALSE Webinar 20230412-07期总第307期大生成模型潮流下的视频创作 ...

2023-4-6 19:00| 发布者: 程一-计算所| 查看: 993| 评论: 0

摘要: 报告时间2023年04月12日 (星期三)晚上20:00 (北京时间)主题大生成模型潮流下的视频创作Video Creation in the Dawn of Generative AI主持人孟德宇 (西安交通大学)蒋路 (Google ResearchCarnegie Mellon University) ...

报告时间	2023年04月12日 (星期三) 晚上20:00 (北京时间)
主题	大生成模型潮流下的视频创作 Video Creation in the Dawn of Generative AI
主持人	孟德宇 (西安交通大学) 蒋路 (Google Research&Carnegie Mellon University)
直播地址	https://live.bilibili.com/22300737

报告嘉宾：于力军 (Carnegie Mellon University)

报告题目：Masked Generative Video Transformer

报告嘉宾：Kihyuk Sohn (Google Research)

报告题目：Video Probabilistic Diffusion Models in Projected Latent Space

Panel嘉宾：

于力军 (Carnegie Mellon University)、Kihyuk Sohn (Google Research)、陈启峰 (香港科技大学)、寿政 (National University of Singapore)、付彦伟 (复旦大学)

Panel议题：

1. Spotlights on recent video editing works？

2. 您认为未来视频生成的发展趋势会是怎样的？

3. 生成式人工智能技术如何帮助up主创造出更具创意的视频内容？

4. 什么会是视频大模型的ChatGPT时刻？

5. 大模型时代下，普通科研实验室如何做视频生成方向的研究？

*欢迎大家在下方留言提出主题相关问题，主持人和panel嘉宾会从中选择若干热度高的问题加入panel议题！

报告嘉宾：于力军 (Carnegie Mellon University)

报告时间：2023年04月12日 (星期三)晚上20:00 (北京时间)

报告题目：Masked Generative Video Transformer

报告人简介：

于力军是美国卡内基梅隆大学计算机学院人工智能博士生，师从Alex Hauptmann教授，同时在蒋路博士的指导下长期兼任谷歌学生研究员，从事多模态基础模型和视频理解与生成的研究。他曾荣获2021年西贝尔学者 (Siebel Scholar)称号，并于2019年以最高荣誉获得北京大学计算机专业和经济学专业学士学位。他设计的MAGVIT视频生成模型以接近满分被CVPR 2023接收为Highlight，并曾在CVPR/ ICLR/ ECCV等发表多篇论文。他开发的视觉行为检测系统曾在CVPR/ ICCV/ WACV等国际会议公开赛事中夺冠十余次，他主导的公益视频分析项目曾被华盛顿邮报等知名媒体报道。

个人主页：

https://me.lj-y.com/

报告摘要：

我们引入了掩码生成式视频变换器模型，MAGVIT，以使用单个模型处理各种视频合成任务。我们设计了一个三维分词器来将视频量化为时空视觉单词，并提出了一种用于掩码视频单词建模的嵌入方法，以促进多任务学习。我们进行了大量实验来证明 MAGVIT 的生成质量、效率和灵活性。我们的实验表明，(1) MAGVIT的表现优于现存最先进的方法，并在三个视频生成基准 (包括具有挑战性的 Kinetics-600)上建立了最佳的 FVD。(2) MAGVIT在推理时间上优于现有方法，比扩散模型快两个数量级，比自回归模型快60倍。(3) 单个MAGVIT模型支持十种不同的生成任务，并可泛化到来自不同领域的视频。

项目主页：

https://magvit.cs.cmu.edu/

参考文献：

[1] Lijun Yu, Yong Cheng, Kihyuk Sohn, José Lezama, Han Zhang, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, and Lu Jiang. MAGVIT: Masked Generative Video Transformer. In CVPR 2023.

报告嘉宾：Kihyuk Sohn (Google Research)

报告时间：2023年04月12日 (星期三)晚上20:30 (北京时间)

报告题目：Video Probabilistic Diffusion Models in Projected Latent Space

报告人简介：

Dr. Kihyuk Sohn is a Research Scientist at Google Research in Mountain View, CA. Prior to joining Google, Kihyuk was a researcher in the Media Analytics group of NEC Laboratories America. He completed his Ph.D. at University of Michigan under the supervision of professor Honglak Lee. Kihyuk has a broad interest in machine learning and computer vision. Specifically, his research focuses on generative models, supervised and unsupervised deep representation learning with applications to computer vision, audio recognition, and text processing, using graphical models that are invariant to many factors of variation for robust perception from complex and multimodal data.

个人主页：

https://sites.google.com/site/kihyuksml/

报告摘要：

Despite the remarkable progress in deep generative models, synthesizing high-resolution and temporally coherent videos remains a challenge due to their high-dimensionality and complex temporal dynamics along with large spatial variations. Recent works on diffusion models have shown their potential to solve this challenge, yet they suffer from severe computation- and memory-inefficiency that limit the scalability. To handle this issue, we propose a novel generative model for videos, coined projected latent video diffusion models (PVDM), a probabilistic diffusion model which learns a video distribution in a low-dimensional latent space and thus can be efficiently trained with high-resolution videos under limited resources. Specifically, PVDM is composed of two components: (a) an autoencoder that projects a given video as 2D-shaped latent vectors that factorize the complex cubic structure of video pixels and (b) a diffusion model architecture specialized for our new factorized latent space and the training/sampling procedure to synthesize videos of arbitrary length with a single model. Experiments on popular video generation datasets demonstrate the superiority of PVDM compared with previous video synthesis methods; e.g., PVDM obtains the FVD score of 639.7 on the UCF-101 long video (128 frames)generation benchmark, which improves 1773.4 of the prior state-of-the-art.

参考文献：

[1] Sihyun Yu, Kihyuk Sohn, Subin Kim, Jinwoo Shin. Video Probabilistic Diffusion Models in Projected Latent Space.

https://arxiv.org/abs/2302.07685

https://sihyun.me/PVDM/

Panel嘉宾：陈启峰 (香港科技大学)

嘉宾简介：

陈启峰是香港科技大学助理教授，博士毕业于斯坦福大学计算机系，本科毕业于香港科技大学。他的研究方向主要包括AI内容生成、计算摄影以及无人驾驶。他曾入选《麻省理工科技评论》 “35岁以下创新35人”中国区榜单。由于他图像增强方面的创新研究，他获得了Google Faculty Research Award。他目前担任香港科技大学的智能自动驾驶中心的副主任。他也担任过了CVPR和NeurIPS的领域主席、IROS的副主编辑、以及AAAI和IJCAI的高级会议委员。他获得过ACM-ICPC全球总决赛和IOI的金牌。

个人主页：

cqf.io

Panel嘉宾：寿政 (National University of Singapore)

嘉宾简介：

Prof. Shou is a tenure-track Assistant Professor at National University of Singapore. He was a Research Scientist at Facebook AI in the Bay Area. He obtained his Ph.D. degree at Columbia University in the City of New York. He was awarded Wei Family Private Foundation Fellowship. He received the best paper finalist at CVPR'22, the best student paper nomination at CVPR'17. His team won the 1st place in the international challenges including ActivityNet 2017, Ego4D 2022, EPIC-Kitchens 2022. He is a Fellow of National Research Foundation (NRF)Singapore. He is on the Forbes 30 Under 30 Asia list.

个人主页：

https://sites.google.com/view/showlab

Panel嘉宾：付彦伟 (复旦大学)

嘉宾简介：

付彦伟，博士，复旦大学大数据学院青年研究员，博士生导师，上海高校特聘教授 (即东方学者), 英国计算机协会会士 (BCS Fellow)。2014年获得伦敦大学玛丽皇后学院博士学位，2015.01-2016.07，在美国匹兹堡迪士尼研究院任博士后研究员。付博士发表高水平论文100多篇, 其中包括IEEE TPAMI 发表通讯作者/ 第一作者论文11篇，论文曾获得IEEE ICME 2019最佳论文，获得美国发明专利7项、中国专利20多项。研究方向侧重于基于迁移学习的多个任务，如零样本/ 小样本学习；3D/ 4D物体的建模；神经网络稀疏化学习、机械臂抓取；图像编辑及修复等。付博士是多个国际期刊、学术会议长期审稿人及程序委员会委员 (如IEEE TPAMI, IJCV, ACM MM, NIPS, ICCV等)等；ICML, NeurPIS 领域主席，TMLR执行编委等。

个人主页：

http://yanweifu.github.io

Panel主持人：蒋路 (Google Research&Carnegie Mellon University)

Panel主持人简介：

Lu Jiang is a staff research scientist and TLM at Google Research, as well as an adjunct faculty member at the Language Technologies Institute of Carnegie Mellon University. His research interests focus on robust deep learning, generative AI, and video creation. His research has been integral in the development of multiple Google products, such as YouTube, Cloud AutoML, Ads, Waymo, and Translate, impacting the daily lives of billions of users worldwide. His work has been nominated for the best paper at the top conferences in natural language processing (ACL)and Computer Vision (CVPR). Lu Jiang is an active member of the research community, serving as an AI panelist for America's Seed Fund (NSF SBIR), and regularly acting as an area chair for conferences like CVPR, ICCV, NeurIPS, ACM Multimedia, and AAAI.

主持人：孟德宇 (西安交通大学)

主持人简介：

孟德宇，西安交通大学教授，博士生导师，任大数据算法与分析技术国家工程实验室机器学习教研室负责人。发表论文百余篇，其中IEEE汇刊论文60余篇，计算机学会A类会议40篇，谷歌学术引用超过21000次。现任IEEE Trans. PAMI，Science China: Information Sciences等7个国内外期刊编委。目前主要研究聚焦于元学习、概率机器学习、可解释性神经网络等机器学习基础研究问题。

个人主页：

https://gr.xjtu.edu.cn/web/dymeng

特别鸣谢本次Webinar主要组织者：

主办AC：孟德宇 (西安交通大学)

协办AC：蒋路 (Google Research&Carnegie Mellon University)

活动参与方式

1、VALSE每周举行的Webinar活动依托B站直播平台进行，欢迎在B站搜索VALSE_Webinar关注我们！

直播地址：

https://live.bilibili.com/22300737；

历史视频观看地址：

https://space.bilibili.com/562085182/

2、VALSE Webinar活动通常每周三晚上20:00进行，但偶尔会因为讲者时区问题略有调整，为方便您参加活动，请关注VALSE微信公众号：valse_wechat 或加入VALSE QQ S群，群号：317920537）；

*注：申请加入VALSE QQ群时需验证姓名、单位和身份，缺一不可。入群后，请实名，姓名身份单位。身份：学校及科研单位人员T；企业研发I；博士D；硕士M。

3、VALSE微信公众号一般会在每周四发布下一周Webinar报告的通知。

4、您也可以通过访问VALSE主页：http://valser.org/ 直接查看Webinar活动信息。Webinar报告的PPT（经讲者允许后），会在VALSE官网每期报告通知的最下方更新。

收藏邀请

上一篇：VALSE 论文速览第108期：VideoMAE for Self-Supervised Video Pre-Training下一篇：VALSE Webinar 2023-08期总第308期知识嵌入的跨模态学习

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2024-4-26 05:32 , Processed in 0.016138 second(s), 14 queries .

返回顶部

VALSE Webinar 20230412-07期 总第307期 大生成模型潮流下的视频创作 ...

相关分类

下级分类

VALSE Webinar 20230412-07期总第307期大生成模型潮流下的视频创作 ...