VALSE Student Webinar 20200920-02 From Internet AI to Embodied AI

2020-9-21 10:27| 发布者: 程一-计算所| 查看: 2541| 评论: 0

摘要: 报告时间2020年09月20日(星期日)上午10:00(北京时间)主题From Internet AI to Embodied AI主持人杨旭(中科院自动化所)王文冠(ETH Zurich)报告嘉宾：Fei Xia(Stanford University)报告题目：Gibson Environment: Pho ...

报告时间	2020年09月20日(星期日) 上午10:00(北京时间)
主题	From Internet AI to Embodied AI
主持人	杨旭(中科院自动化所) 王文冠(ETH Zurich)

报告嘉宾：Fei Xia(Stanford University)

报告题目：Gibson Environment: Photorealistic Simulation for Embodied Visual Tasks

报告嘉宾：朱峰达(Monash University)

报告题目：Vision-Language Navigation with Self-supervised Auxiliary Reasoning Tasks

报告嘉宾：汪汗青(北京理工大学)

报告题目：Active Perception in Vision-Language Navigation Task

Panel嘉宾：

Caiming Xiong(Salesforce)、吴甘沙(驭势科技)、Fei Xia(Stanford University)、朱峰达(Monash University)、汪汗青(北京理工大学)

Panel议题：

1.Embodied AI的范畴（内涵与外延）如何界定，仿真环境对于机器人学习的意义跟核心价值在于何处？

2.相对于机器人领域仿真环境（V-rep、Gazebo等），及无人驾驶公司自己开发的仿真环境，新的仿真环境（iGibson、Habitat等）的优势在于什么地方，对于机器人学习任务来说是必由之路吗？

3.在Embodied AI方向，学术研究上当前有哪些问题亟待解决？Embodied AI在工业界的应用前景如何，又有哪些问题亟待解决？

4.如何更好的完成sim2real？更准确模拟or更好的域自适应方法？

5.对于操作类任务如何实现realistic simulation，如何保证物理真实性?

6.Embodied AI，是一个相对前沿但难度比较大的方向，涉及到虚拟仿真、多模态、强化学习等诸多方面，几位同学是如何在这个方向上手的？有没有遇到一些困难？如何解决？对有意进入这个领域或新上手的同学有何建议？

*欢迎大家在下方留言提出主题相关问题，主持人和Panel嘉宾会从中选择若干热度高的问题加入Panel议题！

报告嘉宾：Fei Xia(Stanford University)

报告时间：2020年09月20日(星期日)10:00-10:20(北京时间)

报告题目：Gibson Environment: Photorealistic Simulation for Embodied Visual Tasks

报告人简介：

Fei Xia is a fifth-year PhD student at Stanford University. He got his BE from Department of Automation in Tsinghua University and MS from Stanford University. He is advised by Silvio Savarese and Leo Guibas. His research interests lie in Computer Vision and Machine Learning. In particular, he is interested in simulation to real-world transfer and domain adaptation for vision and robotics tasks.

个人主页：

http://fxia.me/

报告摘要：

Gibson Environment is a simulation environment that supports the rendering of high-fidelity images from 3D reconstructed buildings. This environment is fundamental to train visuo-motor navigation skills for robotic agents. These models learn to navigate within the simulated environment based on visual information. We further developed our simulator to enable not only navigation, but also interactions with the simulated environment. This opens new avenues in robotics, enabling agents to be trained for new tasks in simulation while maintaining a simple sim2real transfer, thanks to the visual realism. We'll present our techniques to improve photo realism (computer vision and computer graphics) and examples of navigation and interactive agents trained in Gibson (robotics, machine learning, and reinforcement learning based on visual information)

参考文献：

[1] Gibson env: Real-world perception for embodied agents, Fei Xia, Amir Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, Silvio Savarese, CVPR, 2018.

[2] Interactive Gibson Benchmark: A Benchmark for Interactive Navigation in Cluttered Environments, Fei Xia, William B. Shen, Chengshu Li, Priya Kasimbeg, Micael Tchapmi, Alexander Toshev, Roberto Martín-Martín, Silvio Savarese, IEEE Robotics and Automation Letters, 2020

报告嘉宾：朱峰达(Monash University)

报告时间：2020年09月20日(星期日)10:20-10:40(北京时间)

报告题目：Vision-Language Navigation with Self-supervised Auxiliary Reasoning Tasks

报告人简介：

He received bachelor’s degree from Beihang University in 2017. He is currently pursuing Ph.D. degree with the Faculty of Information Technology, Monash University, under the supervision of Prof. Xiaojun Chang. His research interests include reinforcement learning and vision-language navigation.

朱峰达，2017年本科毕业于北京航空航天大学。现在莫纳什大学攻读博士学位。导师为常晓军教授。发表3篇CVPR论文，含2篇一作论文及1篇大会Oral。主要研究方向为深度学习、强化学习和视觉语言导航。

个人主页：

https://www.zhufengda.net/

报告摘要：

Vision-Language Navigation (VLN) is a task where agents learn to navigate following natural language instructions. The key to this task is to perceive both the visual scene and natural language sequentially. Conventional approaches exploit the vision and language features in cross-modal grounding. However, the VLN task remains challenging, since previous works have neglected the rich semantic information contained in the environment (such as implicit navigation graphs or sub-trajectory semantics). In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to take advantage of the additional training signals derived from the semantic information. The auxiliary tasks have four reasoning objectives: explaining the previous actions, estimating the navigation progress, predicting the next orientation, and evaluating the trajectory consistency. As a result, these additional training signals help the agent to acquire knowledge of semantic representations in order to reason about its activity and build a thorough perception of the environment.

参考文献：

[1] Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks, Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang; CVPR, 2020 (Oral)

报告嘉宾：汪汗青(北京理工大学)

报告时间：2020年09月20日(星期日)10:40-11:00(北京时间)

报告题目：Active Perception in Vision-Language Navigation Task

报告人简介：

汪汗青，2018年进入北京理工大学媒体计算与智能系统实验室 (MCIS Lab) 攻读博士学位，导师为梁玮老师。2017年在微软亚洲研究院视觉计算组 (MSRA VC Group) 实习，获“明日之星”优秀实习生称号。实习期间参与设计的Melody Master获2017 Microsoft Hackathon“最富创意奖”。2019年在起源人工智能研究院（IIAI）视觉组进行实习，期间参与了CVPR 2020 Embodied-AI Habitat挑战赛并获得季军。先后以第一作者在AAAI，ICCV，ECCV和SIGGRAPH Asia发表论文。主要研究方向为基于视觉和语言的多模态导航和强化学习。

个人主页：

https://hanqingwangai.github.io

报告摘要：

Conducting robust navigation inside environments following navigational instructions is very challenging for current navigation approaches. One of the crucial problems in this task is the uncertainty caused by ambiguous instructions and insufficient observation of the environment. Agents of the current approaches typically suffer from the uncertainty and fail to make efficient navigation decisions. However, when facing such uncertainty, human can maintain robust navigation by actively exploring the surroundings and gather necessary information for making more confident navigation decision. Inspired by the human behavior during navigation, we introduce the ability of active perception into the agent to make more intelligent navigation decision. In this talk, I will present our end-to-end framework to learn the active exploration policy and demonstrate how active perception helps the navigation task.

参考文献：

[1] Active Visual Information Gathering for Vision-Language Navigation, Hanqing Wang, Wenguan Wang, Tianmin Shu, Wei Liang, Jianbing Shen, ECCV, 2020

点评嘉宾：Caiming Xiong(Salesforce)

嘉宾简介：

Currently he is the senior director of research, AI at Salesforce. Before that he was a senior researcher at MetaMind (acquired by Salesforce in 2016). And he worked as a Postdoctoral Researcher Scholar at the University of California, Los Angeles (UCLA) from Jun 2014 to Sep 2015. His research interests include Deep Learning, Natural Language Processing, Video/Image Understanding, Reinforcement Learning, Speech and Dialogue Learning.

个人主页：

http://cmxiong.com/index.html

点评嘉宾：吴甘沙(驭势科技)

嘉宾简介：

吴甘沙，驭势科技联合创始人、CEO，致力于研发最先进的AI技术，以改变这个世界的出行、物流以致人类生活方式。作为中国智能驾驶的商业化领跑者，驭势科技与20多家头部客户、50多个场景中开展了智能驾驶的落地和商业探索。创业前，吴甘沙作为英特尔中国研究院院长和英特尔首席工程师，领导了英特尔的大数据技术战略长期规划，并为中国研究院确立5G通讯、智能计算和机器人三大方向。2000年以来，他发表了10余篇学术论文，拥有28项美国专利/国际专利，还有14项待审中。个人定位：略通商业智慧的资深工程师，尝试破坏式组织变革的技术管理者，用技术推动社会创新的赶潮人。

个人主页：

https://www.uisee.com/index.aspx

主持人/组织人：杨旭(中科院自动化所)

主持人/组织人简介：

杨旭，中国科学院自动化研究所复杂系统管理与控制国家重点实验室副研究员，于2014年获得中国科学院自动化研究所博士学位。研究方向包括计算机/机器人视觉、视觉感-认知发育、模式分析、图算法等。发表国际期刊与会议论文50余篇，包括IEEE TPAMI、IJCV、IEEE T-CYB、IEEE TNNLS, IEEE TSMC-Systems, PR等。申请专利20余项。承担国家自然科学基金委、科技部等部门项目（课题）10余项。担任国际期刊International Journal of Advanced Robotic Systems (IJARS) 视觉系统方向编委，国际期刊Neurocomputing客座编委。担任中国科协入库专家（第五批）。带队参加国家自然科学基金委主办的水下机器人目标抓取大赛，于2017年首届获得自主抓取组第1，2018年第二届获得目标识别组第1。获北京市科学技术奖二等奖。

个人主页：

http://people.ucas.edu.cn/~XuYang

主持人/组织人：王文冠(ETH Zurich)

主持人/组织人简介：

王文冠，苏黎世联邦理工学院博后研究员。2018年博士毕业于北京理工大学，2016至2018年在加州大学洛杉矶分校（UCLA）访学。2018至2019年，在起源人工智能研究院（IIAI）任Senior Scientist。在计算机视觉领域顶级期刊T-PAMI和会议CVPR/ICCV/ECCV上发表论文近40篇。曾获百度奖学金、ACM 中国优博奖、中国人工智能学会优博奖、世界人工智能大会青年优秀论文奖等。带队在以下国际比赛的若干赛道获得先进名次：CVPR20 LID 和ICCV19 PIC 冠军，CVPR20 DAVIS 和CVPR20 LIP亚军，及CVPR20 Embodied-AI Habitat、CVPR19 WAD 和CVPR19 LIP 季军。主要研究方向为图像/视频分析、以人为中心的视觉理解、点云分割/检测、具身化AI等。

个人主页：

https://sites.google.com/view/wenguanwang

组织人：周晓巍(浙江大学)

组织人简介：

周晓巍，浙江大学CAD&CG国家重点实验室“百人计划”研究员、博士生导师。2008年本科毕业于浙江大学，2013年博士毕业于香港科技大学。2014年至2017年在美国宾夕法尼亚大学GRASP实验室从事博士后研究，于2017年入选国家级青年项目并加入浙江大学。研究方向主要为计算机视觉及其在增强现实、机器人等领域的应用，在3D目标检测、姿态估计、运动捕捉、特征匹配等方面取得了一系列成果。近五年来在计算机视觉领域顶级期刊T-PAMI和会议CVPR/ICCV/ECCV上发表论文近30篇，包括口头报告9篇。曾获得CVPR19 Best Paper Finalists，CVPR18 3DHUMANS Workshop Best Poster Award，“陆增镛CAD&CG高科技奖”一等奖。在CVPR/ICCV/ECCV上组织了多个研讨会和课程，并担任CVPR21和ACCV21领域主席以及人工智能领域顶级会议AAAI20高级程序委员。

个人主页：

xzhou.me

VALSE Student Webinar 20-02期VALSE在线学术报告参与方式：

扫描下方二维码，关注“VALSE”微信公众号 (valse_wechat)，后台回复“02期”，获取直播地址。

特别鸣谢本次Webinar主要组织者：

主办AC/VSC：杨旭(中科院自动化所)、王文冠(ETH Zurich)

协办AC：周晓巍(浙江大学)

责任AC：刘偲(北京航空航天大学)

活动参与方式

1、VALSE Webinar活动依托在线直播平台进行，活动时讲者会上传PPT或共享屏幕，听众可以看到Slides，听到讲者的语音，并通过聊天功能与讲者交互；

2、为参加活动，请关注VALSE微信公众号：valse_wechat 或加入VALSE QQ群（目前A、B、C、D、E、F、G、H、I、J、K、L、M、N群已满，除讲者等嘉宾外，只能申请加入VALSE O群，群号：1149026774）；

*注：申请加入VALSE QQ群时需验证姓名、单位和身份，缺一不可。入群后，请实名，姓名身份单位。身份：学校及科研单位人员T；企业研发I；博士D；硕士M。

3、在活动开始前5分钟左右，讲者会开启直播，听众点击直播链接即可参加活动，支持安装Windows系统的电脑、MAC电脑、手机等设备；

4、活动过程中，请不要说无关话语，以免影响活动正常进行；

5、活动过程中，如出现听不到或看不到视频等问题，建议退出再重新进入，一般都能解决问题；

6、建议务必在速度较快的网络上参加活动，优先采用有线网络连接；

7、VALSE微信公众号会在每周四发布下一周Webinar报告的通知及直播链接。

8、Webinar报告的PPT（经讲者允许后），会在VALSE官网每期报告通知的最下方更新[slides]。

9、Webinar报告的视频（经讲者允许后），会更新在VALSEB站、西瓜视频，请在搜索Valse Webinar进行观看。

朱峰达 [slides]

汪汗青 [slides]

收藏邀请

上一篇：20200916-23 源头活水：自监督与无监督学习探讨下一篇：20200923-24 视觉生成与合成——无中生有，移花接木

下级分类

小黑屋|手机版|Archiver|Vision And Learning SEminar

GMT+8, 2025-11-22 23:41 , Processed in 0.018197 second(s), 14 queries .

返回顶部

VALSE Student Webinar 20200920-02 From Internet AI to Embodied AI

相关分类

下级分类