报告摘要:In this talk, we will examine the problem of collective visual inference across a set of images by exploiting the relations among them. Such relations exist in different forms, including commonly shared text keywords, co-occurrence of different visual entities, spatial geometric constraints, and temporal constraints, to mention a few. We argue that by leveraging such relations, and conduct visual inference collectively across all images, we can achieve better recognition accuracy than separately recognizing each visual entity in each image separately under the conventional visual recognition regime.
Using semantic inference in online photo collections in social media as a case study, we propose a family of networked latent topic models, dubbed the name Visual Topic Network, for joint semantic inference across a collection of images. Our systematic study reveals that modeling such relations for collective inference can effectively facilitate to learn better visual representations, and hence improve recognition accuracy. I will also briefly discuss the potential application of such networked topic models for the task of inferring intended and perceived perception of images in social media.
If time permits, I will also briefly introduce some of the other research works I have conducted that are related to collective visual inference.
讲者信息:Gang Hua is a Senior Research Manager in the Visual Computing Group at Microsoft Research Asia. He was an Associate Professor of Computer Science in Stevens Institute of Technology between 2011 and 2015. He held an Academic Advisor position at IBM T. J. Watson Research Center between 2011 and 2014. He was a visiting researcher at Microsoft Research Asia in Summer 2013, and a Consulting Researcher at Microsoft Research in Summer 2012. Before joining Stevens, he had also worked as full-time Researchers at leading industrial research labs for IBM T. J. Watson Research Center, Nokia Research Center Hollywood, and Microsoft Live Labs Research. He received the Ph.D. degree in Electrical and Computer Engineering from Northwestern University in 2006.
His research in computer vision studies the interconnections and synergies among the visual data, the semantic and situated context, and the users in the expanded physical world, which can be categorized into three themes: human centered visual computing, big visual data analytics, and vision based cyber-physical systems. He is the author of more than 100 peer reviewed publications in prestigious international journals and conferences. His research was funded by NSF, NIH, ARO, ONR, Adobe Research, Google Research, Microsoft Research, and NEC Labs. He is the recipient of the 2015 IAPR Young Biometrics Investigator Award. To date, he holds 14 U.S. patents and has more than 10 U.S. patents pending. He is a Senior Member of the IEEE and a life member of the ACM.
报告摘要 :Computer graphics is everywhere, movies, video games, mobile APPs, etc. However, very few people create computer graphics, because graphics software are traditionally designed for and can only be used by professional users. Driven by the rapid development of senor technologies, enormous Internet data, as well as emerging applications like 3D printing and VR/AR, computer graphics research is evolving into a new era - we are dreaming to make everyone create visual contents in their daily lives, not only in the digital world but also in the real world. In this talk, I will briefly review several projects we recently conducted to make this dream a reality. The underlying trend of this line of research is the fusion of computer graphics, computer vision, and digital fabrication.
讲者信息 :Kun Zhou is a Cheung Kong Professor in the Computer Science Department of Zhejiang University, and the Director of the State Key Lab of CAD&CG. Prior to joining Zhejiang University in 2008, he was a Leader Researcher of the Internet Graphics Group at Microsoft Research Asia. He received his B.S. degree and Ph.D. degree in computer science from Zhejiang University in 1997 and 2002, respectively. His research interests are in visual computing, parallel computing, human computer interaction, virtual reality and computational fabrication. He currently serves on the editorial/advisory boards of ACM Transactions on Graphics and IEEE Spectrum. He is a Fellow of IEEE.
报告摘要 :In this talk, I will introduce our latest research in image scene understanding and interactive technologies. Our first line of research aims at rapid image scene understanding based on visual attention mechanism (IEEE TPAMI 2015, IEEE CVPR 2014 Oral). Instead of specific algorithm design, I would like to highlight how these algorithms can be robustly used in various applications, including image composition, photo montage, image retrieval, object detection, semantic segmentation, and even deep learning. Our second line of research aims at intelligent image manipulation mechanism. We try to explore smart image manipulation techniques for easily obtaining annotated data during users’ nature interaction with the real world (ACM TOG 2014, ACM TOG 2015), which is partially motivated by the growing requirement of high quality labeled training data (expensive to be collected) for scene understanding.
讲者信息:南开大学计控学院副教授,博导,中科协“托举计划”和天津市“青年千人计划”入选者。他的主要研究兴趣包括:计算机图形学、计算机视觉、图像处理等,已在IEEE TPAMI、ACM TOG、ACM SIGGRAPH、IEEE CVPR、IEEE ICCV等顶级国际期刊及会议发表20多篇论文。相关研究成果受到国内外同行的广泛认可。论文他引3000余次,一作论文单篇最高他引1000余次。相关成果曾被英国《每日邮报》、英国《BBC》、德国《明镜周刊》、美国《赫芬顿邮报》等著名国际媒体撰文报道。更多信息请参考: http://mmcheng.net/
报告摘要:Despite the extensive researches of image restoration in the past decades, image restoration still remains a challenging problem due to its ill-posed nature. This talks will fist briefly introduces the challenges of the image restoration problems, and then moves to the introduction of our recent advances in the topics of image restoration, especially focusing on the structured sparse prior of natural images using the tools of statistical modeling (i.e., parametric sparse distributions). We will specially show that the learned shallow sparse model can achieve competitive and even better image restoration performance than the deep learning based methods in the context of image super-resolution.
讲者信息:董伟生,西安电子科技大学电子工程学院副教授,博导。2004年本科毕业于华中科技大学电信系,2010年博士毕业于西安电子科技大学,曾在香港理工大学及微软亚洲研究院进行访问研究。主要研究方向为图像稀疏表示、计算机视觉。发表论文40余篇,其中以第一作者在IJCV、TIP、CVPR、ICCV上发表论文10余篇,2篇论文入选ESI 0.1%高被引论文,论文被引用1700余次,单篇最高引用440余次。曾获IEEE VCIP国际会议最佳论文奖,陕西省科学技术一等奖、陕西省青年科技新星称号。目前担任包括IEEE Transactions on Image Processing在内的3个国际期刊的编委。更多信息请参考个人主页 http://see.xidian.edu.cn/faculty/wsdong
报告摘要:Humans are remarkably adept at recognizing the motion of biological entities in complex visual scenes. Biological motion can be captured with a handful of point lights attached to the head and major joints of the body, and can be further decomposed into two components: global configuration and local motion. Whereas most previous studies have emphasized the contribution of global form to biological motion perception, our recent work indicates that local motion carries unique biological properties that can be processed independent of global configuration. Taken together, these findings suggest that biological motion perception is a multilevel course encompassing the processing of both global configuration and local motion.
讲者信息:蒋毅,中国科学院心理研究所研究员、博士生导师,入选中国科学院“百人计划”和国家“万人计划”首批青年拔尖人才,国家杰出青年科学基金获得者。主要研究方向包括利用心理物理学范式结合脑功能成像技术研究无意识视觉信息加工的神经机制以及面孔和生物运动的特异性视知觉表征。
报告摘要:Domain Adaptation or the general transfer learning aims at transferring the knowledge from a mature domain to a different but related new domain. In this talk, I will first give an overview on the progress of domain adaptation, and then introduce our several works on domain adaptation for face recognition. These works deal with the domain adaptation problem mainly from instance-level, and especially attempt to re-weight or shift the source domains samples to make them share the same distribution with the target domain. Finally, the evaluations on several scenarios such as domain adaptation across dataset, race, pose, and lighting will be presented followed by a summary.
讲者信息:阚美娜,博士,毕业于中国科学院计算所,现为计算所副研究员。2011年曾在新加坡南洋理工大学从事研究工作。2014年获得CCF优秀博士学位论文奖以及中科院优秀博士学位论文奖。研究领域为计算机视觉与模式识别,主要关注人脸识别、多视学习、半监督学习、迁移学习、深度学习等问题,相关成果已发表在TPAMI、IJCV、CVPR、ICCV等相关领域主流国际期刊与会议上面。目前担任TPAMI、IJCV、TIP、TMM、TSMC、TNN等多个刊物的审稿人。更多信息请参考个人主页 http://vipl.ict.ac.cn/members/mnkan
报告摘要:对于图像中物体的跟踪是很多计算机视觉系统中重要的一步,因此,基于视觉的跟踪一直是计算机视觉的核心课题之一。本报告将结合我们在相关领域的经验对视觉跟踪进行一个整体的介绍,并对一些子课题进行探讨,包括无模型视觉跟踪,多目标跟踪,以及增强现实中的跟踪算法研究。对每一个课题,我们会介绍其基本背景和我们团队在该领域的研究成果,以及结合应用背景对于该课题今后发展的一些看法。
讲者信息 :凌海滨博士毕业于北京大学和美国马里兰大学,加州大学洛杉矶分校博士后,曾任职微软亚洲研究院和西门子研究院。现任美国天普大学副教授和华南理工大学讲座教授,亮风台信息科技的共同创始人和科技首席科学家。主要研究领域包括计算机视觉、增强现实、医学图像和人机交互等。获2003年ACM UIST最佳学生论文奖,2014年美国自然科学基金CAREER Award。现任Pattern Recognition期刊编委,CVPR 2014和CVPR 2016年的领域主席。更多信息请参考http://www.dabi.temple.edu/~hbling/。
报告摘要:机器学习中的研究课题是根据数据的特点制定的,比如数据的多子空间结构对应子空间聚类、数据的高噪性对应数据恢复、数据大尺度性对应大尺度学习、数据的实时性对于在线学习、等等。但是,数据的多种特点不是孤立存在的,而是同时存在的。然而,大多数现有方法只考虑了数据的某一种特点,不能很好地处理实际数据。我们提出低秩表示 (Low-Rank Representation / LRR)模型,同时考虑数据的多子空间结构与高噪性特点,把子空间聚类与数据恢复统一到低秩学习框架下,从理论上分析LRR的统计性能,为求解LRR中的优化问题提出一种快速优化方法,建立LRR中的字典学习理论与方法,并把LRR扩展应用于人脸识别、图像分割、图像去糊等应用性问题。
讲者信息:刘光灿于2004年在上海交通大学数学系获理学学士学位,2010年在上海交通大学计算机科学与技术系获工学博士学位。2010至2014年间,先后在新加坡国立大学、美国伊利诺伊大学香槟校区、美国康奈尔大学从事博士后研究工作。2014年回国,加入南京信息工程大学信息与控制学院,任教授。主要研究领域是机器学习与计算机视觉,近年来在低秩学习理论、方法及应用方面做了较为广泛的研究,发表论文30余篇,其中第一作者T-PAMI长文4篇。于2015年获得上海市研究生优秀研究成果奖。论文Google Scholar引用总计2000余次。
报告摘要:Self-paced learning (SPL) is a recently proposed learning regime inspired by the learning process of humans and animals that gradually incorporates easy to more complex samples into training. While several easy SPL implementation strategies have been proposed, it is still short of a general paradigm for guiding the construction of rational SPL learning regimes targeting specific applications. To resolve this problem, we provide an axiom for insightfully formulating the underlying principles of self-paced learning. This axiomatic understanding not only involves the previous SPL learning schemes as its special cases, but also can be utilized to extend a series of new SPL implementation regimes based on certain application aims. In the recent two years, we have constructed several SPL realizations, including SPaR, SPLD, SPCL, SPMF, based on this axiom, and achieved the best performance in several known benchmark datasets, e.g., Web Query, Hollywood2, and Olympic Sports. Especially, this paradigm has been integrated into the system developed by CMU Informedia team, and achieved the leading performance in challenging semantic query (SQ)/000Ex tasks of the TRECVID MED/MER competition organized by NIST in 2014.
In this talk, I’ll introduce some of our recent developments on the insightful understanding under SPL regime. We will use these results to explain the intrinsic reason why SPL can work in applications with highly noisy scenarios.
讲者信息:孟德宇,博士,西安交通大学数学与统计学院副教授,博导。在TIP, TKDE, Neural Computation等国际期刊与CVPR, ICCV, ICML, NIPS等计算机顶级会议发表论文多篇。担任多个CCF A类会议程序委员会委员,2016年AAAI会议高级程序委员会委员。目前主要聚焦于机器学习、数据挖掘、计算机视觉、多媒体分析等方面的研究。
报告摘要:本报告介绍了视频行为分析的发展历史,从特征提取,特征融合,特征选取,以及分类器设计方面详细介绍了目前比较流行的视频行为分析的算法、数据集以及度量标准。特别地,本报告详细介绍了我们在行为识别方面的最新进展,包括:基于中层语义高判别性词典的行为识别方法,自适应的运动特征池化方法,以及深度神经网络,深度递归神经网络在行为识别方面的最新应用与结果。
讲者信息:倪冰冰,现为上海交通大学电子系特别研究员,博士生导师。中组部第十一批青年千人。2010-2015年于美国伊利诺伊大学香槟分校新加坡高等研究院(University of Illinois at Urbana-Champaign, Advanced Digital Science Center Singapore)担任研究科学家。2005年在上海交通大学电子工程系获学士学位;2011年在新加坡国立大学 (National University of Singapore) 电气与计算机工程系获博士学位。博士期间,先后在微软亚洲研究院和谷歌公司美国总部实习。主要研究方向为计算机视觉、多媒体计算、机器学习。在国际重要学术期刊和会议发表学术论文45篇,包括中国计算机学会(CCF)推荐的A类期刊/会议论文13篇 (例如:International Journal of Computer Vision、IEEE Trans. Image Processing、IEEE Trans. Knowledge and Data Engineering、ICCV、CVPR、ACM Multimedia (Oral),其中作为第一或通讯作者发表12篇)。倪博士被IBM公司T. J. Watson研发总部评选为2010年全球多媒体与信号处理领域十大新锐之一 (Emerging Leaders in Multimedia and Signal Processing),先后获得PREMIA2008和PCM2011最佳论文奖,以及ICPR2012 HARL竞赛第一名,ECCV ChaLearn 2014竞赛第二名,THUMOS2015 Action Detection Track第一名。
报告摘要 :最近大脑的局部神经网络的解析工作进展很快。我想指出两个现在神经计算领域还没有广泛关注的相关问题。第一是反馈输入如何调制局部网络。我们提出了注意力神经网络框架。注意力神经网络是一种新的框架,将自上而下的认知偏差和自下而上的特征提取整合在一个统一的架构。当处理具有高背景噪声或困难的分割问题时,自顶向下的影响特别有效。我们的系统是模块化和可扩展的。这个框架很容易训练并且运行成本低廉,但可容纳复杂的行为。我们在MNIST变化数据集得到的分类准确度优于或同于其他最好结果,并可以成功分解重合的数字。另外一个是局部回路的解剖和生理结构在多个方面呈现对数正态分布,我将讨论其计算意义。
讲者信息 :清华大学类脑计算研究中心及生物医学工程系研究员博导、博士。于2002年获得美国布拉戴斯大学博士学位并在冷泉港实验室和麻省理工学院完成博士后研究。主要研究方向为类脑计算、计算神经科学、神经回路等。已发表论文20多篇(包括Nature Neuroscience、Neuron、PLOS Biology、PNAS等权威期刊)。其中在Nature Neuroscience上发表的论文开辟了脉冲时间相关可属性(STDP)相关理论工作这一领域,已被引用1500余次,是类脑计算领域的重要可塑性算法。
报告摘要:现实物理世界和虚拟世界之间的边缘正在逐渐消退。过去我们只能在电脑显示器或是手机屏幕上窥探虚拟世界。头戴显示器和增强现实技术开始让虚拟世界变得无处不在,并和现实物理世界交织融合。在这些技术下,虚拟物体无缝的呈现在现实物理世界中。人们可以像操作真实物体一般和虚拟物体进行交互。另一方面,机器人总是行走在虚拟世界与现实世界的边缘。机器人需要不断的对现实世界扫描、建模,以便获得它的虚拟模型,并在这个虚拟模型里进行路径和控制的优化。这个报告描述关于融合物理世界与虚拟世界的关键三维视觉技术。内容涵盖三维建模、相机运动估计、人体骨骼运动估计等方面的最新成果。我们会先介绍在全局structure-from-motion (SfM)方面的新进展,并逐一介绍SfM技术在三维重建、机器人视觉导航、人体运动捕捉、以及视频图像处理上的应用。
讲者信息:谭平是加拿大西蒙费雷泽大学计算机系副教授。在此之前,他在新加坡国立大学任副教授。他于2007年在香港科技大学获博士学位,并分别于2000年和2003年在上海交通大学获学士及硕士学位。谭平的研究领域是计算机视觉和计算机图形学。他于2012年获得PAMI Young Researcher Award提名奖,以及TR35@Singapore奖。他是IJCV、CGF、MVA等杂志的编委。
报告摘要:视觉技术在交通数据分析理解方面发挥了重要作用,是智能交通、辅助/自动驾驶等愿景走向实用的关键所在。此报告首先介绍视觉技术在交通场景下的应用概况,然后针对交通标志检测识别等问题,重点介绍讲者近年来的一些研究工作。
讲者信息:西北工业大学光学影像分析与学习中心(人才特区)副教授。曾于中国科学技术大学自动化系相继完成本、硕、博学业,并在中国科学院西安光学精密机械研究所从事博士后研究。主要研究方向为模式识别、计算机视觉及其在交通数据解析中的应用。近三年来,在国际知名刊物发表40余篇学术论文,并获IEEE国际会议最佳论文奖和最佳论文提名奖等。同时,相继获得了国家自然科学基金青年项目、面上项目等支持,并作为技术骨干参与了多项国家重点基础研究发展计划(973计划)项目、国家自然科学基金重大研究计划项目、重点项目、陕西省重点科技创新团队项目等。此外,还应邀担任国际期刊Neurocomputing (Elsevier)与Big Data Analytics (Bio-Med Central / Springer)副主编,国际知名会议程序委员会委员90余次,并为30多个国际知名期刊长期审稿。 http://crabwq.github.io/
报告摘要:Light fields are image-based representations that use densely sampled rays as a scene description. While the original goal of acquiring a light field is to conduct view synthesis and post-capture refocusing, recent studies have shown that light fields can be extremely useful in other computer vision applications, including stereo matching and 3D reconstruction, stereoscopy synthesis, saliency detection, surveillance, and recognition. In this talk, I present a light field approach for virtual reality (VR) and augmented reality (AR).
I first introduce a brief history of VR/AR and my personal experience on the evolution of their core technologies: acquisition, processing, and display. I then discuss the limitation of classical approaches and show why light fields provide a viable path. On the acquisition front, I present several light field capture systems developed by our group and others that can produce realistic 3D VR environment beyond regular 360 panoramas. On the processing front, I present several latest 3D reconstruction algorithms that exploit ray geometric structure and sampling patterns of light fields. On the display front, I show how light field head-mounted displays (LF-HMD) can provide unprecedented refocusing capability analogous to human eyes to significantly enhance visual realism. I will summarize my talk by discussing challenges and opportunities of light field based VR/AR approaches.
讲者信息:虞晶怡教授现为上海科技大学特聘正教授和上科大可视计算和虚拟现实中心主任。他于2000年获美国加州理工大学应用数学及计算机学士学位, 2005年获美国麻省理工大学计算机与电子工程博士学位。在加入上科大前,他为美国特拉华大学正教授。虞教授长期从事包括计算机视觉,计算机成像,视频监控,非常规成像系统等领域的研究。他至今已经发表100多篇论文, 其中50多篇发表在顶级会议 CVPR/ICCV/ECCV/TPAMI上。虞教授获得10项美国发明专利,并于2008和2009年分别获得美国国家科学基金的杰出青年奖和美国空军研究院的杰出青年奖。虞教授曾担任多个国际重要会议的大会主席,程序主席,领域主席。他现担任IEEE TPAMI, Elsevier CVIU, Springer TVCJ 和 Springer MVA的副主编。
Jingyi Yu is a Full Professor in the School of Information Science and Technology at ShanghaiTech University. He received B.S. from Caltech in 2000 and Ph.D. from MIT in 2005. Before joining ShanghaiTech, he was a full professor at the University of Delaware. His research interests span a range of topics in computer vision and computer graphics, especially on computational photography and non-conventional optics and camera designs. He has published over 100 papers at highly refereed conferences and journals including over 50 papers at the premiere conferences and journals CVPR/ICCV/ECCV/TPAMI. He has been granted 10 US patents on computational imaging. His research has been generously supported by the National Science Foundation (NSF), the National Institute of Health (NIH), the Army Research Office (ARO), and the Air Force Office of Scientific Research (AFOSR). He is a recipient of the NSF CAREER Award, the AFOSR YIP Award, and the Outstanding Junior Faculty Award at the University of Delaware. He has previously served as general chair, program chair, and area chair of many international conferences such as ICCV, ICCP and NIPS. He is currently an Associate Editor of IEEE TPAMI, Elsevier CVIU, Springer TVCJ and Springer MVA.
报告摘要:Motivated by the previous success in mining structured data (e.g., transaction data) and semi-structured data (e.g., text), it has aroused our curiosity in finding meaningful patterns in non-structured visual data like images and videos. Although the discovery of visual patterns appears to be quite exciting, data mining techniques that are successful in business and text data may not be simply applied to image and video data that are usually described by high-dimensional features and exhibit spatial or spatio-temporal structures. Unlike transaction and text data that are composed of discrete elements without much ambiguity (i.e. predefined items and vocabularies), visual patterns generally exhibit large variations in their visual appearances and structures, thus challenge existing data mining and pattern discovery techniques. This talk will discuss our recent work of discovering and searching visual patterns in image and video data, as well as their applications in image search, video surveillance, and robotics.
讲者信息:新加坡南洋理工大学电气与电子工程学院副教授,视频分析项目主任。主要研究领域包括计算机视觉, 视频分析, 视觉大数据检索与挖掘, 人机交互等。分别于2002年和2005年在华中科技大学及新加坡国立大学获学士及硕士学位。2009年博士毕业于美国西北大学,获电机与计算机系优秀博士论文奖,IEEE Conf. on CVPR’09 Doctoral spotlight award, 南洋理工大学Nanyang Assistant Professorship。现任IEEE Trans. on Image Processing, IEEE Trans. on Circuits and Systems for Video Technology, The Visual Computer 等期刊副主编,担任多个国际会议/研讨会的共同主席及领域主席。更多信息请参考 http://www.ntu.edu.sg/home/jsyuan/
报告摘要:In this talk, I will first give a brief introduction of online learning, including full-information online learning and bandit online learning. Then, I will study a special bandit setting of online stochastic linear optimization, where only one-bit of information is revealed to the learner at each round. This problem has found many applications including online advertisement and online recommendation. We assume the binary feedback is a random variable generated from the logit model, and aim to minimize the regret defined by the unknown linear function. To address this challenge, we develop an efficient online learning algorithm by exploiting particular structures of the observation model. Specifically, we adopt online Newton step to estimate the unknown parameter and derive a tight confidence region based on the exponential concavity of the logistic loss. Our analysis shows that the proposed algorithm achieves a regret bound of O(d√T), which matches the optimal result of stochastic linear bandits.
讲者信息: 张利军,博士,副教授。分别于2007年6月和2012年6月在浙江大学获工学学士和工学博士学位;分别于2011年6月至12月、2012年8月至2014年4月,以访问学生、博士后身份在美国密歇根州立大学访问研究;于2014年4月加入南京大学计算机科学与技术系。主要研究方向为大规模机器学习及优化,在国际学术会议和期刊上发表论文40余篇,包括顶级会议和期刊ICML、NIPS、COLT、AAAI、ACM MM、AISTATS、TPAMI、TIT、TIP、TKDE。曾获浙江大学“竺可桢奖学金”、南京大学“登峰人才支持计划”、第26届AAAI人工智能国际会议“最佳论文”等荣誉。
报告摘要 :视网膜是视觉神经系统的输入端,它不仅将光信号转化成为神经系统可以理解及处理的电信号,而且还对视觉信息进行相当程度的分类及预处理,从而加速大脑对视觉输入的反应。小鼠视网膜中有超过20种神经环路对外界图像中不同的视觉特征进行提取,这些经过不同环路处理的视觉信息由不同亚型的视网膜神经节细胞(RGC)传递到大脑。这些视网膜神经环路输出的视觉信息正在逐渐被解码,对它们的环路机制的了解也在逐步的完善中。我将对近年来这方面的研究进展做一个简要介绍,并汇报我们实验室对两种视网膜神经节细胞功能的研究结果。
讲者信息 :张翼凤,1972年生,博士,现任中国科学院上海生命科学研究院神经科学研究所研究员。1994年毕业于北京大学生命科学院生化和分子生物学系,1997年于北京大学获生化和分子生物学硕士学位,2004年于美国加州大学圣迭戈分校获生命医学科学博士学位,2004-2011年在美国哈佛大学脑科学中心从事博士后研究,2011年起任中科院神经科学研究所研究员、博士生导师。主要使用电生理和分子生物学等方法研究视网膜神经环路的功能。