VALSE ICCV2017 专场重磅来袭:两年一度的视觉盛宴ICCV2017即将上演,为了更好的促进学术交流,VALSE Webinar将连续举行3场ICCV Pre-Conference专场,奉上最新鲜的ICCV2017论文,提前引燃本年度的ICCV热潮。 第二场10月11日,将有4篇报告: 报告嘉宾1:谢凌曦(The Johns Hopkins University) 报告时间:2017年10月11日(星期三)晚20:00(北京时间) 报告摘要: It has been well demonstrated that adversarial examples, i.e., natural images with visually imperceptible perturbations added, generally exist for deep networks to fail on image classification. In this paper, we extend adversarial examples to semantic segmentation and object detection which are much more difficult. Our observation is that both segmentation and detection are based on classifying multiple targets on an image (e.g., the basic target is a pixel or a receptive field in segmentation, and an object proposal in detection), which inspires us to optimize a loss function over a set of pixels/proposals for generating adversarial perturbations. Based on this idea, we propose a novel algorithm named Dense Adversary Generation (DAG), which generates a large family of adversarial examples, and applies to a wide range of state-of-the-art deep networks for segmentation and detection. We also find that the adversarial perturbations can be transferred across networks with different training data, based on different architectures, and even for different recognition tasks. In particular, the transferability across networks with the same architecture is more significant than in other cases. Besides, summing up heterogeneous perturbations often leads to better transfer performance, which provides an effective method of black-box adversarial attack. 参考文献: [1] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie and Alan Yuille, "Adversarial Examples for Semantic Segmentation and Object Detection", in IEEE International Conference on Computer Vision (ICCV, Acceptance Rate ~ 29%), Venice, Italy, 2017. 报告人简介: Lingxi Xie obtained his B.E and Ph.D. degree from Tsinghua University in 2010 and 2015, respectively. He is currently a post-doctoral researcher in the Johns Hopkins University. He moved there from the University of California, Los Angeles. From 2013 to 2015, he was a research intern at Microsoft Research Asia. He was a visiting researcher at the University of Texas as San Antonio in 2014. Lingxi has been working on computer vision and multimedia information retrieval, especially in the area of image classification, image retrieval and object detection. He is also interested in the theory and application of deep learning. Lingxi obtained the best paper award on ICMR 2015. 报告嘉宾2:方浩树(上海交通大学) 报告时间:2017年10月11日(星期三)晚20:25(北京时间) 报告摘要: IMulti-person pose estimation in the wild is challenging. Although state-of-the-art human detectors have demonstrated good performance, small errors in localization and recognition are inevitable. These errors can cause failures for a single-person pose estimator (SPPE), especially for methods that solely depend on human detection results. In this paper, we propose a novel regional multi-person pose estimation (RMPE) framework to facilitate pose estimation in the presence of inaccurate human bounding boxes. Our framework consists of three components: Symmetric Spatial Transformer Network (SSTN), Parametric Pose Non-Maximum-Suppression (NMS), and Pose-Guided Proposals Generator (PGPG). Our method is able to handle inaccurate bounding boxes and redundant detections, allowing it to achieve the state-of-the-art performance on the MPII (multi person) and COCO keypoint dataset. Our model and source codes are publicly available. 参考文献: [1] Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, Cewu Lu, " RMPE: Regional Multi-person Pose Estimation ", in IEEE International Conference on Computer Vision, Venice, Italy, 2017. 报告人简介: Hao-Shu Fang is currently an undergraduate in computer science from Shanghai Jiao Tong University, Shanghai, China, in 2018. He is working in the vision research group leaded by Prof. Cewu Lu. His research interests include computer vision and robotics. 报告嘉宾3:韩欣彤 (上海交通大学) 报告时间:2017年10月11日(星期三)晚20:50(北京时间) 报告摘要: This paper proposes an automatic spatially-aware concept discovery approach using weakly labeled image-text data from shopping websites. We first fine-tune GoogleNet by jointly modeling clothing images and their corresponding descriptions in a visual-semantic embedding space. Then, for each attribute (word), we generate its spatially-aware representation by combining its semantic word vector representation with its spatial representation derived from the convolutional maps of the fine-tuned network. The resulting spatially-aware representations are further used to cluster attributes into multiple groups to form spatially-aware concepts (e.g., the neckline concept might consist of attributes like v-neck, round-neck, etc). Finally, we decompose the visual-semantic embedding space into multiple concept-specific subspaces, which facilitates structured browsing and attribute-feedback product retrieval by exploiting multimodal linguistic regularities. We conducted extensive experiments on our newly collected Fashion200K dataset, and results on clustering quality evaluation and attribute-feedback product retrieval task demonstrate the effectiveness of our automatically discovered spatially-aware concepts. 参考文献: [1] Xintong Han, Zuxuan Wu, Phoenix X. Huang, Xiao Zhang, Menglong Zhu, Yuan Li, Yang Zhao, Larry S. Davis, " Automatic Spatially-aware Fashion Concept Discovery ", in IEEE International Conference on Computer Vision, Venice, Italy, 2017. 报告人简介: Xintong Han received the B.S. degree in electrical engineering from Shanghai Jiao Tong University, Shanghai, China, in 2013. He is currently pursuing the Ph.D. at the University of Maryland, College Park. His research interests include computer vision, machine learning, and multimedia. 报告嘉宾4:范登平(南开大学) 报告时间:2017年10月11日(星期三)晚21:15(北京时间) 报告摘要: Foreground map evaluation is crucial for gauging the progress of object segmentation algorithms, in particular in the field of salient object detection where the purpose is to accurately detect and segment the most salient object in a scene. Several widely-used measures such as Area Under the Curve (AUC), Average Precision (AP) and the recently proposed Fbw have been used to evaluate the similarity between a non-binary saliency map (SM) and a ground-truth (GT) map. These measures are based on pixel-wise errors and often ignore the structural similarities. Behavioral vision studies, however, have shown that the human visual system is highly sensitive to structures in scenes. Here, we propose a novel, efficient, and easy to calculate measure known as structural similarity measure (Structure-measure) to evaluate non-binary foreground maps. Our new measure simultaneously evaluates region-aware and object-aware structural similarity between a SM and a GT map. We demonstrate superiority of our measure over existing ones using 5 meta-measures on 5 benchmark datasets. 参考文献: 报告人简介: 特别鸣谢本次Webinar主要组织者: VOOC主席:程明明(南开大学) VOOC委员:林巍峣(上海交通大学) VOOC责任委员:卢策吾(上海交通大学) VODB协调理事:鲁继文(清华大学 ) 活动参与方式: 1、VALSE Webinar活动全部网上依托VALSE QQ群的“群视频”功能在线进行,活动时讲者会上传PPT或共享屏幕,听众可以看到Slides,听到讲者的语音,并通过文字或语音与讲者交互; 2、为参加活动,需加入VALSE QQ群,目前A、B、C、D、E群已满,除讲者等嘉宾外,只能申请加入VALSE F群,群号:594312623 。申请加入时需验证姓名、单位和身份,缺一不可。入群后,请实名,姓名身份单位。身份:学校及科研单位人员T;企业研发I;博士D;硕士M 3、为参加活动,请下载安装Windows QQ最新版,群视频不支持非Windows的系统,如Mac,Linux等,手机QQ可以听语音,但不能看视频slides; 4、在活动开始前10分钟左右,主持人会开启群视频,并发送邀请各群群友加入的链接,参加者直接点击进入即可; 5、活动过程中,请勿送花、棒棒糖等道具,也不要说无关话语,以免影响活动正常进行; 6、活动过程中,如出现听不到或看不到视频等问题,建议退出再重新进入,一般都能解决问题; 7、建议务必在速度较快的网络上参加活动,优先采用有线网络连接。 |
小黑屋|手机版|Archiver|Vision And Learning SEminar
GMT+8, 2024-12-22 14:29 , Processed in 0.013793 second(s), 15 queries .
Powered by Discuz! X3.4
Copyright © 2001-2020, Tencent Cloud.