20171018-25：VALSE ICCV2017 专场三

2017-10-14 19:01| 发布者: 程一-计算所| 查看: 6564| 评论: 0

摘要: 报告嘉宾1：蔡思佳（Hong Kong Polytechnic University）报告时间：2017年10月18日（星期三）晚20:00（北京时间）报告题目：Higher-order Integration of Hierarchical Convolutional Activations for Fine-grained ...

VALSE ICCV2017 专场重磅来袭：两年一度的视觉盛宴ICCV2017即将上演，为了更好的促进学术交流，VALSE Webinar将连续举行3场ICCV Pre-Conference专场，奉上最新鲜的ICCV2017论文，提前引燃本年度的ICCV热潮。

第三场10月18日，将有5篇报告：

报告嘉宾1：蔡思佳（Hong Kong Polytechnic University）

报告时间：2017年10月18日（星期三）晚20:00（北京时间）

报告题目：Higher-order Integration of Hierarchical Convolutional Activations for Fine-grained Visual Categorization

主持人： 刘日升（大连理工大学）

报告摘要：

The success of ﬁne-grained visual categorization (FGVC) extremely relies on the modelling of appearance and interactions of various semantic parts. This makes FGVC very challenging because: (i) part annotation and detection require expert guidance and are very expensive; (ii) parts are of different sizes; and (iii) the part interactions are complex and of higher-order. To address these issues, we propose an end-to-end framework based on higher-order integration of hierarchical convolutional activations for FGVC. By treating the convolutional activations as local descriptors, hierarchical convolutional activations can serve as a representation of local parts from different scales. A polynomial kernel based predictor is proposed to capture higher-order statistics of convolutional activations for modelling part interaction. To model inter-layer part interactions, we extend polynomial predictor to integrate hierarchical activations via kernel fusion. Our work also provides a new perspective for combining convolutional activations from multiple layers. While hyper-columns simply concatenate maps from different layers, and holistically-nested network uses weighted fusion to combine side-outputs, our approach exploits higher-order intra-layer and inter-layer relations for better integration of hierarchical convolutional features. The proposed framework yields more discriminative representation and achieves competitive results on the widely used FGVC datasets.

参考文献：

[1] Sijia Cai, Wangmeng Zuo and Lei Zhang, " Higher-order Integration of Hierarchical Convolutional Activations for Fine-grained Visual Categorization ", in IEEE International Conference on Computer Vision, Venice, Italy, 2017.

报告人简介：

Sijia Cai received his B.S. and M.S. degrees from Tianjin University in 2011 and 2014, respectively. He is currently a Ph.D. candidate in Prof. Lei Zhang’s group at the Hong Kong Polytechnic University. His research interests include optimization methods and machine learning algorithms for computer vision applications.

报告嘉宾2：毛旭东（香港城市大学）

报告时间：2017年10月18日（星期三）晚20:25（北京时间）

报告题目：Least Squares Generative Adversarial Networks

主持人： 刘日升（大连理工大学）

报告摘要：

Unsupervised learning with generative adversarial networks (GANs) has proven hugely successful. Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross entropy loss function. However, we found that this loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator. We show that minimizing the objective function of LSGAN yields minimizing the Pearson χ^2 divergence. There are two benefits of LSGANs over regular GANs. Firstly, LSGANs are able to generate higher quality images than regular GANs. Secondly, LSGANs perform more stable during the learning process. We evaluate LSGANs on LSUN and CIFAR-10 datasets and the experimental results show that the images generated by LSGANs are of better quality than the ones generated by regular GANs. We also conduct two comparison experiments between LSGANs and regular GANs to illustrate the stability of LSGANs.

参考文献：

[1] Xudong Mao, Qing Li, Haoran Xie, Raymond Y.K. Lau, Zhen Wang, Stephen Paul Smolley, "Least Squares Generative Adversarial Networks", in IEEE International Conference on Computer Vision, Venice, Italy, 2017.

报告人简介：

Xudong Mao received his BEng degree from Nankai University in 2011 and MPhil degree from City University of Hong Kong in 2014. He is currently a PhD student at City University of Hong Kong, advised by Prof. Qing Li. During 2014-2016, he worked as a senior algorithm engineer at Institute of Data Science and Technology (iDST) of Alibaba. His research interests are in the areas of computer vision and deep learning, especially the generative adversarial networks and unsupervised learning.

报告嘉宾3：魏玮（Xi’an Jiaotong University）

报告时间：2017年10月18日（星期三）晚20:50（北京时间）

报告题目：Should We Encode Rain Streaks in Video as Deterministic or Stochastic?

主持人： 刘日升（大连理工大学）

报告摘要：

Videos taken in the wild sometimes contain unexpected rain streaks, which brings difficulty in subsequent video processing tasks. Rain streak removal in a video (RSRV) is thus an important issue and has been attracting much attention in computer vision. Different from previous RSRV methods formulating rain streaks as a deterministic message, this work first encodes the rains in a stochastic manner, i.e., a patch-based mixture of Gaussians. Such modification makes the proposed model capable of finely adapting a wider range of rain variations instead of certain types of rain configurations as traditional. By integrating with the spatiotemporal smoothness configuration of moving objects and low-rank structure of background scene, we propose a concise model for RSRV, containing one likelihood term imposed on the rain streak layer and two prior terms on the moving object and background scene layers of the video. Experiments implemented on videos with synthetic and real rains verify the superiority of the proposed method, as compared with the state-of-the-art methods, both visually and quantitatively in various performance metrics.

参考文献：

[1] Wei Wei, Lixuan Yi, Qi Xie, Qian Zhao, Deyu Meng, Zongben Xu, Should We Encode Rain Streaks in Video as Deterministic or Stochastic? ICCV, 2017.

报告人简介：

Wei Wei obtained his B.S degree from Mathematics Elite Class, School of Mathematics and Statistics, Xi’an Jiaotong University, in 2015. He is currently a master student majored in Statistics at School of Mathematics and Statistics, Xi’an Jiaotong University, supervised by Professor Zongben Xu. His research interests include computer vision and machine learning. He is working in the Machine Learning Group, especially in the area of noise modelling, leaded by Professor Deyu Meng.

报告嘉宾4：谢江涛（Dalian University of Technology）

报告时间：2017年10月18日（星期三）晚21:15（北京时间）

报告题目：Is Second-order Information Helpful for Large-scale Visual Recognition?

主持人：刘日升（大连理工大学）

报告摘要：

By stacking layers of convolution and nonlinearity, convolutional networks (ConvNets) effectively learn from low-level to high-level features and discriminative representations. Since the end goal of large-scale recognition is to delineate complex boundaries of thousands of classes, adequate exploration of feature distributions is important for realizing full potentials of ConvNets. However, state-of-the-art works concentrate only on deeper or wider architecture design, while rarely exploring feature statistics higher than first-order. We take a step towards addressing this problem. Our method consists in covariance pooling, instead of the most commonly used first-order pooling, of high-level convolutional features. The main challenges involved are robust covariance estimation given a small sample of large-dimensional features and usage of the manifold structure of covariance matrices. To address these challenges, we present a Matrix Power Normalized Covariance (MPN-COV) method. We develop forward and backward propagation formulas regarding the nonlinear matrix functions such that MPN-COV can be trained end-to-end. In addition, we analyze both qualitatively and quantitatively its advantage over the well-known Log-Euclidean metric. On the ImageNet 2012 validation set, by combining MPN-COV we achieve over 4%, 3% and 2.5% gains for AlexNet, VGG-M and VGG-16, respectively; integration of MPN-COV into 50-layer ResNet outperforms ResNet-101 and is comparable to ResNet-152. The source code will be available on the project page: http://www.peihuali.org/MPN-COV.

参考文献：

[1] Peihua Li, Jiangtao Xie, Qilong Wang and Wangmeng Zuo. Is Second-order Information Helpful for Large-scale Visual Recognition? IEEE Int. Conf. on Computer Vision (ICCV), pp. 2070-2078, 2017.

[2] Qilong Wang, Peihua Li, Wangmeng Zuo, Lei Zhang. RAID-G: Robust Estimation of Approximate Infinite Dimensional Gaussian with Application to Materiel Recognition. Int. Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4433-4441, 2016.

报告人简介：

Jiangtao Xie is a fourth-year undergraduate of the Electronic Information Innovation Experimental Class of Dalian University of Technology. As a key member of DLUT_VLG team, he achieved 5/50 in iNaturalist Challenge at Fine-Grained Visual Categorization (FGVC) 2017 in conjunction with CVPR2017. His research interests include computer vision and deep learning.

报告嘉宾5：张平平（Dalian University of Technology）

报告时间：2017年10月18日（星期三）晚21:40（北京时间）

报告题目：Diving into Deep Features for Saliency Detection

主持人： 刘日升（大连理工大学）

报告摘要：

As a preprocessing step in computer vision, saliency detection has shown a great success in applications. However, salient object detection remains an unsolved problem because there are large variety of aspects that can contribute to define visual saliency, and it’s hard to combine all factors or cues in an appropriate way. Based on the hierarchical facts in deep neural networks, we dive into the convolutional features in pre-trained FCN models and propose the following methods. 1) we present Amulet, a generic aggregating multi-level convolutional feature framework for salient object detection. Our framework integrates multi-level feature maps into multiple resolutions, adaptively learns to combine these feature maps and predict saliency maps with the combined features. Finally, the predicted results are efficiently fused to generate the final saliency map. The model provides accurate salient object labeling. 2) Considering the object boundary strongly affects the prediction accuracy, we propose a new dropout methods to learn deep uncertain convolutional features (UCF), which encourage the robustness and accuracy of saliency detection and can infer confident boundaries of objects. In addition, we present a new upsampling method to reduce the checkerboard artifacts in de-convolutions. 3) we also propose a stagewise refinement model, which integrates deep features with local context information and refines the course saliency maps generated in the master branch in a stagewise manner. Further, a pyramid pooling module is applied for global context aggregation. Experimental evaluations on public large-scale benchmark datasets show that our proposed methods compares favorably against the state-of-the-art approaches.

参考文献：

[1] Pingping Zhang, Dong Wang, Huchuan Lu, Hongyu Wang and Xiang Ruan, "Amulet: Aggregating Multi-Level Convolutional Features for Salient Object Detection", in IEEE International Conference on Computer Vision (ICCV, Acceptance Rate ~ 29%), Venice, Italy, 2017.

[2] Pingping Zhang, Dong Wang, Huchuan Lu, Hongyu Wang and Baocai Yin, "Learning Uncertain Convolutional Features for Accurate Saliency Detection", in IEEE International Conference on Computer Vision (ICCV, Acceptance Rate ~ 29%), Venice, Italy, 2017.

[3] Tiantian Wang, Ali Borji, Lihe Zhang,Pingping Zhang and Huchuan Lu, "A Stagewise Refinement Model for Detecting Salient Objects in Images", in IEEE International Conference on Computer Vision (ICCV, Acceptance Rate ~ 29%), Venice, Italy, 2017.

报告人简介：

Pingping Zhang received the B.S. degree in mathematics and applied mathematics from Henan Normal University (HNU), Xinxiang, China, in 2012. He is currently pursuing the Ph.D. at Dalian University of Technology (DUT), Dalian, China. His research interests are in deep learning, saliency detection, object tracking and object segmentation.

特别鸣谢本次Webinar主要组织者：

VOOC主席：程明明（南开大学）

VOOC责任委员：刘日升（大连理工大学）

VODB协调理事：孟德宇（西安交通大学）

活动参与方式：

1、VALSE Webinar活动全部网上依托VALSE QQ群的“群视频”功能在线进行，活动时讲者会上传PPT或共享屏幕，听众可以看到Slides，听到讲者的语音，并通过文字或语音与讲者交互；

2、为参加活动，需加入VALSE QQ群，目前A、B、C、D、E、F群已满，除讲者等嘉宾外，只能申请加入VALSE G群，群号：669280237。申请加入时需验证姓名、单位和身份，缺一不可。入群后，请实名，姓名身份单位。身份：学校及科研单位人员T；企业研发I；博士D；硕士M

3、为参加活动，请下载安装Windows QQ最新版，群视频不支持非Windows的系统，如Mac，Linux等，手机QQ可以听语音，但不能看视频slides；

4、在活动开始前10分钟左右，主持人会开启群视频，并发送邀请各群群友加入的链接，参加者直接点击进入即可；

5、活动过程中，请勿送花、棒棒糖等道具，也不要说无关话语，以免影响活动正常进行；

6、活动过程中，如出现听不到或看不到视频等问题，建议退出再重新进入，一般都能解决问题；

7、建议务必在速度较快的网络上参加活动，优先采用有线网络连接。

收藏邀请

上一篇：20171011-24：VALSE ICCV2017 专场二

20171018-25：VALSE ICCV2017 专场三

最新评论

相关分类