Multi-Modal Object Recognition and SceneUnderstanding Employing Computer Vision and Machine Learning Techniques (RDF16/CSDT/HAN) Project Description Recent advances in imaging, networking, data processingand storage technology have resulted in an explosion in the use of multi-modalimage/video in a variety of fields, including video surveillance, urbanmonitoring, cultural heritage area protection and many others. The integrationof videos/images from multiple channels can provide complementary informationand therefore increase the accuracy of the overall decision making process.Currently, seeking an efficient way to analyse, mine and understand suchlarge-scale, multimodal and noisy data is a challenging and interestingresearch topic, where the core problem is learning representations from thedata.
The problem of learning representations from the data has received considerableattention in machine learning. Deep learning approaches in particular haveachieved close to human accuracy at recognition tasks in limited domains (e.g.ImageNet image recognition). However, these approaches usually require vast,high-quality datasets in the training phase in order to obtain a goodperformance, which makes their use expensive and limits to domains wheregathering large numbers of training examples and correct labels are feasible.For the real-life computer vision applications such as video surveillance and ambientassisted living, using such approaches seems impractical. In contrast to deepneural networks, humans can learn concepts from very few examples, and cangeneralize them effortlessly across domains (unlike deep learning). Even 2-yearolds can learn new words and generalize them to new situations after seeingonly a few examples. The abilities of humans including integration informationacquired through different modalities, reasoning, planning, and problem solvingare highly challenging for current artificial intelligence models.
In this PhD work, we will formalize the ideas of representationalgeometry/conceptual spaces in a tractable, deep probabilistic framework, anduse it to develop a new type of bio-inspired models capable of several novelaspects of human-like learning, including learning 1) from few examples, 2)from multiple modalities (visual, spatiotemporal, and text data), 3) groundingconcept representations in perceptions and action possibilities. The targetedapplications include video surveillance, robot vision and smart environment forelderly assisted living.
Eligibility requirement:
* Academic excellence of the proposed student i.e. normally an Honours Degree:1st or 2:1 (or equivalent) or possession of a Masters degree, with merit (orequivalent study at postgraduate level). * We expect experience in computer vision, videoanalysis, and machine learning as well as good mathematical and programmingskills (either C/C++ or MATLAB).
* Appropriate IELTS score (overall score of 6.5 with no component below 6.0),if required (evidence required by 1 August). Please ensure you quote the advert reference above onyour application form. Deadline for applications: 18March 2016
Interview date/s (if known): to be confirmed
Start Date: 1 Oct 2016 Funding Notes Thestudentship includes a full stipend, paid for three years at RCUK rates (in2016/17 this is £14,296 pa) and fees (Home/EU £4,350 / International £13,000). References 1. D. Zhang, J. Han, J. Han, and L. Shao,Co-saliency Detection Based on Intra-saliency Prior Transfer and DeepInter-saliency Mining, IEEE Transactions on Neural Networks and Learning Systems,doi: 10.1109/TNNLS.2015.2495161. 2. M. Yu, L. Liu and L. Shao,“Structure-Preserving Binary Representations for RGB-D Action Recognition”,IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), doi:10.1109/TPAMI.2015.2491925, 2015. 3. L. Shao, L. Liu and M. Yu, “KernelizedMultiview Projection for Robust Action Recognition”, International Journal ofComputer Vision (IJCV), doi: 10.1007/s11263-015-0861-6, 2015. 4. J. Han, C. Chen, L. Shao, X. Hu, J. Han, and T.Liu, Learning Computational Models of Video Memorability from FMRI BrainImaging, IEEE Transactions on Cybernetics 45(8), Aug. 2015. 5. X. Ji, J. Han, X. Jiang, X. Hu, L. Guo, J. Han,L. Shao, and T. Liu, Analysis of Music/Speech via Integration of Audio Contentand Functional Brain Response, Information Sciences 297, 2015. 6. J. Han, L. Shao, D. Xu, and J. Shotton,Enhanced Computer Vision with Microsoft Kinect Sensor: A Review , IEEETransactions on Cybernetics, Oct. 2013 (top downloaded paper 2014-2015, googlecitation 325+). 7. J. Han, E. Pauwels, and P. de Zeeuw, FastSaliency-Aware Multi-Modality Image Fusion, Neurocomputing, July. 2013.
|