苹果手机经典壁纸:ECCV 2010, 11th European Conference on Comput...

来源:百度文库 编辑:九乡新闻网 时间:2024/03/28 19:34:45

Current location:demos




home

news

call for papers

important dates

venue/travel/local info

organizing committee

presentation instructions

submissions

reviewing

program

special sessions

awards

workshops

tutorials

demos

registration

travel grants

accommodation

statistics

sponsors

job offers

ECCV history

Contact
eccv2010@ics.forth.gr

















































Demo 01
Real-time Facial Expression Recognition with Spatiotemporal Local Descriptors
Demo 02
A Novel Approach on Interactive, Visual Analysis of Multi-spectral Data for Reflectance Analysis
Demo 03
From Multiple Views to Textured 3D Meshes: a GPU-powered Approach
Demo 04
Robust Camera Tracking with Dense Depth Recovery and Its Applications
Demo 05
Real-time Head Pose Tracking and Its Applications in Telecommunication
Demo 06
Gradient Response Maps and their Efficient Computation for Real-Time Detection of Texture-Less Objects
Demo 07
An Active Observer Extracting Objects from the Scene
Demo 08
Real-Time Multi-Person Tracking from a Mobile Platform
Demo 09
Real-time Spatiotemporal Stereo Matching Using the Dual-Cross-Bilateral Grid
Demo 10
Kinship Verification with Facial Images for Family Album Organization
Demo 11
Large Scale Image Semantic Search
Demo 12
SAMANTHA: a Hierarchical, Efficient, Available Structure and Motion Pipeline
Demo 13
Feature Tracking and Object Recognition on a Hand-held
Demo 14
3D Multiple Target Tracking and Face Pose Estimation with a Rotating and Zooming Camera
Demo 15
Image Manipulation Detection using Computer Vision Methods
Demo 16
Human Visual Filters in Handheld Video
Demo 17
A Large-Scale Taxonomic Classification System for Web-based Videos
Demo 18
Mini-Dome: Transportable Photometric Stereo/IBR Setup
Demo 19
A Novel Lightfield Camera for Plenoptic Image Processing
Demo 20
Building Rome on a Cloudless Day
Demo 21
Tag Recommendation and Category Discovery on YouTube
Demo 22
On-board Computer Vision-Based Control of a Micro Helicopter
Demo 23
Unstructured VBR
Demo 24
Automated Airborne Surveillance
Demo 25
Real-Time Spherical Mosaicing using Whole Image Alignment



Real-time Facial Expression Recognition with Spatiotemporal Local Descriptors
Contributors: G. Zhao, J. Holappa, X. Huang, M. Pietikäinen
Machine Vision Group, Infotech Oulu and Department of Electrical and Information Engineering, University of Oulu, Finland
Abstract: This demonstration shows a real-time facial expression recognition system using an effective spatiotemporal local descriptor. An ordinary web camera is used to capture videos with a resolution of 640 by 480 pixels or lower in indoor environments. The subjects are at the distance of 0.5-1.0 meters from the camera. The face is detected automatically using boosted Haar-like features, and the size of the cropped faces is around 40 by 50 pixels. No eye detection is used, but the original coarsely located face images are used and normalized into the same size. The spatiotemporal LBP-TOP features are extracted, which combine appearance and motion together and have an ability to describe the local transition information in space and time. A sequence of 20 frames is taken and used as an input video. The selected LBP-TOP slice histograms using Adaboost learning are extracted and fed into the pre-learned SVM classifiers. The system can recognize happiness, sadness, surprise, fear, anger and disgust.
References
[1] Zhao G & Pietikäinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):915-928.
A Novel Approach on Interactive, Visual Analysis of Multi-spectral Data for Reflectance Analysis
Contributors: J. Jordan, E. Angelopoulou
Pattern Recognition Lab, University of Erlangen-Nuremberg
Abstract: Multispectral imaging is an important tool for better understanding of image formation and reflectance phenomena. Wideband RGB data is not sufficient to draw meaningful interpretations from the captured data; instead, a significant amount of filter bands needs to be available. Research on computer vision methods that interpret, or rely on, scene reflectance often profits from analyzing multispectral images. However, due to the high dimensionality of the data, both human observers as well as computers have difficulty interpreting this wealth of information. In this demonstration, we will show how analysis of a multispectral image can be conducted interactively by the researcher with a novel, powerful visually-focused framework. Our software incorporates a new paradigm for visual assistance of multispectral analysis that specifically addresses the lack of a seamless integration of spectral distribution and topology. It puts emphasis on the spectral gradient, which has been shown to provide enhanced information for many reflectance analysis tasks. It also includes a rich toolbox for evaluation of image segmentation and other algorithms in the multispectral domain. As part of the demo, we will show how several specific research interests in scene reflectance can be tackled in a simple workflow on captured image data. The software presented in the demo will be released as open-source and researchers are encouraged to apply it on their own analysis tasks.
From Multiple Views to Textured 3D Meshes: a GPU-powered Approach
Contributors: K. Tzevanidis (Univ. of Crete, FORTH), X. Zabulis (FORTH), T. Sarmis (FORTH), P. Koutlemanis (FORTH), N. Kyriazis (Univ. of Crete, FORTH), A. Argyros (Univ. of Crete, FORTH)
Abstract: We will present work on exploiting modern graphics hardware towards the real-time production of a textured 3D mesh representation of a scene observed by a multicamera system. The employed computational infrastructure consists of a network of four PC workstations each of which is connected to a pair of cameras. One of the PCs is equipped with a GPU that is used for parallel computations. The result of the processing is a list of texture mapped triangles representing the reconstructed surfaces. In contrast to previous works, the entire processing pipeline (foreground segmentation, 3D reconstruction, 3D mesh computation, 3D mesh smoothing and texture mapping) has been implemented on the GPU. The demonstration will show that accurate, high resolution, texture-mapped 3D reconstruction of a scene observed by eight cameras can be achieved in real time.
For more information, please see:
http://www.ics.forth.gr/~argyros/research/gpu3Drec.htm
References
[1] K. Tzevanidis, X. Zabulis, T. Sarmis, P. Koutlemanis, N. Kyriazis, A.A. Argyros, "From multiple views to textured 3D meshes: a GPU-powered approach", to appear in Proceedings of the Computer Vision on GPUs Workshop, CVGPU’2010, In conjunction with ECCV’2010, Heraklion, Crete, Greece, 10 September 2010.
Robust Camera Tracking with Dense Depth Recovery and Its Applications
Contributors: G. Zhang (Zhejiang University), Z. Dong (Zhejiang University), J. Jia (The Chinese University of Hong Kong), H. Bao (Zhejiang University)
Abstract: We will demonstrate an automatic camera tracking system with the dense depth recovery technique (ACTS:http://www.zjucvg.net/acts/acts.html). It is useful for robust structure-from-motion (SfM) due to the following two new features. On the one hand, this system can efficiently and reliably handle long sequences with varying focal length [1]. On the other hand, it employs efficient non-consecutive feature tracking to address the reconstruction drift problem for loop-back sequences [2]. Besides camera tracking, ACTS also allows high-quality depth map estimation from video sequences [3]. The recovered high-quality dense depth maps can profit many editing, enhancement and reconstruction applications. Especially, we will show videos for applications of video composition, refilming [4], 3D reconstruction etc. In addition, live demo of markerless real-time camera tracking [5] will be presented, which requires the use of ACTS for offline environment modeling. One or more augmented reality games will be provided for people to play with based on it.
References
[1] Guofeng Zhang, Xueying Qin, Wei Hua, Tien-Tsin Wong, Pheng-Ann Heng, and Hujun Bao. Robust Metric Reconstruction from Challenging Video Sequences. In CVPR, 2007.
[2] Guofeng Zhang, Zilong Dong, Jiaya Jia, Tien-Tsin Wong, and Hujun Bao. Efficient Non-Consecutive Feature Tracking for Structure-from-Motion. In ECCV, 2010.
[3] Guofeng Zhang, Jiaya Jia, Tien-Tsin Wong, and Hujun Bao. Consistent Depth Maps Recovery from a Video Sequence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6):974-988, 2009.
[4] Guofeng Zhang, Zilong Dong, Jiaya Jia, Liang Wan, Tien-Tsin Wong, and Hujun Bao. Refilming with Depth-Inferred Videos. IEEE Transactions on Visualization and Computer Graphics, 15(5):828-840, 2009.
[5] Zilong Dong, Guofeng Zhang, Jiaya Jia, and Hujun Bao. Keyframe-Based Real-Time Camera Tracking. In ICCV, pages 1538-1545, 2009.
Real-time Head Pose Tracking and Its Applications in Telecommunication
Contributors: Q. Cai, C. Zhang, Z. Zhang, D. Florencio
Microsoft Research
Abstract: The knowledge of a user’s head pose can improve the overall user experience in many applications including telecommunication. We will show a real time demo to track 3D head pose robustly using multiple loosely placed web cameras with unknown geometric placement. Compared to single camera tracking, the use of multiple cameras increases the reliability of tracking by covering a wider range of poses as well as providing more accurate results. Our approach is also very useful for applications where prior calibration on camera placements cannot be performed for the ease of usability. The head pose information can be used to create immersive telecommunication experiences driven by motion parallax. In particular, we will show two demo applications: monocular 3D (Mono3D) rendering, and 3D audio system with loudspeakers. In Mono3D, different views of the remote party are rendered depending on the viewer’s head position, creating a pseudo 3D effect even on regular 2D displays. In the 3D audio system, dynamic crosstalk cancellation is applied to allow the sweet spot of the system to follow wherever the user moves, overcoming one of the biggest challenges in 3D audio spatialization.
For more information, please see:
http://research.microsoft.com/en-us/UM/people/qincai/Videos/TwoCamTrack.wmv
http://research.microsoft.com/en-us/UM/people/qincai/Videos/mono3D.wmv
Gradient Response Maps and their Efficient Computation for Real-Time Detection of Texture-Less Objects
Contributors: S. Hinterstoisser (CAMP-TUM), S. Ilic (CAMP-TUM), Peter Sturm (INRIA Grenoble - Rhône Alpes), P. Fua (CVLAB-EPFL), N. Navab (CAMP-TUM), V. Lepetit (CVLAB-EPFL)
Abstract: We propose to demonstrate LINE, a method we developed for our paper "Gradient Response Maps and their Efficient Computation for Real-Time Detection of Texture-Less Objects" [1] which is currently under PAMI review. Our new approach is a method for real-time 3D object detection that can handle textured and untextured objects even in case of strong background clutter, with an almost instantaneous training time. At its core, it is a novel image representation that is designed to be both robust to small object transformations and efficient for the architecture of modern computers. While being faster than our previous approach DOT [2], presented at CVPR2010, it exhibits significantly more robustness to strong background clutter. This is made possible by precomputing efficient but very discriminant representations of the input images (so called response maps) for each image and each discretized gradient orientation. Each of these maps contains its similarity with respect to the current image. We store these maps in linear memory and as a result, we can handle thousands of templates in real-time. Additionally, these response maps allow us to keep the exact locations of the gradients instead of representing the templates with the dominant gradients found in regulary spaced grid cells [2] which makes the new approach less sensitive to background clutter.
The corresponding source code (GPL license) can be downloaded at:http://campar.in.tum.de/personal/hinterst/index/index.html
References
[1] Hinterstoisser, S., Ilic, S., Fua, P., Navab, N., Lepetit, V.: Gradient response maps and their efficient computation for real-time detection of texture-less objects. In:currently under PAMI review. (2010)
[2] Hinterstoisser, S., Lepetit, V., Ilic, S., Fua, P., Navab, N.: Dominant orientation templates for real-time detection of texture-less objects. In: Conference on Computer Vision and Pattern Recognition. (2010)
An Active Observer Extracting Objects from the Scene
Contributors: A. Mishra, Y. Aloimonos
University of Maryland
Abstract: In this demo, we are going to demonstrate the ability of the fixation based segmentation framework [1,2] to simultaneously localize and segment objects in a scene. Specifically, we are going to place a set of common objects such as bottle, books, watch, scissor etc on a table. Our camera on a moving platform automatically localizes and then fixates at them to segment it out. The exact segmentation of the objects will pop-up. Please visithttp://www.umiacs.umd.edu/~mishraka/Files/tableTopDemo.gif for an example of how the segmented results will look like. The demo will highlight two primary strengths of the fixation based segmentation framework in comparison with the prevailing segmentation strategies: first, the fixation based framework is completely automatic and needs no user-input; second, since the motion and intensity cues are both used, the closed boundary of the segmented region is actually the exact object boundary. So, the entire objects are segmented using just low-level information. To demonstrate the robustness of our segmentation process, we choose objects of all kinds of shapes and sizes. Also, some objects are going to have a lot of texture while some will have smooth interior.
References
[1] Ajay Mishra, Yiannis Aloimonos, Active Segmentation, International Journal of Humanoid Robotics, 2009.
[2] Ajay Mishra, Yiannis Aloimonos, Cheong Loong Fah, Active Segmentation With Fixation, in ICCV 2009.
Real-Time Multi-Person Tracking from a Mobile Platform
Contributors: G. Floros, D. Mitzel, P. Sudowe, B. van der Zander, B. Leibe
UMIC Research Centre, RWTH Aachen University
Abstract: Multi-object tracking from a moving platform is an important problem with direct applications in the near future of mobile robotics and smart vehicles. Considerable progress has been achieved on this task in recent years, and several approaches have been proposed that reach good performance in challenging real-world scenarios. However, computational complexity is still a major issue, since the difficulty of the task requires the successful integration of several complex vision components. In this demo, we present a simplified version of the multi-person tracking approach by [1] optimized for real-time processing. Our approach combines stereo visual odometry estimation, HOG pedestrian detection, and multi-hypothesis tracking-by-detection running together on a single laptop with a CUDA-enabled graphics card. Real-time processing is achieved by shifting expensive computations to the GPU and making extensive use of scene geometry constraints. As a result of this integration, our system can robustly track multiple persons at interactive frame rates from a mobile setup.. It thus provides a good example for the vision capabilities that are already possible now for mobile robotics applications.
References
[1] A. Ess, B. Leibe, K. Schindler, and L. van Gool, "Robust Multi-Person Tracking from a Mobile Platform.," IEEE Trans. PAMI, vol. 31, pp. 1831-46, 2009.
Real-time Spatiotemporal Stereo Matching Using the Dual-Cross-Bilateral Grid
Contributors: C. Richardt (University of Cambridge), D. Orr (University of Cambridge), I. Davies (University of Cambridge) Antonio Criminisi (Microsoft Research Cambridge), N.A. Dodgson (University of Cambridge)
Abstract: We introduce a real-time stereo matching technique based on a reformulation of Yoon and Kweon’s adaptive support weights algorithm [1]. Our implementation uses the bilateral grid to achieve a speedup of 200× compared to a straightforward full-kernel GPU implementation, making it the fastest technique on the Middlebury website. We introduce a colour component into our greyscale approach to recover precision and increase discriminability. Using our implementation, we speed up spatial-depth superresolution 100×. We further present a spatiotemporal stereo matching approach based on our technique that incorporates temporal evidence in real time (>14 fps). Our technique visibly reduces flickering and outperforms per-frame approaches in the presence of image noise. We have created five synthetic stereo videos, with ground truth disparity maps, to quantitatively evaluate depth estimation from stereo video. Source code and datasets are available on our project website.
References
[1] Richardt, C., Orr, D., Davies, I., Criminisi, A., Dodgson, N.A.: Real-time Spatiotemporal Stereo Matching Using the Dual-Cross-Bilateral Grid. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Proc. ECCV 2010, Lecture Notes in Computer Science, vol. 6311–6316. Springer, Berlin (2010)
Kinship Verification with Facial Images for Family Album Organization
Contributors: R. Fang, K.D. Tang, N. Snavely, T. Chen
Cornell University
Abstract: Over the past few years, personal photo collection applications like Picasa and genealogy software like MyHeritage have become pervasive through the web. Efficient and effective analysis of family linkage and ancestry history present in the facial images on these services could prove very valuable in electronic family album organization, social data mining, and web search. Novel computer vision approaches in kinship recognition will contribute another dimension to far-reaching real-world applications including large-scale search for lost children and historic genealogy study. Based on the large parents and children pair dataset collected via internet, we demonstrate the software application of kinship verification using facial images. By selecting a pair of potential parent and child facial images from the database, or providing two facial photos of your family members, the system computes the representative facial features and classifies the pair of people as related or un-related as parent and child using K-Nearest-Neighbors. We also have a recognition challenge where human competes against computer on kinship verification based on facial images, where computer reaches 70% accuracy on our database. The photos provided for kinship verification must be upright frontal images with normal illumination and natural facial expression for accuracy feature extraction.
References
[1] Ruogu Fang, Kevin D. Tang, Noah Snavely, Tsuhan Chen,TOWARDS COMPUTATIONAL MODELS OF KINSHIP VERIFICATION, International Conference on Image Processing, 2010
Large Scale Image Semantic Search
Contributors: K. Yu, F. Lv, Y. Lin, S. Zhu, M. Yang (NEC Labs America), L. Cao, T. Huang
(University of Illinois at Urbana Champaign)
Abstract: A large scale image semantic search system is presented in this demo. Given a query image, our system returns semantic information such as object class, object description and links to online references of the query image. We build the system on the ImageNet dataset consisting of more than 1.2 million images in 1000 object classes. Some of the classes are shown in Figure 2. The system is web based and can be accessed from web browsers. User can submit any online query image that belongs to the 1000 object classes. The web server is running on a single laptop computer because the computation is very light-weight.
SAMANTHA: a Hierarchical, Efficient, Available Structure and Motion Pipeline
Contributors: M. Farenzena, A. Fusiello, R. Gherardi, R. Toldo
Computer Science Department, University of Verona, Italy
Abstract: We will demonstrate our Structure and Motion pipeline, called Samantha, an approach to 3D reconstruction from images which is both more robust and computationally cheaper than current competing approaches. Pictures are organized into a hierarchical tree which has single images as leaves and partial reconstructions as internal nodes. The method proceeds bottom up until it reaches the root node, corresponding to the final result. This framework is one order of magnitude faster than sequential approaches, inherently parallel, less sensitive to the error accumulation causing drift and truly uncalibrated, not needing EXIF metadata to be present in pictures. Verona dataset: a thousand images from Verona city center. From left to right, the point cloud superimposed to an aerial view and three sightseeing places from the reconstructed data. Our results have been verified both qualitatively producing compelling point clouds and quantitatively, comparing them with laser scans serving as ground truth. Our demo will consist of a poster, several videos of our results and a live demonstration.
Feature Tracking and Object Recognition on a Hand-held
Contributors: T. Lee, S. Soatto
University of California, Los Angeles
Abstract: We demonstrate a visual recognition system operating on a hand-held device, with the help of an efficient and robust feature tracking and an object recognition mechanism that can be used for interactive mobile applications. In our recognition system, corner features are detected from captured video frames in a multi-scale image pyramid, and are tracked between consecutive frames efficiently. In order to perform object recognition, local descriptors are calculated on the tracked features, and quantized using a vocabulary tree. For each object, a bag-of-words model is learned from multiple views. The learned objects are recognized by computing the ranking score for the set of features in a single video frame. Our feature tracking algorithm and local descriptors are different than the Lucas-Kanade algorithm in image pyramid or the SIFT descriptor, however improving the efficiency and accuracy. For our implementation on a mobile phone, we used an iPhone 3GS with a 600MHz ARM chip CPU. The video frame is captured from a camera preview screen at a rate of 15 frames per second using the public API. The task of object recognition on a mobile phone runs at around 7 frames per second, including the feature tracking and descriptor calculation.
References
[1] D. G. Lowe. Distinctive image features from scale-invariant keypoints. Intl. Journal of Computer Vision, 60(2):91–110, 2004.
[2] B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Image Understanding Workshop, pages 121–130, 1981.
[3] D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In IEEE CVPR, volume 2, pages 2161–2168, 2006.
[4] E. Rosten and T. Drummond. Machine learning for high-speed corner detection. In ECCV, volume 1, pages 430–443, May 2006.
[5] Willow Garage. OpenCV: Open source computer vision library. http://opencv.willowgarage.com/wiki/, 2009.
3D Multiple Target Tracking and Face Pose Estimation with a Rotating and Zooming Camera
Contributors: A. Del Bimbo, F. Dini, G. Lisanti, I. Masi and F. Pernici
Media Integration and Communication Center (MICC) University of Florence
Abstract: We present two demos of two modules of a surveillance system under development at MICC with the collaboration of Thales. The first demo presents a method for tracking multiple targets in world coordinates using a zooming camera [1]. The method achieves an almost constant standard deviation error of less than 0.3 meters in recovering world coordinate trajectories of multiple moving targets in an area of 70x15 meters. The main novelty relies on exploiting device-tagged text information from the camera encoders to index and then refine at runtime a set of pre-build camera poses (i.e. internal and external camera parameters). Real time camera pose estimation allows to recover world coordinate localizations of moving targets in the presence of large focal length variations [1][2]. In addition to this, scene geometry and appearance variation are also taken into account allowing the system to operate unattended for several hours [3]. The second module estimates the orientation of target face provided that the target is taken in a sufficient high resolution. Once head localization is obtained within the frame using the first module (a fixed height of a person is assumed), the camera begins moving and zooming to center it within the camera view [4][5].
References
[1] A. Del Bimbo G. Lisanti F. Pernici. Scale Invariant 3D Multi-Person Tracking using a Base Set of Bundle Adjusted Visual Landmarks. Proc. of ICCV Int'l Workshop on Visual Surveillance (VS) 2009. Kyoto, Japan.
[2] A. Del Bimbo, G. Lisanti, I. Masi, F. Pernici. Person Detection using Temporal and Geometric Context with a Pan Tilt Zoom Camera. In Proc. of the International Conference on Pattern Recognition (ICPR) 2010. Istanbul, Turkey.
[3] A. Del Bimbo, G. Lisanti, I. Masi, Pernici F. Device-Tagged Feature-based Localization and Mapping of Wide Areas with a PTZ Camera. In Proc. of CVPR International Workshop on Socially Intelligent Surveillance and Monitoring (SISM) 2010. San Francisco, CA, USA
[4] A. Del Bimbo, F. Dini, and G. Lisanti, "A real time solution for face logging," in Proc. of Int’l Conference on Imaging for Crime Detection and Prevention (ICDP), London, United Kingdom, 2009.
[5] A. D. Bagdanov, A. Del Bimbo, and W. Nunziati, "Improving Evidential Quality of Surveillance Imagery Through Active Face Tracking" 18th Int’l Conf. on Pattern Recognition, 2006.
Image Manipulation Detection using Computer Vision Methods
Contributors: C. Riess and E. Angelopoulou
Pattern Recognition Lab, University of Erlangen
Abstract: In blind image forensics, researchers aim to assess the authenticity of an image without an embedded security scheme. The main focus of this demo is to show how insights from computer vision improve the detection of image manipulations. In particular, we will demonstrate a new method for the analysis of the illuminant color [1] and its connection to the estimation of the illuminant direction. In image forensics, illumination and image geometry are among the most innovative methods for validating scene properties. Within our demonstration framework, we will also show how these ideas nicely complement other CV-based methods, like consistency measures based on camera properties such as: the camera response function [2], chromatic aberration [3] and image compression artifacts [4]. For the purposes of the demo, we have already compiled a diverse demo set of images. A visitor, however, will also be able to use our software to analyze his own images. Though the focus of our demo is to show how computer vision techniques can be used in image forensics, our software also includes two popular forensic analysis methods for a better overview on the topic.
References
[1] C. Riess, E. Angelopoulou: "Scene Illumination as an Indicator of Image Manipulation", Information Hiding Conference, 2010.
[2] T.-T. Ng, S.-F. Chang, M.-P. Tsui: "Using Geometry Invariants for Camera Response Function Estimation", IEEE Conference on Computer Vision and Pattern Recognition, 2007.
Human Visual Filters in Handheld Video
Contributors: D. Summers-Stay, Y. Aloimonos
University of Maryland
Abstract: We will demonstrate detection and soft segmentation of humans in low-frame rate video, using visual filters. These filters use a learned model of the patch surrounding a pixel to estimate the probability of the pixel belonging to the class "human". The demo will use a handheld webcam, showing that the technique works when other methods for detection and tracking that depend on a fixed background would fail.
A Large-Scale Taxonomic Classification System for Web-based Videos
Contributors: Y. Song (Google, USA), M. Zhao (Google, USA), R. Strobl, (Google, Switzerland), J. Zhang (Google, USA), J. Yagnik (Google, USA)
Abstract: This demo presents a large-scale taxonomic classification system, with more than 1000 categories in the taxonomy. A user can select a video from YouTube.com, and the system returns one or more categories (from the taxonomy), which the video belongs to. Technical challenges for such a large-scale video classification system include large data diversity within a category, the lack of manually labeled training data, and degradation of video quality. We use taxonomic structure of categories in classifier training. To compensate for the lack of labeled video data and to take advantage of a large corpus of labeled text documents, we propose a novel scheme to transfer knowledge from web-based text documents to the video domain. State-of-the-art video content-based features are integrated with text-based features to gain power in the case of degradation of one type of features. This demo shows the effectiveness of our algorithms. The system works on a very large percent of YouTube corpus so that almost any video from YouTube can be selected, and categorization results are shown.
References
[1] Yang Song, Ming Zhao, Jay Yagnik, and Xiaoyun Wu, Taxonomic Classification for Web-based Videos, by, in IEEE CVPR'2010
[2] Zheshen Wang, Ming Zhao, Yang Song, Sanjiv Kumar and Baoxin Li, YouTubeCat: Learning to Categorize Wild Web Videos, in IEEE CVPR'2010
Mini-Dome: Transportable Photometric Stereo/IBR Setup
Contributors: W. Moreau, F. Verbiest, G. Willems, L. Van Gool
Abstract: Many sophisticated Image-Based Rendering and Photometric Stereo systems have already been built. An issue with most is that they are fixed laboratory setups. The parts to be captured have to be brought to the system. That strongly limits their applicability. A half-spherical structure is presented - the mini-dome - that can be easily assembled and disassembled. All components can be neatly packed into a standard flight case. The inner side of the dome is uniformly covered by 260 white LEDs that are computer-controlled. One overhead camera takes images of the object, which is to be placed at the center. Typically an image is taken for each LED which is lit. Accompanying software supports both Image-Based Rendering and 3D reconstruction of fine surface detail through photometric stereo. The software also allows the user to further enhance the visualization by applying different filters or to create line-drawing style of output. The first prototype of the mini-dome concept was built in 2004. It has been tested in multiple museums and at several excavations since. It has been used to elicit the structure of textiles or the brush strokes on paintings, and to enhance the legibility of cuneiform tablets.
References
[1] G. Willems, F. Verbiest, W. Moreau, H. Hameeuw, K. Van Lerberghe, and L. Van Gool, Easy and cost-effective cuneiform digitizing, The 6th International Symposium on Virtual Reality, Archaeology and Cultural Heritage (VAST 2005) pages:73-80, 2005
A Novel Lightfield Camera for Plenoptic Image Processing
Contributors: O. Fleischmann, L. Wietzke, C. PerwassAbstract: In the field of computational photography, the interest in lightfield image processing has grown over the last time due to new lightfield capturing devices. We present a novel lightfield camera which, compared to conventional lightfield capturing devices, provides a very high spatial resolution up to 3 megapixels with respect to a 10 megapixels sensor. We introduce a novel highly accurate digital refocus algorithm based on the images from our lightfield camera. The algorithm is able to render a refocused image in less than 20ms. Apart from refocusing the image after the capturing process, it will be possible create completely focused images as well as images with custom refocused areas. Further we propose a depth estimation algorithm based on the lightfield images which motivates to replace classical stereo systems in many by a single lightfield camera. We will show a live demonstration of the refocusing, stereo vision and depth estimation.
Building Rome on a Cloudless Day
Contributors: J-M. Frahm, P. Georgel, D. Gallup, T. Johnson, R. Raguram, C. Wu, Y.-H. Jen, E. Dunn, B. Clipp, S. Lazebnik, M. Pollefeys
University of North Carolina at Chapel Hill, ETH Zurich
Abstract: This demo presents results for dense 3D reconstruction from unregistered Internet-scalephoto collections with about 3 million images within the span of a day on a single PC (“cloudless”). Our method advances image clustering, stereo, stereo fusion and structure from motion to achieve high computational performance on large photo collections. We leverage geometric and appearance constraints to obtain a highly parallel implementation on modern graphics processors and multi-core architectures. This leads to two orders of magnitude higher performance on an order of magnitude larger dataset than competing state-of-the-art approaches. During this demo, a live processing will be running on a smaller size dataset in order to visualize each step of the method to visitors.
References
[1] Jan-Michael Frahm, Pierre Georgel, David Gallup, Tim Johnson, Rahul Raguram, changchang Wu, Yi-Hung Jen, Enrique Dunn, Brian Clipp, Svetlana Lazebnik, Marc Pollefeys, "Building Rome on a Cloudless Day", ECCV 2010.
Tag Recommendation and Category Discovery on YouTube
Contributors: G. Toderici, H. Aradhye, M. Pasca, L. Sbaiz, J. Yagnik
Google
Abstract: We present a very large scale tag recommendation system for YouTube videos which only uses the video content for predictions [1]. The system learns tags using noisy labels and uses web data to smooth noisy tags by grouping them together into feasible categories [2]. The feasibility of the categories is determined completely automatically by using classifier agreement. The resulting system comprises of a wide variety of tags/categories that demonstrate the effectiveness of large scale learning and cross domain knowledge integration. The demo displays various visualizations of the predicted tags and categories: 1. For each tag we allow the user to browse the top predictions; and 2. For each video we display the top predicted tags. 3. For each of the categories that were learned we display the top videos. The categories which were deemed “unfeasible” on a visual level are highlighted.
References
[1] Hrishikesh Aradhye, George Toderici, Jay Yagnik, "Video2Text: Learning to Annotate Video Content", ICDM Workshop on Internet Multimedia Mining, 2009.
[2] George Toderici, Hrishikesh Aradhye, Marius Pasca, Luciano Sbaiz, Jay Yagnik, "Finding Meaning on YouTube: Tag Recommendation and Category Discovery", Computer Vision and Pattern Recognition, 2010.
On-board Computer Vision-Based Control of a Micro Helicopter
Contributors: L. Meier,P. Tanskanen, F. Fraundorfer, M. Pollefeys
ETH Zurich
Abstract: The demo shows a small quadrotor helicopter that is fully autonomously stabilized with the support of computer vision. The helicopter has a mini-computer onboard that consists of an Intel Core2Duo mobile CPU. This platform enables optimized algorithms to directly run onboard while also providing acceptable battery life without the need for error-prone and high-latency video transmissions. The position controller of the helicopter uses position estimates computed with computer vision that is obtained through the onboard cameras. Because of the low computing resources and challenging real-time constraints on a flying helicopter, efficient and robust algorithms have to be used. For the position estimation adapted algorithms are implemented, i.e. marker-based localization, natural feature-based position estimation and also tight coupling of vision algorithms with inertial measurements. The helicopter is able to performbasic autonomous flight operations such as take-off and landing, position hold and waypoint following.
You may have a look at the following video:
http://pixhawk.ethz.ch/_media/videos/pixhawk-imav2010.mp4
Unstructured VBR
Contributors: L. Ballan (ETHZ), G.J. Brostow (UCL), J. Puwein (ETHZ), M. Pollefeys (ETHZ)
Abstract: In the referenced paper we have presented an algorithm designed for navigating around a performance that was filmed as a "casual" multi-view video collection: real-world footage captured on hand held cameras by a few audience members. The objective was to easily navigate in 30, generating a video-based rendering (VBR) of a performance filmed with widely separated cameras, The point of the demo is to show our GPU-accelerated interface which allows a user to navigate a video collection in real-time. We will prepare a set of 3 or 4 video collections so that the people can browse them and see how does VBR look and feel.
Some videos of the working system can be seen in the project webpage:http://cvg.ethz.ch/research/unstructured-vbr/
References
[1] L. Ballan, G. J. Brostow and J. Puwein and M. Pollefeys, "Unstructured Video-Based Rendering: Interactive Exploration of Casually Captured Videos", ACM Transactions on Graphics (Proceedings of SIGGRAPH 2010)
Automated Airborne Surveillance
Contributors: M. E. Bazakos, E. Taipale, C. Poling
Lockheed Martin Maritime Systems and Sensors (MS2), Aviation Systems, Eagan MN
Abstract: Persistent airborne Intelligence, Surveillance and Reconnaissance (ISR) requires all weather and 24/7 integrated multi-sensor surveillance systems, with capabilities of detecting and holding at risk, and prosecuting mission specific targets of interest. Lockheed Martin, as the systems integrator of manned P-3C and UAV platforms, has been providing custom solutions to the U.S. Navy and several international customers for over 40 years, to meet their evolving mission requirements. These solutions involve the integration of state-of-the-art mission systems composed of a heterogeneous multi-sensor suite, along with communication, processing, storage, display, and weapons subsystems.
Real-Time Spherical Mosaicing using Whole Image Alignment
Contributors: S. Lovegrove, A.J. Davison
Imperial College London
Abstract: When a purely rotating camera observes a general scene, overlapping views are related by a parallax-free warp which can be estimated by direct image alignment methods that iterate to optimise photo- consistency. However, building globally consistent mosaics from video has usually been tackled as an off-line task, while sequential methods suitable for real-time implementation have often suffered from long-term drift. We present a high performance real-time video mosaicing algorithm based on parallel image alignment via ESM (Efficient Second-order Minimisation) and global optimisation of a map of keyframes over the whole viewsphere. We show real-time results for drift-free camera rotation tracking and globally consistent spherical mosaicing, demonstrating high global accuracy and the ability to track rapid rotation while maintaining solid 30Hz operation. We also show that automatic camera calibration refinement can be straightforwardly built into our framework. Real-time performance is made possible through an efficient implementation that runs on commodity graphics hardware.
Call for demos >>

back

Pages designed and maintained by George Georgiadis