|
Enregistrement audio de l’exposé [en anglais] de Jan Koenderink
|
Écouter |
 | format audio mp3 - 29.64 Mo
| |
|
Enregistrement vidéo de l’exposé [en anglais] de Jan Koenderink
|
- Visualiser
- Télécharger |
 | format quicktime mov, vidéo à la demande / streaming
| |
|
Enregistrement vidéo de l’exposé [en anglais] de Jan Koenderink
|
Télécharger |
 | format vidéo mp4 à télécharger - 209.48 Mo
| |
|
Enregistrement vidéo de l’exposé [en anglais] de Jan Koenderink
|
Télécharger |
 | format vidéo windows media video à télécharger - 153.08 Mo
| |
|
Documents d’accompagnement de l’exposé de Jan Koenderink
|
Télécharger |
 | format pdf - 58.63 Mo
| |
| Jan Koenderink (Utrecht University) Department Physics of Man, Helmholtz Institute, Utrecht University |
Séminaire Vision artificielle / Équipe Willow
Pour en savoir plus sur ce cycle ...
>> enregistrement [précédent|suivant] de Séminaire Vision artificielle / Équipe Willow
Liste complète des enregistrements de ce cycle par ordre chronologique :
- Building local part models for category-level recognition du 6 octobre 2005 — Cordelia Schmid
This talk addresses the problem of building semi-local part models
for category-level recognition. In the context of category recognition, it
is no longer sufficient to use individual local features, and it becomes
necessary to model intra-class variations, to select discriminant features,
and to model spatial relations. This leads to a part-based approach to
category-level recognition that I will illustrate with two examples. The
first one represents images as distributions of local parts and learns a
Support Vector Machine classifier with kernels based on two effective
measures for comparing distributions, the Earth Mover’s Distance and the
Chi-square distance. The second one represents object classes with a
dictionary of composite semi-local parts, i.e., groups of neighboring
keypoints with stable and distinctive appearance and geometric layout. A
discriminative maximum entropy framework is used to learn the posterior
distribution of the class label given the occurrences of parts from the
dictionary in the training set.
This is joint work with S. Lazebnik, M. Marszalek, J. Zhang and J. Ponce.
- Three-Dimensional Computer Vision: Challenges and Opportunities du 12 octobre 2005 — Jean Ponce
This talk addresses two of the main challenges of computer vision: automatically identifying three-dimensional (3D) objects in photographs despite arbitrary viewpoint variations, occlusion, and clutter; and recovering accurate models of 3D shapes observed in multiple images. I will first present a new approach to object recognition that combines local invariants with global geometric constraints to construct 3D object models from multiple images and/or stereo views and effectively identify them in heavily cluttered photographs taken from unknown viewpoints. I will then discuss a novel algorithm that uses the geometric and photometric constraints associated with multiple calibrated photographs to construct high-quality solid models of complex 3D shapes in the form of carved visual hulls. I will conclude with a brief discussion of exciting
new application domains and wide open research issues.
Joint work with Yasutaka Furukawa, Akash Kushal, Svetlana Lazebnik,
Fred Rothganger, and Cordelia Schmid.
- Modélisation 3D de scènes dynamiques à partir de plusieurs vues du 16 novembre 2005 — Edmond Boyer
Dans cet exposé, je présenterai quelques travaux réalisés au sein de l’équipe MOVI de l’INRIA Rhône-Alpes, et portant sur le thème de l’acquisition de modèles dynamiques à partir de flux vidéos. Je m’intéresserai en particulier aux résultats obtenus avec la plate-forme expérimentale Grimage, un environnement multi-caméras, de capture de mouvements et de formes, pour des applications interactives. Cet environnement est constitué d’un espace d’acquisition entouré de caméras, d’un écran de visualisation haute résolution et d’une grappe de PC. Il permet d’extraire et de visualiser, en temps réel, des informations 3D sur la scène observée par les caméras. Je discuterai des questions relatives aux différents éléments constituant la plate-forme, de l’acquisition d’images à la
modélisation et la reconnaissance de mouvements.
- Detecting people in images and videos and reconstructing their movements du 16 novembre 2005 — Bill Triggs
Detecting humans in images is a challenging task owing to
their variable appearance and the wide range of poses that they can
adopt. I will present detectors for upright humans in static images
and in videos. The detectors use a linear SVM classifier over a robust
visual feature set based on well normalized local histograms of image
gradient orientations. The video detector also incorporates oriented
histograms of differential optical flow to capture cues for human
motion despite moving cameras and backgrounds.
In the second part of the talk, I will give an overview of some of our
work on reconstructing human body motions from monocular image
sequences. We avoid using an explicit 3-D body model, instead taking
a learning based approach that directly regresses 3-D pose (joint
angles) from robust shape descriptors extracted from image
silhouettes. A kernelized Relevance Vector Machine is used for
regression. Ambiguities in the silhouette representation cause
occasional failures and we present two methods to correct this:
incorporating a learned dynamical model, and using multi-valued
regression to generate several reconstruction hypotheses along with
their associated probabilities of being correct.
Work done with my students Navneet Dalal and Ankur Agarwal.
- Using Context in Scene Analysis and Object Detection du 16 novembre 2005 — Martial Hebert
This talk will include a review of some of our current
activities in the general area of object recognition and scene
understanding. In particular, I’ll review some recent ideas for
incorporating representations of context in object recognition and
scene understanding approach. Context may include geometric relations
between object parts, relations between objects, relations between
regions in images, and geometric cues. These ideas are applied to
object detection and scene interpretation. If time allows, ideas for
extensions to the temporal domain for video analysis will be
discussed.
- Object and Scene Recognition in Large Datasets du 16 novembre 2005 — David Lowe
Many real-world applications of computer vision require
recognizing small objects within large datasets containing thousands
of images. This talk will describe some new algorithms for efficient
indexing within large datasets, including randomized tree algorithms
for fast nearest-neighbour matching and probabilistic methods that
determine the minimal number of matches needed for reliable object
detection. Some applications of these methods will be described for
panorama recognition, location recognition for augmented reality, and
a system that can identify any product in a supermarket from a partial
image.
- Séminaire vision artificielle du 23 novembre 2005 — Renaud Keriven
- Image, Texture, Video & ’Structural’ Completion: from LEGO’s to Combinatorial Optimization du 30 novembre 2005 — Nikos Paragios
Image Completion (often called inpainting) has emerged to be
a high level task of low level vision. Such a procedure consists of
completing missing content in images. Central idea within such an
approach is often the principle of good continuation, which consists
of adding content using information from the borders of the area to be
inpainted. While such methods can be quite efficient when dealing with
smooth content, fail to account for texture while their extension to
complete missing content in video as well as 3D is not
straightforward. In this talk we propose a novel technique that
addresses image renaissance, video inpainting and structure completion
through a "multi-level" graph-based matching process. To this
end, numerous patches that do present similarities with the local
content around the missing part are considered. The selection of these
patches is done through a particle filter method to address the task
of hypotheses evaluation. These patches are positioned on top of
missing segment, ordered depending on their similarity weight, and
form in some fashion a multi-layered graph over time. Markov Random
Fields are used to formalize inpainting as a labelling estimation
problem while a combinatorial approach is used to recover the optimal
combination of patches to complete the missing structure. The min-cut
max-flow algorithm within the -expansion process is used to determine
the optimal cut that, in an implicit fashion, completes the missing
image structure. Promising results in image and texture completion
demonstrate the potentials of the proposed method.
Joint work with Cedric Allene
- Computer Vision and the Art of Special Effects du 2 décembre 2005 — Steve Sullivan
La diffusion des illustrations de l’exposé n’a pas été autorisée par Industrial Light and Magic
Computer vision techniques are now quite common in visual effects
production. Camera matchmove, object tracking, motion capture, and
image-based modeling have been used in hundreds of films and TV shows
and are no longer considered exotic. In practice, however, they are
far from robust or automatic, and the next generation of production
technologies will demand major advancements in reliability, user
interface, and real-time performance.
In this talk, I’ll disuss how computer vision techniques are changing
the way movies are made, then cover a few technologies which promise
major advances in the near future. Particular attention will be paid
to virtual production, and the need for interactive data acquisition
to bring directors onto the virtual set.
- Video Google - Faces du 7 décembre 2005 — Andrew Zisserman
Matching people based on their imaged face is hard because
of the well known problems of pose, size and expression variation.
Indeed these variations can exceed those due to identity.
Fortunately, videos of people have the happy benefit of containing
multiple exemplars of each person in a form that can easily be
associated automatically using straightforward visual tracking.
We describe progress in harnessing these multiple exemplars in order
to retrieve humans automatically in videos, given a query face in a
shot. There are three areas of interest: (i) the matching of sets of
exemplars provided by "tubes" of the spatial-temporal volume; (ii) the
description of the face using a spatial orientation field; and (iii)
the structuring of the problem so that retrieval is immediate at run
time.
The result is a preliminary "Video Google - Faces", able to retrieve a
ranked list of shots containing a particular person in the manner of
Google. The method will be demonstrated on several feature length
films.
Joint work with Josef Sivic and Mark Everingham.
- People Tracking with a Multi-Camera Setup du 11 janvier 2006 — François Fleuret
In this talk, I will show that in a multi-camera context, we can
effectively track and estimate the locations of an a priori unknown
number of individuals with good accuracy, despite complex occlusions.
Our algorithm initially estimates for each isolated frame a
conditional probability of occupancy for every location on the ground
plane, given binary images produced by a simple background subtraction
procedure. We show that a simple Bayesian formulation leads to a large
system of equations whose variables are the conditional marginal
probabilities of occupancy at each location. This system can be solved
iteratively at a reasonable speed (10 frames per second with two
cameras and a 25cm accuracy). Despite the absence of temporal
consistency and the poor quality of the input data, this procedure by
itself provides accurate detection of individuals on isolated frames.
The results can be improved by combining these estimates obtained on a
few tens of isolated frames into a classical HMM, taking into account
both the color consistency and a simple motion model.
We demonstrate the quality of our results on several sequences. The
full algorithm performs reliably on these test sequences, with no
false negative or false positive, and an error of less than 30cm for
more than 90% of the predicted locations.
If there is time left after this main subject, I will briefly
introduce a more prospective topic: learning the appearance of an
object from a single example. Instead of using a large number of
pictures of the object to recognize, we use a labeled reference
database of pictures of other objects to learn high-level
invariance. We propose to build hundreds of random binary splits of
the training set, chosen to keep together the images of any given
object, and to combine those splits with a Bayesian rule into a
posterior probability of similarity.
Joint work with J. Berclaz, R. Lengagne and P. Fua.
- Collaboration between Computer Vision and Computer Graphics - Applications du 11 janvier 2006 — André Gagalowicz
This talk is a kind of illustration of the presentation by Steve Sullivan on the second of December. I will first explain what is post-production and 3D rotoscopy which is the most important technique in post-production applications. Then I will discuss the computer vision/computer graphics strategy used to perform this task. The case of rigid objects where the strategy appears clearly will first be described. I will then proceed to the case of articulated objects and especially to the case of a full human body tracking (when humans wear rather tight garments). Some results
related to the tracking of professional golfers’swing will be discussed. Finally, I will give some results of 3D face tracking which is a case of deformable objects. I will conclude with a presentation of other possible applications of the research done at the MIRAGES laboratory at INRIA Rocquencourt.
- Toward a Geometrically Coherent Image Interpretation du 26 septembre 2006 — Alexei Efros
Image interpretation, the ability to see and understand the three-dimensional world behind a two-dimensional image, goes to the very heart of the computer vision problem. The ultimate objective is, given an image, to automatically produce a coherent interpretation of the depicted scene. This requires not only recognizing specific objects (e.g. people, houses, cars, trees), but understanding the underlying structure of the 3D scene where these objects reside.
In this talk I will describe some of our recent efforts toward this lofty goal. I will present an approach for estimating the coarse geometric properties of a scene by learning appearance-based models of geometric classes. Geometric classes describe the 3D orientation of image regions with respect to the camera. This geometric information is then combined with camera viewpoint estimation and local object detection producing a prototype for a coherent image-interpretation framework.
Joint work with Derek Hoiem and Martial Hebert at CMU.
- Color Space du 13 novembre 2007 — Jan Koenderink
Structure of the space of colors as related to the space of radiant power spectra
- Image Space du 13 novembre 2007 — Jan Koenderink
Structure of images, image transformations, etc.
- Image Texture and the "Flow of Light" du 14 novembre 2007 — Jan Koenderink
Light field, light flow over surfaces, novel SFS algorithms
- Pictorial Space du 14 novembre 2007 — Jan Koenderink
Psychophysics, nature of the geometry
Consulter les autres cycles du même groupe :