Current Projects

 Example of text spotting. The words visible in the image have been detected, 

Text spotting with smartphone cameras

We are developing fast text spotting algorithms that can reliable detect and localize any text visible in images acquired by a smartphone camera. 

QinManduchi. Research funded by CITRIS.


 Floor plan of a building, shown with reconstructed path

Geometric reconstruction of indoor environments

Indoor environments are often characterized by simple geometric features such as lines and planes that are either parallel or orthogonal (Manhattan geometry). Our algorithms exploit the Manhattan geometry for robust structure and motion reconstruction. 

Kim, KimManduchi.


 Plots of inertial sensors 

Blind indoor localization using inertial sensors

Personal indoor localization has received a lot of attention in recent years.  We are interested in simple techniques that use the intertial sensors embedded in a smartphone to help blind travelers localize themselves. However, the gait of blind walkers using a long cane or a dog is different than that of sighted walkers. We have collected an extensive data set of inertial data set (WeAllWalk) from blind walkers, and are developing robust algorithms for self-localization and safe return based on step counting and turn detection.

FloresManduchi. Research funded by CITRIS. The WeAllWalk data set can be accessed here.


 A woman holding an iPhone over a document

Assisted mobile OCR

Mobile OCR apps allow blind people to access printed text. However, for OCR to work, it is necessary that an well-framed, well-resolved picture of the docume'nt is taken - something that is difficult to do without sight. We are developing computer vision-based  mechanisms that can provide directions to a blind user in real time about where to move the phone to increase the chance to capture an OCR-readable snapshot of a document.

Cutter, Manduchi. Research funded by NIH NEI.


 Picture of a blind person waiting for the bus, with a bus arriving

Accessible public transportation

Using public transportation may be challenging for those who cannot see or have cognitive impairments. We are developing a cloud-supported sensory infrastructure that will facilitate access to real-time information for travelers using public transit. 

FloresManduchi. Research funded by TRB and NSF. Press release here.



Gaze-contingent screen magnification

People with low vision may need screen magnification to access a computer. We are developing algorithms for eye gaze-based control of screen magnification. Our system uses images from the camera embedded in the computer screen to detect and track the user's eye gaze.

Kaur, Cazzato, Dominio, Manduchi. Research funded by Research to Prevent Blindness and Reader’s Digest Partners for Sight Foundation. Press release here.


Completed Projects

 Image showing a blind person using a mobile vision system

(Computer) vision without sight

Mobile computer vision applications normally assume that the user can look through the viewfinder. But what if the user has no sight? This project aims to understand how a blind person can access information about the environment using mobile vision, and to identify the requirements of a mobile computer vision to support information access without sight.

Manduchi. Work in collaboration with J. Coughlan of SKERI. Research funded by NIH NEI.


 Image showing a color pattern

High-efficiency color barcodes

Color barcodes (e.g. Microsoft Tag) enable higher information density than regular 2-D barcodes. We are developing a new approach to increase the number of colors that can be used in a barcode (and thus the achievable information density) while ensuring robust information recovery via mobile vision.

Bagherinia, Wang, Manduchi. Research funded by NSF.


 Image showing a camera cell phone detecting a color marker

Wayfinding for blind persons using camera cell phones

We have developed a system that allows blind persons to find their way in an unfamiliar environment using a regular camera cell phone. Locations of interest are labeled with specialized color marker that are detected quickly and robustly by the cell phone. A blind user can thus be guided through these landmarks to destination. Here you can download the software for color marker detection and design.

Bagherinia, Gray, Manduchi. Work in collaboration with J. Coughlan of SKERI. Research funded by NIH and NSF.


 Image showing a barcode and a detail of a scanline

Reading difficult bar codes with cell phones

There is a growing interest in cell phone apps that can read bar codes printed on products. Unfortunately, a number of factors (low resolution, motion blur, poor lighting) make bar code reading by cell phones a challenging problem. We have developed an algorithm for maximum likelihood bar code reading that outperforms all other published state-of-the-art techniques.

Gallo, Manduchi. Research funded by NIH and NSF.


 Image showing a prototype laser cane

Environment exploration using a virtual white cane

The long cane is the most widely used mobility tool for blind people. It allows one to extend touch and to "preview" the lower portion of the space in front of oneself. We are designing laser-based hand-held devices that enable environment exploration without the need for physical contact. Using active triangulation, our devices can identify obstacles and other features that are important for safe ambulation (such as steps and drop-offs).

Ilstrup, Yuan, Manduchi. Research funded by NSF.


 Image showing the result of deghosting for a HDR image

High-dynamic-range Imaging

For scenes characterized by a range of irradiance values that is too large to be captured with a single shot, one can resort to High-dynamic-range Imaging (HDR). A common strategy is to capture a stack of differently exposed pictures and combine them into an HDR image. We investigate different stages of this pipeline, from radiometric calibration, to the weighting schemes used in the combination, to the correction of artifacts such as ghosting in the final image.

Gallo, Manduchi. Work in collaboration with M. Tico, N. Gelfand, and K. Pulli of the Nokia Research Lab, Palo Alto. Research funded in part by Nokia.


 Image illustrating the people counting algorithm

Video surveillance using an ultra-low-power contrast-based camera node

Energy consumption is a critical constraint for wireless camera networks. We have designed and implemented a fast recognition algorithm on a self-standing node based an ultra-low-power contrast-based camera and a Flash FPGA processor. The whole node consumes lass then 10 mW for image acquisition and processing.

Gasparini, Manduchi, Gottardi. Research in collaboration with the Fondazione Bruno Kessler, Trento, Italy.


 Image illustrating the VIPeR dataset

Viewpoint Invariant Pedestrian Recognition

Recognizing people in images and video is one of the most fundamental problems in computer vision. We focus on matching images of pedestrians from single image frames of different pose and viewpoint. Our approach focuses on finding methods of comparing pedestrian images which are invariant to elements not associated with the persons identity. Our VIPeR dataset (Details) is freely available for use by the community.

Gray, Brennan, Tao


 Image illustrating the Haar-like features

Efficient image representation using Haar-like features

The efficient and compact representation of images is a fundamental problem in computer vision. In this project, we propose methods that use Haar-like binary box functions to represent a single image or a set of images. A desirable property of these box functions is that their inner product operation with an image can be computed very efficiently. We show that using this efficient representation, many vision appliations can be significantly accelerated, for example: template matching, image filtering, PCA project, image reconstruction.

Tang, Crabb, Tao


 Image illustrating the co-tracking algorithm

Co-tracking using semi-supervised support vector machines

We treat tracking as a foreground/background classification problem and propose an online semi-supervised learning framework. Classification of new data and updating of the classifier are achieved simultaneously in a co-training framework. Experiments show that this framework performs better than state-of-the-art tracking algorithms on challenging sequences.

Tang, Brennan, Tao