(Professor and computer vision expert Rich Radke was kind enough to write the below post for The Approach – Enjoy!)
I’m Rich Radke, an associate professor in the ECSE (Electrical, Computer, and Systems Engineering) Department. For the past several years, I’ve been part of a research group working with a 3-D scanning technology called LiDAR, which stands for Light Detection and Ranging. The principle is the same as the “laser measuring tape” device you can buy at a hardware store- you point a laser beam at an object, and the time it takes to bounce back tells you how far away the surface is. We have a much more sophisticated machine that uses a rapidly spinning mirror to send a few beams a second into the world at different angles to scan large buildings. Architects and builders have been using these types of devices for years for construction and surveying, and they’re also used for scanning actors’ faces or bodies to create digital stunt doubles and special effects in movies. Just recently, many bloggers discovered LiDAR due to a Radiohead video that was partially created with the technology.
Our interest in LiDAR involves acquiring and automatically processing very large-scale scans of buildings. We’re in the process of building an entire 3-D digital model of our campus, although it’s going to take a while. Each scan takes about an hour to acquire, and looks like this:
Although it may look like the building is a solid surface, it’s actually composed of hundreds of thousands of single points; we need to design algorithms to “connect the dots” into building faces. The points are colored based on the image we get from a digital camera attached to the scanner. Since the laser can’t penetrate solid surfaces, a single scan typically has many “shadows” where one object got in the way of another (for example, the tree casts a shadow of missing data on the building in the above example). We need many scans to get an accurate building model, and a key problem is to automatically stitch all the scans together into a single frame of reference. We’ve developed some pretty good algorithms for this problem; this figure shows more than 20 scans of the Voorhees Computing Center automatically stitched together with no human intervention.
We’ve also designed a kind of “visual GPS” algorithm: if we take a picture of an environment, can we automatically tell where we were standing and looking with respect to the LiDAR model? The three pictures below show one of our example results: the top image was taken with a digital camera, but we have no information about where the picture was taken. Our algorithm automatically determines where the picture was taken from, and we can see that the result is correct by looking at the 3-D scan from the perspective predicted by the algorithm (bottom). The algorithm even works when the images are tricky- for example, taken in the middle of the night or after a snowstorm.
We still have many unsolved problems to address. We recently took a few scans of the new EMPAC building on campus. The building is mostly glass, and we found out that some laser beams reflect off the glass surfaces (left wall), many pass through them to hit other objects inside the building (front wall), and some never get reflected back to the scanner at all (right side wall). This makes it difficult to get an accurate idea of where the outside of the building is, and is a challenge for our stitching and image localization algorithms.
We are also particularly interested in a technology called flash LiDAR, which acquires thousands of range points (“frames”) simultaneously, as fast as a video camera. Working with this data will be challenging since either the points in each frame are really close together (like looking through a soda straw) and hence difficult to stitch, or the points will be far enough apart that we won’t really be able to reliably guess what’s in between the points in each frame. We’re currently investigating a representation of the LiDAR data that doesn’t make rigid decisions about where every surface in the environment is, to help us handle these types of problems and deal with complex regions like wires, trees, and fences.
Once we have a good LiDAR representation of an environment, what can we do with it? We hope to be able to detect changes in scenes based on images and scans taken at widely different times and perspectives; to be able to figure out what regions have not been adequately sampled, and where we should put the scanner next; and to detect different kind of objects like vehicles.
The research I described was largely funded by the US Army Intelligence and Security Command, and continues to be funded by my membership in DARPA’s Computer Science Study Group. This included a great opportunity for me to get up-close tours and briefings at military bases and Department of Defense facilities throughout the country; you can read more about it here. For the technical details, our research papers are available here and here. All of the research here was co-advised with my colleague Chuck Stewart in the Computer Science Department. Students involved in this research included Brad King, Eric Smith, Jacob Becker, Ted Yapo, and David Doria.