ABSTRACT: Computer vision architectures used to be built on a sparse sample of points in the 80s and 90s. In the 2000s, dense models started to become popular for visual recognition as heuristically defined sparse models do not cover all the important parts of an image. However, with deep learning and end-to-end training approaches, this does not have to continue and sparse models may still have significant advantages in saving unnecessary computation as well as being more flexible. In this talk, I will talk about the deep point cloud convolutional backbones that we have developed in the past few years, including the most recent work PointConvFormer that outperforms grid-based convolutional approaches. I will also talk about a recent work, AutoFocusFormer, that uses point cloud transformer backbones and decoders to work on 2D image recognition, with a novel adaptive downsampling module that enables the end-to-end learning of adaptive downsampling. Results show significant improvements in both 3D and 2D recognition tasks. Especially, on the CityScapes benchmark, a model with only 42 million parameters with our approach outperforms the state-of-the-art Mask2Former Large model with 197 million parameters.
BIO: Fuxin Li is currently an associate professor in the School of Electrical Engineering and Computer Science at Oregon State University. He has held research positions in Apple Inc., University of Bonn and Georgia Institute of Technology. He had obtained a Ph.D. degree in the Institute of Automation, Chinese Academy of Sciences in 2009. He has won an NSF CAREER award, an Amazon Research Award, (co-)won the PASCAL VOC semantic segmentation challenges from 2009-2012, and led a team to the 4th place finish in the DAVIS Video Segmentation challenge 2017. He has published more than 70 papers in computer vision, machine learning and natural language processing. His main research interests are point cloud deep networks, human understanding of deep learning, video object segmentation, multi-target tracking and uncertainty estimation in deep learning.