系统枢纽“机器人及自主系统”学域讲堂(Speaker: Dr. Hengshuang ZHAO)
Building intelligent visual systems is essential for the nextgeneration of artificial intelligence systems. It is a fundamentaltool for many disciplines and beneficial to various potentialapplicationssuch asautonomousdriving,robotics.surveillance,augmented reality, to name a few. An accurateand efficient intelligent visual system has a deep understandingof the scene, objects, and humans. lt can automaticallyunderstand the surrounding scenes. In general, 2D images and3D point clouds are the two most common data representationsin our daily life. Designing powerful image understanding andpoint cloud processing systems are two pillars of visualintelligence,enabling the artificial intelligence systems toand interact with the current status of theunderstandenvironment automatically. In this talk, I will first present ourefforts in designing modern neural systems for 2D imageunderstanding,including high-accuracy and high-efficiencysemantic parsing structures, and unified panoptic parsingarchitecture. Then, we go one step further to design neuralsystems for processing complex 3D scenes, includingsemantic-level and instance-level understanding. Further, weshow our latest works for unified 2D-3D reasoning frameworkswhich are fully based on self-attention mechanisms. In the end.the challenges, up-to-date progress, and promising futuredirections for building advanced intelligent visual systems will bediscussed.