PointConvFormer. The revenge of point-based convolution

We introduce PointConvFormer, a new building block for point cloud-based deep network architectures. Inspired by generalization theory, PointConvFormer combines ideas from point convolution, where filter weights are based only on relative position, and transformers, which use feature-based attention. In PointConvFormer, the focus, calculated from the feature difference between neighborhood points, is used to modify the convolutional weights of each point. Thus, we preserved the invariants from pointwise convolution, while attention helps to select appropriate points in the neighborhood for convolution. PointConvFormer is suitable for many tasks that require detail at the point level, such as segmentation and scene flow estimation tasks. We experiment on both tasks with multiple datasets including ScanNet, SemanticKitti, FlyingThings3D, and KITTI. Our results show that PointConvFormer offers a better accuracy/speed tradeoff than classical convolutions, regular transforms, and voxelized sparse convolution approaches. The visualizations show that PointConvFormer works like convolution on flat areas, while the neighborhood selection effect is stronger on object boundaries, showing that it’s got the best of both worlds. The code will be available.

Source link