Deep learning and convolution neural network (CNN) have shown their advances in object recognition and scene understanding [19,20]. Different from the traditional handcrafted features, the architecture of CNN usually consists of multilayers and can learn abstract representations of input images. Researchers have investigated using deep learning in loop closure detection. However, training a deep CNN model is time consuming and requires expertise of parameter tuning. Choosing a deep learning network that is suitable for loop closure detection and can be easily implemented in robots is a problem to be addressed. In this paper, we present our investigation of using a cascaded, PCA based network – PCANet [21] in visual loop closure detection. We use Microsoft Kinect as the visual sensor and built a RGB-D SLAM system [9] to test our loop closure detection method. The structure of this paper is as follows. In Section II, we introduce the related work about loop closure detection in SLAM. Section III describes our SLAM system and the details of how we train PCANet to extract features. In Section IV, we show experimental results on three open datasets to evaluate the performance of PCANet features in loop closure detection. Finally, we conclude the paper in Section V. II.RELATED WORKIn this section, we will review previous work on image feature and image description used in visual loop closure detection. We introduce approaches using traditional handcrafted features and deep learning-based features, especially the PCANet feature. A.Traditional Approaches Bag-of-Words (BoW) is widely used in loop closure detection for visual SLAM [22,23,24]. The Bag-of-Words model was originally designed for document classification, where documents are considered as unordered sets of words. In computer vision, the BoW model extracts a set of unordered visual features (called the “visual words”) as representation of images. The BoW model extracts local image features (e.g. SIFT, SURF), which were then clustered to generate visual vocabulary. The image is described by a list of the number of occurrences of each visual word. One of the most successful applications of the BoW in loop closure detection is Fast AppearanceBased Mapping (FABMAP) [24]. Other approaches using hand-crafted local features include Fisher vector (FV) [11,12] and vector of locally aggregated descriptors (VLAD) [13,14]. FV uses a Gaussian mixture model (GMM) to build a visual word dictionary. FV contains richer image information than BoW. VLAD is a simplification of FV, and makes a good balance between performance and computational efficiency.