other than those local features, a global image descriptor GIST [18] was also reported to be used in SLAM [25,26,27]. B.Deep Learningbased Approaches Recent advances in deep learning motivate researchers to adopt it in SLAM. Stacked auto-encoder was trained to learn a feature representation and find loops in similarity or difference matrix [20]. A pre-trained CNN model can generate abstract image representation and outperforms stateof-the-art hand-crafted features in illumination variant environment [19]. PCANet [21] is a simple deep learning network and has shown its effect image representation abilities in face recognition, written digits recognition and texture classification. The PCANet consists of three parts: cascaded principal component analysis (PCA), binary hashing, and block-wise histograms. Compared with CNN, PCANet is simpler in network architecture, and requires no expertise of parameter tuning or long-time training. The advantages of PCANet suggest itself to be a very good choice in robotic applications. This motivates us to apply it in loop closure detection. III.OUR APPROACHIn this section, the details of our SLAM system and the proposed loop closure detection algorithm are described. Our SLAM system uses graph optimization method and is made up of three main parts: 1) the frontend, 2) loop closure detection module based on PCANet features, 3) the backend. A detailed SLAM system architecture is shown in Figure 1. AFrontend We employ an RGB-D sensor which provide chromatic and depth information of the scene. The frontend in our SLAM is mainly used to solve the location problem by estimating transformation from the sensor data. Our system first extracts features from two consecutive RGB images and then matches these features for correspondences. We use some basic visual algorithms for detection, extraction and matching of various visual features, e.g., SIFT, SURF and ORB. We use RANSAC [28] algorithm to compute the transformation estimation which includes a rotation matrix and a translation vector. A visual odometry is built from the estimated transformation and the relative position of each two consecutive frames. If the current frame could be matched to the previous frame, a node is then added to the pose graph of the SLAM backend, the estimated transformation is added as an edge connecting the existing nodes.