Data Clustering - Even though the data was clean enough to train MLmodels, data was not labeled. Classification process is a supervised learning algorithm that need labeled data for the training process. Understanding the trafficpatterns in the dataset is a complicated and time-consuming task. Since thedataset is very large, it is very hard to label traffic flows manually. To avoidmanual labeling, an unsupervised learning model can be used. By using an unsupervised learning algorithm, network traffic data will be clustered based on allthe possible correlations of network traffic data. For this process, Kmeans unsupervised learning model was used as shown in Figure 1. It is a high accuracy, fastlearning model ideal for large datasets. The number of clusters will be selectedusing the Davies-Bouldin algorithm [8]. This method is calculating distances ofclusters by using Euclidean distances and lower the score better the cluster interms of similarity ratio of within-cluster and between cluster distances. By selecting k value with the lowest Davies-Bouldin score, Dataset was clustered andlabeled.