Comparison of Naïve Bayes and K-Nearest Neighbor for DKI Jakarta Air Pollution Standard Index Classification

Penulis

  • Nurdalia Universitas Negeri Padang
  • Zilrahmi
  • Dony Permana
  • Admi Salma

DOI:

https://doi.org/10.24036/ujsds/vol1-iss2/29

Kata Kunci:

Confusion Matrix, Data Mining, KNN, Naive Bayes

Abstrak

Data mining is the process of extracting and searching for useful knowledge and information using certain algorithms or methods according to knowledge or information. The data mining classification methods used in this study are Naïve Bayes and K-Nearest Neighbor. By using the Naïve Bayes and K-Nearest Neighbor methods, it is possible to classify the DKI Jakarta air pollution standard index in 2021 based on six air pollutants, namely dust particles (PM10), dust particles (PM2.5), sulfur dioxide (SO2), carbon monoxide. (CO), ozone (O3) and nitrogen dioxide (NO2). The test was carried out to determine the accuracy in predicting the DKI Jakarta air pollution standard index in 2021 using the confusion matrix evaluation value. So that the best performance of the two methods is found in the Naïve Bayes algorithm with high Naïve Bayes sensitivity values ​​for all categories even though there are data in minority or unbalanced categories, and the frequency of data from each category or in this case the data is not balanced, the Naïve Bayes algorithm shows good performance in accuracy, sensitivity, specificity.

Unduhan

Diterbitkan

2023-03-08

Cara Mengutip

Nurdalia, Zilrahmi, Permana, D., & Salma, A. (2023). Comparison of Naïve Bayes and K-Nearest Neighbor for DKI Jakarta Air Pollution Standard Index Classification. UNP Journal of Statistics and Data Science, 1(2), 67–73. https://doi.org/10.24036/ujsds/vol1-iss2/29