Comparison of Naïve Bayes and K-Nearest Neighbor for DKI Jakarta Air Pollution Standard Index Classification

Authors

  • Nurdalia Universitas Negeri Padang
  • Zilrahmi
  • Dony Permana
  • Admi Salma

DOI:

https://doi.org/10.24036/ujsds/vol1-iss2/29

Keywords:

Confusion Matrix, Data Mining, KNN, Naive Bayes

Abstract

Data mining is the process of extracting and searching for useful knowledge and information using certain algorithms or methods according to knowledge or information. The data mining classification methods used in this study are Naïve Bayes and K-Nearest Neighbor. By using the Naïve Bayes and K-Nearest Neighbor methods, it is possible to classify the DKI Jakarta air pollution standard index in 2021 based on six air pollutants, namely dust particles (PM10), dust particles (PM2.5), sulfur dioxide (SO2), carbon monoxide. (CO), ozone (O3) and nitrogen dioxide (NO2). The test was carried out to determine the accuracy in predicting the DKI Jakarta air pollution standard index in 2021 using the confusion matrix evaluation value. So that the best performance of the two methods is found in the Naïve Bayes algorithm with high Naïve Bayes sensitivity values ​​for all categories even though there are data in minority or unbalanced categories, and the frequency of data from each category or in this case the data is not balanced, the Naïve Bayes algorithm shows good performance in accuracy, sensitivity, specificity.

Published

2023-03-08

How to Cite

Nurdalia, Zilrahmi, Permana, D., & Salma, A. (2023). Comparison of Naïve Bayes and K-Nearest Neighbor for DKI Jakarta Air Pollution Standard Index Classification. UNP Journal of Statistics and Data Science, 1(2), 67–73. https://doi.org/10.24036/ujsds/vol1-iss2/29

Most read articles by the same author(s)

1 2 3 4 > >>