UNP Journal of Statistics and Data Science https://ujsds.ppj.unp.ac.id/index.php/ujsds UNP Journal of Statistics and Data Science Departemen Statistika Universitas Negeri Padang en-US UNP Journal of Statistics and Data Science 3025-5511 Error Correction Model Approach for Analysis of Original Regional Income in West Sumatra https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/332 <p>In this research, an error correction model approach is used, namely looking at long-term and short-term<br />relationships. Meanwhile, Original Regional Income (PAD) is all regional income originating from original regional<br />economic sources. Sources of Original Regional Income according to Law Number 33 of 2004 Chapter V Article 6<br />consist of Regional Taxes, Regional Levies, Separated Regional Wealth Management Results and Other Legal PAD.<br />because this approach uses long-term and short-term relationships, it is known that only variables x1 and x3 have a<br />long-term relationship and variables x1 and x3 have a short-term relationship. so it can be concluded that not all<br />independent variables have a connection with the dependent variable</p> Herlena Purnama Sari Fadhilah Fitri Nonong Amalita Tessy Octavia Mukhti Copyright (c) 2025 Herlena Purnama Sari, Fadhilah Fitri, Nonong Amalita, Tessy Octavia Mukhti https://creativecommons.org/licenses/by/4.0 2025-03-02 2025-03-02 3 1 10.24036/ujsds/vol3-iss1/332 Analysis of Public Sentiment towards Corruption Based on Tweets Using Naive Bayes Classifier https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/345 <p><em>Corruption is one of the big problems faced in Indonesia. The still high rate of corruption can damage the integrity of government, hamper economic growth, and reduce public trust in public institutions. Even though the government has made efforts to eradicate corruption, such as the formation of the Corruption Eradication Commission (KPK), these big challenges remain. Social media, especially Twitter, has become an important platform for people to voice opinions and criticize corruption issues. Sentiment analysis is used to detect opinions in the form of judgments, evaluations, attitudes and emotions of a person. The textual classification algorithm used in this research is Naive Bayes. This research aims to determine public sentiment towards corruption in Indonesia in positive, negative and neutral categories. This is done by data preprocessing, data labeling, and classification. The results of sentiment classification using the Naïve Bayes method obtained positive sentiment of 11, negative sentiment of 14, and neutral sentiment of 1485. So it can be concluded that Indonesian society tends to have neutral sentiments towards corruption that occurs in Indonesia</em></p> Alivia Zulzila Latifah Jayatri Febiola Dodi Vionanda Copyright (c) 2025 Alivia Zulzila, Latifah Jayatri Febiola, Dodi Vionanda https://creativecommons.org/licenses/by/4.0 2025-02-28 2025-02-28 3 1 72 78 10.24036/ujsds/vol3-iss1/345 Analysis of The Effect of Unemployment, Economic Growth and Inflation on Poverty in West Sumatra Province https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/329 <p><em>Poverty remains a major challenge in West Sumatra, although various efforts have been made to improve community welfare. In this context, it is important to understand the factors that influence poverty levels. Unemployment, economic growth and inflation are several important variables that can have a significant effect on poverty levels. Unemployment is one of the problems that is often associated with poverty. On the other hand, strong economic growth has the potential to reduce poverty levels by creating new job opportunities and increasing people's incomes. However, non-inclusive economic growth can increase social inequality and uneven income distribution, which in the end can worsen poverty. Apart from that, inflation can also affect poverty levels by reducing people's purchasing power, especially those with low incomes. This research aims to analyze the effect of unemployment, economic growth and inflation on poverty levels. The multiple linear regression analysis method is used to test the relationship between the independent variables (unemployment, economic growth and inflation) and the dependent variable (poverty). </em><em>Based on the research findings</em><em>, it can be concluded that unemployment, economic growth and inflation contribute to poverty in West Sumatra at </em><em>49,35</em><em>% and the remainder </em><em>50,65</em><em>%</em> <em>is explained by other factors outside the model.</em><em>The analysis indicates </em><em>a significant linear influence on unemployment and economic growth on poverty in West Sumatra and there is no significant linear </em><em>impact of </em><em>inflation</em> <em> o</em><em>n poverty in West Sumatra</em><em>. <br /></em></p> Ulya Syafitri.J Zilrahmi Admi Salma Copyright (c) 2025 Ulya Syafitri.J, Zilrahmi, Admi Salma https://creativecommons.org/licenses/by/4.0 2025-02-28 2025-02-28 3 1 9 16 10.24036/ujsds/vol3-iss1/329 Sentiment Analysis of Using the YouTube Application Using the Naïve Bayes Method https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/343 <p><em>This study aims to analyze user sentiment towards the YouTube application using the Naive Bayes method. With the rapid growth of YouTube users worldwide, understanding user preferences and experiences is crucial. Sentiment analysis, a process of processing or extracting textual data to obtain information by categorizing positive or negative sentiment The Naive Bayes algorithm, a statistical approach commonly used in natural language processing and sentiment analysis, was applied due to its simplicity and efficiency. The research involved data collection through web scraping, followed by preprocessing steps such as cleaning, case folding, tokenization, stopword removal, and stemming. Feature selection was performed using TF-IDF (Term Frequency-Inverse Document Frequency) to assign weights to words based on their importance. The Naive Bayes classifier was then trained on the preprocessed data, and its performance was evaluated using accuracy, precision, recall, and F1-score metrics. The results showed an accuracy of 82%, precision of 83%, recall of 98%, and an F1-score of 89%, indicating the effectiveness of the Naive Bayes method in sentiment analysis for the YouTube application. This study provides valuable insights into user sentiment towards YouTube, enabling developers and content creators to enhance user experiences and marketing strategies.</em></p> Triana Putri Siti Nurhaliza Dodi Vionanda Copyright (c) 2025 Triana Putri, Siti Nurhaliza, Dodi Vionanda https://creativecommons.org/licenses/by/4.0 2025-02-28 2025-02-28 3 1 60 66 10.24036/ujsds/vol3-iss1/343 Classification of Determining Factors for Eligibility of Extreme Poverty Social Assistance Recipients in Dumai City for 2024 Using CHAID https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/354 <p><em>Poverty is one of the goals of the Sustainable Development Goals (SDGs). Poverty is a condition in which an individual falls below the standard minimum value of basic needs, both food and non-food. One of the efforts by the Indonesian government to alleviate poverty is through fulfilling needs in various sectors. Although the distribution of social assistance has been successfully implemented, there are still issues in determining beneficiaries who are not properly targeted. Therefore, it is necessary to identify the significant factors influencing the eligibility of social assistance recipients. The application of the CHAID method in classifying the determining factors for eligibility of extreme poverty social assistance recipients in Dumai City for 2024 shows that the significant factors influencing the eligibility status of extreme poverty social assistance recipients in Dumai City for 2024 are house size and neighbors' testimonies. The classification model's accuracy in determining the eligibility factors for extreme poverty social assistance recipients in Dumai City for 2024 is 87.70%.</em></p> Nurul Hasni Pajrini Dina Fitria Tessy Octavia Mukhti Copyright (c) 2025 Nurul Hasni Pajrini, Dina Fitria, Tessy Octavia Mukhti https://creativecommons.org/licenses/by/4.0 2025-02-28 2025-02-28 3 1 116 122 10.24036/ujsds/vol3-iss1/354 Regression Model Selection Analysis of Methanol Conversion Based on Temperature, Residence Time, Concentration, Oxygen Ratio, and Reactor System https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/339 <p><em>This study aims to determine the best regression model that explains the effect of temperature, residence time, methanol concentration, oxygen to methanol ratio, and reactor system on methanol conversion in supercritical water. Preliminary analysis showed a violation of the multicollinearity assumption, which affected the validity of the multiple linear regression model. To overcome this and determine the optimal model, variable selection was performed using the stepwise selection method. This method was evaluated based on predictive power, model accuracy and statistical validity. The results showed that the stepwise method produced an optimal model in predicting conversion. Reactor system and temperature were the most significant variables affecting methanol conversion. The conclusion of this study shows that the variable selection approach with stepwise selection can be effectively used to identify the best regression model, when classical assumptions are met. These findings make an important contribution to the optimization of supercritical water-based chemical processes. </em></p> Andre Marvero Fahmi Amri Muhammad Fadhil Irsyad Yenni Kurniawati Copyright (c) 2025 Andre Marvero, Fahmi Amri, Muhammad Fadhil Irsyad, Yenni Kurniawati https://creativecommons.org/licenses/by/4.0 2025-02-28 2025-02-28 3 1 47 52 10.24036/ujsds/vol3-iss1/339 Implementation of Text Mining for Emotion Detection Using The Lexicon Method (Case Study: Tweets About Pemilu 2024) https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/348 <p><em>The presidential election is a five-year event that is an important and crucial moment in the realisation of democracy in the Unitary State of the Republic of Indonesia (NKRI). In the modern political era, the development of information technology has had a significant impact in changing the way people interact and express their views on political issues, including in the Presidential election. One of the social media platforms that is often used to debate political and social issues is Twitter. The analysis method used in this research is sentiment and emotion analysis with a lexicon-based approach. The research stages consist of twitter data collection, data preprocessing, and emotion feature extraction. The first word to be highlighted in the 2024 election series on twitter social media is Anies. Trust is the most dominant emotion towards the three candidate pairs, namely Anies Muhaimin, Prabowo Gibran, and Ganjar Mahfud, showing high public trust.</em></p> Afifah Salsabilah Putri Eujeniatul Jannah Dodi Vionanda Syafriandi Copyright (c) 2025 Afifah Salsabilah Putri, Eujeniatul Jannah, Dodi Vionanda, Syafriandi https://creativecommons.org/licenses/by/4.0 2025-02-28 2025-02-28 3 1 100 107 10.24036/ujsds/vol3-iss1/348 Implementation of the Self Organizing Maps (SOMS) Method in Grouping Provinces in Indonesia Based on the Number of Crimes by Type of Crime https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/334 <p><em>Crime cases are often the main topic of daily news in various media in Indonesia. Some of these crime cases are detrimental to the surrounding community and some are detrimental and these actions cannot be avoided in human life because they have become one type of social phenomenon. To protect the community by providing a sense of security and peace, the Indonesian government, especially the police, must pay attention to conditions like this. The results of this study used the Self Organizing Maps (SOMs) method to obtain 3 clusters with the characteristics of each cluster. The first cluster with a low impact crime rate consists of 29 provinces. The second cluster with a moderate impact consists of 3 provinces showing the most dominant crime rate, namely crimes related to fraud, embezzlement, smuggling &amp; corruption compared to other clusters. The third cluster with a high impact consists of 2 provinces with the most prominent characteristics by showing almost all indicators of the number of crimes according to the type of crime experiencing the highest average crime cases compared to other clusters.</em></p> Putri fajriyanti nur Tessy Octavia Mukhti Nonong Amalita Admi Salma Copyright (c) 2025 Putri fajriyanti nur, Tessy Octavia Mukhti, Nonong Amalita, Admi Salma https://creativecommons.org/licenses/by/4.0 2025-02-28 2025-02-28 3 1 39 46 10.24036/ujsds/vol3-iss1/334 Analysis on Scopus Articles Padang State University Based on SINTA Website https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/346 <p><em>Universities have the responsibility to carry out education, research, and community service as mandated by Law Number 20 of 2003 on the National Education System in Article 20. The flagship research theme set by Universitas Negeri Padang (UNP) for the period 2020-2024 is "Development of Digital Learning Services and Development of Minangkabau Cuisine based on Local Potential." The focus of the flagship research activities at Padang State University encompasses two main research areas: 1) Digital Learning Services; and 2) Minangkabau Cuisine. The objective of this research is to compare the flagship research theme with the Scopus articles from Universitas Negeri Padang on the SINTA website. By analyzing the trends of Scopus article topics on the SINTA website using web scraping techniques and wordcloud visualization, it is concluded that there is a match between the trending topics of UNP's Scopus articles and UNP's flagship research theme, particularly in the field of Digital Learning Services. From the wordcloud results, which show keywords such as Learning, Development, Student, and Model. This research allows us to easily observe from the wordcloud visualization the trend of research topics in Scopus articles on SINTA at Universitas Negeri Padang, reflecting the realization of Universitas Negeri Padang flagship research theme for the period 2020-2024</em></p> Kerin Hagia Aidillah Dodi Vionanda Dony Permana Copyright (c) 2025 Kerin Hagia Aidillah, Dodi Vionanda, Dony Permana https://creativecommons.org/licenses/by/4.0 2025-02-28 2025-02-28 3 1 79 85 10.24036/ujsds/vol3-iss1/346 Forecasting the Price of Shallots in Padang City Using the SARIMA Method https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/330 <p><em>The fluctuation of shallot prices in Padang City has become a major concern for consumers, producers, and the government. This study applies the Seasonal Autoregressive Integrated Moving Average (SARIMA) method to forecast shallot prices from January 2020 to August 2024, using monthly time-series data. The analysis identifies ARIMA(1,1,2)(0,1,1)<sup>12</sup> as the optimal model for predicting shallot prices in Padang City, effectively capturing seasonal and non-seasonal patterns. Predictions for the period from September 2024 to August 2025 indicate a price increase trend, peaking in May 2025 before declining. The findings are expected to serve as a reference for planning production, distribution, and price control of shallots.</em></p> Dwika Larissa Fadhilah Fitri Dina Fitria Copyright (c) 2025 Dwika Larissa, Fadhilah Fitri, Dina Fitria https://creativecommons.org/licenses/by/4.0 2025-02-28 2025-02-28 3 1 17 25 10.24036/ujsds/vol3-iss1/330 K-means Cluster Analysis in Grouping Districts/Cities in West Sumatra Province Based on Types of Violence Against Women 2023 https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/344 <p><strong><em>Violence against women</em></strong><em> is a serious social issue and a violation of human rights. Women are often vulnerable to violence, whether physical, psychological, or sexual, which negatively impacts their physical and mental health. To understand the distribution of violence cases against women in West Sumatra Province, an analytical method is needed to classify regions based on the number of reported cases. </em><em>K-Means Clustering</em><em> is one of the clustering analysis methods used to group districts/cities based on similarities in the number of violence cases. This study aims to classify districts/cities in West Sumatra based on the number of female violence victims using the </em><em>K-Means Clustering</em><em> algorithm. The optimal number of clusters was determined using the silhouette method, resulting in three clusters. Cluster 3 has the highest average number of physical and sexual violence cases, consisting of four districts/cities: Solok Regency, Lima Puluh Kota, Solok City, and Payakumbuh City. Cluster 2 represents areas with a moderate level of violence, dominated by psychological abuse, and consists of five districts/cities. Meanwhile, Cluster 1 comprises ten districts/cities with the lowest recorded violence cases. This classification provides insight into the regional distribution of violence against women in West Sumatra, identifying areas that require more attention. The findings suggest that the government should prioritize regions with high levels of violence through stricter law enforcement, the provision of support services for victims, gender equality campaigns, and increased awareness of women's rights</em></p> Latifah Jayatri Febiola Fadhilah Fitri Fenni Kunia Mutiya Copyright (c) 2025 Latifah Jayatri Febiola, Fadhilah Fitri, Fenni Kunia Mutiya https://creativecommons.org/licenses/by/4.0 2025-02-28 2025-02-28 3 1 67 71 10.24036/ujsds/vol3-iss1/344 Panel Data Regression on Gross Regional Domestic Product in West Sumatra https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/328 <p><em>Economic growth is assessed by the amount of gross regional domestic product (GRDP) as part of the development of people's welfare. West Sumatra Province needs a development plan that is able to produce GRDP per capita population of 9 to 11 times the current economic growth. To examine the economic growth of a country, not only using cross section data, because it is important to observe the behavior of the research unit over several periods of time. So that research is carried out whether there is an influence on the level of labor force participation, average length of schooling, life expectancy, and the number of poor people on GRDP per capita in districts / cities in West Sumatra in 2020-2023 using panel data regression. This research is an applied research with secondary data obtained from the Regency / City RPJPD document and the official website of the West Sumatra Statistics Agency consisting of 19 districts / cities as objects and the period 2020-2023. </em><em>The factors that are significant to GRDP per capita are average years of schooling and life expectancy with the selected model, namely the fixed effect model. The model has a good ability to explain the dependent variable with a value of 82.72%</em></p> Eujeniatul Jannah Admi Salma Syafriandi Syafriandi Copyright (c) 2025 Eujeniatul Jannah, Admi Salma, Syafriandi Syafriandi https://creativecommons.org/licenses/by/4.0 2025-02-28 2025-02-28 3 1 1 8 10.24036/ujsds/vol3-iss1/328 Clustering Regions in West Sumatera Based on the Special Protection Index for Children Using K-Means Clustering with Silhouette Coefficient https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/356 <p><em>Child protection is a crucial aspect of social development, especially in West Sumatra Province, which consists of 19 regencies/cities with diverse child protection characteristics. This study aims to cluster regencies/cities in West Sumatra based on the 2021 Child Special Protection Index (IPKA) using the K-Means Clustering method with the Silhouette Coefficient. Secondary data were obtained from the Office of Women's Empowerment and Child Protection, Population Control, and Family Planning (DP3AP2KB) of West Sumatra Province, covering variables such as the percentage of working children, internet access, education level, poverty, and child neglect. The results show that the K-Means method is effective in quickly and accurately grouping data into homogeneous clusters, while a Silhouette Coefficient value of 0.70 indicates a strong cluster structure and high-quality grouping.</em></p> Siti Nurhaliza Tessy Octavia Mukhti Copyright (c) 2025 Siti Nurhaliza, Tessy Octavia Mukhti https://creativecommons.org/licenses/by/4.0 2025-03-02 2025-03-02 3 1 123 129 10.24036/ujsds/vol3-iss1/356 Classification of Factors Affecting Preeclampsia in Pregnant Women at RSUP. Dr. M. Djamil Padang using the CART Algorithm https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/341 <p style="text-align: justify; text-justify: inter-ideograph; text-indent: 21.25pt;"><em><span lang="EN-US" style="font-size: 10.0pt;">Preeclampsia is a pregnancy-specific disease characterized by hypertension and proteinuria that occurs after 20 weeks of gestation. Preeclampsia itself is caused by various factors that can influence the occurrence of preeclampsia in pregnant women, including age, parity, history of hypertension, obesity, and kidney disorders. This study aims to determine the risk factors influencing preeclampsia based on preeclampsia diagnosis at RSUP Dr. M. Djamil Padang by classifying each variable using a decision tree. This research employs the CART (Classification and Regression Tree) algorithm. The CART algorithm has a binary nature and can analyze response variables that are either categorical or continuous, handle data with missing values, and produce an interpretable tree structure. The study results indicate that the primary risk factor for preeclampsia is parity. The model developed using the CART algorithm was tested using a confusion matrix, yielding an accuracy of 54%, a precision of 33.3% in correctly classifying patients with mild preeclampsia (PER), and a recall of 23.8% in classifying patients with severe preeclampsia (PEB).</span></em></p> AULIA YUSWITA Dina Fitria Dony Permana Admi Salma Copyright (c) 2025 AULIA YUSWITA, Dina Fitria, Dony Permana, Admi Salma https://creativecommons.org/licenses/by/4.0 2025-02-28 2025-02-28 3 1 53 39 10.24036/ujsds/vol3-iss1/341 Sentiment Analysis of Twitter Users on Moscow Attack by ISIS with Naive Bayes Algorithm https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/349 <p><em>This study aims to analyze public sentiment towards the ISIS attack in Moscow, Russia on March 22, 2024 through twitter data using the Naive Bayes classification method. The attack had a significant impact on people's perceptions and reactions as reflected in the tweets of twitter social media users. To analyze this, 3005 English tweets from 22 March 2024 to 30 April 2024 relating to the event were collected using the crawling method with the phyton programming language. Preprocessing was done on the data to clean the data, then data labeling was done using phyton TextBlob. Naive Bayes algorithm is used to classify the sentiment of tweets into positive, and negative classes. The results of the research using Naive Bayes show that public sentiment tends to be negative towards the attacks that occurred. Naive Bayes classification results are quite good with an accuracy value of 70%, but there is an imbalance of data that tends to be biased towards negative sentiment. This research provides insight into how public opinion responds to events that occur and the performance of the Naive Bayes model in classification.</em></p> <p><em>&nbsp;</em></p> Cindy Pratiwi Dodi Vionanda Fayyadh Ghaly Copyright (c) 2025 Cindy Pratiwi, Dodi Vionanda, Fayyadh Ghaly https://creativecommons.org/licenses/by/4.0 2025-02-28 2025-02-28 3 1 108 115 10.24036/ujsds/vol3-iss1/349 Implementation of Association Rule on Agricultural Commodity Exports in Indonesia Using Apriori Algorithm https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/336 <p><em>Exports of agricultural commodities in Indonesia have the smallest contribution to state revenues and the movement of export values ​​in the last decade has not shown a significant increase compared to other export sectors. This shows that there are weaknesses in the export of agricultural commodities so that an analysis is needed to optimize export results to other countries. These weaknesses can be seen in terms of quality, price, infrastructure and technology. This study uses association rule analysis with the apriori algorithm with the aim of finding out what agricultural commodities are exported simultaneously and the resulting association rules. The apriori algorithm is an algorithm used to find association rules between items in a database by considering two main parameters, namely Support and Confidence. The data used is agricultural commodity export data obtained from the publication of the Central Statistics Agency in Indonesia in 2023. Based on the analysis carried out, there are 32 association rules generated with a minimum Support of 25% and a minimum Confidence of 80%. Then after the Lift Ratio test was carried out, all the rules generated met the Lift Ratio test with a value of more than 1. The association rules produced must have at least 2 to 4 agricultural export commodities in each rule. By knowing the association rules for agricultural commodity exports, it is hoped that export distribution in the agricultural sector can be further optimized for trading abroad so that it can cover existing weaknesses.</em></p> Asra Dinul Haq Dina Fitria Dony Permana Zamahsary Martha Copyright (c) 2025 Asra Dinul Haq, Dina Fitria, Dony Permana, Zamahsary Martha https://creativecommons.org/licenses/by/4.0 2025-02-28 2025-02-28 3 1 26 32 10.24036/ujsds/vol3-iss1/336 Sentiment Analysis of Chatting Application Reviews on Google Play Store Using Naïve Bayes Classifer Alghoritm https://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/347 <p><em>Chatting application is a medium used to connect two or more people through social media platforms. Based on the results of the survey report, there are 5 chat applications that are often used as a medium of communication, including WhatsApp, Facebook, Telegram, Instagram and Line applications.</em> <em>This research aims to see the sentiment of chat application users, and see how naive bayes performs in analyzing the sentiment of chat application users.&nbsp; The purpose of sentiment analysis in this research is to assess whether a comment related to an issue is negative or positive, as well as a guide in improving the quality or service of a product. From the analysis results obtained, the Naïve Bayes model showed mixed performance depending on the type of application and sentiment. The model generally showed better performance in identifying positive reviews, especially on Facebook, Telegram, and Instagram apps, where recall reached 100%. However, the model performed very poorly in identifying neutral reviews across all apps</em><em>. To increase accuracy and more balanced sentiment detection capabilities, improvements in data preprocessing, handling data imbalance, or the use of more complex classification methods are needed.</em></p> Muhammad Luthfi Alfathan Dodi Vionanda Nufhika Fishuri Copyright (c) 2025 Muhammad Luthfi Alfathan, Dodi Vionanda, Nufhika Fishuri https://creativecommons.org/licenses/by/4.0 2025-02-28 2025-02-28 3 1 89 99 10.24036/ujsds/vol3-iss1/347