https://ujsds.ppj.unp.ac.id/index.php/ujsds/issue/feedUNP Journal of Statistics and Data Science2026-05-31T12:23:01+00:00Open Journal SystemsUNP Journal of Statistics and Data Sciencehttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/474Random Forest Algorithm Implementation for Air Quality Classification in DKI Jakarta Based on ISPU 2026-04-27T01:42:41+00:00Khairanisa Salsabilakhairanisa2004@gmail.comTessy Octavia Mukhtitessyoctaviam@fmipa.unp.ac.id<p><em>Air quality is an essential factor that has a direct impact on human health. High concentrations of air pollutants have the potential to cause various health impacts, across short-term and long-term horizons. This study aims to classify air quality in DKI Jakarta using the Air Pollution Standard Index (ISPU) data via the random forest algorithm. The dataset covers a timeframe from 2021 to 2025 and includes air pollutant parameters, namely PM10 and PM2.5 particulate matter, carbon monoxide (CO), nitrogen dioxide (NO2), sulfur dioxide (SO2), dan ozone (O3). The research method employs a supervised learning approach, in which the data are stratified and evakuated through the implementation of K-Fold Cross Validation (k = 10) to ensure objective and stable model performance. Model performance was measured using Accuracy, Precision, Recall, and F1-Score metrics, along with Confusion Matrix and Feature Importance analyses. It can be seen from the results that the Random Forest model can classify air quality categories with excellent performance, reaching 100% Accuracy on training data and 98.44% on testing data. The Confusion Matrix analysis indicates that most data in each air quality are correctly classified. Furthermore, the Feature Importance analysis reveals PM2.5 that is most influential parameter in determining air quality categories. Therefore, this study indicates that the Random Forest algorithm proves effective for air quality classificati and can function as a decision-support tool for air pollution control and management in DKI Jakarta.</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Khairanisa Salsabila, Tessy Octavia Mukhtihttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/473Predicting the Future: A Forecast of Bukittinggi's Original Local Revenue from 1996 to 20242026-05-25T03:40:26+00:00Fedisha Elfiri Fedishafedisha01@gmail.comFadhilah Fitrifadhilahfitri@fmipa.unp.ac.idZilrahmizilrahmi@fmipa.unp.ac.id<p><em>In the past decade, Bukittinggi City’s locally generated revenue (PAD) has experienced considerable instability. A significant decline occurred during the 2020 pandemic, followed by external disruptions such as the 2024 Mount Marapi eruption. These conditions complicate regional financial planning and highlight the importance of reliable forecasting. This study aims to forecast PAD for the 2025–2029 period using the ARIMA (Autoregressive Integrated Moving Average) method. Annual data from 1996–2024 were obtained from official publications of Indonesia’s Central Bureau of Statistics (BPS) Bukittinggi. The analysis procedure included exploratory data analysis, variance stationarity testing using Box-Cox transformation, mean stationarity testing through the Augmented Dickey-Fuller test supported by ACF and PACF plots, tentative model identification, parameter estimation, residual diagnostics using the Ljung-Box and Shapiro-Wilk tests, and model selection based on the smallest MAPE value. The results showed that the data became stationary after Box-Cox transformation and second-order differencing. Among the candidate models, ARIMA(3,2,0) was selected as the best model because all parameters were statistically significant (p-value < 0.05), the residuals satisfied the white noise assumption, and the model produced the lowest MAPE value. Forecasting results indicate an increasing PAD trend from approximately 240.23 million Rupiah in 2025 to 429.57 million Rupiah in 2029. However, prediction intervals widened over time, indicating increasing uncertainty in long-term forecasts. Therefore, the local government should implement adaptive fiscal policies and strengthen regional revenue sources to anticipate future PAD fluctuations</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Fedisha Elfiri Fedisha, Fadhilah Fitri, Zilrahmihttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/475Factors Affecting Turnover Intention: A Survival Analysis Approach with the Stratified Cox Model2026-05-05T09:27:05+00:00Reihan Dani Eka Saputrarehantanjung199@gmail.comTessy Octavia Mukhtitessyoctaviam@fmipa.unp.ac.id<p><em>The phenomenon of employee resignation or turnover in Indonesia has reached a critical point that threatens operational stability and organizational competitiveness in the global market. The primary challenge faced by human resource practitioners is a reliance on static statistical models that fail to capture the temporal dimension and the evolving dynamics of risk. Conventional linear or logistic regression models often cannot accommodate censored data and may violate the proportionality assumption when applied to complex categorical variables such as profession. This study aims to model the determinants of turnover intention—including age, gender, and mode of transportation—by employing a more adaptive survival analysis approach. The main focus of the research is the application of a stratified Cox Proportional Hazards model to address violations of the Proportional Hazards assumption for the profession variable. Based on an analysis of 1,129 observations, the study identifies how turnover risk varies significantly across profession strata. We developed and compared two model configurations—with and without interaction terms—using the Akaike Information Criterion (AIC). While the non-interaction model proved most optimal for overall prediction (AIC: 5124.104), the interaction model revealed nuanced dynamics across professional strata. Key findings indicate that age generally increases turnover risk by 6.3% per year (HR: 1.063), and walking to work provides a protective effect, reducing risk by 13.6% (HR: 0.864) compared to bus usage. However, professional context significantly modulates these effects: in the 'Manage' stratum, age serves as a stabilizer (HR: 0.822), whereas male teachers face a risk 200.8% higher than their female counterparts (HR: 3.008). Furthermore, car usage in the 'Consult' stratum leads to a dramatic 423.5% increase in turnover risk (HR: 5.235). These results underscore the necessity of strata-specific retention strategies that prioritize workplace accessibility and demographic inclusivity. This study provides a robust data-driven framework for organizations to maintain workforce stability amidst the evolving labor landscape in Indonesia.</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Reihan Dani Eka Saputra, Tessy Octavia Mukhtihttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/478Hybrid LBFA-Based Feature Selection for Improving Machine Learning Classification Performance in Heart Disease Prediction2026-05-12T09:00:59+00:00Hana Azizahazizah.hanasa@gmail.comEni Sumarminingsiheni_stat@ub.ac.idAdji Achmad Rinaldo Fernandesfernandes@ub.ac.id<p><em>Feature selection and feature engineering are essential steps in developing accurate machine learning models, particularly when dealing with imbalanced datasets and redundant variables. However, many feature augmentation methods are often applied without a consistent preprocessing strategy, which can reduce model reliability and increase the risk of information leakage. To overcome this issue, this study proposes a hybrid classification framework that combines CatBoost-based feature selection with two feature augmentation techniques: LOGIT transformation and Log Density Ratio (LDR). A structured preprocessing pipeline was designed to ensure consistency throughout the modeling process. One-hot encoding was applied for the LOGIT transformation, while numerical standardization was used for LDR estimation. The generated features were then integrated with the selected original variables to produce richer feature representations for classification. The proposed framework was evaluated using the Heart Disease dataset with three gradient boosting algorithms, namely LightGBM, XGBoost, and CatBoost. Model performance was assessed using accuracy, precision, sensitivity, specificity, and F1-score. The results show that the proposed approach consistently improved classification performance across all models. Among the tested models, LightGBM combined with LOGIT and LDR achieved the best performance, obtaining an accuracy of 0.9618, precision of 0.9485, sensitivity of 0.9620, specificity of 0.9625, and F1-score of 0.9552. These findings suggest that combining feature selection with structured feature augmentation can significantly improve predictive performance in imbalanced classification tasks</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Hana Azizah, Eni Sumarminingsih, Adji Achmad Rinaldo Fernandeshttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/476Stock Price Forecasting of PT Bank Rakyat Indonesia (Persero) Tbk Using the Support Vector Regression Method 2026-05-06T04:46:29+00:00Widya Febriani Widyawfebriani978@gmail.comDony Permanadonypermana@fmipa.unp.ac.id<p><em>Stock price forecasting is an important activity in the capital market because stock price movements tend to be nonlinear and volatile over time. PT Bank Rakyat Indonesia (Persero) Tbk (BBRI) is a blue-chip stock with high liquidity and strong fundamentals, making it an appropriate subject for forecasting research. This study aims to predict BBRI’s stock price using the Support Vector Regression (SVR) method, which is known for its ability to model nonlinear relationships and minimize overfitting. The data used consist of BBRI’s daily closing prices from January 2020 to December 2024. Before modeling, the data were normalized using the Min–Max method and divided into training and testing sets with an 80:20 ratio.The initial baseline model employed an SVR with a linear kernel. The model was then optimized using the Radial Basis Function (RBF) kernel through Grid Search Optimization combined with time-series cross-validation to determine the best parameter combination. Optimal parameters were selected based on the lowest Root Mean Square Error (RMSE). The results show that the SVR RBF model outperformed the linear model in capturing the nonlinear patterns of BBRI’s stock price. During testing, the optimized model achieved an RMSE of 0.022054, indicating high predictive accuracy. The optimized SVR model was subsequently used to forecast stock prices for the next period and demonstrated relatively stable yet dynamic price movements. Overall, the findings confirm that the SVR method is effective and reliable for stock price forecasting and can serve as a valuable reference for investors and future financial research.</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Widya Febriani Widya, Dony Permanahttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/479Panel Data Regression with Driscoll-Kraay Standard Errors: Examining Crime and Socioeconomic Indicators in West Sumatra (2017-2024)2026-05-07T03:40:26+00:00Andini Diva Luthfiyahandinidiva2004@gmail.comDhio Ervandidhioervandi@gmail.comTessy Octavia Mukhtitessyoctaviam@fmipa.unp.ac.id<p>Criminal behavior is a complex social issue that threatens public safety and hinders regional development. In Indonesia, the crime rate varies across provinces and is influenced by multiple socioeconomic and structural factors. In West Sumatra Province, fluctuations in crime risk over time highlight the need for a deeper analysis of its determining factors. Understanding these factors is essential for the government to formulate effective and targeted crime prevention policies. This study aims to analyze the determinants of crime risk in West Sumatra Province using panel data from 2017 to 2024, covering 19 districts and cities, allowing for a more robust and comprehensive evaluation of both temporal and cross-sectional variations. The variables examined include the open unemployment rate, poverty rate, percentage of youth not in employment, education, or training (NEET), and the COVID-19 pandemic as a dummy variable. Panel data regression analysis was employed, and the results indicate that the most appropriate model is the Random Effects Model (REM). The findings show that the open unemployment rate and the pandemic variable have a significant effect on crime risk at the 5% significance level, while the poverty rate is significant at the 10% level. These results provide valuable insights for policymakers in addressing the root causes of crime in West Sumatra through employment generation, poverty alleviation, and preparedness for crisis situations.</p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Andini Diva Luthfiyah, Dhio Ervandi, Tessy Octavia Mukhtihttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/482Spatial Analysis of Open Unemployment Rate in West Java Province Using the Spatial Autoregressive Model2026-04-27T02:28:55+00:00Zulfadly Harman Harahapzulfadlyhrphrp@gmail.comTessy Octavia Mukhtitessyoctaviam@fmipa.unp.ac.id<p><em>Unemployment remains a major socio-economic issue in West Java Province. The Open Unemployment Rate (OUR) is affected not just by local regional elements but also by the circumstances of adjacent regions, showing that spatial interdependence exists.The research aims to analyze the spatial pattern of OUR in West Java and identify the influencing factors using the Spatial Autoregressive (SAR) approach. The study uses cross-sectional secondary data from all regencies and cities in West Java for the year 2023. Moran’s I findings indicate a positive spatial dependence, suggesting that regions with high OUR are typically surrounded by regions with similarly high unemployment rates. According to the analysis using the Lagrange Multiplier test, the SAR model was chosen. Estimation results show that population growth rate and government expenditure significantly affect OUR. Additionally, the spatial lag coefficient shows a positive and significant value, suggesting spatial spillover effects. These findings highlight the importance of incorporating spatial perspectives in formulating regional employment policies.</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Zulfadly Harman Harahap, Tessy Octavia Mukhtihttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/484Sentiment Analysis of Public Opinion on Rupiah Redenomination on Twitter Using Naive Bayes Classification2026-05-24T09:10:12+00:00FIGO RAHMATULLAHfigorahmatullah868@gmail.comDila Sarifigorahmatullah868@gmail.comRahmat Kurniawanraehmatkurniawan@gmail.comFadhilah Fitrifadhilahfitri@fmipa.unp.ac.id<p><em>This study examines public opinion on the Rupiah redenomination policy through sentiment analysis of Twitter data. Redenomination refers to the simplification of currency denominations without changing their real value, a policy that often triggers varied public responses due to concerns such as inflation perception and money illusion. In the digital era, Twitter (currently X) serves as a major platform for real-time public expression, generating large volumes of unstructured textual data suitable for analysis. The objective of this research is to classify public sentiment toward the Rupiah redenomination policy into positive, negative, and neutral categories using the Naive Bayes Classifier, as well as to evaluate the model’s performance. The dataset consists of Indonesian-language tweets collected via the Twitter API using keywords related to redenomination. Data processing involves several stages, including data cleaning, manual labeling, text preprocessing (case folding, tokenization, stopword removal, and stemming), and feature extraction using Term Frequency–Inverse Document Frequency (TF–IDF). The classification results are evaluated using a confusion matrix. The Naive Bayes Classifier achieved an accuracy of approximately 74.84% and a precision of 80%, indicating that the model performs adequately in identifying sentiment patterns. The findings show that neutral sentiment dominates the discussion, suggesting that most users tend to provide informational or observational opinions rather than strong support or opposition. These results are expected to provide insights for policymakers, particularly Bank Indonesia and the government, regarding public acceptance of the redenomination policy, while also contributing to the development of sentiment analysis research on Indonesian social media data.</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 FIGO RAHMATULLAH, Dila Sari, Rahmat Kurniawan, Fadhilah Fitrihttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/485Application of the Cox Proportional Hazards Model to Analyze Survival Times in Women with Breast Cancer2026-05-06T05:45:38+00:00Rahmadanirahma0danird@gmail.comVinna Sulviavinnasulvialubis@gmail.comFathina Nafisafthnanafisaa13@gmail.comSeptrina Kiki Arisandiseptrinakikiars@gmail.comTessy Octavia Mukhtitessyoctaviam@fmipa.unp.ac.id<p><em>Breast cancer is still claimed to be one of the most number causes of cancer-related mortality all round the world, highlighting the importance of identifying factors that influence patient survival time. Variations in clinical outcomes among patients indicate the need for appropriate statistical methods to evaluate prognostic factors. This studi aims to analyze factors affecting the survival time by applying the Cox Propotional Hazard (Cox PH) model. The data consist of breast cancer patient record with several predictor variabel, including age at diagnosis, type of breast surgery, chemotherapy, hormone therapy, Nottingham Prognostic Index, and tumor size. The analysis procedure includes testingthe propotional hazards assumption and assessing parameter significance using the likelihood ratio test for simultaneous affect and also the test of wald for partial effect. The resuls show that the propotional hazards assumption is satisfied, indicating that the Cox PH model is appropriate for the data. Simultaneous testing reveals that at least one predictor significanly affect survuval time, while partial testing identifies type of surgery, chemotherapy as significant factors. The hazard ratio estimates indicate that patients undergoing mastectomy have a lower risk of death compared to those receiving breast-conserving surgery. Conversely, chemotherapy and hormone theraoy are associated with a higher risk of death, wich may reflect the more severe clinical conditions of patients receiving these treatments. In conclusion, the Cox PH model provides a reliable approach for identifying key factors influetncing breast cancer survival and offers important implications for clinical decision-making and treatment planning.</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Rahmadani, Vinna Sulvia, Fathina Nafisa, Septrina Kiki Arisandi, Tessy Octavia Mukhtihttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/486IHSG Closing Price Prediction on the Indonesian Stock Exchange using the Geometric Brownian Motion Model2026-04-21T08:28:10+00:00Sukra Hamnasukrahamna07@gmail.comDevni Prima Saridevniprimasari@fmipa.unp.ac.id<p><em>Being among the leading primary benchmarks reflecting the health of the equity market in Indonesia, the Jakarta Composite Index (IHSG) experiences ongoing price movements shaped by a wide spectrum of domestic and international forces</em><em>. The inherent unpredictability of these movements underscores the critical need for reliable forecasting methods to guide investors in their decision-making process. </em><em>In response to this, the present study applies the Geometric Brownian Motion model as a tool for projecting the daily closing values of the IHSG, owing to its well-recognized ability to represent the random characteristics inherent in financial time series. The dataset utilized comprises daily closing price records of the IHSG throughout 2025</em><em>. The analysis includes the calculation of log returns, normality testing using the Kolmogorov-Smirnov test, and estimation of drift and volatility parameters. Forecasting is performed using simulation with 50 and 1000 iterations, where the initial value is based on the last observed closing price. </em><em>The findings reveal that the Geometric Brownian Motion model demonstrates a solid capacity to reflect the volatile behavior of IHSG movements, yielding MAPE figures of 4.50% and 2.81%, which correspond to a very high level of predictive precision. A greater number of iterations was found to produce more consistent and dependable projections, while the estimated values broadly align with the overall trajectory of historical data, notwithstanding the element of randomness embedded in the model</em><em>. Therefore, the GBM model can be considered an effective method for forecasting stock price movements, particularly for highly volatile market indices such as the IHSG.</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Sukra Hamna, Devni Prima Sarihttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/491Mapping Anxiety, Developing Solutions: A Statistical Study of Student Anxiety Using The K-Modes Clustering Method2026-04-27T03:28:34+00:00Fadhilah Fitrifadhilahfitri14@gmail.comFitri Mudia Sarifitrimudiasari@fmipa.unp.ac.idFauziah Taslimfauziahtaslim@fpk.unp.ac.idSri Wahyuni513sriwahyuni@gmail.com<p><em>Statistics anxiety is a common issue among university students that can negatively affect their learning process and academic performance. This study aims to identify patterns of statistics anxiety among undergraduate students at Universitas Negeri Padang using the Statistics Anxiety Rating Scale (STARS), which consists of six dimensions. A total of 479 valid responses were analyzed using the k-modes clustering method, which is appropriate for categorical data. The optimal number of clusters was determined using the elbow and silhouette methods, resulting in three clusters. The clustering results reveal three distinct groups of students characterized by high, moderate, and low levels of statistics anxiety. The average silhouette value of 0.52 indicates a moderately well-defined cluster structure. Further analysis shows that each cluster exhibits different patterns across the six anxiety dimensions, highlighting the heterogeneity of students’ responses to statistics. These findings suggest that clustering provides a more informative approach than conventional descriptive analysis in understanding statistics anxiety. The results of this study can serve as a basis for developing targeted strategies to reduce student anxiety in statistics learning</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Fadhilah Fitri, Fitri Mudia Sari, Fauziah Taslim, Sri Wahyunihttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/492Evaluating Local Parameter Reliability in Hierarchical Geographically Weighted Regression: A Bootstrap and Sign Consistency Approach2026-05-28T23:13:59+00:00Fitri Mudia Sarifitrimudiasari@fmipa.unp.ac.idMuhammad Nur Aidimuhammadai@apps.ipb.ac.idAgus Mohamad Solehagusms@apps.ipb.ac.idFarit Mochamad Afendifmafendi@apps.ipb.ac.id<p><em>The Hierarchical Geographically Weighted Regression (HGWR) model is widely used to capture spatial heterogeneity and hierarchical data structures simultaneously. However, the reliability of its local parameter estimates remains a critical issue due to potential variability across locations. This study aims to evaluate the reliability of local parameters in the HGWR model using a bootstrap-based approach combined with sign consistency analysis</em><em>,</em> <em>using an empirical stunting prevalence dataset in Indonesia</em><em>. A cluster bootstrap procedure at the provincial level was implemented with 500 replications to generate empirical distributions of parameter estimates, enabling the assessment of statistical significance through confidence intervals. In addition, sign consistency was employed to examine the stability of the direction of local effects across bootstrap replications. The results show that while some local parameters are statistically significant, they do not always exhibit consistent directional effects, indicating potential instability. Conversely, several parameters demonstrate both statistical significance and high sign consistency, suggesting robust local relationships. These findings highlight that relying solely on statistical significance may lead to misleading interpretations of local effects in HGWR models. The combination of bootstrap and sign consistency provides a more comprehensive framework for assessing parameter reliability. This approach contributes to improving the interpretability and robustness of spatial multilevel modeling, particularly in applications involving complex hierarchical and spatial data.</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Fitri Mudia Sari, Muhammad Nur Aidi, Agus Mohamad Soleh, Farit Mochamad Afendihttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/487A Self-Organizing Map Approach for Clustering Provinces Based on Multisectoral Indicators of Stunting Determinants 2026-05-26T03:54:03+00:00Admi Salmaadmisalma1@fmipa.unp.ac.idRiwi Dyah Pangestiriwi.pangesti@unib.ac.idReny Wulandarirenywulandari58@gmail.com<p><em>Stunting is a national issue in Indonesia and also a global challenge. It becomes one of the key priorities outlined in the Sustainable Development Goals (SDGs). The heterogeneity of multisectoral conditions across provinces also contributes to the variation in stunting prevalence in Indonesia. The implementation of uniform policies to address stunting may not yield optimal results due to the diverse needs of each province. Therefore, </em><em>specific interventions are required to overcome stunting issues. Based on this condition, it is important to cluster provinces based on their characteristics so that the government can determine appropriate interventions for each provincial cluster. Visualization of stunting conditions and multisectoral indicators can also enrich the understanding of each cluster. This study aims to construct clusters of provinces with similar characteristics in terms of multisectoral indicators of stunting determinants. This study applies cluster analysis using a Self-Organizing Map (SOM) algorithm to group provinces. The research steps include data preprocessing, clustering using the SOM algorithm, SOM mapping, and cluster characterization analysis. The results of this study show that three clusters were obtained. The first cluster consists of three provinces characterized by a high maternal mortality rate and a high percentage of exclusive breastfeeding. The second cluster includes nine provinces and is characterized by high risks in maternal and child health as well as economic vulnerability. In addition, the third cluster consists of 26 provinces characterized by relatively good living conditions and quality education.</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Admi Salma, Riwi Dyah Pangesti, Reny Wulandarihttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/493Poverty Modeling in East Nusa Tenggara Using Fourier Nonparametric Regression with Cosine–Sine Comparison and Hypothesis Testing2026-05-18T02:22:12+00:00Narita Yuri AdrianingsihNaritayuria98@gmail.comAndrea Tri Rian Daniandreatririandani@fmipa.unmul.ac.idI Nyoman Budiantaranyomanbudiantara65@gmail.comVita Ratnasarivitaratnasari.its@gmail.comYossy Candrayondra2022@gmail.comBintang A. Banewangbintang.aries1104@gmail.comLeti S. Gaimaulethygaimau22@gmail.com<p><em>Poverty is a complex multidimensional issue and remains a major development challenge in Indonesia, particularly in East Nusa Tenggara (NTT), which consistently records one of the highest poverty rates nationally. Conventional parametric approaches, such as linear regression, are often inadequate to capture the nonlinear and complex relationships between socioeconomic factors and poverty levels. Therefore, this study proposes a nonparametric regression approach based on Fourier series to model poverty in NTT. The novelty of this research lies in the systematic comparison between cosine-based and sine-based Fourier components within a nonparametric regression framework, combined with inferential statistical testing to identify significant determinants of poverty. The study uses cross-sectional data from 22 districts/cities in NTT for the year 2025. Model estimation is conducted using the Ordinary Least Squares (OLS) method, while the optimal oscillation parameter is determined using Generalized Cross-Validation (GCV). Model performance is evaluated using MSE, RMSE, MAPE, and coefficient of determination (R²). The results show that the cosine-based Fourier model with three oscillations outperforms the sine-based model, achieving MSE of 1.903, RMSE of 1.379, MAPE of 5.817%, and R² of 95.146%. Hypothesis testing indicates that all predictor variables significantly influence poverty levels both simultaneously and partially. These findings demonstrate that the Fourier nonparametric regression approach is highly effective in capturing complex and fluctuating poverty patterns, and it provides a more accurate and interpretable model for supporting targeted poverty alleviation policies.</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Narita Yuri Adrianingsih, Andrea Tri Rian Dani, I Nyoman Budiantara, Vita Ratnasari, Yossy Candra, Bintang A. Banewang, Leti S. Gaimauhttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/494Agricultural Involution in Indonesia: A Generalized Structured Component Analysis (GSCA) Approach with Land and Labor Interaction Effects2026-05-18T01:47:18+00:00Urwawuska Ladiniurwawuskaladini@uinjambi.ac.idSella Nofriska Sudrimosellans@iainsorong.ac.idDahlia Misrikadahlia.misrika@sci.unand.ac.id<p><em>Indonesian agriculture faces an economic paradox where the sector remains a primary employer despite low wages and stagnant GDP contributions compared to industry. This study aims to analyze and quantify the phenomenon of agricultural involution in Indonesia from 2017 to 2023 by simultaneously examining the effects of land, labor, productivity, and land–labor interaction on agricultural output across 34 provinces. Generalized Structured Component Analysis (GSCA) with an Alternating Least Squares (ALS) approach is employed because of its ability to handle mixed formative–reflective measurement models and accommodate latent variable interaction effects — capabilities unavailable in conventional covariance-based SEM or linear regression. The results indicate that land capacity is the dominant determinant of agricultural output with a path coefficient of 0.958, signaling that growth remains extensive rather than intensive. Crucially, labor intensity is found to have a significant negative effect on productivity and total output, confirming the law of diminishing marginal returns and the presence of labor surpluses that exceed optimal points. Furthermore, the interaction between land and labor yields a significant negative coefficient (-0.109), proving that demographic pressure on limited land exacerbates inefficiency and output destruction. Spatial post-hoc analysis indicates that agricultural involution is no longer confined to Java but has evolved into a national phenomenon, as demonstrated by the absence of significant disparities in labor-to-land ratios and productivity between Java and other regions. These findings suggest that sustainable transformation requires integrated policies for land protection, labor restructuring toward non-agricultural sectors, and technological modernization to break the cycle of involution.</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Urwawuska Ladini, Sella Nofriska Sudrimo, Dahlia Misrikahttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/495Comparison of District/City Clusters in West Sumatra Province 2019–2025 Based on Labor Indicators Using K-Means Method2026-05-29T10:19:15+00:00Naila Marettanianailatania03@gmail.comZilrahmizilrahmi@fmipa.unp.ac.idMellisa Ayuningtyasmellisa@bps.go.id<p><em>This study is motivated by the differences in labor conditions among regencies/cities in West Sumatra Province, as indicated by the Open Unemployment Rate (OUR) and the Labor Force Participation Rate (LFPR). In addition, the impact of the COVID-19 pandemic and the economic recovery process during the 2019–2025 period are assumed to have caused changes in labor characteristics across regions. However, the patterns of similarities and differences in labor conditions among regions have not been clearly identified, making it necessary to conduct a regional clustering analysis based on labor characteristics. This study aims to analyze the clustering of regencies/cities in West Sumatra Province based on the OUR and LFPR indicators during 2019–2025. The data used were obtained from the Central Statistics Agency, covering 19 regencies/cities. The analytical method applied was K-Means clustering using Euclidean distance, while cluster validation was conducted using the Silhouette Coefficient. This study used two clusters to facilitate the interpretation of results. The findings show that the regencies/cities in West Sumatra Province were divided into two clusters with different characteristics. Cluster 1 represents regions with better labor conditions, characterized by lower OUR and higher LFPR, while Cluster 2 represents regions with relatively poorer labor conditions, characterized by higher OUR and lower LFPR. Cluster membership changed from year to year, indicating dynamic labor conditions across regions. The results of this study are expected to serve as a basis for formulating more targeted labor policies according to the characteristics of each region.</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Naila Marettania, Zilrahmi, Mellisa Ayuningtyashttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/499Classification of Stroke Desease Using the Learning Vector Quantization Algorithm2026-05-25T03:42:42+00:00Andriarmi Andriarmiandri.armi03@gmail.comChairina Wirdiastutichairinawirdiastuti01@gmail.comSyafriandi Syafriandisyafriandi_math@fmipa.unp.ac.id<p><em>Stroke is one of the leading causes of death and disability worldwide, thereby making early detection crucial for timely and appropriate medical treatment. In clinical practice, stroke diagnosis is generally carried out through medical examinations and patient history analysis, but this process is time-consuming and depends on the subjective judgment of medical personnel. Therefore, machine learning approaches can be utilized to support disease classification more quickly and objectively. This study aims to analyze the performance of the Learning Vector Quantization (LVQ) method in classifying stroke disease using a dataset obtained from Kaggle. The dataset used in this study is imbalanced;therefore, the SMOTE (Synthetic Minority Over-sampling Technique) method was applied to handle class imbalance. The research stages included data preprocessing, splitting data into training and testing sets, LVQ model training, parameter optimization using learning rate and maximum epoch, and model evaluation using accuracy and sensitivity. The results show that the LVQ model trained on the original dataset achieved an accuracy of 95,72%, but failed to detect stroke cases with a sensitivity of 0%. After applying SMOTE, the best model achived a stroke sensitivity of 90%, although the accuracy decreased to 49,49% due to the high number of false positives. These findings indicate that LVQ is highly sensitive to data distribution and model parameters, making its performance on this dataset less optimal for stroke classification and more suitable as an initial screening tool.</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Andriarmi Andriarmi, Chairina Wirdiastuti, Syafriandi Syafriandihttps://ujsds.ppj.unp.ac.id/index.php/ujsds/article/view/496K-Medoids Clustering Analysis of Regional Development in West Sumatra Based on Socioeconomic Indicators2026-05-30T14:35:48+00:00Kayla Faradinakaylafaradina0@gmail.comFadhilah Fitrifadhilahfitri@fmipa.unp.ac.id<p><em>Regional development disparities among districts and cities in West Sumatra Province remain a persistent challenge, reflected in significant differences across economic, social, and employment indicators. This study aims to cluster 19 districts/cities in West Sumatra Province based on socioeconomic indicators using the K-Medoids clustering method. The variables include GRDP per capita, economic growth rate, GRDP percentage distribution, Human Development Index (HDI), poverty rate, and open unemployment rate, using 2024 data obtained from the Central Bureau of Statistics (BPS) of West Sumatra Province. The optimal number of clusters was determined using the Elbow method, resulting in three clusters. Cluster 1 consists of 12 districts characterized by the lowest average GRDP per capita and HDI, along with the highest poverty rate. Cluster 2 comprises only Kota Padang, which recorded the highest values across most indicators including GRDP per capita, economic growth rate, and HDI, yet also exhibited the highest open unemployment rate. Cluster 3 includes 6 cities with relatively high HDI and the lowest poverty rate among the three clusters. Cluster validation using the Davies-Bouldin Index (DBI) produced a value of 0.8341, indicating that the clustering results are optimal. The findings are expected to provide a reference for local governments and the Regional Development Planning Agency (Bappeda) of West Sumatra Province in formulating more targeted regional development policies based on the characteristics of each cluster.</em></p>2026-05-31T00:00:00+00:00Copyright (c) 2026 Kayla Faradina, Fadhilah Fitri