UNP Journal of Statistics and Data Science

Rice Price Forecasting in Padang City for 2025 Using Artificial Neural Network with Backpropagation

2025-08-06T02:32:43+00:00

Rice is a staple food commodity in Indonesia that significantly influences economic stability and food security. In Padang City, rice price fluctuations frequently occur due to high dependence on external supply sources and limited local production, highlighting the need for a reliable predictive system. This study aims to forecast the monthly average retail price if rice in Padang City for the year 2025 using an Artificial Neural Network (ANN) based on the Backpropagation algorithm. The forecasting model is developed using historical rice price data from January 2017 to December 2024. In addition to building the forecasting model, this study evaluates the model’s accuracy in capturing the complex and nonlinear patterns of rice price fluctuations. The forecasting results are expected to serve as a valuable reference for local policymakers, market participants, and consumers in making strategic decisions to anticipate future price volality.

Clustering of Regencies/Cities Based on Factors Influencing Poverty in West Sumatra Using K-Medoids

2025-08-06T02:33:06+00:00

Poverty remains a significant issue in Indonesia, particularly in West Sumatra Province, where regional disparities persist despite a national decline in poverty rates. This study aims to classify the 19 regencies/cities in West Sumatra based on key socioeconomic indicators to support more targeted and effective poverty alleviation policies. Using a quantitative descriptive approach, the research applies the K-Medoids clustering method to group regions according to four indicators: Gross Regional Domestic Product (GRDP) per capita, Human Development Index (HDI), Open Unemployment Rate (OUR), and Gini Ratio. Secondary data for the year 2024 were obtained from the official website of the Central Bureau of Statistics of West Sumatra. Prior to clustering, data standardization using Z-score transformation was performed, and multicollinearity was tested using the Variance Inflation Factor (VIF). The silhouette method indicated that the optimal number of clusters is four. The clustering analysis revealed four distinct groups: (1) underdeveloped areas with low income and human development but high inequality; (2) moderately developed areas with stable unemployment and low income inequality; (3) urbanized areas with high income and human development but also high unemployment and inequality; and (4) a single metropolitan area with high economic and human development and moderate inequality. The findings highlight the importance of region-specific strategies in addressing poverty, considering the diverse economic and social conditions across regions. The results can serve as a basis for designing equitable and effective socioeconomic development policies.

Stratified Cox Regression Approach to Identifying Prognostic Factors for Survival in Breast Cancer Patients

2025-10-03T06:37:33+00:00

The most common type of cancer that affects women is Breast cancer. In 2022, 2.3 million women were diagnosed with breast cancer, and 670,000 deaths were recorded globally. By 2040, it is estimated that breast cancer will increase by 40%, reaching 3 million annually with the number of deaths increasing by 50% to 1 million in 2020. This highlights breast cancer as a serious threat to world health. This study utilized secondary data from METABRIC or the Molecular Taxonomy of Breast Cancer International Consortium obtained from the website www.kaggle.com/datasets/raghadalharbi/breast-cancer-gene-expression-profiles-metabric/data. The independent variables analyzed were, Age at Diagnosis (X₁), Surgery Type (X₂), Chemotherapy (X₃), Hormone Therapy (X₄), Tumor Size (X₅), Radio Therapy (X₆), Pam50. The dependent variables were Survival Time (Overall Survival Month) and Patient Status. In this study, we used the Stratified Cox model to predict the predictor variables of survival time. The total number of patients used was 18886, with 1080 censored patients and 788 uncensored patients. The Stratified Cox model without interaction revealed that the patients who underwent breast-conserving surgery had a 1.35 times higher risk of death compared to those who underwent mastectomy. Patients who received chemotherapy had a 2.01 times higher risk of death than those who did not, while patients who did hormone therapy had a 1.83 times higher risk of death than those who did not undergo this therapy.

Fuzzy C-Means Based Clustering of Central Java’s Regencies and Cities Using Economic Welfare Indicators 2023

2025-10-03T06:40:44+00:00

This study aims to cluster the regencies and cities in Central Java Province based on economic welfare indicators using the Fuzzy C-Means (FCM) method. The motivation for this research arises from the evident disparities in development outcomes across regions in Indonesia, particularly in Central Java. Several areas in this province continue to experience high poverty rates, low income, and poor human development despite improvements in labor force participation in others. Five key indicators were used: Labor Force Participation Rate (TPAK), Open Unemployment Rate (TPT), Percentage of Poor Population (PPM), Average Net Income (RPB), and Human Development Index (HDI). The data, obtained from Badan Pusat Statistik (2023), were standardized and analyzed using the FCM algorithm with optimal clusters determined via the elbow method. The clustering results show three distinct regional groupings: Cluster 0 includes areas with relatively high HDI and income despite lower labor participation and higher poverty; Cluster 1 comprises urbanized areas with high labor participation but lower HDI; and Cluster 2 represents the most disadvantaged areas with low income, high unemployment, and poor development outcomes. These findings offer a valuable foundation for targeted policy interventions and strategic regional development planning. Fuzzy C-Means proves to be an effective approach for uncovering nuanced regional profiles in socio-economic development.

An Examination of Determinants Affecting the Survival Duration Pediatric Brain Cancer Patients Through Stratified Cox Regression Analysis

2025-10-03T14:10:47+00:00

Brain cancer is the second most common pediatric malignancy and the leading cause of cancer-related mortality in children. Pediatric brain tumors (PBTs) represent around 25% of all pediatric cancers and consist of clinically and biologically diverse subtypes, with an estimated incidence of 0.3–2.9 cases per 100,000 children annually. The high prevalence emphasizes the importance of identifying factors that influence patient survival. This study aims to identify and analyze the factors that significantly affect the survival duration of pediatric brain cancer patients by applying the Stratified Cox regression model. This study utilized secondary data from the Pediatric Brain Cancer database (www.cbioportal.org). Independent variables included cancer type, ethnicity, other medical conditions, sex, tumor type, and treatment type, while the dependent variables were survival time (OS Months) and patient status (OS Status). Data were analyzed using the Stratified Cox regression method. A total of 203 patients were observed, consisting of 39 uncensored cases (19.21%) and 164 censored cases (80.79%). The majority of patients were male (58.62%), diagnosed with low-grade glioma/astrocytoma (43.35%), classified as non-Hispanic or Latino (93.52%), had no additional medical conditions (51.72%), received new treatment (85.22%), and were categorized with primary tumor type (74.38%). Results from the stratified Cox model indicated that cancer type was a significant predictor of survival. Children with embryonal tumors were found to have 8.9 times greater risk of experiencing an event compared to those with CNS cancer types, whereas children with high-grade glioma/astrocytoma had a 24.85 times higher risk compared to the CNS cancer group.

Logit and Complementary Log-Log Modeling in the Case of Factors Affecting Heart Failure Disease

2025-10-17T02:23:43+00:00

Heart failure is one of the leading causes of morbidity and mortality globally. Heart disease is a disease caused by plaque that builds up in the coronary arteries that supply oxygen to the heart muscle. Research on heart failure disease aims to find out what factors affect heart failure disease and how much influence it has. This test was conducted using logistic regression method with logit modeling and complementary log-log modeling in analyzing data of 918 patients with heart failure disease. This study also takes which modeling is the best. The results of this analysis indicate that Age, Gender, Blood Sugar, and Chest Pain have significant effects on the likelihood of Heart Failure. Specifically, higher blood sugar levels and the presence of chest pain were found to increase the probability of heart failure, while gender and age showed varying effects across different age groups. Based on the model comparison, the Logit model demonstrated better fit and predictive accuracy than the Complementary Log-Log model, as reflected by its lower AIC value 897.43.

DBSCAN Method in Clustering Provinces in Indonesia Based on Ratio of Health and Medical Personnel in 2023

2025-10-17T02:22:20+00:00

Health is a fundamental right of every citizen. This right is realized in the form of health services. Good health services have an adequate ratio of health and medical personnel. However, in reality, there are still many provinces that have a shortage of health and medical personnel. Therefore, clustering is carried out to make it easier for the government to group provinces that have similarities in terms of the ratio of health and medical personnel in Indonesia in 2023. Density Based Spatial Clustering of Applications with Noise (DBSCAN) is one of the clustering methods used. Using the DBSCAN method, two clusters were obtained with a silhouette coefficient value of 0.49. Cluster 0 is called noise because the observation points in group 0 are outliers. Cluster 0 consists of provinces with a higher ratio of healthcare and medical personnel than cluster 1.

Forecasting PM2.5 Concentration in Medan City Using the ARIMAX Method with Meteorological Factors as Exogenous Variables

2025-10-22T02:52:02+00:00

Particulate Matter 2.5 (PM_2.5) is a fine particle measuring less than 2.5 micrometers which is dangerous for human health because it can penetrate the respiratory system and cause cardiovascular disorders. High PM_2.5 concentrations reflect a decline in air quality, so forecasting efforts are needed to support pollution control and environmental policies. This study aims to forecast daily PM_2.5 concentrations in Medan City using the Autoregressive Integrated Moving Average with Exogenous Variables (ARIMAX) method by considering meteorological factors as exogenous variables. The data used consist of PM_2.5 concentrations and average temperature, humidity, rainfall, and wind speed data for the period from June 1, 2024 to June 10, 2025. The analysis results show that the best model is ARIMAX (4,1,0) with exogenous variables of average temperature and rainfall, where temperature has a positive effect and rainfall has a negative effect on PM_2.5. This model meets the assumptions of white noise and residual normality, with a MAPE value of 20.635%, indicating a fairly good level of forecasting accuracy. The forecasting results show PM_2.5 concentrations in the range of 19–26 µg/m³ with a downward trend at the end of June 2025, indicating improved air quality in Medan City. Thus, the ARIMAX method with meteorological factors is considered effective in modeling and forecasting PM_2.5 dynamics in urban areas.

Evaluation of Prognosis and Duration of Survival in Breast Cancer Patients Using the Cox PH Model

2025-11-11T06:05:08+00:00

Breast cancer is the leading cause of cancer-related deaths among women in Indonesia. Late detection and delayed treatment contribute significantly to this high mortality rate, as many patients seek medical care only after reaching advanced stages. Early detection through Breast Self Examination (BSE) and timely intervention can improve survival rates and quality of life. This study aims to evaluate the survival duration and influencing factors for breast cancer patients using clinical and genomic data from the METABRIC dataset, encompassing 1.980 primary breast cancer cases. The study employs survival analysis using Kaplan-Meier curves, Log-rank tests, and Cox proportional hazards regression to analyze the data. Results indicate significant differences in survival rates based on type of surgery and chemotherapy, while age at diagnosis shows no significant effect. The Cox proportional hazards model reveals that patients undergoing mastectomy have a 0.725 lower risk of death compared to those not undergoing the procedure, and patients receiving chemotherapy have a 1.869 higher risk of death. The findings underscore the importance of early and appropriate treatment in improving survival outcomes. This study contributes to the understanding of factors influencing breast cancer survival, aiding in better clinical decision-making and patient management strategies.

Keywords: Breast Cancer, Cox Regression, Kaplan-Meier, Survival Analysis, Treatment Factors.

Modeling Infant Mortality in West Pasaman Regency With Negative Binomial Regression to Overcome Overdispersion

2025-11-11T06:04:17+00:00

Infant mortality serves as a vital indicator of public health and an essential benchmark of development progress. Although the general trend shows a decline, several sub-districts in West Pasaman Regency continue to report relatively high infant mortality rates, raising concerns about the effectiveness of current health services. This study seeks to examine the determinants of infant mortality using count data regression models. The data were obtained from the publication West Pasaman Regency in Figures 2025 by Statistics Indonesia (BPS), consisting of one response variable, the number of infant deaths, and five independent variables: the percentage of Low Birth Weight (LBW), the proportion of deliveries assisted by medical personnel, the proportion of pregnant women enrolled in the K4 program, the number of health workers, and the number of health facilities. The initial analysis employed a Poisson regression model, which assumes equidispersion, but the results revealed evidence of overdispersion. To address this issue, negative binomial regression was adopted as an alternative approach. Model evaluation using the Akaike Information Criterion (AIC) and the Likelihood Ratio Test confirmed that the negative binomial regression provided a better fit than Poisson regression. The results indicate that the percentage of LBW and the number of health facilities significantly influence infant mortality. Low birth weight (LBW) had a positive association with infant mortality, consistent with theory, while the positive effect of health facilities differed from expectations, possibly due to issues of quality, distribution, or reverse causality.

Penalized Spline Regression Modeling on the Human and Cultural Development Index (IPMK) for 2022

2025-10-17T02:21:22+00:00

Human and cultural development is a multidimensional phenomenon whose relationship with socioeconomic factors is often complex and nonlinear, making it challenging to model with conventional parametric approaches. This study aims to model the influence of socioeconomic variables on the Human and Cultural Development Index (IPMK) across 34 provinces in Indonesia in 2022 using the nonparametric Penalized Spline (P-spline) regression method within a Generalized Additive Model (GAM) framework. Secondary data from the Central Statistics Agency (BPS) were used, with predictor variables including School Participation Rate (APS), percentage of access to safe drinking water, Gini Ratio, per capita expenditure, average years of schooling (RLS), and open unemployment rate (TPT). Initial data exploration via scatterplots confirmed nonlinear relationship patterns between the predictor variables and IPMK. The best model was obtained using a first-order cubic spline with 10 knot points, selected based on the minimum Generalized Cross Validation (GCV) criterion. The modeling results demonstrated excellent performance, with an Adjusted R² value of 0.842 and a Deviance Explained of 92.3%. Significance analysis indicated that access to safe drinking water, per capita expenditure, average years of schooling, and the open unemployment rate significantly influence IPMK. Visual interpretation of the significant spline curves revealed informative relationship patterns, such as the diminishing returns effect of per capita expenditure. This study concludes that the P-spline approach is effective and interpretable for modeling complex nonlinear relationships in development data, providing a richer evidence base for policy formulation.

Application of K-Means Clustering for Grouping Plantation Production in West Pasaman Regency in 2024

2025-12-01T05:08:16+00:00

The plantation sector plays a strategic role in supporting the economy of West Pasaman Regency, with major commodities including oil palm, coconut, rubber, cocoa, and patchouli. However, disparities in production across subdistricts require further analysis to identify regions with similar characteristics. This study applies the K-Means Clustering method, with the optimal number of clusters determined using the Elbow Method. The results show three clusters: the first with relatively balanced production, the second dominated by rubber and cocoa, and the third represented by Kinali District with high dominance of oil palm, coconut, and patchouli. These findings indicate that K-Means Clustering can effectively map regional plantation potentials and provide a useful basis for formulating targeted development strategies to optimize resource allocation and support sustainable agricultural planning in West Pasaman Regency.

Grouping Of Universities In Indonesia In 2025 Based On The Qs World University Rankings Ranking Indicator Using The Kohonen Self-Organizing Maps Algorithm

2025-11-12T08:04:38+00:00

Increasing the competitiveness of higher education is one of the main focuses in facing global competition. One of the important indicators in assessing the quality of higher education institutions is the QS World University Rankings which assesses universities based on indicators such as academic reputation, citations per lecturer, sustainability, and international collaboration. This study aims to group universities in Indonesia that are included in the QS World University Rankings in 2025 using the Kohonen Self-Organizing Maps (SOM) algorithm. The data used consisted of 10 QS assessment indicators for 26 universities in Indonesia. The normalization process is carried out using the min-max method, and the optimal number of clusters is determined using internal validation indices such as Connectivity, Dunn, and Silhouette. The results of the analysis show that the best models form three main clusters. Cluster 1 contains universities with superior performance in reputation and research, cluster 2 contains universities with a fairly balanced medium performance, and cluster 3 consists of universities with low performance in key indicators. The results of this study are expected to be the basis for policy makers and university managers to develop strategies to improve the quality of higher education in a targeted manner.

Analysis of ChatGPT Use on University Students Academic Achievement with Motivation as Intervening Using SEM-PLS

2025-12-01T09:14:48+00:00

This study aims to analyze the factors that influence student academic achievement through the use of ChatGPT using the Structural Equation Modeling (SEM) method based on the Partial Least Square (PLS) approach. In this study, three main factors were identified as elements that can influence the use of ChatGPT, namely knowledge about ChatGPT (PTC), willingness to use the technology (KUMT), and concerns that may arise (KYDT), as well as learning motivation as an intervening variable. The total sampling method was used in this study, where the entire population that met the criteria was designated as respondents. The research population included students in the Statistics Study Program at Padang State University in semesters 4–8 who had used ChatGPT for at least six months, with a total of 216 student respondents. Data were collected through a survey using an online questionnaire. Based on the analysis that has been carried out, the results of the study show that the variables of knowledge about ChatGPT (PTC) and willingness to use the technology (KUMT) have a significant positive effect on learning motivation, while concerns that may arise (KYDT) have no significant effect. Furthermore, only the variable of concerns that may arise (KYDT) had a significant direct effect on academic achievement, while the results of the mediation effect test showed that only the variable of willingness to use the technology (KUMT) had a significant indirect effect on academic achievement through learning motivation.

Classification of Recipients of the Family Hope Program in West Sumatra Province Using the Random Forest Algorithm

2025-10-22T02:53:17+00:00

According to the Central Statistics Agency (BPS), the percentage of poor people in West Sumatra Province increased by 0.02% in 2024. One of the government's efforts to overcome poverty is a social assistance program issued by the government to help people who are economically disadvantaged. The targeted distribution of social assistance is an important challenge in improving community welfare, especially for families receiving PKH benefits. This study aims to classify households receiving the Family Hope Program (PKH) in West Sumatra Province using a random forest algorithm with Synthetic Minority Oversampling Technique (SMOTE). This study uses data on PKH recipient households in West Sumatra Province in 2024, which has a significant class imbalance. Therefore, the SMOTE method was applied to balance the data. The data was divided into training and testing data with a ratio of 80%:20%, then parameter tuning was performed to optimize mtry and ntree. The model was evaluated using a confusion matrix to compare model performance. The results show that the accuracy obtained is 76%. The precision value is 72%, the recall is 84%, and the f1-score is 78%. Based on the Mean Decrease Gini value, the head of household's diploma became the main attribute in determining whether a household received PKH or not. This study concluded that the use of SMOTE in the random forest algorithm performed well in classifying PKH recipients in West Sumatra Province, where the model performed well and was quite reliable in identifying PKH recipients.

Comparison Performance of SARIMA and Exponential Smoothing Holt-Winter’s models for Forecasting turnover PT. Indah Logistik Cargo Padang

2025-11-05T03:43:54+00:00

Forecasting is an important part of corporate decision making. With forecasting, companies can predict future conditions and demand so that they can make appropriate and strategic decisions. PT. Indah Logistik Cargo Padang's turnover data contains trend and seasonal elements that are forecasted using a time series model. This study was conducted to determine the best model for forecasting PT. Indah Logistik Cargo Padang's revenue in the coming period. The methods used in this study are the SARIMA method and Holt-Winter's Exponential Smoothing. The best model was obtained from the results of a comparative analysis of the two methods, as seen in the forecasting error rate determined by the mean absolute percentage error value. For forecasting the revenue of PT. Indah Logistik Cargo Padang, the best model used was SARIMA with a MAPE value of 3.9%.

Application of Three-Dimensional Log Linear Models in Analyzing Risk Factors for a History of Gastritis

2025-10-21T02:38:25+00:00

Gastritis, or commonly known as an ulcer, is an inflammatory condition caused by excess stomach acid that irritates the stomach lining. This disease is one of the most common in Indonesia and often disrupts daily activities, especially among students who face academic pressure, stress, and irregular diet. Based on Indonesia’s Health Profile Data, gastritis ranks sixth for inpatients with 330,580 cases, 60.86% of which occur in women, and seventh for outpatients with 201,083 cases, of which 77.74% occur in women. This study aims to examine the relationship between gastritis and demographic factors using a three-dimensional log-linear model. The method analyzes interactions between categorical variables to identify the best explanatory model. Results indicate that the most appropriate model involves the interaction between place of residence, gender, and history of stomach ulcers, showing that these factors collectively influence gastritis incidence. In conclusion, gastritis is not only related to physical health but also lifestyle and demographic factors. This study underlines the importance for students to manage stress, maintain healthy eating habits, and adopt preventive measures. The urgency of this research lies in raising awareness that untreated gastritis may reduce productivity and lead to more serious health problems.