Statistics, Optimization & Information Computing

Moments of Generalized Order Statistics from Doubly Truncated Power Linear Hazard Rate Distribution

M. I. Khan — 2024-03-13

This paper is concerned with some recurrence relations for single and product moments of doubly truncated power linear hazard rate distribution via generalized order statistics. Some deductions and related results are also considered. The characterization result is provided at the end.

Modified Bagdonavicius-Nikulin Goodness-of-fit Test Statistic for the Compound Topp Leone Burr XII Model with Various Censored Applications

Mohamed G. Khalil — 2024-04-13

The Poisson Topp Leone Burr XII distribution is extensively studied due to its broad relevance in analyzing censored real datasets from engineering, economics, and medicine. In this research, the distribution's versatility is highlighted through the analysis of four specific real datasets. The study compares the Poisson Topp Leone Burr XII distribution with nine extensions of the Burr type XII distribution to determine which offers the best fit for these datasets. To evaluate the goodness-of-fit of the Poisson Topp Leone Burr XII distribution under right censoring, a modified Bagdonavi\v{c}ius-Nikulin goodness-of-fit test statistic is introduced and applied. This new test statistic is utilized to validate the distributional fit for the Poisson Topp Leone Burr XII distribution across the four right-censored datasets. The modified Bagdonavi\v{c}ius-Nikulin test statistic is employed to assess distributional validation, specifically in the context of right censoring. The application of this statistic involves analyzing each of the four censored datasets to confirm the appropriateness of the Poisson Topp Leone Burr XII distribution for these scenarios. Additionally, to support the evaluation of the modified goodness-of-fit test statistic, the Barzilai-Borwein algorithm is utilized. This algorithm is employed within a simulation study to further assess the effectiveness and reliability of the modified Bagdonavi\v{c}ius-Nikulin test statistic, thereby ensuring robust validation of the Poisson Topp Leone Burr XII distribution against the observed real datasets.

Empirical likelihood ratio-based goodness-of-fit test for the Lindley distribution

Hadi Alizadeh Noughabi — 2023-12-19

The Lindley distribution may serve as a useful reliability model. In this article, we propose a goodness of fit test for the Lindley distribution based on the empirical likelihood (EL) ratio. The properties of the proposed test are stated and the critical values are obtained by Monte Carlo simulation. Power comparisons of the proposed test with some known competing tests are carried out via simulations. Finally, an illustrative example is presented and analyzed.

The Marshall-Olkin-Topp-Leone-Gompertz-G Family of Distributions with Applications

Broderick Oluyede — 2024-04-13

A new family of distributions called the Marshall-Olkin-Topp-Leone-Gompertz-G (MO-TL-Gom-G) distribution is developed and studied in detail. Some mathematical and statistical properties of the new family of distributions are explored. Statistical properties of the new family of distributions considered are the quantile function, moments and generating function, probability weighted moments, distribution of the order statistics and R\'enyi entropy. The maximum likelihood technique is used for estimating the model parameters and Monte Carlo simulation is conducted to examine the performance of the model. Finally, we give examples of real-life data applications to show the usefulness of the above mentioned Topp-Leone-Gompertz generalization.

Systematic Literature Review on Named Entity Recognition: Approach, Method, and Application

Warto — 2024-02-28

Named entity recognition (NER) is one of the preprocessing stages in natural language processing (NLP), which functions to detect and classify entities in the corpus. NER results are used in various NLP applications, including sentiment analysis, text summarization, chatbot, machine translation, and question answering. Several previous reviews partially discussed NER, for instance, NER reviews in specific domains, NER classification, and NER deep learning. This paper provides a comprehensive and systematic review on NER topic studies published from 2011 to 2020. The main contribution of this review is to present a comprehensive systematic literature review on NER from preprocessing techniques, datasets, application domains, feature extraction techniques, approaches, methods, and evaluation techniques. The result concludes that the deep learning approach and the Bi-directional long short-term memory with a conditional random field (Bi-LSTM-CRF) method are the most interesting methods among NER researchers. At the same time, medical and health are NER researchers' most popular domains. These developments have also led to an increasing number of public datasets in the medical and health fields. At the end of this review, we recommend some opportunities and challenges for NER research going forward.

On Truncated Versions of Xgamma Distribution: Various Estimation Methods and Statistical modelling

Subhradev Sen — 2023-12-19

In this article, we introduced the truncated versions (lower, upper and double) of xgamma distribution (Sen et al. 2016). In particular, different structural and distributional properties such as moments, popular entropy measures, order statistics and survival characteristics of the upper truncated xgamma distribution are discussed in detail. We briefly describe different estimation methods, namely the maximum likelihood, ordinary least squares, weighted least square and L-Moments. Monte Carlo simulation experiments are performed for comparing the performances of the proposed methods of estimation for both small and large samples under the lower, upper and double versions. Two applications are provided, the first one comparing estimation
methods and the other for illustrating the applicability of the new model.

E-Bayesian estimation and the corresponding E-MSE under progressive type-II censored data for some characteristics of Weibull distribution

Omid Shojaee — 2023-12-19

Estimating the parameters (or characteristics) of a distribution, from the availability of censored samples, is one of the most important topics in statistical inference over the past decades. This study is concerned about the E-Bayesian estimation method to compute the estimates of the parameter, the hazard rate function and the reliability function of the Weibull distribution when the progressive type-2 censored samples are available. The estimations are obtained based on the Squared error loss function (as a symmetric loss) and General Entropy and LINEX loss functions (as asymmetric losses). In addition, the asymptotic behaviour of the derived E-Bayesian estimators is discussed. Moreover, the E-Bayesian estimators under the different loss functions have been compared through Monte Carlo simulation studies by calculating the E-MSE of the resulting estimators, which is a new measure to compare the E-Bayesian estimators. As an application, we analyzed two real data sets that follow from the Weibull distribution.

Liu-Type Estimator for the Poisson-Inverse Gaussian Regression Model: Simulation and Practical Applications

Hleil Alrweili — 2024-05-11

The Poisson-Inverse Gaussian regression model (PIGRM) is commonly used to analyze count datasets with over-dispersion. While the maximum likelihood estimator (MLE) is a standard choice for estimating PIGRM parameters, its performance may be suboptimal in the presence of correlated explanatory variables. To overcome this limitation, we introduce a novel Liu-type estimator for PIGRM. Our analysis includes an examination of the matrix mean square error (MMSE) and scalar mean square error (SMSE) properties of the proposed estimator, comparing them with those of the MLE, ridge, and Liu estimators. We also present several parameters of the Liu-type estimator for PIGRM. We evaluated the performance of the proposed estimator through a simulation study and application to real-life data, using SMSE as the primary evaluation criterion. Our results demonstrate that the proposed estimators outperform the MLE, ridge, and Liu estimators in both simulated and real-world scenarios.

Categorization of Dehydrated Food through Hybrid Deep Transfer Learning Techniques

Sm Nuruzzaman Nobel — 2024-02-28

The essentiality of categorizing dry foods plays a crucial role in maintaining quality control and ensuring food safety for human consumption. The effectiveness and precision of classification methods are vital for enhanced evaluation of food quality and streamlined logistics. To achieve this, we gathered a dataset of 11,500 samples from Mendeley and proceeded to employ various transfer learning models, including VGG16 and ResNet50. Additionally, we introduce a novel hybrid model, VGG16-ResNet, which combines the strengths of both architectures. Transfer learning involves utilizing knowledge acquired from one task or domain to enhance learning and performance in another. By fusing multiple Deep Learning techniques and transfer learning strategies, such as VGG16-ResNet50, we developed a robust model capable of accurately classifying a wide array of dry foods. The integration of Deep Learning (DL) and transfer learning techniques in the context of dry food classification signifies a drive towards automation and increased efficiency within the food industry. Notably, our approach achieved remarkable results, achieving a classification accuracy of 99.78% for various dry food images, even when dealing with limited training data for VGG16-ResNet50.

Generalization of Power Lindley Distribution: Properties and Applications

Fatehi Yahya Eissa — 2024-04-17

This article introduces the generalized Kumaraswamy power Lindley (GKPL) distribution, a novel probabilistic model derived by combining the generalized Kumaraswamy (GK-G) family with the power Lindley (PL) distribution. The GKPL distribution encompasses a wide range of distributions, including Kumaraswamy power Lindley, Kumaraswamy Lindley, generalized power Lindley, generalized Lindley, power Lindley, and the well-known Lindley, as special cases. Fundamental properties are derived, such as the hazard rate function, survival function, quantile function, reverse hazard function, moments, mean residual life function, entropy, and order statistics. To determine the parameters of the GKPL distribution, four estimation methods, including maximum likelihood, least squares, Cramer-von Mises, and Anderson-Darling methods, are used to estimate the parameters of the GKPL distribution. The effectiveness of the estimation techniques is assessed by employing Monte Carlo simulations. The adaptability and validity of the proposed GKPL distribution are compared with alternative models, including the Kumaraswamy power Lindley (KPL), Extended Kumaraswamy power Lindley (EKPL), type II generalized Topp Leone-power Lindley (TIIGTLPL), exponentiated generalized power Lindley (EGPL), generalized Kumaraswamy Weibull (GKW), generalized Kumaraswamy log-logistic (GKLLo), and generalized Kumaraswamy generalized power Gompertz (GKGPGo) distributions, through analyses of three real datasets.

Enhancing Volatility Prediction: Comparison Study Between Persistent and Anti-persistent Financial Series.

Youssra Bakkali — 2024-05-11

Predicting financial volatility is crucial for managing risks and making investment decisions. This research introduces a novel method for creating a prediction model that effectively handles the intricate dynamics of financial time series data. By utilizing the advantages of both time series models and recurrent neural networks, we present two hybrid models: Vanilla-RGARCH and LSTM-RGARCH. These models are designed to overcome the shortcomings of Realized GARCH (RGARCH) and HAR models in representing various stylized facts of financial data. While RGARCH models are proficient in capturing asymmetry, they fail to address long-term memory. Conversely, HAR models are adept at capturing long-term memory. The innovative model combines forecasted values from the RGARCH model with components from the HAR model, including daily, weekly, and monthly realized volatility, within a neural network framework. This combination helps to bypass the complexities involved in directly merging the HAR model with RGARCH. Through this method, our hybrid models provide a thorough depiction of the characteristics of financial data.

The proposed approach is evaluated on two distinct types of financial series; persistent and anti-persistent, to demonstrate its robustness and capacity to generalize in different contexts. The performance of hybrid models is compared to that of conventional RGARCH and HAR models, demonstrating their superiority in precise prediction of financial volatility and their ability to capture complex trends observed in real data. In addition, a principal component analysis (PCA) is used to visualize the results and facilitate their interpretation.

Randomized density matrix renormalization group algorithm for low-rank tensor train decomposition

Huilin Jiang — 2024-05-14

Tensor train decomposition is a powerful tool for processing high-dimensional data. Density matrix renormalization group (DMRG) is an alternating scheme for low-rank tensor train decomposition of large tensors. However, it may suffer from the curse of dimensionality due to the large scale of subproblems. In this paper, we proposed a novel randomized proximal DMRG algorithm for low-rank tensor train decomposition by using TensorSketch to alleviate the curse of dimensionality. Numerical experiments on synthetic and real-world data also demonstrate the effectiveness and efficiency of the proposed algorithm.

New efficient descent direction of a primal-dual path-following algorithm for linear programming

Billel Zaoui — 2024-02-25

We introduce a new primal-dual interior-point algorithm with a full-Newton step for solving linear optimization problems. The newly proposed approach is based on applying a new function on a simple equivalent form of the centering equation of the system, which defines the central path. Thus, we get a new efficient search direction for the considered algorithm. Moreover, we prove that the method solves the studied problems in polynomial time and that the algorithm obtained has the best known complexity bound for linear optimization. Finally, a comparative numerical study is reported to show the efficiency of the proposed algorithm.

Geoadditive Semiparametric Regression For Modeling Property Price In Surabaya, Indonesia Using Marketplace Data

Wahyu Wibowo — 2024-04-28

The price growth of the property in Surabaya is the highest among the other cities in East Java, but demand in the residential sub-sector is still there. The value of a property is described by its price. Property price is one of the important factors considered in making an investment decision. The market value is determined by its physical and micro-neighborhood factors. It consists of location and environmental factors. Mass appraisal is an efficient and cost-effective way to value property fairly, transparently, and consistently, as properties with the same attributes will receive equal value. The existence of a price property model is vital in the context of mass appraisal. The objective of mass appraisal is to value a group of properties using data, valuation methods, and statistical tests. Mass appraisal is invaluable for the government to formulate taxes based on the market value. In this research, the Geoaditive model is used to model property price based on its physical and location factors. The results show that the physical (number of bedrooms, number of bathrooms, land area, and building area) and location (longitude and latitude coordinates) factors significantly influence the property prices in Surabaya. The building area has more impact on the property price compared to the land area. The combined effect plot shows also that the properties located in the eastern of Surabaya have a relatively higher price than those in the western part

Optimality conditions for (h, φ)-subdifferentiable multiobjective programming problems with G-type I functions

Tadeusz Antczak — 2024-04-13

In this paper, using generalized algebraic operations introduced by Ben-Tal [7], we introduce new classes of (h,φ)-subdifferentiable functions, called (h,φ)-G-type I functions and generalized (h,φ)-G-type I functions. Then, we consider a class of nonconvex (h, φ)-subdifferentiable multiobjective programming problems with locally Lipschitz functions in which the functions involved belong to aforesaid classes of (h, φ)-subdifferentiable nonconvex functions. For such (h, φ)-subdifferentiable vector optimization problems, we prove the sufficient optimality conditions for a feasible solution to be its (weak) Pareto solution. Further, we define a vector dual problem in the sense of Mond-Weir for the considered (h, φ)-subdifferentiable multiobjective programming problem and we prove several duality theorems for the aforesaid (h, φ)-subdifferentiable vector optimization problems also under (h, φ)-G-type I hypotheses.

Simulation Structure for Selecting an Optimal Error Distribution Through the GAS Model

Richard T. A. Samuel — 2024-03-23

In econometrics and finance, volatility modelling has long been a specialised field for addressing a variety of issues that pertain to the risks and uncertainties of an asset. However, volatility modelling for risk management is highly dependent on the underlying error distribution. Hence, this study presents a Monte Carlo simulation (MCS) structure for selecting an optimal or the most adequate error distribution that is relevant for modelling the persistence of volatility through the Generalized Autoregressive Score (GAS) model. The structure describes an organised approach to the MCS experiment that includes “background of the study (optional), defining the aim of the study, specifying the research questions, method of implementation, and summarised conclusion”. The method of implementation is a process that consists of writing the simulation code, setting the seed, setting the true parameter a priori, data generation, and performance evaluation through
meta-statistics. Among the findings, the study used both fat-tails and √N consistency experiments to show that the GAS model with a lower unconditional shape parameter value (ˆν∗ = 4.1) can generate a dataset that adequately reflects the behaviour of financial time series data, relevant for volatility modelling. This dynamic structure is intended to help interested users on MCS experiments utilising the GAS model for reliable volatility persistence calculations in finance and other areas.

A new SVM solver applied to Skin Lesion Classification

Jonatas Silva — 2024-04-28

We present a unified framework for solving the nonlinear Support Vector Machines (SVM) training problems. The framework is based on an objective function approximation so that the Problem becomes separable, with low computational cost root-finding methods to solve the resulting subproblems. Because of the diagonalization of the objective function in the first stage of the framework, we named the new SVM solver DiagSVM. To test the performance of the DiagSVM, we reported preliminary numerical experiments with benchmark datasets. From the results, we chose the best combination used in the framework to solve the Skin Lesion Classification (SLC) problem. Since melanoma (skin cancer) is the most dangerous and deadliest disease that affects the skin, the application of the DiagSVM can be integrated into several Computer-Aided Diagnosis (CAD) systems to help them detect skin cancer and significantly reduce both morbidity and mortality associated with this disease.

Machine learning (ML) and deep learning (DL) based approaches have been widely used to develop robust skin lesion classification systems. For the SLC problem, three pre-trained convolutional neural networks (CNN), Xception, InceptionResNetV2 and DenseNet201, were employed as feature extractors and their dimension was reduced using Principal Component Analysis (PCA), Kernel Principal Component Analysis (KPCA) and Independent Component Analysis (ICA). Finally, the samples were fed into two SVM solvers: DiagSVM and Libsvm. The experiment shows that using PCA, KPCA, or ICA, the SVM can perform better than without feature reduction. The classification performance of the proposed methodology is analyzed on the ISIC2017 and PH2 datasets. The benchmark and SLC results indicate a promising proposal for accuracy, specificity and sensitivity metrics.

Temporal regularity of stochastic differential equations driven by G-Brownian motion

Amel Redjil — 2024-03-12

This paper is devoted to study the temporal regularity of the solution of stochastic differential equations driven by G-Brownian motion (G-SDEs) under the global Lipschitz and linear growth conditions. In addition, a numerical simulation of a particular G-SDE is provided.

Bayesian and Non-Bayesian Estimation for The Parameter of Inverted Topp-Leone Distribution Based on Progressive Type I Censoring

Hiba Z. Muhammed — 2024-06-04

In this paper, Bayesian and non-Bayesian estimations of the shape parameter of the Inverted Topp-Leone distribution are studied under a progressive Type I censoring scheme. The maximum likelihood estimator (MLE) and Bayes estimator (BE) of the unknown parameter under the squared error loss (SEL) function are obtained. Three types of confidence intervals are discussed for the unknown parameter. A simulation study is performed to compare the performances of the proposed methods, and two numerical examples have been analyzed for illustrative purposes.

In-depth Analysis of von Mises Distribution Models: Understanding Theory, Applications, and Future Directions

Said Benlakhdar — 2024-06-06

Multimodal and asymmetric circular data manifest in diverse disciplines, underscoring the significance of fitting suitable distributions for the analysis of such data. This study undertakes a comprehensive comparative assessment, encompassing diverse extensions of the von Mises distribution and the associated statistical methodologies, spanning from Richard von Mises' seminal work in 1918 to contemporary applications, with a particular focus on the field of wind energy. The primary objective is to discern the strengths and limitations inherent in each method. To illustrate the practical implications, three authentic datasets and a simulation study are incorporated to showcase the performance of the proposed models. Furthermore, this paper provides an exhaustive list of references pertinent to von Mises distribution models.