Comparison of partial least squares with other prediction methods via generated data
Abstract
The purpose of this study is to compare the Partial Least Squares (PLS), Ridge Regression (RR) and Principal Components Regression (PCR) methods, used to fit regressors with severe multicollinearity against a dependent variable. To realize this, a great number of varying groups of datasets are generated from standard normal distribution allowing for the inclusion of different degrees of collinearities for 10000 replications. The design of the study is based on a simulation work that has been performed for six different degrees of multicollinearity levels and sample sizes. From the generated data, a comparison is made using the value of mean squares error of the regression parameters. The findings show that each prediction method is affected by the sample size, number of regressors or multicollinearity level. However, in contrast to literature (sayn200), whatever the number of regressors is, PCR had significantly better results compared to the other two.