Use a csv (comma-separated values) file to load the data you want to analyze
Choose a pre-loaded data set
Try our multiple regression calculator with some of the most popular open data sets from the American Statistical Association, Kaggle and the UCI Machine Learning Repository.
Select the numeric explanatory variables that you would like to include in your regression.
In this step you should select only the numeric variables that you would like to include in your regression by treating them as continuous variables. In the next stage you will choose the numeric variables
to encode as dummies, as if they were categorical.
These are the variables used as continuous regressors by the multiple regression calculator.
Select the variables to encode as dummies
Choose the variables to include in your regression after encoding them as groups of dummies.
Dummy variables are explanatory variables that can take only two values, either 1 or 0. When you select a variable that takes n different values (indicated between parentheses below), n dummies are created and included in your regression.
Unable to run: too many regressors. The number of regressors is greater than or equal to the sample size. Please reduce the number of regressors and try again.
Results
Here are the results from your regression.
Multicollinearity
Sample size and degrees of freedom
Here is the calculation of the degrees of freedom (sample size - number of parameters).
Numerosity, number of parameters and degrees of freedom.
Coefficient estimates and their significance
In the next table, you can find the ordinary least squares (OLS) coefficient estimates, their standard errors and other statistics.
A p-value is derived from the t-statistic and can be used to decide whether to:
reject the null hypothesis (p-value < level of significance)
or not to reject it (p-value >= level of significance).
The p-value is two-sided if the null is tested against the alternative hypothesis that the coefficient can be either significantly negative or significantly positive. The p-value is one-sided if only one of the two alternatives is deemed possible.
Rows correspond to regressors.
Goodness of fit
And here are some statistics about the fit of the regression model.
The R squared
is a measure of how well the linear regression fits the data. When it is equal to 1, it indicates that the fit of the regression is perfect.
The smaller the R squared, the worse the fit of the regression is.
If the model includes a constant, the R squared cannot be smaller than 0.
The adjusted R squared is similar, but it takes into account the fact that more complex models (i.e., models with more regressors) tend to over-fit the data.
Global statistics
Scatter plot of actual vs fitted values
Plot of the actual values of the dependent variable (x-axis) vs their predicted values (y-axis).
In a well-specified regression model, the points in an actual-vs-fitted scatter plot should be evenly distributed around the 45-degrees line.
In a model that fits the data very well, the points should be very close to the 45-degrees line.
Scatter plot here
Playground display
Data privacy and security
Your data remains on your computer and is analyzed by your browser.
The multiple regression calculator does not send data over the Internet and does not use remote servers to perform computations. Therefore, your data remains completely private and secure.
Scientific standards
SimpleR is intended to meet the highest scientific standards. It is often tested on new data sets to make sure that the results from the regression calculator coincide with those provided by well-tested scientific and statistical software such as R and Python (with NumPy and Scikit-learn).
Here are some facts about our development standards:
we run hundreds of unit tests each time that we change the code base of SimpleR;
we use data sets obtained from multiple sources to run test regressions and cross-check the results with those obtained from well-tested statistical software;
our matrix algebra routines are optimized for numerical stability;
the QR algorithm used to derive the OLS estimator is highly robust to near perfect multicollinearity; we drop variables that might compromise numerical stability even if they have some minuscule explanatory power;
the routines to compute the Gamma and Beta functions (needed to compute p-values for the test statistics) are based on highly accurate numerical integration algorithms;
as much as possible, we write everything from scratch, to optimize for speed and to allow for extensive testing of results.
Disclaimer
The copyright owners of SimpleR and the owners of this website do not warrant or guarantee the accuracy of the SimpleR multiple regression calculator and shall have no liability whatsoever (including but not limited to) for any direct, indirect, special or consequential damages and economic losses arising in connection with the use of SimpleR.
How to cite
Please cite as:
Taboga, Marco (2022). "SimpleR: multiple linear regression calculator", StatLect. https://www.statlect.com/fundamentals-of-statistics/SimpleR.
Help us to improve SimpleR
If you have suggestions on how to improve SimpleR (e.g., new features you would like to see added to it), please write an e-mail to [email protected].