This tutorial is about calculating the R-squared in Python with and without the sklearn package.

For an exemplary calculation we are first defining two arrays. While the y_hat is the predicted y variable out of a linear regression, the y_true are the true y values.

import numpy as np

y_hat = np.array([2,3,5,7,2,3,8,5,3,1])
y_true = np.array([5,4,2,7,4,2,1,6,5,3])



Now we are calculating the R-squared out of those two variables.

The formulas for calculating the R-squared are: where SST is: and SSE is: To understand the SST and SSE consider the following image found on Wikipedia and created by Orzetto (Please see the credits and license below the image):

On the left-hand side, you see the SST – the total sum of squares which are just the squared differences between the actual y values and the mean y.

On the right-hand side, you see the SSE – the residual sum of squares which is just the summed squared differences between the regression line (m*x+b) and the predicted y values.

You can also just use the sklearn package to calculate the R-squared.

from sklearn.metrics import r2_score

r2_score(y_true,y_hat)



For an application of the R-squared on real data, you are kindly invited to check out the video on my channel

##### You May Also Like ## Data Visualization using Matplotlib – Part 1

Matplotlib is one of the most popular library for data visualization and that’s for a reason. It has… ## DataTypes and Variables

Hey guys, this is the third tutorial of the Python for Beginners Series. If you haven’t checked out our previous…  