Guide to Linear Regression in Machine learning

Guide to Linear Regression in Machine learning

Introduction Of Linear Regression

In July 1894, Sir Francis Galton and Karl Pearson were the first to propose the concept/theory of linear regression. A statistical test called linear regression is used to determine and quantify the relationship between variables in a data set. Univariate statistical tests, such as the Fisher’s exact test, t-test, chi-square, and analysis of variance, do not take into account the impact of additional factors or confounders. However, partial correlation and regression tests enable the researcher to compensate for the influence of confounders when analysing the relationship between two variables.

The researcher in scientific or clinical research frequently aims to comprehend or link two or more independent factors in order to anticipate an outcome or dependent variable. It can be taken to signify how the risk factors, predictor variables, dependent factors, or independent variables, take into account the prediction of sickness occurrence. Biological characteristics, physical variables such as BMI and blood pressure, and lifestyle variables are all examples of disease risk factors (such as smoking and alcohol intake).

Both correlation and regression analysis provide this understanding of the link between “risk factors and ailment.” Regression analysis statistically specifies a connection between two variables, while a link quantifies the degree or strength of that association. Based on one or more independent variables, regression analysis predicts the connection between a dependent variable and one or more independent variables. In correlation analysis, the correlation coefficient “r1” is a dimensionless number ranging from –1 to +1. A score closer to -1 implies a negative connection, while a value closer to +1 indicates a positive one.

For normal data, Pearson’s correlation is utilised, whereas, for abnormal data, Spearman’s rank correlation is employed. The mathematical equation y = mx + c is used in linear regression analysis to find the best fit line for the link between y1 (dependent variable) and x1 (independent variable) (independent variable). The degree of variability of y as a function of x is represented by the regression coefficient, r2.

When is Regression Necessary?

To determine if one phenomenon affects another or how many variables are related, a regression analysis is commonly required. You can figure out if and how much experience or gender affects compensation, for example. Regression is also beneficial for anticipating a response with a fresh set of parameters. For instance, depending on the outside temperature, time of day, and the number of people in the house, you may try to anticipate how much electricity a household would demand in the next hour. Regression is utilised in a variety of domains such as stats, economics, computer science, and social sciences. As more information becomes available and users become even more aware of the practical value of data, its significance rises by the day.


Guide Roadmap to Linear Regression in Machine learning

Step 1: Import Libraries and Load

DATA The first step is to import any libraries required to create our model. It is not required to import all libraries at the same time. To begin, we import pandas, numpy, matplotlib, and other libraries.



After we’ve imported these libraries, we’ll fetch the data set and load our data. When uploading data, be sure to include the format (.csv/.xls) at the end of the file name. This model’s data may be obtained straight from the device here.


CSV files are most typically used for this, however, an Excel spreadsheet can also be utilised. The only difference is that when utilising an Excel spreadsheet as a dataset, we must use read excel() instead of read csv().

Step 2: View Data


The next step is to inspect the data after it has been successfully loaded. Data visualisation is an integral aspect of the job of a data scientist. It is advised to display the data in order to identify a link between the various parameters.

Matplotlib and Seashore are excellent tools for displaying our data in a variety of visualisations.


Step 3: Feature Engineering


When we visualise our data, we see a high link between the two parameters: sqft living and price. As a result, we will utilise these parameters to build our model.



parameters to the model, albeit this may reduce accuracy. A multivariate regression model is one that combines numerous factors to predict the result of a response variable.


Step 4: Fitting the Linear Regression Model

Following the selection of the required parameters, the train test split function from the sklearn package is imported. This is utilised to categorise our data into training and testing. Typically, 70-80% of the data is utilised as the training data set, with the remainder serving as the test data set.


The model is fitted using the training data set after importing this LinearRegression from sklearn.model selection. Our model’s intercept and coefficient may be determined as follows:



The model’s performance may be assessed by calculating the model’s mean squared error. The model performs better when the RMSE is low.

Linear Regression with Gradient Descent

Gradient descent is an advanced optimization process that is used to determine the minimum of a function. To comprehend this method, imagine a person with no sense of direction attempting to reach the valley’s bottom. Walk down the slope, taking huge steps when it is steep and tiny steps when it is not. It moves based on its present location and stops when it reaches the bottom it is aiming for. Gradient descent operates in a similar manner.

The gradient descent method is gradually applied to our m and c. First, set m and c to zero. Let L represent our learning rate. This determines how much m changes with each step. range(epochs): for I in range(epochs):


To increase accuracy, L will be allocated a tiny number. The partial derivative of the loss function with respect to m and c is the next step. After that, we update the c and m values and repeat the procedure until our loss function is very minimal.



That’s all for the guide explaining the aspects of linear regression in Machine Learning. All the individuals who are trying to learn the concepts of machine learning should definitely do so as it will increase their sphere of knowledge in Machine learning. If you are someone working in this IT field, it will give you a boost in your career.




Leave a Comment