Sklearn Linear Regression: Complete Guide with Python Examples

Q: What is LinearRegression in sklearn?

sklearn.linear_model.LinearRegression is an Ordinary Least Squares (OLS) regression model that fits a linear equation by minimizing the sum of squared differences between actual and predicted values.

Q: How do I interpret the R-squared score?

R-squared measures the proportion of variance explained by the model. 1.0 is perfect, 0.0 means no better than the mean, and negative values mean the model is worse than predicting the mean.

Q: When should I use Ridge vs Lasso vs ElasticNet?

Use Ridge to reduce overfitting with correlated features. Use Lasso for automatic feature selection. Use ElasticNet for a balance of Ridge stability and Lasso sparsity when features are correlated.

Q: Does LinearRegression need feature scaling?

Basic LinearRegression does not require scaling. However, Ridge, Lasso, and ElasticNet all require feature scaling because their penalties treat all coefficient magnitudes equally.

Q: How do I handle categorical features in linear regression?

Convert categorical features to numeric using OneHotEncoder or pd.get_dummies() before fitting. Use ColumnTransformer in pipelines to handle mixed column types.

Q: What is the difference between MSE and RMSE?

MSE is the average of squared errors. RMSE is the square root of MSE and has the same units as the target variable, making it easier to interpret.

Name: Soren Atelier

Updated on 2/12/2026

You have a dataset with features and a continuous target variable. You want to predict outcomes -- housing prices, sales revenue, temperature trends -- but you are not sure which approach to use or how to set it up correctly in Python. The wrong model choice or missing preprocessing steps lead to poor predictions and wasted time debugging.

Linear regression is the most widely used algorithm for continuous prediction tasks, yet getting it right involves more than calling .fit() and .predict(). You need to understand how the model works internally, when it fails, how to evaluate it properly, and when to switch to regularized variants like Ridge or Lasso. Skipping these steps means deploying models that perform well on training data but break down on new observations.

Scikit-learn provides LinearRegression along with a complete ecosystem of tools for preprocessing, evaluation, and regularization. This guide covers everything from basic usage to production-ready regression pipelines.

What Is Linear Regression?

Linear regression models the relationship between one or more input features and a continuous output by fitting a straight line (or hyperplane) that minimizes the sum of squared residuals. The equation for a model with n features is:

y = b0 + b1*x1 + b2*x2 + ... + bn*xn

Where b0 is the intercept (bias term), b1...bn are the coefficients (weights) for each feature, and y is the predicted value.

The model finds the coefficients that minimize the Ordinary Least Squares (OLS) cost function:

Cost = Sum of (y_actual - y_predicted)^2

This has a closed-form solution, so training is fast even on large datasets.

Simple Linear Regression with Sklearn

Simple linear regression uses a single feature to predict the target. Here is a complete example:

from sklearn.linear_model import LinearRegression
import numpy as np
 
# Sample data: years of experience vs salary (in thousands)
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]).reshape(-1, 1)  # see /topics/NumPy/numpy-reshape
y = np.array([35, 40, 45, 55, 60, 62, 70, 75, 82, 90])
 
# Create and train the model
model = LinearRegression()
model.fit(X, y)
 
# Model parameters
print(f"Coefficient (slope): {model.coef_[0]:.4f}")
print(f"Intercept: {model.intercept_:.4f}")
 
# Predict salary for 12 years of experience
prediction = model.predict([[12]])
print(f"Predicted salary for 12 years: ${prediction[0]:.2f}k")
# Coefficient (slope): 5.9394
# Intercept: 28.3333
# Predicted salary for 12 years: $99.61k

Understanding the Output

Attribute	Meaning	Example Value
`model.coef_`	Weight for each feature	[5.94] -- salary increases ~$5,940 per year
`model.intercept_`	Predicted y when all features are 0	28.33 -- base salary of $28,330
`model.score(X, y)`	R-squared on the given data	0.98

Multiple Linear Regression

When you have more than one feature, the model fits a hyperplane instead of a line:

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
import numpy as np
 
# Load California housing dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target
feature_names = housing.feature_names
 
print(f"Features: {feature_names}")
print(f"Dataset shape: {X.shape}")  # (20640, 8)
 
# Split into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
 
# Train model
model = LinearRegression()
model.fit(X_train, y_train)
 
# Print coefficients for each feature
print("\nFeature Coefficients:")
for name, coef in zip(feature_names, model.coef_):
    print(f"  {name:12s}: {coef:+.6f}")
print(f"  {'Intercept':12s}: {model.intercept_:+.6f}")
 
# Evaluate
train_score = model.score(X_train, y_train)
test_score = model.score(X_test, y_test)
print(f"\nR² (train): {train_score:.4f}")
print(f"R² (test):  {test_score:.4f}")

Model Evaluation: R-squared, MSE, and RMSE

R-squared alone does not tell the full story. Use multiple metrics to evaluate regression models:

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.datasets import fetch_california_housing
import numpy as np
 
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2, random_state=42
)
 
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
 
# Calculate metrics
r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
 
print(f"R² Score:  {r2:.4f}")
print(f"MSE:       {mse:.4f}")
print(f"RMSE:      {rmse:.4f}")
print(f"MAE:       {mae:.4f}")
# R² Score:  0.5758
# MSE:       0.5559
# RMSE:      0.7456
# MAE:       0.5332

Evaluation Metrics Explained

Metric	Formula	Range	Interpretation
R-squared (R²)	1 - (SS_res / SS_tot)	(-inf, 1]	Proportion of variance explained. 1.0 = perfect, 0 = no better than mean
MSE	mean((y - y_pred)²)	[0, inf)	Average squared error. Penalizes large errors more
RMSE	sqrt(MSE)	[0, inf)	Same units as target variable. Easier to interpret than MSE
MAE	mean(\|y - y_pred\|)	[0, inf)	Average absolute error. Robust to outliers

A low R-squared does not always mean a bad model. For noisy real-world data (like housing prices), R² = 0.6 can be reasonable. Always compare RMSE against the scale of your target variable. For classification tasks, see our guide on the sklearn confusion matrix for appropriate evaluation metrics.

Feature Scaling for Linear Regression

Standard LinearRegression does not require feature scaling because it uses OLS with a closed-form solution. However, scaling becomes essential when using regularization:

from sklearn.linear_model import LinearRegression, Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.datasets import fetch_california_housing
 
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2, random_state=42
)
 
# Without scaling (fine for basic LinearRegression)
model_no_scale = LinearRegression()
model_no_scale.fit(X_train, y_train)
print(f"LinearRegression R² (no scaling): {model_no_scale.score(X_test, y_test):.4f}")
 
# With scaling via Pipeline (required for regularized models)
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('ridge', Ridge(alpha=1.0))
])
pipeline.fit(X_train, y_train)
print(f"Ridge R² (with scaling):          {pipeline.score(X_test, y_test):.4f}")

Why scaling matters for regularization: Ridge and Lasso penalize large coefficients equally. If one feature ranges 0-1 and another ranges 0-100,000, the penalty disproportionately shrinks the small-range feature's coefficient. Scaling puts all features on the same scale so the penalty is applied fairly.

Polynomial Features: Modeling Non-Linear Relationships

When the relationship between features and target is not linear, polynomial features can capture curves and interactions:

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
import numpy as np
 
# Generate non-linear data
np.random.seed(42)
X = np.linspace(0, 10, 200).reshape(-1, 1)
y = 3 * X.ravel()**2 - 5 * X.ravel() + 10 + np.random.randn(200) * 15
 
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
 
# Linear model
linear = LinearRegression()
linear.fit(X_train, y_train)
print(f"Linear R²: {r2_score(y_test, linear.predict(X_test)):.4f}")
 
# Polynomial (degree 2) model
poly_pipeline = Pipeline([
    ('poly', PolynomialFeatures(degree=2, include_bias=False)),
    ('linear', LinearRegression())
])
poly_pipeline.fit(X_train, y_train)
print(f"Poly (d=2) R²: {r2_score(y_test, poly_pipeline.predict(X_test)):.4f}")
 
# Polynomial (degree 3) model
poly3_pipeline = Pipeline([
    ('poly', PolynomialFeatures(degree=3, include_bias=False)),
    ('linear', LinearRegression())
])
poly3_pipeline.fit(X_train, y_train)
print(f"Poly (d=3) R²: {r2_score(y_test, poly3_pipeline.predict(X_test)):.4f}")

Warning: High-degree polynomials overfit quickly. Use cross-validation to select the right degree, and prefer regularization for polynomial models.

Regularization: Ridge, Lasso, and ElasticNet

When your model has many features or polynomial terms, regularization prevents overfitting by adding a penalty to large coefficients.

Ridge Regression (L2 Penalty)

Ridge adds the sum of squared coefficients to the cost function. It shrinks coefficients toward zero but never sets them exactly to zero.

from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.datasets import fetch_california_housing
import numpy as np
 
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2, random_state=42
)
 
# Find best alpha with cross-validation
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('ridge', Ridge())
])
 
param_grid = {'ridge__alpha': [0.01, 0.1, 1.0, 10.0, 100.0]}
grid = GridSearchCV(pipeline, param_grid, cv=5, scoring='r2')
grid.fit(X_train, y_train)
 
print(f"Best alpha: {grid.best_params_['ridge__alpha']}")
print(f"Best CV R²: {grid.best_score_:.4f}")
print(f"Test R²:    {grid.score(X_test, y_test):.4f}")

Lasso Regression (L1 Penalty)

Lasso adds the sum of absolute coefficients. It can set coefficients exactly to zero, performing automatic feature selection:

from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.datasets import fetch_california_housing
import numpy as np
 
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2, random_state=42
)
 
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('lasso', Lasso(max_iter=10000))
])
 
param_grid = {'lasso__alpha': [0.001, 0.01, 0.1, 1.0, 10.0]}
grid = GridSearchCV(pipeline, param_grid, cv=5, scoring='r2')
grid.fit(X_train, y_train)
 
print(f"Best alpha: {grid.best_params_['lasso__alpha']}")
print(f"Test R²:    {grid.score(X_test, y_test):.4f}")
 
# Show which features were selected (non-zero coefficients)
lasso_model = grid.best_estimator_.named_steps['lasso']
feature_names = housing.feature_names
for name, coef in zip(feature_names, lasso_model.coef_):
    status = "KEPT" if abs(coef) > 1e-6 else "DROPPED"
    print(f"  {name:12s}: {coef:+.6f}  [{status}]")

ElasticNet (L1 + L2 Penalty)

ElasticNet combines Ridge and Lasso penalties. The l1_ratio parameter controls the mix: 0 = pure Ridge, 1 = pure Lasso.

from sklearn.linear_model import ElasticNet
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.datasets import fetch_california_housing
 
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2, random_state=42
)
 
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('elasticnet', ElasticNet(max_iter=10000))
])
 
param_grid = {
    'elasticnet__alpha': [0.01, 0.1, 1.0],
    'elasticnet__l1_ratio': [0.1, 0.3, 0.5, 0.7, 0.9]
}
 
grid = GridSearchCV(pipeline, param_grid, cv=5, scoring='r2')
grid.fit(X_train, y_train)
 
print(f"Best alpha:    {grid.best_params_['elasticnet__alpha']}")
print(f"Best l1_ratio: {grid.best_params_['elasticnet__l1_ratio']}")
print(f"Test R²:       {grid.score(X_test, y_test):.4f}")

Comparison: LinearRegression vs Ridge vs Lasso vs ElasticNet

Model	Penalty	Feature Selection	When to Use	Scaling Required
LinearRegression	None	No	Few features, no multicollinearity, good signal-to-noise ratio	No
Ridge	L2 (squared)	No (shrinks toward zero)	Many correlated features, want to keep all features	Yes
Lasso	L1 (absolute)	Yes (sets coefficients to zero)	Many features, want automatic feature selection	Yes
ElasticNet	L1 + L2	Yes (partial)	Many correlated features, want some feature selection	Yes

Choosing the Right Model

Use LinearRegression as your baseline. If the model overfits (big gap between train and test R-squared), try Ridge first. If you suspect many irrelevant features, try Lasso. If features are correlated and you want selection, try ElasticNet. For non-linear problems, consider a Random Forest instead. Always use cross-validation to compare.

Assumptions of Linear Regression

Linear regression produces reliable results when these assumptions hold:

Linearity -- The relationship between features and target is linear (or linearizable with transformations).
Independence -- Observations are independent of each other. Violated in time-series data without accounting for autocorrelation.
Homoscedasticity -- The variance of residuals is constant across all levels of the predicted values.
Normality of residuals -- Residuals follow a normal distribution. Matters most for confidence intervals and hypothesis tests, less for prediction accuracy.
No multicollinearity -- Features are not highly correlated with each other. Multicollinearity inflates coefficient variance and makes individual coefficients unreliable.

Checking Assumptions in Code

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
import numpy as np
 
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2, random_state=42
)
 
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
residuals = y_test - y_pred
 
# Check residual statistics
print(f"Residual mean:     {residuals.mean():.6f}")   # Should be near 0
print(f"Residual std:      {residuals.std():.4f}")
print(f"Residual skewness: {float(np.mean((residuals - residuals.mean())**3) / residuals.std()**3):.4f}")
 
# Check for multicollinearity (correlation matrix)
corr_matrix = np.corrcoef(X_train, rowvar=False)
print(f"\nMax feature correlation: {np.max(np.abs(corr_matrix - np.eye(corr_matrix.shape[0]))):.4f}")

Complete Pipeline: Real-World Regression

Here is a production-style pipeline that combines preprocessing, feature engineering, and model selection:

from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.datasets import fetch_california_housing
import numpy as np
 
# Load data
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2, random_state=42
)
 
# Define models to compare
models = {
    'LinearRegression': Pipeline([
        ('scaler', StandardScaler()),
        ('model', LinearRegression())
    ]),
    'Ridge (alpha=1)': Pipeline([
        ('scaler', StandardScaler()),
        ('model', Ridge(alpha=1.0))
    ]),
    'Lasso (alpha=0.01)': Pipeline([
        ('scaler', StandardScaler()),
        ('model', Lasso(alpha=0.01, max_iter=10000))
    ]),
    'Poly(2) + Ridge': Pipeline([
        ('poly', PolynomialFeatures(degree=2, include_bias=False)),
        ('scaler', StandardScaler()),
        ('model', Ridge(alpha=10.0))
    ])
}
 
# Evaluate all models
print(f"{'Model':<25} {'CV R² (mean)':>12} {'CV R² (std)':>12} {'Test R²':>10}")
print("-" * 62)
 
for name, pipeline in models.items():
    cv_scores = cross_val_score(pipeline, X_train, y_train, cv=5, scoring='r2')
    pipeline.fit(X_train, y_train)
    test_r2 = pipeline.score(X_test, y_test)
    print(f"{name:<25} {cv_scores.mean():>12.4f} {cv_scores.std():>12.4f} {test_r2:>10.4f}")

Exploring Regression Results with PyGWalker

After training your model, understanding prediction patterns is critical. PyGWalker (opens in a new tab) lets you visually explore residuals, feature importances, and predicted-vs-actual relationships through an interactive drag-and-drop interface in Jupyter:

import pandas as pd
import pygwalker as pyg
 
# Build a results DataFrame
results = pd.DataFrame(housing.data[len(X_train):], columns=housing.feature_names)
results['actual'] = y_test
results['predicted'] = y_pred
results['residual'] = y_test - y_pred
results['abs_error'] = np.abs(y_test - y_pred)
 
# Launch interactive exploration
walker = pyg.walk(results)

You can drag features to axes, color-code by residual magnitude, and identify which segments of your data the model struggles with -- all without writing plotting code.

For running experiments iteratively in Jupyter, RunCell (opens in a new tab) provides an AI agent that helps you test different feature combinations, hyperparameters, and preprocessing steps without manually rewriting cells.

FAQ

What is LinearRegression in sklearn?

sklearn.linear_model.LinearRegression is an Ordinary Least Squares (OLS) regression model. It fits a linear equation to the data by minimizing the sum of squared differences between actual and predicted values. It is the simplest and most interpretable regression model in scikit-learn.

How do I interpret the R-squared score?

R-squared measures the proportion of variance in the target variable explained by the model. An R-squared of 0.80 means 80% of the variance is explained. A value of 1.0 is a perfect fit, 0.0 means the model is no better than predicting the mean, and negative values mean the model is worse than just using the mean.

When should I use Ridge vs Lasso vs ElasticNet?

Use Ridge when you want to keep all features but reduce overfitting (multicollinear features). Use Lasso when you want automatic feature selection (it sets irrelevant feature coefficients to zero). Use ElasticNet when features are correlated and you want a balance of Ridge's stability and Lasso's sparsity.

Does LinearRegression need feature scaling?

Basic LinearRegression does not require feature scaling because the OLS solution is scale-invariant. However, Ridge, Lasso, and ElasticNet all require scaling because their penalties treat all coefficient magnitudes equally. Always scale features before regularized regression.

How do I handle categorical features in linear regression?

Convert categorical features to numeric using OneHotEncoder or pd.get_dummies() before fitting. Sklearn's LinearRegression only accepts numeric input. For pipelines, use ColumnTransformer to apply different transformations to numeric and categorical columns -- see our sklearn Pipeline guide for a full walkthrough.

What is the difference between MSE and RMSE?

MSE (Mean Squared Error) is the average of squared differences between actual and predicted values. RMSE (Root Mean Squared Error) is the square root of MSE. RMSE has the same units as the target variable, making it easier to interpret. For example, if predicting house prices in dollars, RMSE of 50,000 means average prediction error of about $50,000.

Conclusion

Sklearn's LinearRegression is the starting point for any regression task in Python. It is fast, interpretable, and effective when the underlying relationship is approximately linear. For real-world datasets with noise, multicollinearity, or many features, Ridge, Lasso, and ElasticNet provide regularization that improves generalization. Always evaluate with multiple metrics (R-squared, RMSE, MAE), use train-test splits to avoid overfitting, and check residual patterns to verify your model's assumptions hold. Build pipelines with StandardScaler and PolynomialFeatures to keep your workflow clean and reproducible.

Related Guides

Sklearn Pipeline -- chain preprocessing and regression into a single deployable object
Sklearn Confusion Matrix -- evaluation metrics for classification tasks
Sklearn Random Forest -- a non-linear alternative when linear models underperform
Pandas read_csv -- loading datasets from CSV files for regression analysis

📚