Chapter 5: Logistic Regression in Practice

Logistic regression is one of the most important classification algorithms in machine learning. Despite the "regression" in its name, it is actually a classification algorithm. This chapter will delve into the principles, implementation, and applications of logistic regression.

5.1 What is Logistic Regression?

Logistic regression uses the logistic function (Sigmoid function) to model the probability of binary classification problems. It does not directly predict classes but predicts the probability of a sample belonging to a certain class.

5.1.1 Mathematical Principles

Sigmoid Function:

σ(z) = 1 / (1 + e^(-z))

Where z = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ

Probability Prediction:

P(y=1|x) = σ(β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ)
P(y=0|x) = 1 - P(y=1|x)

Decision Boundary:

When P(y=1|x) ≥ 0.5, predict class 1
When P(y=1|x) < 0.5, predict class 0

5.1.2 Differences from Linear Regression

Feature	Linear Regression	Logistic Regression
Target	Predict continuous values	Predict probability/classification
Output Range	(-∞, +∞)	[0, 1]
Activation Function	None	Sigmoid
Loss Function	Mean Squared Error	Log-Likelihood

5.2 Preparing Environment and Data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, classification_report, roc_curve, auc,
    precision_recall_curve, log_loss
)
from sklearn.pipeline import Pipeline
import warnings
warnings.filterwarnings('ignore')

# Set random seed
np.random.seed(42)

# Set plot style
plt.style.use('seaborn-v0_8')
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

5.3 Binary Classification Logistic Regression

5.3.1 Generate Binary Classification Data

# Generate binary classification dataset
X_binary, y_binary = make_classification(
    n_samples=1000,
    n_features=2,
    n_redundant=0,
    n_informative=2,
    n_clusters_per_class=1,
    random_state=42
)

# Create DataFrame for analysis
df_binary = pd.DataFrame(X_binary, columns=['Feature1', 'Feature2'])
df_binary['Label'] = y_binary

print("Binary Classification Dataset Info:")
print(df_binary.info())
print("\nClass Distribution:")
print(df_binary['Label'].value_counts())

# Visualize data distribution
plt.figure(figsize=(10, 8))
colors = ['red', 'blue']
for i, label in enumerate([0, 1]):
    mask = y_binary == label
    plt.scatter(X_binary[mask, 0], X_binary[mask, 1], 
                c=colors[i], label=f'Class {label}', alpha=0.7)

plt.xlabel('Feature1')
plt.ylabel('Feature2')
plt.title('Binary Classification Data Distribution')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

5.3.2 Train Binary Classification Logistic Regression Model

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_binary, y_binary, test_size=0.2, random_state=42, stratify=y_binary
)

# Feature standardization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create and train logistic regression model
logistic_model = LogisticRegression(random_state=42)
logistic_model.fit(X_train_scaled, y_train)

# View model parameters
print("Logistic Regression Model Parameters:")
print(f"Intercept: {logistic_model.intercept_[0]:.4f}")
print(f"Coefficients: {logistic_model.coef_[0]}")

# Predict probabilities and classes
y_pred_proba = logistic_model.predict_proba(X_test_scaled)
y_pred = logistic_model.predict(X_test_scaled)

print(f"\nPrediction Examples (first 5 samples):")
for i in range(5):
    print(f"Sample {i+1}: Actual={y_test[i]}, Predicted={y_pred[i]}, "
          f"Probability=[{y_pred_proba[i][0]:.3f}, {y_pred_proba[i][1]:.3f}]")

5.3.3 Decision Boundary Visualization

def plot_decision_boundary(X, y, model, scaler=None, title="Decision Boundary"):
    """Plot decision boundary"""
    plt.figure(figsize=(10, 8))
    
    # Create grid
    h = 0.02
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    
    # Predict grid points
    grid_points = np.c_[xx.ravel(), yy.ravel()]
    if scaler:
        grid_points = scaler.transform(grid_points)
    
    Z = model.predict_proba(grid_points)[:, 1]
    Z = Z.reshape(xx.shape)
    
    # Plot contours
    plt.contourf(xx, yy, Z, levels=50, alpha=0.8, cmap='RdYlBu')
    plt.colorbar(label='P(y=1)')
    
    # Plot decision boundary
    plt.contour(xx, yy, Z, levels=[0.5], colors='black', linestyles='--', linewidths=2)
    
    # Plot data points
    colors = ['red', 'blue']
    for i, label in enumerate([0, 1]):
        mask = y == label
        plt.scatter(X[mask, 0], X[mask, 1], 
                   c=colors[i], label=f'Class {label}', alpha=0.7, edgecolors='black')
    
    plt.xlabel('Feature1')
    plt.ylabel('Feature2')
    plt.title(title)
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()

# Plot decision boundary
plot_decision_boundary(X_train, y_train, logistic_model, scaler, "Logistic Regression Decision Boundary")

5.3.4 Sigmoid Function Visualization

# Visualize Sigmoid function
z = np.linspace(-10, 10, 100)
sigmoid = 1 / (1 + np.exp(-z))

plt.figure(figsize=(10, 6))
plt.plot(z, sigmoid, 'b-', linewidth=2, label='Sigmoid Function')
plt.axhline(y=0.5, color='r', linestyle='--', alpha=0.7, label='Decision Threshold')
plt.axvline(x=0, color='g', linestyle='--', alpha=0.7, label='z=0')
plt.xlabel('z = β₀ + β₁x₁ + β₂x₂')
plt.ylabel('P(y=1|x)')
plt.title('Sigmoid Function')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Demonstrate conversion from linear combination to probability
sample_features = X_test_scaled[:10]
linear_combination = logistic_model.decision_function(sample_features)
probabilities = logistic_model.predict_proba(sample_features)[:, 1]

print("Linear Combination to Probability Conversion Example:")
print("Linear Combination(z)\tProbability P(y=1)\tPredicted Class")
print("-" * 40)
for i in range(len(sample_features)):
    pred_class = 1 if probabilities[i] >= 0.5 else 0
    print(f"{linear_combination[i]:8.3f}\t{probabilities[i]:8.3f}\t{pred_class:8d}")

5.4 Model Evaluation

5.4.1 Basic Evaluation Metrics

def evaluate_classification_model(y_true, y_pred, y_pred_proba=None, model_name="Model"):
    """Evaluate classification model performance"""
    print(f"{model_name} Evaluation Results:")
    print("-" * 50)
    
    # Basic metrics
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, average='weighted')
    recall = recall_score(y_true, y_pred, average='weighted')
    f1 = f1_score(y_true, y_pred, average='weighted')
    
    print(f"Accuracy: {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall: {recall:.4f}")
    print(f"F1 Score: {f1:.4f}")
    
    # Log loss
    if y_pred_proba is not None:
        logloss = log_loss(y_true, y_pred_proba)
        print(f"Log Loss: {logloss:.4f}")
    
    print("\nDetailed Classification Report:")
    print(classification_report(y_true, y_pred))
    
    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1': f1,
        'log_loss': log_loss(y_true, y_pred_proba) if y_pred_proba is not None else None
    }

# Evaluate model
metrics = evaluate_classification_model(
    y_test, y_pred, y_pred_proba, "Logistic Regression"
)

5.4.2 Confusion Matrix

# Calculate and visualize confusion matrix
cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Class 0', 'Class 1'],
            yticklabels=['Class 0', 'Class 1'])
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

# Calculate metrics from confusion matrix
tn, fp, fn, tp = cm.ravel()
print("Confusion Matrix Analysis:")
print(f"True Negative (TN): {tn}")
print(f"False Positive (FP): {fp}")
print(f"False Negative (FN): {fn}")
print(f"True Positive (TP): {tp}")

print(f"\nManually Calculated Metrics:")
print(f"Accuracy: {(tp + tn) / (tp + tn + fp + fn):.4f}")
print(f"Precision: {tp / (tp + fp):.4f}")
print(f"Recall: {tp / (tp + fn):.4f}")
print(f"Specificity: {tn / (tn + fp):.4f}")

5.4.3 ROC Curve and AUC

# Calculate ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba[:, 1])
roc_auc = auc(fpr, tpr)

# Plot ROC curve
plt.figure(figsize=(10, 8))
plt.plot(fpr, tpr, color='darkorange', lw=2, 
         label=f'ROC Curve (AUC = {roc_auc:.3f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', 
         label='Random Classifier')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate (FPR)')
plt.ylabel('True Positive Rate (TPR)')
plt.title('ROC Curve')
plt.legend(loc="lower right")
plt.grid(True, alpha=0.3)
plt.show()

print(f"AUC Score: {roc_auc:.4f}")

# Performance at different thresholds
print("\nPerformance at Different Thresholds:")
print("Threshold\t\tFPR\t\tTPR\t\tPrecision\t\tRecall")
print("-" * 60)

for i in range(0, len(thresholds), len(thresholds)//10):
    threshold = thresholds[i]
    y_pred_threshold = (y_pred_proba[:, 1] >= threshold).astype(int)
    
    if len(np.unique(y_pred_threshold)) > 1:  # Avoid division by zero
        precision_thresh = precision_score(y_test, y_pred_threshold)
        recall_thresh = recall_score(y_test, y_pred_threshold)
        print(f"{threshold:.3f}\t\t{fpr[i]:.3f}\t\t{tpr[i]:.3f}\t\t{precision_thresh:.3f}\t\t{recall_thresh:.3f}")

5.4.4 Precision-Recall Curve

# Calculate precision-recall curve
precision_curve, recall_curve, pr_thresholds = precision_recall_curve(
    y_test, y_pred_proba[:, 1]
)
pr_auc = auc(recall_curve, precision_curve)

# Plot PR curve
plt.figure(figsize=(10, 8))
plt.plot(recall_curve, precision_curve, color='blue', lw=2,
         label=f'PR Curve (AUC = {pr_auc:.3f})')

# Baseline (random classifier)
baseline = np.sum(y_test) / len(y_test)
plt.axhline(y=baseline, color='red', linestyle='--', 
           label=f'Random Classifier (Precision = {baseline:.3f})')

plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print(f"PR-AUC Score: {pr_auc:.4f}")

5.5 Multiclass Logistic Regression

5.5.1 Load Multiclass Data

# Use wine dataset (3 classes)
wine_data = load_wine()
X_wine = wine_data.data
y_wine = wine_data.target
feature_names_wine = wine_data.feature_names
target_names_wine = wine_data.target_names

print("Wine Dataset Info:")
print(f"Sample Count: {X_wine.shape[0]}")
print(f"Feature Count: {X_wine.shape[1]}")
print(f"Class Count: {len(np.unique(y_wine))}")
print(f"Class Names: {target_names_wine}")

# View class distribution
unique, counts = np.unique(y_wine, return_counts=True)
plt.figure(figsize=(8, 6))
plt.bar(target_names_wine, counts, color=['red', 'green', 'blue'], alpha=0.7)
plt.title('Wine Dataset Class Distribution')
plt.xlabel('Wine Type')
plt.ylabel('Sample Count')
plt.show()

for i, name in enumerate(target_names_wine):
    print(f"{name}: {counts[i]} samples")

5.5.2 Feature Analysis

# Create DataFrame for analysis
df_wine = pd.DataFrame(X_wine, columns=feature_names_wine)
df_wine['wine_type'] = y_wine

# Select several important features for visualization
important_features = ['alcohol', 'flavanoids', 'color_intensity', 'proline']

fig, axes = plt.subplots(2, 2, figsize=(12, 10))
fig.suptitle('Distribution of Important Features', fontsize=16)

for i, feature in enumerate(important_features):
    row = i // 2
    col = i % 2
    
    for wine_type in range(3):
        data = df_wine[df_wine['wine_type'] == wine_type][feature]
        axes[row, col].hist(data, alpha=0.6, label=target_names_wine[wine_type], bins=15)
    
    axes[row, col].set_title(feature)
    axes[row, col].set_xlabel(feature)
    axes[row, col].set_ylabel('Frequency')
    axes[row, col].legend()
    axes[row, col].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Feature correlation analysis
plt.figure(figsize=(12, 10))
correlation_matrix = df_wine[important_features + ['wine_type']].corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0,
            square=True, linewidths=0.5)
plt.title('Important Features Correlation Matrix')
plt.tight_layout()
plt.show()

5.5.3 Train Multiclass Logistic Regression

# Split data
X_train_wine, X_test_wine, y_train_wine, y_test_wine = train_test_split(
    X_wine, y_wine, test_size=0.2, random_state=42, stratify=y_wine
)

# Feature standardization
scaler_wine = StandardScaler()
X_train_wine_scaled = scaler_wine.fit_transform(X_train_wine)
X_test_wine_scaled = scaler_wine.transform(X_test_wine)

# Train multiclass logistic regression
# multi_class='ovr': One-vs-Rest strategy
# multi_class='multinomial': Multinomial logistic regression
logistic_multi = LogisticRegression(
    multi_class='multinomial',
    solver='lbfgs',
    random_state=42,
    max_iter=1000
)

logistic_multi.fit(X_train_wine_scaled, y_train_wine)

print("Multiclass Logistic Regression Model Info:")
print(f"Class Count: {len(logistic_multi.classes_)}")
print(f"Coefficient Matrix Shape: {logistic_multi.coef_.shape}")
print(f"Intercepts: {logistic_multi.intercept_}")

# Predict
y_pred_wine = logistic_multi.predict(X_test_wine_scaled)
y_pred_proba_wine = logistic_multi.predict_proba(X_test_wine_scaled)

# Evaluate
wine_metrics = evaluate_classification_model(
    y_test_wine, y_pred_wine, y_pred_proba_wine, "Multiclass Logistic Regression"
)

5.5.4 Multiclass Confusion Matrix

# Multiclass confusion matrix
cm_wine = confusion_matrix(y_test_wine, y_pred_wine)

plt.figure(figsize=(8, 6))
sns.heatmap(cm_wine, annot=True, fmt='d', cmap='Blues',
            xticklabels=target_names_wine,
            yticklabels=target_names_wine)
plt.title('Multiclass Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

# Performance for each class
print("Detailed Performance for Each Class:")
for i, class_name in enumerate(target_names_wine):
    class_precision = precision_score(y_test_wine, y_pred_wine, 
                                    labels=[i], average=None)[0]
    class_recall = recall_score(y_test_wine, y_pred_wine, 
                              labels=[i], average=None)[0]
    class_f1 = f1_score(y_test_wine, y_pred_wine, 
                       labels=[i], average=None)[0]
    
    print(f"{class_name}:")
    print(f"  Precision: {class_precision:.4f}")
    print(f"  Recall: {class_recall:.4f}")
    print(f"  F1 Score: {class_f1:.4f}")

5.5.5 One-vs-Rest vs Multinomial Comparison

# Compare different multiclass strategies
strategies = ['ovr', 'multinomial']
strategy_results = {}

for strategy in strategies:
    model = LogisticRegression(
        multi_class=strategy,
        solver='lbfgs',
        random_state=42,
        max_iter=1000
    )
    
    model.fit(X_train_wine_scaled, y_train_wine)
    y_pred = model.predict(X_test_wine_scaled)
    
    accuracy = accuracy_score(y_test_wine, y_pred)
    f1 = f1_score(y_test_wine, y_pred, average='weighted')
    
    strategy_results[strategy] = {'accuracy': accuracy, 'f1': f1}
    
    print(f"{strategy.upper()} Strategy:")
    print(f"  Accuracy: {accuracy:.4f}")
    print(f"  F1 Score: {f1:.4f}")
    print()

# Visualize comparison
strategies_df = pd.DataFrame(strategy_results).T
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

strategies_df['accuracy'].plot(kind='bar', ax=axes[0], color='skyblue')
axes[0].set_title('Accuracy Comparison')
axes[0].set_ylabel('Accuracy')
axes[0].tick_params(axis='x', rotation=0)

strategies_df['f1'].plot(kind='bar', ax=axes[1], color='lightcoral')
axes[1].set_title('F1 Score Comparison')
axes[1].set_ylabel('F1 Score')
axes[1].tick_params(axis='x', rotation=0)

plt.tight_layout()
plt.show()

5.6 Regularized Logistic Regression

5.6.1 L1 and L2 Regularization

# Create high-dimensional dataset to test regularization effects
X_high_dim, y_high_dim = make_classification(
    n_samples=500,
    n_features=50,
    n_informative=10,
    n_redundant=10,
    n_clusters_per_class=1,
    random_state=42
)

X_train_hd, X_test_hd, y_train_hd, y_test_hd = train_test_split(
    X_high_dim, y_high_dim, test_size=0.2, random_state=42
)

# Standardization
scaler_hd = StandardScaler()
X_train_hd_scaled = scaler_hd.fit_transform(X_train_hd)
X_test_hd_scaled = scaler_hd.transform(X_test_hd)

# Compare different regularization methods
penalties = ['none', 'l1', 'l2', 'elasticnet']
C_values = [0.01, 0.1, 1, 10, 100]

results = {}

for penalty in penalties:
    if penalty == 'none':
        model = LogisticRegression(penalty=penalty, solver='lbfgs', 
                                 random_state=42, max_iter=1000)
        model.fit(X_train_hd_scaled, y_train_hd)
        y_pred = model.predict(X_test_hd_scaled)
        accuracy = accuracy_score(y_test_hd, y_pred)
        results[f'{penalty}'] = accuracy
        
    elif penalty == 'elasticnet':
        model = LogisticRegression(penalty=penalty, solver='saga', 
                                 C=1.0, l1_ratio=0.5,
                                 random_state=42, max_iter=1000)
        model.fit(X_train_hd_scaled, y_train_hd)
        y_pred = model.predict(X_test_hd_scaled)
        accuracy = accuracy_score(y_test_hd, y_pred)
        results[f'{penalty}'] = accuracy
        
    else:
        best_accuracy = 0
        best_C = None
        
        for C in C_values:
            solver = 'liblinear' if penalty == 'l1' else 'lbfgs'
            model = LogisticRegression(penalty=penalty, C=C, solver=solver,
                                     random_state=42, max_iter=1000)
            model.fit(X_train_hd_scaled, y_train_hd)
            y_pred = model.predict(X_test_hd_scaled)
            accuracy = accuracy_score(y_test_hd, y_pred)
            
            if accuracy > best_accuracy:
                best_accuracy = accuracy
                best_C = C
        
        results[f'{penalty} (C={best_C})'] = best_accuracy

print("Regularization Method Comparison:")
for method, accuracy in results.items():
    print(f"{method}: {accuracy:.4f}")

5.6.2 Regularization Path Visualization

from sklearn.linear_model import LogisticRegressionCV

# L1 regularization path
l1_model = LogisticRegressionCV(
    penalty='l1',
    solver='liblinear',
    Cs=np.logspace(-4, 2, 20),
    cv=5,
    random_state=42
)

l1_model.fit(X_train_hd_scaled, y_train_hd)

# L2 regularization path
l2_model = LogisticRegressionCV(
    penalty='l2',
    solver='lbfgs',
    Cs=np.logspace(-4, 2, 20),
    cv=5,
    random_state=42
)

l2_model.fit(X_train_hd_scaled, y_train_hd)

print(f"L1 Best C: {l1_model.C_[0]:.4f}")
print(f"L2 Best C: {l2_model.C_[0]:.4f}")

# Visualize coefficient paths
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# L1 path
C_range = np.logspace(-4, 2, 20)
coefs_l1 = []
for C in C_range:
    model = LogisticRegression(penalty='l1', C=C, solver='liblinear', 
                             random_state=42, max_iter=1000)
    model.fit(X_train_hd_scaled, y_train_hd)
    coefs_l1.append(model.coef_[0])

coefs_l1 = np.array(coefs_l1)
for i in range(min(10, coefs_l1.shape[1])):  # Only show first 10 features
    axes[0].plot(C_range, coefs_l1[:, i], label=f'Feature{i+1}')

axes[0].set_xscale('log')
axes[0].set_xlabel('C (Inverse of Regularization Strength)')
axes[0].set_ylabel('Coefficient Value')
axes[0].set_title('L1 Regularization Path')
axes[0].grid(True, alpha=0.3)
axes[0].legend(bbox_to_anchor=(1.05, 1), loc='upper left')

# L2 path
coefs_l2 = []
for C in C_range:
    model = LogisticRegression(penalty='l2', C=C, solver='lbfgs', 
                             random_state=42, max_iter=1000)
    model.fit(X_train_hd_scaled, y_train_hd)
    coefs_l2.append(model.coef_[0])

coefs_l2 = np.array(coefs_l2)
for i in range(min(10, coefs_l2.shape[1])):  # Only show first 10 features
    axes[1].plot(C_range, coefs_l2[:, i], label=f'Feature{i+1}')

axes[1].set_xscale('log')
axes[1].set_xlabel('C (Inverse of Regularization Strength)')
axes[1].set_ylabel('Coefficient Value')
axes[1].set_title('L2 Regularization Path')
axes[1].grid(True, alpha=0.3)
axes[1].legend(bbox_to_anchor=(1.05, 1), loc='upper left')

plt.tight_layout()
plt.show()

# Feature selection effect comparison
l1_final = LogisticRegression(penalty='l1', C=l1_model.C_[0], 
                            solver='liblinear', random_state=42)
l1_final.fit(X_train_hd_scaled, y_train_hd)

l2_final = LogisticRegression(penalty='l2', C=l2_model.C_[0], 
                            solver='lbfgs', random_state=42)
l2_final.fit(X_train_hd_scaled, y_train_hd)

print(f"L1 Regularization Non-zero Coefficients: {np.sum(l1_final.coef_[0] != 0)}/{len(l1_final.coef_[0])}")
print(f"L2 Regularization Non-zero Coefficients: {np.sum(l2_final.coef_[0] != 0)}/{len(l2_final.coef_[0])}")

5.7 Hyperparameter Tuning

5.7.1 Grid Search

# Use grid search to optimize hyperparameters
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']  # Supports l1 and l2
}

grid_search = GridSearchCV(
    LogisticRegression(random_state=42, max_iter=1000),
    param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)

grid_search.fit(X_train_hd_scaled, y_train_hd)

print("Grid Search Results:")
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Cross-Validation Score: {grid_search.best_score_:.4f}")

# Test set performance
best_model = grid_search.best_estimator_
y_pred_best = best_model.predict(X_test_hd_scaled)
test_accuracy = accuracy_score(y_test_hd, y_pred_best)
print(f"Test Set Accuracy: {test_accuracy:.4f}")

# Visualize grid search results
results_df = pd.DataFrame(grid_search.cv_results_)

plt.figure(figsize=(10, 8))
pivot_table = results_df.pivot_table(
    values='mean_test_score',
    index='param_penalty',
    columns='param_C'
)

sns.heatmap(pivot_table, annot=True, cmap='viridis', fmt='.4f')
plt.title('Grid Search Results Heatmap')
plt.xlabel('C Value')
plt.ylabel('Regularization Type')
plt.show()

5.7.2 Learning Curve Analysis

from sklearn.model_selection import learning_curve

def plot_learning_curve_classification(estimator, X, y, title="Learning Curve"):
    """Plot learning curve for classification model"""
    train_sizes, train_scores, val_scores = learning_curve(
        estimator, X, y, cv=5, n_jobs=-1,
        train_sizes=np.linspace(0.1, 1.0, 10),
        scoring='accuracy'
    )
    
    train_mean = np.mean(train_scores, axis=1)
    train_std = np.std(train_scores, axis=1)
    val_mean = np.mean(val_scores, axis=1)
    val_std = np.std(val_scores, axis=1)
    
    plt.figure(figsize=(10, 6))
    plt.plot(train_sizes, train_mean, 'o-', color='blue', label='Training Score')
    plt.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, 
                     alpha=0.1, color='blue')
    
    plt.plot(train_sizes, val_mean, 'o-', color='red', label='Validation Score')
    plt.fill_between(train_sizes, val_mean - val_std, val_mean + val_std, 
                     alpha=0.1, color='red')
    
    plt.xlabel('Number of Training Samples')
    plt.ylabel('Accuracy')
    plt.title(title)
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()

# Plot learning curve for best model
plot_learning_curve_classification(
    best_model, X_train_hd_scaled, y_train_hd, 
    "Best Logistic Regression Model Learning Curve"
)

5.8 Practical Application Cases

5.8.1 Breast Cancer Diagnosis Case

# Load breast cancer dataset
cancer_data = load_breast_cancer()
X_cancer = cancer_data.data
y_cancer = cancer_data.target
feature_names_cancer = cancer_data.feature_names
target_names_cancer = cancer_data.target_names

print("Breast Cancer Dataset Info:")
print(f"Sample Count: {X_cancer.shape[0]}")
print(f"Feature Count: {X_cancer.shape[1]}")
print(f"Classes: {target_names_cancer}")

# View class distribution
unique, counts = np.unique(y_cancer, return_counts=True)
print(f"Benign: {counts[1]} samples")
print(f"Malignant: {counts[0]} samples")

# Split data
X_train_cancer, X_test_cancer, y_train_cancer, y_test_cancer = train_test_split(
    X_cancer, y_cancer, test_size=0.2, random_state=42, stratify=y_cancer
)

# Create complete preprocessing and modeling pipeline
cancer_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', LogisticRegression(random_state=42, max_iter=1000))
])

# Train model
cancer_pipeline.fit(X_train_cancer, y_train_cancer)

# Predict and evaluate
y_pred_cancer = cancer_pipeline.predict(X_test_cancer)
y_pred_proba_cancer = cancer_pipeline.predict_proba(X_test_cancer)

print("\nBreast Cancer Diagnosis Model Evaluation:")
cancer_metrics = evaluate_classification_model(
    y_test_cancer, y_pred_cancer, y_pred_proba_cancer, "Breast Cancer Diagnosis Model"
)

5.8.2 Feature Importance Analysis

# Get feature importance (based on absolute coefficient values)
classifier = cancer_pipeline.named_steps['classifier']
feature_importance = np.abs(classifier.coef_[0])

# Create feature importance DataFrame
importance_df = pd.DataFrame({
    'feature': feature_names_cancer,
    'importance': feature_importance
}).sort_values('importance', ascending=False)

# Visualize top 15 most important features
plt.figure(figsize=(10, 8))
top_features = importance_df.head(15)
plt.barh(range(len(top_features)), top_features['importance'])
plt.yticks(range(len(top_features)), top_features['feature'])
plt.xlabel('Feature Importance (Absolute Coefficient Value)')
plt.title('Breast Cancer Diagnosis Model - Top 15 Important Features')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

print("Top 10 Most Important Features:")
for i, (_, row) in enumerate(top_features.head(10).iterrows()):
    print(f"{i+1:2d}. {row['feature']}: {row['importance']:.4f}")

5.8.3 Model Interpretation and Prediction Examples

# Predict new samples
def predict_cancer_diagnosis(model, scaler, sample_features, feature_names):
    """Predict breast cancer diagnosis result"""
    # Standardize features
    sample_scaled = scaler.transform([sample_features])
    
    # Predict probability
    proba = model.predict_proba(sample_scaled)[0]
    prediction = model.predict(sample_scaled)[0]
    
    print("Breast Cancer Diagnosis Prediction Result:")
    print(f"Predicted Class: {'Benign' if prediction == 1 else 'Malignant'}")
    print(f"Malignant Probability: {proba[0]:.3f}")
    print(f"Benign Probability: {proba[1]:.3f}")
    
    # Display contribution of most important features
    classifier = model.named_steps['classifier']
    coefficients = classifier.coef_[0]
    
    print("\nImportant Feature Contribution Analysis:")
    feature_contributions = sample_scaled[0] * coefficients
    
    # Get features with largest contribution
    top_indices = np.argsort(np.abs(feature_contributions))[-5:]
    
    for idx in reversed(top_indices):
        contribution = feature_contributions[idx]
        direction = "Supports Malignant" if contribution < 0 else "Supports Benign"
        print(f"{feature_names[idx]}: {contribution:.3f} ({direction})")

# Use a sample from test set for demonstration
sample_idx = 0
sample_features = X_test_cancer[sample_idx]
true_label = y_test_cancer[sample_idx]

print(f"True Label: {'Benign' if true_label == 1 else 'Malignant'}")
predict_cancer_diagnosis(cancer_pipeline, 
                        cancer_pipeline.named_steps['scaler'],
                        sample_features, 
                        feature_names_cancer)

5.9 Exercises

Exercise 1: Basic Logistic Regression

Use make_classification to generate a binary classification dataset
Train a logistic regression model and draw the decision boundary
Analyze the impact of different thresholds on classification results

Exercise 2: Multiclass Problems

Use iris dataset to train multiclass logistic regression
Compare performance of One-vs-Rest and Multinomial strategies
Analyze the classification difficulty for each class

Exercise 3: Imbalanced Data Handling

Create an imbalanced binary classification dataset (ratio 1:9)
Use different evaluation metrics to assess model performance
Try using class_weight='balanced' parameter to improve performance

Exercise 4: Feature Selection

Use high-dimensional dataset (features > 100)
Compare feature selection effects of L1 and L2 regularization
Analyze the impact of regularization strength on model performance

5.10 Summary

In this chapter, we have deeply learned various aspects of logistic regression:

Core Concepts

Logistic Regression Principles: Sigmoid function, probability prediction, decision boundary
Multiclass Strategies: One-vs-Rest, Multinomial
Regularization Methods: L1, L2, ElasticNet

Main Techniques

Model Training: Binary and multiclass logistic regression
Performance Evaluation: Accuracy, Precision, Recall, F1, AUC
Visualization Techniques: ROC curve, PR curve, decision boundary
Hyperparameter Tuning: Grid search, cross-validation

Practical Skills

Data Preprocessing: Standardization, feature selection
Model Interpretation: Coefficient analysis, feature importance
Real Applications: Medical diagnosis, classification prediction
Performance Optimization: Regularization, threshold adjustment

Key Points

Logistic regression is a linear classifier suitable for linearly separable problems
The Sigmoid function maps linear combinations to probability space
Regularization can prevent overfitting and perform feature selection
The choice of evaluation metrics depends on specific business requirements

5.11 Next Steps

Now you have mastered the important classification algorithm of logistic regression! In the next chapter Decision Tree Algorithm, we will learn a completely different algorithm - decision tree, which has excellent interpretability and is the foundation for understanding more complex ensemble methods.

Chapter Key Points Review:

✓ Understood the mathematical principles of logistic regression and Sigmoid function
✓ Mastered the implementation of binary and multiclass logistic regression
✓ Learned to use various evaluation metrics for classification models
✓ Understood the application of regularization in logistic regression
✓ Mastered the drawing and interpretation of ROC curves and PR curves
✓ Able to build a complete classification prediction system

#Chapter 5: Logistic Regression in Practice

#5.1 What is Logistic Regression?

#5.1.1 Mathematical Principles

#5.1.2 Differences from Linear Regression

#5.2 Preparing Environment and Data

#5.3 Binary Classification Logistic Regression

#5.3.1 Generate Binary Classification Data

#5.3.2 Train Binary Classification Logistic Regression Model

#5.3.3 Decision Boundary Visualization

#5.3.4 Sigmoid Function Visualization

#5.4 Model Evaluation

#5.4.1 Basic Evaluation Metrics

#5.4.2 Confusion Matrix

#5.4.3 ROC Curve and AUC

#5.4.4 Precision-Recall Curve

#5.5 Multiclass Logistic Regression

#5.5.1 Load Multiclass Data

#5.5.2 Feature Analysis

#5.5.3 Train Multiclass Logistic Regression

#5.5.4 Multiclass Confusion Matrix

#5.5.5 One-vs-Rest vs Multinomial Comparison

#5.6 Regularized Logistic Regression

#5.6.1 L1 and L2 Regularization

#5.6.2 Regularization Path Visualization

#5.7 Hyperparameter Tuning

#5.7.1 Grid Search

#5.7.2 Learning Curve Analysis

#5.8 Practical Application Cases

#5.8.1 Breast Cancer Diagnosis Case

#5.8.2 Feature Importance Analysis

#5.8.3 Model Interpretation and Prediction Examples

#5.9 Exercises

#Exercise 1: Basic Logistic Regression

#Exercise 2: Multiclass Problems

#Exercise 3: Imbalanced Data Handling

#Exercise 4: Feature Selection

#5.10 Summary

#Core Concepts

#Main Techniques

#Practical Skills

#Key Points

#5.11 Next Steps