Skip to content

Chapter 8: Support Vector Machines

Support Vector Machine (SVM) is one of the most powerful and elegant algorithms in machine learning. It solves classification and regression problems by finding optimal separation hyperplanes, and excels in handling high-dimensional data and nonlinear problems.

8.1 What is a Support Vector Machine?

The core idea of Support Vector Machine is to find an optimal decision boundary (hyperplane) that maximizes the margin between different classes. This decision boundary is determined by a few key data points (support vectors).

8.1.1 Core Concepts

  • Hyperplane: An (n-1)-dimensional subspace in n-dimensional space that separates data
  • Support Vectors: Data points closest to the decision boundary
  • Margin: Distance from support vectors to the decision boundary
  • Kernel Function: Function that maps data to high-dimensional space

8.1.2 Advantages of SVM

  • Effective for High-dimensional Data: Remains effective when number of features is large
  • Memory Efficient: Only uses support vectors for prediction
  • Flexible: Handles nonlinear problems through different kernel functions
  • Strong Generalization: Based on structural risk minimization principle

8.1.3 Disadvantages of SVM

  • Slow Training on Large Datasets: High time complexity
  • Sensitive to Noise: Outliers may affect decision boundary
  • Requires Feature Scaling: Sensitive to feature scale
  • No Probability Output: Does not directly provide prediction probabilities

8.2 Environment and Data Preparation

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification, make_circles, make_moons, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score, validation_curve
from sklearn.svm import SVC, SVR, LinearSVC, LinearSVR
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import (
    accuracy_score, classification_report, confusion_matrix,
    mean_squared_error, r2_score, roc_curve, auc
)
from sklearn.pipeline import Pipeline
import warnings
warnings.filterwarnings('ignore')

# Set random seed
np.random.seed(42)

# Set figure style
plt.style.use('seaborn-v0_8')
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

8.3 Linear SVM

8.3.1 Linearly Separable Data

python
# Create linearly separable data
def create_linearly_separable_data():
    """Create linearly separable binary classification data"""
    np.random.seed(42)

    # Class 1
    class1_x = np.random.normal(2, 0.5, 50)
    class1_y = np.random.normal(2, 0.5, 50)

    # Class 2
    class2_x = np.random.normal(-2, 0.5, 50)
    class2_y = np.random.normal(-2, 0.5, 50)

    X = np.vstack([np.column_stack([class1_x, class1_y]),
                   np.column_stack([class2_x, class2_y])])
    y = np.hstack([np.ones(50), np.zeros(50)])

    return X, y

X_linear, y_linear = create_linearly_separable_data()

# Visualize data
plt.figure(figsize=(10, 8))
colors = ['red', 'blue']
for i, color in enumerate(colors):
    mask = y_linear == i
    plt.scatter(X_linear[mask, 0], X_linear[mask, 1],
                c=color, label=f'Class {i}', alpha=0.7, s=50)

plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Linearly Separable Data')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print(f"Data shape: {X_linear.shape}")
print(f"Class distribution: {np.bincount(y_linear.astype(int))}")

8.3.2 Train Linear SVM

python
# Split data
X_train_linear, X_test_linear, y_train_linear, y_test_linear = train_test_split(
    X_linear, y_linear, test_size=0.2, random_state=42, stratify=y_linear
)

# Feature standardization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_linear)
X_test_scaled = scaler.transform(X_test_linear)

# Create linear SVM classifier
linear_svm = SVC(kernel='linear', C=1.0, random_state=42)
linear_svm.fit(X_train_scaled, y_train_linear)

# Predict
y_pred_linear = linear_svm.predict(X_test_scaled)

# Evaluate
accuracy_linear = accuracy_score(y_test_linear, y_pred_linear)
print(f"Linear SVM accuracy: {accuracy_linear:.4f}")

print("\nDetailed classification report:")
print(classification_report(y_test_linear, y_pred_linear))

# Get support vector information
print(f"\nNumber of support vectors: {linear_svm.n_support_}")
print(f"Total support vectors: {len(linear_svm.support_)}")
print(f"Support vector ratio: {len(linear_svm.support_) / len(X_train_scaled) * 100:.2f}%")

8.3.3 Visualize Decision Boundary and Support Vectors

python
def plot_svm_decision_boundary(X, y, model, scaler=None, title="SVM Decision Boundary"):
    """Plot SVM decision boundary and support vectors"""
    plt.figure(figsize=(12, 8))

    # If there's a scaler, use original data for visualization
    if scaler is not None:
        X_plot = X
        # Create grid
        h = 0.02
        x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
        y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
        xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                           np.arange(y_min, y_max, h))

        # Predict grid points (need standardization)
        grid_points = np.c_[xx.ravel(), yy.ravel()]
        grid_points_scaled = scaler.transform(grid_points)
        Z = model.predict(grid_points_scaled)
        Z = Z.reshape(xx.shape)

        # Get decision function values (for plotting margins)
        decision_values = model.decision_function(grid_points_scaled)
        decision_values = decision_values.reshape(xx.shape)

        # Get support vectors (need inverse standardization)
        support_vectors_scaled = model.support_vectors_
        support_vectors = scaler.inverse_transform(support_vectors_scaled)
    else:
        X_plot = X
        # Create grid
        h = 0.02
        x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
        y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
        xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                           np.arange(y_min, y_max, h))

        Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
        Z = Z.reshape(xx.shape)

        decision_values = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
        decision_values = decision_values.reshape(xx.shape)

        support_vectors = model.support_vectors_

    # Plot decision boundary
    plt.contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')

    # Plot decision boundary lines and margins
    plt.contour(xx, yy, decision_values, levels=[-1, 0, 1],
                colors=['red', 'black', 'red'], linestyles=['--', '-', '--'],
                linewidths=[2, 3, 2])

    # Plot data points
    colors = ['red', 'blue']
    for i, color in enumerate(colors):
        mask = y == i
        plt.scatter(X_plot[mask, 0], X_plot[mask, 1],
                   c=color, label=f'Class {i}', alpha=0.7, s=50)

    # Highlight support vectors
    plt.scatter(support_vectors[:, 0], support_vectors[:, 1],
                s=200, facecolors='none', edgecolors='black',
                linewidths=2, label='Support Vectors')

    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title(title)
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()

# Plot linear SVM decision boundary
plot_svm_decision_boundary(X_train_linear, y_train_linear, linear_svm, scaler,
                          "Linear SVM Decision Boundary and Support Vectors")

8.3.4 Impact of C Parameter

python
# Compare impact of different C values
C_values = [0.1, 1, 10, 100]
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Impact of Different C Values on Linear SVM', fontsize=16)

for i, C in enumerate(C_values):
    row = i // 2
    col = i % 2

    # Train SVM
    svm_c = SVC(kernel='linear', C=C, random_state=42)
    svm_c.fit(X_train_scaled, y_train_linear)

    # Predict
    y_pred_c = svm_c.predict(X_test_scaled)
    accuracy_c = accuracy_score(y_test_linear, y_pred_c)

    # Create grid for visualization
    h = 0.02
    x_min, x_max = X_train_linear[:, 0].min() - 1, X_train_linear[:, 0].max() + 1
    y_min, y_max = X_train_linear[:, 1].min() - 1, X_train_linear[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                       np.arange(y_min, y_max, h))

    grid_points_scaled = scaler.transform(np.c_[xx.ravel(), yy.ravel()])
    Z = svm_c.predict(grid_points_scaled)
    Z = Z.reshape(xx.shape)

    decision_values = svm_c.decision_function(grid_points_scaled)
    decision_values = decision_values.reshape(xx.shape)

    # Plot
    axes[row, col].contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')
    axes[row, col].contour(xx, yy, decision_values, levels=[-1, 0, 1],
                          colors=['red', 'black', 'red'],
                          linestyles=['--', '-', '--'], linewidths=[1, 2, 1])

    # Plot data points
    for j, color in enumerate(['red', 'blue']):
        mask = y_train_linear == j
        axes[row, col].scatter(X_train_linear[mask, 0], X_train_linear[mask, 1],
                              c=color, alpha=0.7, s=30)

    # Support vectors
    support_vectors = scaler.inverse_transform(svm_c.support_vectors_)
    axes[row, col].scatter(support_vectors[:, 0], support_vectors[:, 1],
                          s=100, facecolors='none', edgecolors='black', linewidths=1)

    axes[row, col].set_title(f'C={C}, Accuracy={accuracy_c:.3f}, Support Vectors={len(svm_c.support_)}')
    axes[row, col].set_xlabel('Feature 1')
    axes[row, col].set_ylabel('Feature 2')
    axes[row, col].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Analyze impact of C value on performance
print("Impact of C value on model:")
print("C\tAccuracy\tSupport Vectors")
print("-" * 30)
for C in C_values:
    svm_c = SVC(kernel='linear', C=C, random_state=42)
    svm_c.fit(X_train_scaled, y_train_linear)
    y_pred_c = svm_c.predict(X_test_scaled)
    accuracy_c = accuracy_score(y_test_linear, y_pred_c)
    print(f"{C}\t{accuracy_c:.4f}\t{len(svm_c.support_)}")

8.4 Nonlinear SVM and Kernel Functions

8.4.1 Nonlinear Data

python
# Create nonlinear datasets
X_circles, y_circles = make_circles(n_samples=200, noise=0.1, factor=0.3, random_state=42)
X_moons, y_moons = make_moons(n_samples=200, noise=0.1, random_state=42)

# Visualize nonlinear data
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Concentric circles data
for i, color in enumerate(['red', 'blue']):
    mask = y_circles == i
    axes[0].scatter(X_circles[mask, 0], X_circles[mask, 1],
                   c=color, label=f'Class {i}', alpha=0.7)
axes[0].set_title('Concentric Circles Data')
axes[0].set_xlabel('Feature 1')
axes[0].set_ylabel('Feature 2')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Crescent moon data
for i, color in enumerate(['red', 'blue']):
    mask = y_moons == i
    axes[1].scatter(X_moons[mask, 0], X_moons[mask, 1],
                   c=color, label=f'Class {i}', alpha=0.7)
axes[1].set_title('Crescent Moon Data')
axes[1].set_xlabel('Feature 1')
axes[1].set_ylabel('Feature 2')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

8.4.2 Comparison of Different Kernel Functions

python
# Compare performance of different kernel functions on nonlinear data
kernels = ['linear', 'poly', 'rbf', 'sigmoid']
datasets = [('Concentric Circles', X_circles, y_circles), ('Crescent Moon', X_moons, y_moons)]

for dataset_name, X_data, y_data in datasets:
    print(f"\nPerformance of different kernel functions on {dataset_name} dataset:")
    print("Kernel\t\tAccuracy\t\tSupport Vectors")
    print("-" * 40)

    # Data preprocessing
    X_train, X_test, y_train, y_test = train_test_split(
        X_data, y_data, test_size=0.2, random_state=42, stratify=y_data
    )

    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)

    # Test different kernel functions
    kernel_results = {}

    for kernel in kernels:
        if kernel == 'poly':
            svm = SVC(kernel=kernel, degree=3, C=1.0, random_state=42)
        else:
            svm = SVC(kernel=kernel, C=1.0, random_state=42)

        svm.fit(X_train_scaled, y_train)
        y_pred = svm.predict(X_test_scaled)
        accuracy = accuracy_score(y_test, y_pred)

        kernel_results[kernel] = {
            'accuracy': accuracy,
            'n_support': len(svm.support_),
            'model': svm
        }

        print(f"{kernel}\t\t{accuracy:.4f}\t\t{len(svm.support_)}")

    # Visualize decision boundaries of different kernel functions
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    fig.suptitle(f'{dataset_name} Dataset - Decision Boundaries of Different Kernel Functions', fontsize=16)

    for i, kernel in enumerate(kernels):
        row = i // 2
        col = i % 2

        model = kernel_results[kernel]['model']
        accuracy = kernel_results[kernel]['accuracy']
        n_support = kernel_results[kernel]['n_support']

        # Create grid
        h = 0.02
        x_min, x_max = X_data[:, 0].min() - 0.5, X_data[:, 0].max() + 0.5
        y_min, y_max = X_data[:, 1].min() - 0.5, X_data[:, 1].max() + 0.5
        xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                           np.arange(y_min, y_max, h))

        # Predict grid points
        grid_points = scaler.transform(np.c_[xx.ravel(), yy.ravel()])
        Z = model.predict(grid_points)
        Z = Z.reshape(xx.shape)

        # Plot decision boundary
        axes[row, col].contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')

        # Plot data points
        for j, color in enumerate(['red', 'blue']):
            mask = y_data == j
            axes[row, col].scatter(X_data[mask, 0], X_data[mask, 1],
                                 c=color, alpha=0.7, s=30)

        # Support vectors
        support_vectors = scaler.inverse_transform(model.support_vectors_)
        axes[row, col].scatter(support_vectors[:, 0], support_vectors[:, 1],
                              s=100, facecolors='none', edgecolors='black', linewidths=1)

        axes[row, col].set_title(f'{kernel} Kernel (Accuracy={accuracy:.3f}, Support Vectors={n_support})')
        axes[row, col].set_xlabel('Feature 1')
        axes[row, col].set_ylabel('Feature 2')
        axes[row, col].grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

8.4.3 RBF Kernel Parameter Tuning

python
# Impact of gamma parameter for RBF kernel
def analyze_rbf_parameters():
    """Analyze impact of RBF kernel gamma parameter"""

    # Use concentric circles data
    X_train, X_test, y_train, y_test = train_test_split(
        X_circles, y_circles, test_size=0.2, random_state=42, stratify=y_circles
    )

    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)

    # Different gamma values
    gamma_values = [0.01, 0.1, 1, 10]

    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    fig.suptitle('Impact of Different Gamma Values for RBF Kernel', fontsize=16)

    print("Impact of gamma parameter for RBF kernel:")
    print("gamma\tAccuracy\tSupport Vectors")
    print("-" * 30)

    for i, gamma in enumerate(gamma_values):
        row = i // 2
        col = i % 2

        # Train SVM
        svm_rbf = SVC(kernel='rbf', gamma=gamma, C=1.0, random_state=42)
        svm_rbf.fit(X_train_scaled, y_train)

        # Predict
        y_pred = svm_rbf.predict(X_test_scaled)
        accuracy = accuracy_score(y_test, y_pred)

        print(f"{gamma}\t{accuracy:.4f}\t{len(svm_rbf.support_)}")

        # Visualization
        h = 0.02
        x_min, x_max = X_circles[:, 0].min() - 0.5, X_circles[:, 0].max() + 0.5
        y_min, y_max = X_circles[:, 1].min() - 0.5, X_circles[:, 1].max() + 0.5
        xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                           np.arange(y_min, y_max, h))

        grid_points = scaler.transform(np.c_[xx.ravel(), yy.ravel()])
        Z = svm_rbf.predict(grid_points)
        Z = Z.reshape(xx.shape)

        axes[row, col].contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')

        # Plot data points
        for j, color in enumerate(['red', 'blue']):
            mask = y_circles == j
            axes[row, col].scatter(X_circles[mask, 0], X_circles[mask, 1],
                                 c=color, alpha=0.7, s=30)

        # Support vectors
        support_vectors = scaler.inverse_transform(svm_rbf.support_vectors_)
        axes[row, col].scatter(support_vectors[:, 0], support_vectors[:, 1],
                              s=100, facecolors='none', edgecolors='black', linewidths=1)

        axes[row, col].set_title(f'gamma={gamma} (Accuracy={accuracy:.3f}, Support Vectors={len(svm_rbf.support_)})')
        axes[row, col].set_xlabel('Feature 1')
        axes[row, col].set_ylabel('Feature 2')
        axes[row, col].grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

analyze_rbf_parameters()

8.5 SVM Regression

8.5.1 Linear Regression

python
# Create regression data
np.random.seed(42)
X_reg = np.linspace(0, 10, 100).reshape(-1, 1)
y_reg = 2 * X_reg.ravel() + 1 + 0.5 * np.random.randn(100)

# Split data
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(
    X_reg, y_reg, test_size=0.2, random_state=42
)

# Train linear SVR
linear_svr = SVR(kernel='linear', C=1.0, epsilon=0.1)
linear_svr.fit(X_train_reg, y_train_reg)

# Predict
y_pred_reg = linear_svr.predict(X_test_reg)

# Evaluate
r2 = r2_score(y_test_reg, y_pred_reg)
rmse = np.sqrt(mean_squared_error(y_test_reg, y_pred_reg))

print(f"Linear SVR performance:")
print(f"R² score: {r2:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"Number of support vectors: {len(linear_svr.support_)}")

# Visualize regression results
plt.figure(figsize=(12, 8))

# Plot data points
plt.scatter(X_train_reg, y_train_reg, alpha=0.6, label='Training Data', color='blue')
plt.scatter(X_test_reg, y_test_reg, alpha=0.6, label='Testing Data', color='green')

# Plot regression line
X_plot = np.linspace(0, 10, 100).reshape(-1, 1)
y_plot = linear_svr.predict(X_plot)
plt.plot(X_plot, y_plot, color='red', linewidth=2, label='SVR Prediction')

# Plot epsilon tube
epsilon = 0.1
plt.fill_between(X_plot.ravel(), y_plot - epsilon, y_plot + epsilon,
                alpha=0.2, color='red', label=f'ε-Tube (ε={epsilon})')

# Highlight support vectors
support_vectors_x = X_train_reg[linear_svr.support_]
support_vectors_y = y_train_reg[linear_svr.support_]
plt.scatter(support_vectors_x, support_vectors_y, s=200,
           facecolors='none', edgecolors='black', linewidths=2, label='Support Vectors')

plt.xlabel('X')
plt.ylabel('y')
plt.title(f'Linear SVR (R²={r2:.3f}, Support Vectors={len(linear_svr.support_)})')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

8.5.2 Nonlinear Regression

python
# Create nonlinear regression data
np.random.seed(42)
X_nonlinear = np.linspace(0, 4*np.pi, 100).reshape(-1, 1)
y_nonlinear = np.sin(X_nonlinear.ravel()) + 0.1 * np.random.randn(100)

X_train_nl, X_test_nl, y_train_nl, y_test_nl = train_test_split(
    X_nonlinear, y_nonlinear, test_size=0.2, random_state=42
)

# Compare SVR with different kernel functions
svr_kernels = ['linear', 'poly', 'rbf']
svr_results = {}

fig, axes = plt.subplots(1, 3, figsize=(18, 6))

for i, kernel in enumerate(svr_kernels):
    if kernel == 'poly':
        svr = SVR(kernel=kernel, degree=3, C=1.0, epsilon=0.1)
    else:
        svr = SVR(kernel=kernel, C=1.0, epsilon=0.1)

    svr.fit(X_train_nl, y_train_nl)
    y_pred_nl = svr.predict(X_test_nl)

    r2_nl = r2_score(y_test_nl, y_pred_nl)
    rmse_nl = np.sqrt(mean_squared_error(y_test_nl, y_pred_nl))

    svr_results[kernel] = {'r2': r2_nl, 'rmse': rmse_nl, 'n_support': len(svr.support_)}

    # Visualization
    axes[i].scatter(X_train_nl, y_train_nl, alpha=0.6, label='Training Data', s=20)
    axes[i].scatter(X_test_nl, y_test_nl, alpha=0.6, label='Testing Data', color='green', s=20)

    # Prediction curve
    X_plot_nl = np.linspace(0, 4*np.pi, 200).reshape(-1, 1)
    y_plot_nl = svr.predict(X_plot_nl)
    axes[i].plot(X_plot_nl, y_plot_nl, color='red', linewidth=2, label='SVR Prediction')

    # Support vectors
    support_vectors_x = X_train_nl[svr.support_]
    support_vectors_y = y_train_nl[svr.support_]
    axes[i].scatter(support_vectors_x, support_vectors_y, s=100,
                   facecolors='none', edgecolors='black', linewidths=1, label='Support Vectors')

    axes[i].set_xlabel('X')
    axes[i].set_ylabel('y')
    axes[i].set_title(f'{kernel} Kernel SVR\nR²={r2_nl:.3f}, Support Vectors={len(svr.support_)}')
    axes[i].legend()
    axes[i].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Performance comparison
print("SVR performance comparison with different kernel functions:")
print("Kernel\t\t\t\tRMSE\t\tSupport Vectors")
print("-" * 50)
for kernel, results in svr_results.items():
    print(f"{kernel}\t\t{results['r2']:.4f}\t\t{results['rmse']:.4f}\t\t{results['n_support']}")

8.6 Hyperparameter Tuning

python
# Use breast cancer dataset for hyperparameter tuning
cancer = load_breast_cancer()
X_cancer, y_cancer = cancer.data, cancer.target

X_train_cancer, X_test_cancer, y_train_cancer, y_test_cancer = train_test_split(
    X_cancer, y_cancer, test_size=0.2, random_state=42, stratify=y_cancer
)

# Create pipeline (includes standardization and SVM)
svm_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('svm', SVC(random_state=42))
])

# Define parameter grid
param_grid = [
    {
        'svm__kernel': ['linear'],
        'svm__C': [0.1, 1, 10, 100]
    },
    {
        'svm__kernel': ['rbf'],
        'svm__C': [0.1, 1, 10, 100],
        'svm__gamma': ['scale', 'auto', 0.001, 0.01, 0.1, 1]
    },
    {
        'svm__kernel': ['poly'],
        'svm__C': [0.1, 1, 10],
        'svm__degree': [2, 3, 4],
        'svm__gamma': ['scale', 'auto']
    }
]

# Grid search
print("Performing SVM hyperparameter grid search...")
grid_search = GridSearchCV(
    svm_pipeline,
    param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    verbose=1
)

grid_search.fit(X_train_cancer, y_train_cancer)

print(f"Best parameters: {grid_search.best_params_}")
print(f"Best cross-validation score: {grid_search.best_score_:.4f}")

# Test best model
best_svm = grid_search.best_estimator_
y_pred_best = best_svm.predict(X_test_cancer)
test_accuracy = accuracy_score(y_test_cancer, y_pred_best)
print(f"Test set accuracy: {test_accuracy:.4f}")

# Detailed evaluation
print("\nBest SVM model detailed evaluation:")
print(classification_report(y_test_cancer, y_pred_best,
                          target_names=['Malignant', 'Benign']))

8.6.2 Validation Curve Analysis

python
# Plot C parameter validation curve
def plot_validation_curve_svm():
    """Plot SVM validation curves"""

    # Use standardized data
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train_cancer)

    # C parameter validation curve
    C_range = np.logspace(-3, 3, 10)
    train_scores, val_scores = validation_curve(
        SVC(kernel='rbf', gamma='scale', random_state=42),
        X_train_scaled, y_train_cancer,
        param_name='C', param_range=C_range,
        cv=5, scoring='accuracy', n_jobs=-1
    )

    train_mean = np.mean(train_scores, axis=1)
    train_std = np.std(train_scores, axis=1)
    val_mean = np.mean(val_scores, axis=1)
    val_std = np.std(val_scores, axis=1)

    plt.figure(figsize=(12, 5))

    # C parameter validation curve
    plt.subplot(1, 2, 1)
    plt.semilogx(C_range, train_mean, 'o-', color='blue', label='Training Score')
    plt.fill_between(C_range, train_mean - train_std, train_mean + train_std,
                     alpha=0.1, color='blue')

    plt.semilogx(C_range, val_mean, 'o-', color='red', label='Validation Score')
    plt.fill_between(C_range, val_mean - val_std, val_mean + val_std,
                     alpha=0.1, color='red')

    plt.xlabel('C Parameter')
    plt.ylabel('Accuracy')
    plt.title('SVM C Parameter Validation Curve')
    plt.legend()
    plt.grid(True, alpha=0.3)

    # gamma parameter validation curve
    gamma_range = np.logspace(-4, 1, 10)
    train_scores_gamma, val_scores_gamma = validation_curve(
        SVC(kernel='rbf', C=1.0, random_state=42),
        X_train_scaled, y_train_cancer,
        param_name='gamma', param_range=gamma_range,
        cv=5, scoring='accuracy', n_jobs=-1
    )

    train_mean_gamma = np.mean(train_scores_gamma, axis=1)
    train_std_gamma = np.std(train_scores_gamma, axis=1)
    val_mean_gamma = np.mean(val_scores_gamma, axis=1)
    val_std_gamma = np.std(val_scores_gamma, axis=1)

    plt.subplot(1, 2, 2)
    plt.semilogx(gamma_range, train_mean_gamma, 'o-', color='blue', label='Training Score')
    plt.fill_between(gamma_range, train_mean_gamma - train_std_gamma,
                     train_mean_gamma + train_std_gamma, alpha=0.1, color='blue')

    plt.semilogx(gamma_range, val_mean_gamma, 'o-', color='red', label='Validation Score')
    plt.fill_between(gamma_range, val_mean_gamma - val_std_gamma,
                     val_mean_gamma + val_std_gamma, alpha=0.1, color='red')

    plt.xlabel('gamma Parameter')
    plt.ylabel('Accuracy')
    plt.title('SVM gamma Parameter Validation Curve')
    plt.legend()
    plt.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

plot_validation_curve_svm()

8.7 Real-world Application Cases

8.7.1 Text Classification Example

python
# Create simple text classification data
from sklearn.feature_extraction.text import TfidfVectorizer

# Simulate text data
texts = [
    "Machine learning is an important branch of artificial intelligence",
    "Deep learning performs well in image recognition",
    "Support Vector Machine is a classic classification algorithm",
    "The weather is nice today, suitable for a walk",
    "The plot of this movie is very exciting",
    "The restaurant's dishes taste good and the service is also very good",
    "Neural networks can handle complex nonlinear problems",
    "Data preprocessing is an important step in machine learning",
    "Natural language processing technology is developing rapidly",
    "On a sunny afternoon, I feel particularly refreshed",
    "The content of this book is interesting and worth recommending",
    "There are many people in the shopping mall and a wide variety of goods"
]

# Labels: 0-Tech, 1-Life
labels = [0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1]

# Text vectorization
vectorizer = TfidfVectorizer(max_features=100, stop_words=None)
X_text = vectorizer.fit_transform(texts).toarray()

print(f"Text feature dimension: {X_text.shape}")
print(f"Class distribution: {np.bincount(labels)}")

# Split data
X_train_text, X_test_text, y_train_text, y_test_text = train_test_split(
    X_text, labels, test_size=0.3, random_state=42, stratify=labels
)

# Train SVM text classifier
text_svm = SVC(kernel='linear', C=1.0, random_state=42)
text_svm.fit(X_train_text, y_train_text)

# Predict
y_pred_text = text_svm.predict(X_test_text)
accuracy_text = accuracy_score(y_test_text, y_pred_text)

print(f"\nText classification SVM accuracy: {accuracy_text:.4f}")
print(f"Number of support vectors: {len(text_svm.support_)}")

# Get most important feature words
feature_names = vectorizer.get_feature_names_out()
coef = text_svm.coef_[0]

# Find most important positive and negative features
top_positive = np.argsort(coef)[-10:]
top_negative = np.argsort(coef)[:10]

print("\nMost important tech class feature words:")
for idx in reversed(top_positive):
    print(f"{feature_names[idx]}: {coef[idx]:.4f}")

print("\nMost important life class feature words:")
for idx in top_negative:
    print(f"{feature_names[idx]}: {coef[idx]:.4f}")

8.7.2 High-dimensional Data Processing

python
# Create high-dimensional dataset
X_high_dim, y_high_dim = make_classification(
    n_samples=1000,
    n_features=1000,
    n_informative=100,
    n_redundant=50,
    n_clusters_per_class=1,
    random_state=42
)

print(f"High-dimensional data shape: {X_high_dim.shape}")

# Split data
X_train_hd, X_test_hd, y_train_hd, y_test_hd = train_test_split(
    X_high_dim, y_high_dim, test_size=0.2, random_state=42, stratify=y_high_dim
)

# Compare performance of linear SVM and RBF SVM on high-dimensional data
import time

models = {
    'Linear SVM': SVC(kernel='linear', C=1.0, random_state=42),
    'RBF SVM': SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42),
    'LinearSVC': LinearSVC(C=1.0, random_state=42, max_iter=1000)  # Specialized for linear SVM
}

print("\nPerformance comparison of different SVMs on high-dimensional data:")
print("Model\t\tTraining Time\tPrediction Time\tAccuracy\t\tSupport Vectors")
print("-" * 70)

# Standardize data
scaler = StandardScaler()
X_train_hd_scaled = scaler.fit_transform(X_train_hd)
X_test_hd_scaled = scaler.transform(X_test_hd)

for name, model in models.items():
    # Training time
    start_time = time.time()
    model.fit(X_train_hd_scaled, y_train_hd)
    train_time = time.time() - start_time

    # Prediction time
    start_time = time.time()
    y_pred_hd = model.predict(X_test_hd_scaled)
    pred_time = time.time() - start_time

    # Accuracy
    accuracy_hd = accuracy_score(y_test_hd, y_pred_hd)

    # Number of support vectors (LinearSVC doesn't have this attribute)
    if hasattr(model, 'support_'):
        n_support = len(model.support_)
    else:
        n_support = "N/A"

    print(f"{name}\t\t{train_time:.3f}s\t\t{pred_time:.3f}s\t\t{accuracy_hd:.4f}\t\t{n_support}")

# Visualize relationship between training time and accuracy
model_names = list(models.keys())
train_times = []
accuracies = []

for name, model in models.items():
    start_time = time.time()
    model.fit(X_train_hd_scaled, y_train_hd)
    train_time = time.time() - start_time

    y_pred = model.predict(X_test_hd_scaled)
    accuracy = accuracy_score(y_test_hd, y_pred)

    train_times.append(train_time)
    accuracies.append(accuracy)

plt.figure(figsize=(10, 6))
colors = ['blue', 'red', 'green']
for i, (name, train_time, accuracy) in enumerate(zip(model_names, train_times, accuracies)):
    plt.scatter(train_time, accuracy, s=200, c=colors[i], alpha=0.7, label=name)
    plt.annotate(name, (train_time, accuracy), xytext=(5, 5),
                textcoords='offset points', fontsize=10)

plt.xlabel('Training Time (seconds)')
plt.ylabel('Accuracy')
plt.title('High-dimensional Data: Training Time vs Accuracy')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

8.8 SVM Pros and Cons Summary

8.8.1 Performance Comparison

python
def comprehensive_svm_comparison():
    """Comprehensively compare SVM with other algorithms"""

    from sklearn.ensemble import RandomForestClassifier
    from sklearn.linear_model import LogisticRegression
    from sklearn.neighbors import KNeighborsClassifier

    # Use wine dataset
    wine = load_wine()
    X_wine, y_wine = wine.data, wine.target

    X_train_wine, X_test_wine, y_train_wine, y_test_wine = train_test_split(
        X_wine, y_wine, test_size=0.2, random_state=42, stratify=y_wine
    )

    # Standardize
    scaler = StandardScaler()
    X_train_wine_scaled = scaler.fit_transform(X_train_wine)
    X_test_wine_scaled = scaler.transform(X_test_wine)

    # Define algorithms
    algorithms = {
        'SVM (RBF)': SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42),
        'SVM (Linear)': SVC(kernel='linear', C=1.0, random_state=42),
        'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
        'Logistic Regression': LogisticRegression(random_state=42, max_iter=1000),
        'K-Nearest Neighbors': KNeighborsClassifier(n_neighbors=5)
    }

    results = {}

    print("Comprehensive algorithm performance comparison:")
    print("Algorithm\t\tTraining Time\tAccuracy\t\tCross-Validation Score")
    print("-" * 60)

    for name, algorithm in algorithms.items():
        # Training time
        start_time = time.time()
        if 'SVM' in name or name == 'Logistic Regression' or name == 'K-Nearest Neighbors':
            algorithm.fit(X_train_wine_scaled, y_train_wine)
            y_pred = algorithm.predict(X_test_wine_scaled)
            cv_scores = cross_val_score(algorithm, X_train_wine_scaled, y_train_wine, cv=5)
        else:
            algorithm.fit(X_train_wine, y_train_wine)
            y_pred = algorithm.predict(X_test_wine)
            cv_scores = cross_val_score(algorithm, X_train_wine, y_train_wine, cv=5)

        train_time = time.time() - start_time

        # Performance metrics
        accuracy = accuracy_score(y_test_wine, y_pred)
        cv_mean = np.mean(cv_scores)

        results[name] = {
            'train_time': train_time,
            'accuracy': accuracy,
            'cv_score': cv_mean
        }

        print(f"{name}\t{train_time:.4f}s\t\t{accuracy:.4f}\t\t{cv_mean:.4f}")

    # Visualize comparison
    fig, axes = plt.subplots(1, 3, figsize=(18, 6))

    names = list(results.keys())
    train_times = [results[name]['train_time'] for name in names]
    accuracies = [results[name]['accuracy'] for name in names]
    cv_scores = [results[name]['cv_score'] for name in names]

    # Training time
    axes[0].bar(names, train_times, color='skyblue', alpha=0.7)
    axes[0].set_title('Training Time Comparison')
    axes[0].set_ylabel('Time (seconds)')
    axes[0].tick_params(axis='x', rotation=45)

    # Accuracy
    axes[1].bar(names, accuracies, color='lightgreen', alpha=0.7)
    axes[1].set_title('Test Accuracy Comparison')
    axes[1].set_ylabel('Accuracy')
    axes[1].tick_params(axis='x', rotation=45)
    axes[1].set_ylim(0.8, 1.0)

    # Cross-validation scores
    axes[2].bar(names, cv_scores, color='lightcoral', alpha=0.7)
    axes[2].set_title('Cross-Validation Score Comparison')
    axes[2].set_ylabel('CV Score')
    axes[2].tick_params(axis='x', rotation=45)
    axes[2].set_ylim(0.8, 1.0)

    plt.tight_layout()
    plt.show()

    return results

comparison_results = comprehensive_svm_comparison()

8.8.2 Usage Recommendations

python
def svm_usage_guide():
    """SVM usage guide"""

    print("SVM usage guide and best practices:")
    print("=" * 50)

    guidelines = {
        "Data Preprocessing": [
            "Must perform feature standardization or normalization",
            "Handle missing values and outliers",
            "Consider feature selection to reduce dimensions"
        ],
        "Kernel Function Selection": [
            "Linear kernel: Data is linearly separable or high-dimensional sparse data",
            "RBF kernel: General choice, suitable for most nonlinear problems",
            "Polynomial kernel: Specific nonlinear relationships",
            "Custom kernel: Special domain problems"
        ],
        "Parameter Tuning": [
            "C parameter: Controls regularization strength, needs cross-validation selection",
            "gamma parameter: Important parameter for RBF kernel, affects decision boundary complexity",
            "Use grid search or random search for tuning"
        ],
        "Applicable Scenarios": [
            "High-dimensional data (e.g., text classification, gene data)",
            "Medium and small scale datasets",
            "Scenarios requiring stable performance",
            "Binary classification problems (native support)"
        ],
        "Inapplicable Scenarios": [
            "Large scale datasets (>100k samples)",
            "Scenarios requiring probability output",
            "Real-time prediction with extremely high requirements",
            "Very noisy data"
        ]
    }

    for category, items in guidelines.items():
        print(f"\n{category}:")
        for item in items:
            print(f"  • {item}")

    print("\n" + "=" * 50)
    print("Parameter selection empirical rules:")
    print("• C value: Start trying from 0.1, 1, 10, 100")
    print("• gamma value: Start trying from 'scale', 0.001, 0.01, 0.1, 1")
    print("• Use cross-validation to evaluate different parameter combinations")
    print("• Watch for overfitting: training accuracy much higher than validation accuracy")

svm_usage_guide()

8.9 Exercises

Exercise 1: Basic SVM

  1. Train a linear SVM classifier using Iris dataset
  2. Visualize decision boundary and support vectors
  3. Analyze the impact of different C values on the model

Exercise 2: Kernel Function Comparison

  1. Create a complex nonlinear dataset
  2. Compare the performance of different kernel functions (linear, polynomial, RBF, sigmoid)
  3. Analyze the applicable scenarios for each kernel function

Exercise 3: SVM Regression

  1. Train an SVR model using Boston housing price dataset
  2. Compare the regression performance of linear and RBF kernels
  3. Analyze the impact of epsilon parameter on the model

Exercise 4: High-dimensional Data Processing

  1. Create a high-dimensional dataset (features > 1000)
  2. Compare the performance of SVM with other algorithms on high-dimensional data
  3. Analyze the impact of feature selection on SVM performance

8.10 Summary

In this chapter, we have learned various aspects of Support Vector Machines in depth:

Core Concepts

  • SVM Principles: Maximum margin classifier, support vectors, hyperplanes
  • Kernel Trick: Map data to high-dimensional space to handle nonlinear problems
  • Parameter Tuning: Roles and selection of C parameter, gamma parameter

Main Techniques

  • Linear SVM: Handle linearly separable and approximately linearly separable problems
  • Nonlinear SVM: Handle complex nonlinear problems through kernel functions
  • SVM Regression: Principles and applications of support vector regression
  • Hyperparameter Optimization: Grid search, cross-validation

Practical Skills

  • Data Preprocessing: Importance of feature standardization
  • Kernel Function Selection: Choose appropriate kernel function based on data characteristics
  • Performance Evaluation: Comprehensively evaluate SVM model performance
  • Real Applications: Text classification, high-dimensional data processing

Key Points

  • SVM performs excellently on high-dimensional data and medium-small scale datasets
  • Must perform feature standardization, sensitive to parameters
  • Kernel function selection has important impact on performance
  • Suitable for classification and regression tasks requiring stable performance

8.11 Next Steps

Now you have mastered the powerful Support Vector Machine algorithm! In next chapter Naive Bayes, we will learn probability-based classification methods and understand the application of Bayes' theorem in machine learning.


Chapter Points Review:

  • ✅ Understood mathematical principles and geometric intuition of SVM
  • ✅ Mastered implementation of linear and nonlinear SVM
  • ✅ Learned selection and application of different kernel functions
  • ✅ Understood principles and practice of SVM regression
  • ✅ Mastered SVM hyperparameter tuning methods
  • ✅ Able to reasonably use SVM in real-world problems

Content is for learning and research only.