Chapter 8: Support Vector Machines
Support Vector Machine (SVM) is one of the most powerful and elegant algorithms in machine learning. It solves classification and regression problems by finding optimal separation hyperplanes, and excels in handling high-dimensional data and nonlinear problems.
8.1 What is a Support Vector Machine?
The core idea of Support Vector Machine is to find an optimal decision boundary (hyperplane) that maximizes the margin between different classes. This decision boundary is determined by a few key data points (support vectors).
8.1.1 Core Concepts
- Hyperplane: An (n-1)-dimensional subspace in n-dimensional space that separates data
- Support Vectors: Data points closest to the decision boundary
- Margin: Distance from support vectors to the decision boundary
- Kernel Function: Function that maps data to high-dimensional space
8.1.2 Advantages of SVM
- Effective for High-dimensional Data: Remains effective when number of features is large
- Memory Efficient: Only uses support vectors for prediction
- Flexible: Handles nonlinear problems through different kernel functions
- Strong Generalization: Based on structural risk minimization principle
8.1.3 Disadvantages of SVM
- Slow Training on Large Datasets: High time complexity
- Sensitive to Noise: Outliers may affect decision boundary
- Requires Feature Scaling: Sensitive to feature scale
- No Probability Output: Does not directly provide prediction probabilities
8.2 Environment and Data Preparation
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification, make_circles, make_moons, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score, validation_curve
from sklearn.svm import SVC, SVR, LinearSVC, LinearSVR
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import (
accuracy_score, classification_report, confusion_matrix,
mean_squared_error, r2_score, roc_curve, auc
)
from sklearn.pipeline import Pipeline
import warnings
warnings.filterwarnings('ignore')
# Set random seed
np.random.seed(42)
# Set figure style
plt.style.use('seaborn-v0_8')
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False8.3 Linear SVM
8.3.1 Linearly Separable Data
# Create linearly separable data
def create_linearly_separable_data():
"""Create linearly separable binary classification data"""
np.random.seed(42)
# Class 1
class1_x = np.random.normal(2, 0.5, 50)
class1_y = np.random.normal(2, 0.5, 50)
# Class 2
class2_x = np.random.normal(-2, 0.5, 50)
class2_y = np.random.normal(-2, 0.5, 50)
X = np.vstack([np.column_stack([class1_x, class1_y]),
np.column_stack([class2_x, class2_y])])
y = np.hstack([np.ones(50), np.zeros(50)])
return X, y
X_linear, y_linear = create_linearly_separable_data()
# Visualize data
plt.figure(figsize=(10, 8))
colors = ['red', 'blue']
for i, color in enumerate(colors):
mask = y_linear == i
plt.scatter(X_linear[mask, 0], X_linear[mask, 1],
c=color, label=f'Class {i}', alpha=0.7, s=50)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Linearly Separable Data')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
print(f"Data shape: {X_linear.shape}")
print(f"Class distribution: {np.bincount(y_linear.astype(int))}")8.3.2 Train Linear SVM
# Split data
X_train_linear, X_test_linear, y_train_linear, y_test_linear = train_test_split(
X_linear, y_linear, test_size=0.2, random_state=42, stratify=y_linear
)
# Feature standardization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_linear)
X_test_scaled = scaler.transform(X_test_linear)
# Create linear SVM classifier
linear_svm = SVC(kernel='linear', C=1.0, random_state=42)
linear_svm.fit(X_train_scaled, y_train_linear)
# Predict
y_pred_linear = linear_svm.predict(X_test_scaled)
# Evaluate
accuracy_linear = accuracy_score(y_test_linear, y_pred_linear)
print(f"Linear SVM accuracy: {accuracy_linear:.4f}")
print("\nDetailed classification report:")
print(classification_report(y_test_linear, y_pred_linear))
# Get support vector information
print(f"\nNumber of support vectors: {linear_svm.n_support_}")
print(f"Total support vectors: {len(linear_svm.support_)}")
print(f"Support vector ratio: {len(linear_svm.support_) / len(X_train_scaled) * 100:.2f}%")8.3.3 Visualize Decision Boundary and Support Vectors
def plot_svm_decision_boundary(X, y, model, scaler=None, title="SVM Decision Boundary"):
"""Plot SVM decision boundary and support vectors"""
plt.figure(figsize=(12, 8))
# If there's a scaler, use original data for visualization
if scaler is not None:
X_plot = X
# Create grid
h = 0.02
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
# Predict grid points (need standardization)
grid_points = np.c_[xx.ravel(), yy.ravel()]
grid_points_scaled = scaler.transform(grid_points)
Z = model.predict(grid_points_scaled)
Z = Z.reshape(xx.shape)
# Get decision function values (for plotting margins)
decision_values = model.decision_function(grid_points_scaled)
decision_values = decision_values.reshape(xx.shape)
# Get support vectors (need inverse standardization)
support_vectors_scaled = model.support_vectors_
support_vectors = scaler.inverse_transform(support_vectors_scaled)
else:
X_plot = X
# Create grid
h = 0.02
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
decision_values = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
decision_values = decision_values.reshape(xx.shape)
support_vectors = model.support_vectors_
# Plot decision boundary
plt.contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')
# Plot decision boundary lines and margins
plt.contour(xx, yy, decision_values, levels=[-1, 0, 1],
colors=['red', 'black', 'red'], linestyles=['--', '-', '--'],
linewidths=[2, 3, 2])
# Plot data points
colors = ['red', 'blue']
for i, color in enumerate(colors):
mask = y == i
plt.scatter(X_plot[mask, 0], X_plot[mask, 1],
c=color, label=f'Class {i}', alpha=0.7, s=50)
# Highlight support vectors
plt.scatter(support_vectors[:, 0], support_vectors[:, 1],
s=200, facecolors='none', edgecolors='black',
linewidths=2, label='Support Vectors')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title(title)
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
# Plot linear SVM decision boundary
plot_svm_decision_boundary(X_train_linear, y_train_linear, linear_svm, scaler,
"Linear SVM Decision Boundary and Support Vectors")8.3.4 Impact of C Parameter
# Compare impact of different C values
C_values = [0.1, 1, 10, 100]
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Impact of Different C Values on Linear SVM', fontsize=16)
for i, C in enumerate(C_values):
row = i // 2
col = i % 2
# Train SVM
svm_c = SVC(kernel='linear', C=C, random_state=42)
svm_c.fit(X_train_scaled, y_train_linear)
# Predict
y_pred_c = svm_c.predict(X_test_scaled)
accuracy_c = accuracy_score(y_test_linear, y_pred_c)
# Create grid for visualization
h = 0.02
x_min, x_max = X_train_linear[:, 0].min() - 1, X_train_linear[:, 0].max() + 1
y_min, y_max = X_train_linear[:, 1].min() - 1, X_train_linear[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
grid_points_scaled = scaler.transform(np.c_[xx.ravel(), yy.ravel()])
Z = svm_c.predict(grid_points_scaled)
Z = Z.reshape(xx.shape)
decision_values = svm_c.decision_function(grid_points_scaled)
decision_values = decision_values.reshape(xx.shape)
# Plot
axes[row, col].contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')
axes[row, col].contour(xx, yy, decision_values, levels=[-1, 0, 1],
colors=['red', 'black', 'red'],
linestyles=['--', '-', '--'], linewidths=[1, 2, 1])
# Plot data points
for j, color in enumerate(['red', 'blue']):
mask = y_train_linear == j
axes[row, col].scatter(X_train_linear[mask, 0], X_train_linear[mask, 1],
c=color, alpha=0.7, s=30)
# Support vectors
support_vectors = scaler.inverse_transform(svm_c.support_vectors_)
axes[row, col].scatter(support_vectors[:, 0], support_vectors[:, 1],
s=100, facecolors='none', edgecolors='black', linewidths=1)
axes[row, col].set_title(f'C={C}, Accuracy={accuracy_c:.3f}, Support Vectors={len(svm_c.support_)}')
axes[row, col].set_xlabel('Feature 1')
axes[row, col].set_ylabel('Feature 2')
axes[row, col].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Analyze impact of C value on performance
print("Impact of C value on model:")
print("C\tAccuracy\tSupport Vectors")
print("-" * 30)
for C in C_values:
svm_c = SVC(kernel='linear', C=C, random_state=42)
svm_c.fit(X_train_scaled, y_train_linear)
y_pred_c = svm_c.predict(X_test_scaled)
accuracy_c = accuracy_score(y_test_linear, y_pred_c)
print(f"{C}\t{accuracy_c:.4f}\t{len(svm_c.support_)}")8.4 Nonlinear SVM and Kernel Functions
8.4.1 Nonlinear Data
# Create nonlinear datasets
X_circles, y_circles = make_circles(n_samples=200, noise=0.1, factor=0.3, random_state=42)
X_moons, y_moons = make_moons(n_samples=200, noise=0.1, random_state=42)
# Visualize nonlinear data
fig, axes = plt.subplots(1, 2, figsize=(15, 6))
# Concentric circles data
for i, color in enumerate(['red', 'blue']):
mask = y_circles == i
axes[0].scatter(X_circles[mask, 0], X_circles[mask, 1],
c=color, label=f'Class {i}', alpha=0.7)
axes[0].set_title('Concentric Circles Data')
axes[0].set_xlabel('Feature 1')
axes[0].set_ylabel('Feature 2')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# Crescent moon data
for i, color in enumerate(['red', 'blue']):
mask = y_moons == i
axes[1].scatter(X_moons[mask, 0], X_moons[mask, 1],
c=color, label=f'Class {i}', alpha=0.7)
axes[1].set_title('Crescent Moon Data')
axes[1].set_xlabel('Feature 1')
axes[1].set_ylabel('Feature 2')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()8.4.2 Comparison of Different Kernel Functions
# Compare performance of different kernel functions on nonlinear data
kernels = ['linear', 'poly', 'rbf', 'sigmoid']
datasets = [('Concentric Circles', X_circles, y_circles), ('Crescent Moon', X_moons, y_moons)]
for dataset_name, X_data, y_data in datasets:
print(f"\nPerformance of different kernel functions on {dataset_name} dataset:")
print("Kernel\t\tAccuracy\t\tSupport Vectors")
print("-" * 40)
# Data preprocessing
X_train, X_test, y_train, y_test = train_test_split(
X_data, y_data, test_size=0.2, random_state=42, stratify=y_data
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Test different kernel functions
kernel_results = {}
for kernel in kernels:
if kernel == 'poly':
svm = SVC(kernel=kernel, degree=3, C=1.0, random_state=42)
else:
svm = SVC(kernel=kernel, C=1.0, random_state=42)
svm.fit(X_train_scaled, y_train)
y_pred = svm.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
kernel_results[kernel] = {
'accuracy': accuracy,
'n_support': len(svm.support_),
'model': svm
}
print(f"{kernel}\t\t{accuracy:.4f}\t\t{len(svm.support_)}")
# Visualize decision boundaries of different kernel functions
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle(f'{dataset_name} Dataset - Decision Boundaries of Different Kernel Functions', fontsize=16)
for i, kernel in enumerate(kernels):
row = i // 2
col = i % 2
model = kernel_results[kernel]['model']
accuracy = kernel_results[kernel]['accuracy']
n_support = kernel_results[kernel]['n_support']
# Create grid
h = 0.02
x_min, x_max = X_data[:, 0].min() - 0.5, X_data[:, 0].max() + 0.5
y_min, y_max = X_data[:, 1].min() - 0.5, X_data[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
# Predict grid points
grid_points = scaler.transform(np.c_[xx.ravel(), yy.ravel()])
Z = model.predict(grid_points)
Z = Z.reshape(xx.shape)
# Plot decision boundary
axes[row, col].contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')
# Plot data points
for j, color in enumerate(['red', 'blue']):
mask = y_data == j
axes[row, col].scatter(X_data[mask, 0], X_data[mask, 1],
c=color, alpha=0.7, s=30)
# Support vectors
support_vectors = scaler.inverse_transform(model.support_vectors_)
axes[row, col].scatter(support_vectors[:, 0], support_vectors[:, 1],
s=100, facecolors='none', edgecolors='black', linewidths=1)
axes[row, col].set_title(f'{kernel} Kernel (Accuracy={accuracy:.3f}, Support Vectors={n_support})')
axes[row, col].set_xlabel('Feature 1')
axes[row, col].set_ylabel('Feature 2')
axes[row, col].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()8.4.3 RBF Kernel Parameter Tuning
# Impact of gamma parameter for RBF kernel
def analyze_rbf_parameters():
"""Analyze impact of RBF kernel gamma parameter"""
# Use concentric circles data
X_train, X_test, y_train, y_test = train_test_split(
X_circles, y_circles, test_size=0.2, random_state=42, stratify=y_circles
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Different gamma values
gamma_values = [0.01, 0.1, 1, 10]
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Impact of Different Gamma Values for RBF Kernel', fontsize=16)
print("Impact of gamma parameter for RBF kernel:")
print("gamma\tAccuracy\tSupport Vectors")
print("-" * 30)
for i, gamma in enumerate(gamma_values):
row = i // 2
col = i % 2
# Train SVM
svm_rbf = SVC(kernel='rbf', gamma=gamma, C=1.0, random_state=42)
svm_rbf.fit(X_train_scaled, y_train)
# Predict
y_pred = svm_rbf.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
print(f"{gamma}\t{accuracy:.4f}\t{len(svm_rbf.support_)}")
# Visualization
h = 0.02
x_min, x_max = X_circles[:, 0].min() - 0.5, X_circles[:, 0].max() + 0.5
y_min, y_max = X_circles[:, 1].min() - 0.5, X_circles[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
grid_points = scaler.transform(np.c_[xx.ravel(), yy.ravel()])
Z = svm_rbf.predict(grid_points)
Z = Z.reshape(xx.shape)
axes[row, col].contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')
# Plot data points
for j, color in enumerate(['red', 'blue']):
mask = y_circles == j
axes[row, col].scatter(X_circles[mask, 0], X_circles[mask, 1],
c=color, alpha=0.7, s=30)
# Support vectors
support_vectors = scaler.inverse_transform(svm_rbf.support_vectors_)
axes[row, col].scatter(support_vectors[:, 0], support_vectors[:, 1],
s=100, facecolors='none', edgecolors='black', linewidths=1)
axes[row, col].set_title(f'gamma={gamma} (Accuracy={accuracy:.3f}, Support Vectors={len(svm_rbf.support_)})')
axes[row, col].set_xlabel('Feature 1')
axes[row, col].set_ylabel('Feature 2')
axes[row, col].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
analyze_rbf_parameters()8.5 SVM Regression
8.5.1 Linear Regression
# Create regression data
np.random.seed(42)
X_reg = np.linspace(0, 10, 100).reshape(-1, 1)
y_reg = 2 * X_reg.ravel() + 1 + 0.5 * np.random.randn(100)
# Split data
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(
X_reg, y_reg, test_size=0.2, random_state=42
)
# Train linear SVR
linear_svr = SVR(kernel='linear', C=1.0, epsilon=0.1)
linear_svr.fit(X_train_reg, y_train_reg)
# Predict
y_pred_reg = linear_svr.predict(X_test_reg)
# Evaluate
r2 = r2_score(y_test_reg, y_pred_reg)
rmse = np.sqrt(mean_squared_error(y_test_reg, y_pred_reg))
print(f"Linear SVR performance:")
print(f"R² score: {r2:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"Number of support vectors: {len(linear_svr.support_)}")
# Visualize regression results
plt.figure(figsize=(12, 8))
# Plot data points
plt.scatter(X_train_reg, y_train_reg, alpha=0.6, label='Training Data', color='blue')
plt.scatter(X_test_reg, y_test_reg, alpha=0.6, label='Testing Data', color='green')
# Plot regression line
X_plot = np.linspace(0, 10, 100).reshape(-1, 1)
y_plot = linear_svr.predict(X_plot)
plt.plot(X_plot, y_plot, color='red', linewidth=2, label='SVR Prediction')
# Plot epsilon tube
epsilon = 0.1
plt.fill_between(X_plot.ravel(), y_plot - epsilon, y_plot + epsilon,
alpha=0.2, color='red', label=f'ε-Tube (ε={epsilon})')
# Highlight support vectors
support_vectors_x = X_train_reg[linear_svr.support_]
support_vectors_y = y_train_reg[linear_svr.support_]
plt.scatter(support_vectors_x, support_vectors_y, s=200,
facecolors='none', edgecolors='black', linewidths=2, label='Support Vectors')
plt.xlabel('X')
plt.ylabel('y')
plt.title(f'Linear SVR (R²={r2:.3f}, Support Vectors={len(linear_svr.support_)})')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()8.5.2 Nonlinear Regression
# Create nonlinear regression data
np.random.seed(42)
X_nonlinear = np.linspace(0, 4*np.pi, 100).reshape(-1, 1)
y_nonlinear = np.sin(X_nonlinear.ravel()) + 0.1 * np.random.randn(100)
X_train_nl, X_test_nl, y_train_nl, y_test_nl = train_test_split(
X_nonlinear, y_nonlinear, test_size=0.2, random_state=42
)
# Compare SVR with different kernel functions
svr_kernels = ['linear', 'poly', 'rbf']
svr_results = {}
fig, axes = plt.subplots(1, 3, figsize=(18, 6))
for i, kernel in enumerate(svr_kernels):
if kernel == 'poly':
svr = SVR(kernel=kernel, degree=3, C=1.0, epsilon=0.1)
else:
svr = SVR(kernel=kernel, C=1.0, epsilon=0.1)
svr.fit(X_train_nl, y_train_nl)
y_pred_nl = svr.predict(X_test_nl)
r2_nl = r2_score(y_test_nl, y_pred_nl)
rmse_nl = np.sqrt(mean_squared_error(y_test_nl, y_pred_nl))
svr_results[kernel] = {'r2': r2_nl, 'rmse': rmse_nl, 'n_support': len(svr.support_)}
# Visualization
axes[i].scatter(X_train_nl, y_train_nl, alpha=0.6, label='Training Data', s=20)
axes[i].scatter(X_test_nl, y_test_nl, alpha=0.6, label='Testing Data', color='green', s=20)
# Prediction curve
X_plot_nl = np.linspace(0, 4*np.pi, 200).reshape(-1, 1)
y_plot_nl = svr.predict(X_plot_nl)
axes[i].plot(X_plot_nl, y_plot_nl, color='red', linewidth=2, label='SVR Prediction')
# Support vectors
support_vectors_x = X_train_nl[svr.support_]
support_vectors_y = y_train_nl[svr.support_]
axes[i].scatter(support_vectors_x, support_vectors_y, s=100,
facecolors='none', edgecolors='black', linewidths=1, label='Support Vectors')
axes[i].set_xlabel('X')
axes[i].set_ylabel('y')
axes[i].set_title(f'{kernel} Kernel SVR\nR²={r2_nl:.3f}, Support Vectors={len(svr.support_)}')
axes[i].legend()
axes[i].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Performance comparison
print("SVR performance comparison with different kernel functions:")
print("Kernel\t\tR²\t\tRMSE\t\tSupport Vectors")
print("-" * 50)
for kernel, results in svr_results.items():
print(f"{kernel}\t\t{results['r2']:.4f}\t\t{results['rmse']:.4f}\t\t{results['n_support']}")8.6 Hyperparameter Tuning
8.6.1 Grid Search
# Use breast cancer dataset for hyperparameter tuning
cancer = load_breast_cancer()
X_cancer, y_cancer = cancer.data, cancer.target
X_train_cancer, X_test_cancer, y_train_cancer, y_test_cancer = train_test_split(
X_cancer, y_cancer, test_size=0.2, random_state=42, stratify=y_cancer
)
# Create pipeline (includes standardization and SVM)
svm_pipeline = Pipeline([
('scaler', StandardScaler()),
('svm', SVC(random_state=42))
])
# Define parameter grid
param_grid = [
{
'svm__kernel': ['linear'],
'svm__C': [0.1, 1, 10, 100]
},
{
'svm__kernel': ['rbf'],
'svm__C': [0.1, 1, 10, 100],
'svm__gamma': ['scale', 'auto', 0.001, 0.01, 0.1, 1]
},
{
'svm__kernel': ['poly'],
'svm__C': [0.1, 1, 10],
'svm__degree': [2, 3, 4],
'svm__gamma': ['scale', 'auto']
}
]
# Grid search
print("Performing SVM hyperparameter grid search...")
grid_search = GridSearchCV(
svm_pipeline,
param_grid,
cv=5,
scoring='accuracy',
n_jobs=-1,
verbose=1
)
grid_search.fit(X_train_cancer, y_train_cancer)
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best cross-validation score: {grid_search.best_score_:.4f}")
# Test best model
best_svm = grid_search.best_estimator_
y_pred_best = best_svm.predict(X_test_cancer)
test_accuracy = accuracy_score(y_test_cancer, y_pred_best)
print(f"Test set accuracy: {test_accuracy:.4f}")
# Detailed evaluation
print("\nBest SVM model detailed evaluation:")
print(classification_report(y_test_cancer, y_pred_best,
target_names=['Malignant', 'Benign']))8.6.2 Validation Curve Analysis
# Plot C parameter validation curve
def plot_validation_curve_svm():
"""Plot SVM validation curves"""
# Use standardized data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_cancer)
# C parameter validation curve
C_range = np.logspace(-3, 3, 10)
train_scores, val_scores = validation_curve(
SVC(kernel='rbf', gamma='scale', random_state=42),
X_train_scaled, y_train_cancer,
param_name='C', param_range=C_range,
cv=5, scoring='accuracy', n_jobs=-1
)
train_mean = np.mean(train_scores, axis=1)
train_std = np.std(train_scores, axis=1)
val_mean = np.mean(val_scores, axis=1)
val_std = np.std(val_scores, axis=1)
plt.figure(figsize=(12, 5))
# C parameter validation curve
plt.subplot(1, 2, 1)
plt.semilogx(C_range, train_mean, 'o-', color='blue', label='Training Score')
plt.fill_between(C_range, train_mean - train_std, train_mean + train_std,
alpha=0.1, color='blue')
plt.semilogx(C_range, val_mean, 'o-', color='red', label='Validation Score')
plt.fill_between(C_range, val_mean - val_std, val_mean + val_std,
alpha=0.1, color='red')
plt.xlabel('C Parameter')
plt.ylabel('Accuracy')
plt.title('SVM C Parameter Validation Curve')
plt.legend()
plt.grid(True, alpha=0.3)
# gamma parameter validation curve
gamma_range = np.logspace(-4, 1, 10)
train_scores_gamma, val_scores_gamma = validation_curve(
SVC(kernel='rbf', C=1.0, random_state=42),
X_train_scaled, y_train_cancer,
param_name='gamma', param_range=gamma_range,
cv=5, scoring='accuracy', n_jobs=-1
)
train_mean_gamma = np.mean(train_scores_gamma, axis=1)
train_std_gamma = np.std(train_scores_gamma, axis=1)
val_mean_gamma = np.mean(val_scores_gamma, axis=1)
val_std_gamma = np.std(val_scores_gamma, axis=1)
plt.subplot(1, 2, 2)
plt.semilogx(gamma_range, train_mean_gamma, 'o-', color='blue', label='Training Score')
plt.fill_between(gamma_range, train_mean_gamma - train_std_gamma,
train_mean_gamma + train_std_gamma, alpha=0.1, color='blue')
plt.semilogx(gamma_range, val_mean_gamma, 'o-', color='red', label='Validation Score')
plt.fill_between(gamma_range, val_mean_gamma - val_std_gamma,
val_mean_gamma + val_std_gamma, alpha=0.1, color='red')
plt.xlabel('gamma Parameter')
plt.ylabel('Accuracy')
plt.title('SVM gamma Parameter Validation Curve')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
plot_validation_curve_svm()8.7 Real-world Application Cases
8.7.1 Text Classification Example
# Create simple text classification data
from sklearn.feature_extraction.text import TfidfVectorizer
# Simulate text data
texts = [
"Machine learning is an important branch of artificial intelligence",
"Deep learning performs well in image recognition",
"Support Vector Machine is a classic classification algorithm",
"The weather is nice today, suitable for a walk",
"The plot of this movie is very exciting",
"The restaurant's dishes taste good and the service is also very good",
"Neural networks can handle complex nonlinear problems",
"Data preprocessing is an important step in machine learning",
"Natural language processing technology is developing rapidly",
"On a sunny afternoon, I feel particularly refreshed",
"The content of this book is interesting and worth recommending",
"There are many people in the shopping mall and a wide variety of goods"
]
# Labels: 0-Tech, 1-Life
labels = [0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1]
# Text vectorization
vectorizer = TfidfVectorizer(max_features=100, stop_words=None)
X_text = vectorizer.fit_transform(texts).toarray()
print(f"Text feature dimension: {X_text.shape}")
print(f"Class distribution: {np.bincount(labels)}")
# Split data
X_train_text, X_test_text, y_train_text, y_test_text = train_test_split(
X_text, labels, test_size=0.3, random_state=42, stratify=labels
)
# Train SVM text classifier
text_svm = SVC(kernel='linear', C=1.0, random_state=42)
text_svm.fit(X_train_text, y_train_text)
# Predict
y_pred_text = text_svm.predict(X_test_text)
accuracy_text = accuracy_score(y_test_text, y_pred_text)
print(f"\nText classification SVM accuracy: {accuracy_text:.4f}")
print(f"Number of support vectors: {len(text_svm.support_)}")
# Get most important feature words
feature_names = vectorizer.get_feature_names_out()
coef = text_svm.coef_[0]
# Find most important positive and negative features
top_positive = np.argsort(coef)[-10:]
top_negative = np.argsort(coef)[:10]
print("\nMost important tech class feature words:")
for idx in reversed(top_positive):
print(f"{feature_names[idx]}: {coef[idx]:.4f}")
print("\nMost important life class feature words:")
for idx in top_negative:
print(f"{feature_names[idx]}: {coef[idx]:.4f}")8.7.2 High-dimensional Data Processing
# Create high-dimensional dataset
X_high_dim, y_high_dim = make_classification(
n_samples=1000,
n_features=1000,
n_informative=100,
n_redundant=50,
n_clusters_per_class=1,
random_state=42
)
print(f"High-dimensional data shape: {X_high_dim.shape}")
# Split data
X_train_hd, X_test_hd, y_train_hd, y_test_hd = train_test_split(
X_high_dim, y_high_dim, test_size=0.2, random_state=42, stratify=y_high_dim
)
# Compare performance of linear SVM and RBF SVM on high-dimensional data
import time
models = {
'Linear SVM': SVC(kernel='linear', C=1.0, random_state=42),
'RBF SVM': SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42),
'LinearSVC': LinearSVC(C=1.0, random_state=42, max_iter=1000) # Specialized for linear SVM
}
print("\nPerformance comparison of different SVMs on high-dimensional data:")
print("Model\t\tTraining Time\tPrediction Time\tAccuracy\t\tSupport Vectors")
print("-" * 70)
# Standardize data
scaler = StandardScaler()
X_train_hd_scaled = scaler.fit_transform(X_train_hd)
X_test_hd_scaled = scaler.transform(X_test_hd)
for name, model in models.items():
# Training time
start_time = time.time()
model.fit(X_train_hd_scaled, y_train_hd)
train_time = time.time() - start_time
# Prediction time
start_time = time.time()
y_pred_hd = model.predict(X_test_hd_scaled)
pred_time = time.time() - start_time
# Accuracy
accuracy_hd = accuracy_score(y_test_hd, y_pred_hd)
# Number of support vectors (LinearSVC doesn't have this attribute)
if hasattr(model, 'support_'):
n_support = len(model.support_)
else:
n_support = "N/A"
print(f"{name}\t\t{train_time:.3f}s\t\t{pred_time:.3f}s\t\t{accuracy_hd:.4f}\t\t{n_support}")
# Visualize relationship between training time and accuracy
model_names = list(models.keys())
train_times = []
accuracies = []
for name, model in models.items():
start_time = time.time()
model.fit(X_train_hd_scaled, y_train_hd)
train_time = time.time() - start_time
y_pred = model.predict(X_test_hd_scaled)
accuracy = accuracy_score(y_test_hd, y_pred)
train_times.append(train_time)
accuracies.append(accuracy)
plt.figure(figsize=(10, 6))
colors = ['blue', 'red', 'green']
for i, (name, train_time, accuracy) in enumerate(zip(model_names, train_times, accuracies)):
plt.scatter(train_time, accuracy, s=200, c=colors[i], alpha=0.7, label=name)
plt.annotate(name, (train_time, accuracy), xytext=(5, 5),
textcoords='offset points', fontsize=10)
plt.xlabel('Training Time (seconds)')
plt.ylabel('Accuracy')
plt.title('High-dimensional Data: Training Time vs Accuracy')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()8.8 SVM Pros and Cons Summary
8.8.1 Performance Comparison
def comprehensive_svm_comparison():
"""Comprehensively compare SVM with other algorithms"""
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
# Use wine dataset
wine = load_wine()
X_wine, y_wine = wine.data, wine.target
X_train_wine, X_test_wine, y_train_wine, y_test_wine = train_test_split(
X_wine, y_wine, test_size=0.2, random_state=42, stratify=y_wine
)
# Standardize
scaler = StandardScaler()
X_train_wine_scaled = scaler.fit_transform(X_train_wine)
X_test_wine_scaled = scaler.transform(X_test_wine)
# Define algorithms
algorithms = {
'SVM (RBF)': SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42),
'SVM (Linear)': SVC(kernel='linear', C=1.0, random_state=42),
'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
'Logistic Regression': LogisticRegression(random_state=42, max_iter=1000),
'K-Nearest Neighbors': KNeighborsClassifier(n_neighbors=5)
}
results = {}
print("Comprehensive algorithm performance comparison:")
print("Algorithm\t\tTraining Time\tAccuracy\t\tCross-Validation Score")
print("-" * 60)
for name, algorithm in algorithms.items():
# Training time
start_time = time.time()
if 'SVM' in name or name == 'Logistic Regression' or name == 'K-Nearest Neighbors':
algorithm.fit(X_train_wine_scaled, y_train_wine)
y_pred = algorithm.predict(X_test_wine_scaled)
cv_scores = cross_val_score(algorithm, X_train_wine_scaled, y_train_wine, cv=5)
else:
algorithm.fit(X_train_wine, y_train_wine)
y_pred = algorithm.predict(X_test_wine)
cv_scores = cross_val_score(algorithm, X_train_wine, y_train_wine, cv=5)
train_time = time.time() - start_time
# Performance metrics
accuracy = accuracy_score(y_test_wine, y_pred)
cv_mean = np.mean(cv_scores)
results[name] = {
'train_time': train_time,
'accuracy': accuracy,
'cv_score': cv_mean
}
print(f"{name}\t{train_time:.4f}s\t\t{accuracy:.4f}\t\t{cv_mean:.4f}")
# Visualize comparison
fig, axes = plt.subplots(1, 3, figsize=(18, 6))
names = list(results.keys())
train_times = [results[name]['train_time'] for name in names]
accuracies = [results[name]['accuracy'] for name in names]
cv_scores = [results[name]['cv_score'] for name in names]
# Training time
axes[0].bar(names, train_times, color='skyblue', alpha=0.7)
axes[0].set_title('Training Time Comparison')
axes[0].set_ylabel('Time (seconds)')
axes[0].tick_params(axis='x', rotation=45)
# Accuracy
axes[1].bar(names, accuracies, color='lightgreen', alpha=0.7)
axes[1].set_title('Test Accuracy Comparison')
axes[1].set_ylabel('Accuracy')
axes[1].tick_params(axis='x', rotation=45)
axes[1].set_ylim(0.8, 1.0)
# Cross-validation scores
axes[2].bar(names, cv_scores, color='lightcoral', alpha=0.7)
axes[2].set_title('Cross-Validation Score Comparison')
axes[2].set_ylabel('CV Score')
axes[2].tick_params(axis='x', rotation=45)
axes[2].set_ylim(0.8, 1.0)
plt.tight_layout()
plt.show()
return results
comparison_results = comprehensive_svm_comparison()8.8.2 Usage Recommendations
def svm_usage_guide():
"""SVM usage guide"""
print("SVM usage guide and best practices:")
print("=" * 50)
guidelines = {
"Data Preprocessing": [
"Must perform feature standardization or normalization",
"Handle missing values and outliers",
"Consider feature selection to reduce dimensions"
],
"Kernel Function Selection": [
"Linear kernel: Data is linearly separable or high-dimensional sparse data",
"RBF kernel: General choice, suitable for most nonlinear problems",
"Polynomial kernel: Specific nonlinear relationships",
"Custom kernel: Special domain problems"
],
"Parameter Tuning": [
"C parameter: Controls regularization strength, needs cross-validation selection",
"gamma parameter: Important parameter for RBF kernel, affects decision boundary complexity",
"Use grid search or random search for tuning"
],
"Applicable Scenarios": [
"High-dimensional data (e.g., text classification, gene data)",
"Medium and small scale datasets",
"Scenarios requiring stable performance",
"Binary classification problems (native support)"
],
"Inapplicable Scenarios": [
"Large scale datasets (>100k samples)",
"Scenarios requiring probability output",
"Real-time prediction with extremely high requirements",
"Very noisy data"
]
}
for category, items in guidelines.items():
print(f"\n{category}:")
for item in items:
print(f" • {item}")
print("\n" + "=" * 50)
print("Parameter selection empirical rules:")
print("• C value: Start trying from 0.1, 1, 10, 100")
print("• gamma value: Start trying from 'scale', 0.001, 0.01, 0.1, 1")
print("• Use cross-validation to evaluate different parameter combinations")
print("• Watch for overfitting: training accuracy much higher than validation accuracy")
svm_usage_guide()8.9 Exercises
Exercise 1: Basic SVM
- Train a linear SVM classifier using Iris dataset
- Visualize decision boundary and support vectors
- Analyze the impact of different C values on the model
Exercise 2: Kernel Function Comparison
- Create a complex nonlinear dataset
- Compare the performance of different kernel functions (linear, polynomial, RBF, sigmoid)
- Analyze the applicable scenarios for each kernel function
Exercise 3: SVM Regression
- Train an SVR model using Boston housing price dataset
- Compare the regression performance of linear and RBF kernels
- Analyze the impact of epsilon parameter on the model
Exercise 4: High-dimensional Data Processing
- Create a high-dimensional dataset (features > 1000)
- Compare the performance of SVM with other algorithms on high-dimensional data
- Analyze the impact of feature selection on SVM performance
8.10 Summary
In this chapter, we have learned various aspects of Support Vector Machines in depth:
Core Concepts
- SVM Principles: Maximum margin classifier, support vectors, hyperplanes
- Kernel Trick: Map data to high-dimensional space to handle nonlinear problems
- Parameter Tuning: Roles and selection of C parameter, gamma parameter
Main Techniques
- Linear SVM: Handle linearly separable and approximately linearly separable problems
- Nonlinear SVM: Handle complex nonlinear problems through kernel functions
- SVM Regression: Principles and applications of support vector regression
- Hyperparameter Optimization: Grid search, cross-validation
Practical Skills
- Data Preprocessing: Importance of feature standardization
- Kernel Function Selection: Choose appropriate kernel function based on data characteristics
- Performance Evaluation: Comprehensively evaluate SVM model performance
- Real Applications: Text classification, high-dimensional data processing
Key Points
- SVM performs excellently on high-dimensional data and medium-small scale datasets
- Must perform feature standardization, sensitive to parameters
- Kernel function selection has important impact on performance
- Suitable for classification and regression tasks requiring stable performance
8.11 Next Steps
Now you have mastered the powerful Support Vector Machine algorithm! In next chapter Naive Bayes, we will learn probability-based classification methods and understand the application of Bayes' theorem in machine learning.
Chapter Points Review:
- ✅ Understood mathematical principles and geometric intuition of SVM
- ✅ Mastered implementation of linear and nonlinear SVM
- ✅ Learned selection and application of different kernel functions
- ✅ Understood principles and practice of SVM regression
- ✅ Mastered SVM hyperparameter tuning methods
- ✅ Able to reasonably use SVM in real-world problems