Convolutional Neural Networks (CNN)

Convolutional Neural Networks, or CNNs, are a standard architecture for images, video frames, spectrograms, and other grid-like data. A CNN learns local patterns first, such as edges and textures, then combines them into higher-level shapes and semantic features.

This chapter builds a practical TensorFlow/Keras CNN and explains the purpose of each layer, how to train it, and how to diagnose common problems.

When To Use CNNs

CNNs are a strong baseline for:

  • Image classification
  • Object detection pipelines
  • Image segmentation backbones
  • Medical image analysis
  • Spectrogram classification
  • Local pattern extraction from one-dimensional or two-dimensional signals

If neighboring positions in your input carry related meaning, a CNN is usually worth trying.

Core Layers

Conv2D

Conv2D slides learnable filters over the input and extracts local features.

from tensorflow import keras
from tensorflow.keras import layers

layers.Conv2D(
    filters=32,
    kernel_size=(3, 3),
    activation="relu",
    padding="same",
)

Important parameters:

  • filters: number of output channels
  • kernel_size: filter size, commonly 3x3
  • padding="same": keeps spatial dimensions stable across layers
  • activation="relu": adds nonlinear modeling capacity

MaxPooling2D

Pooling reduces spatial size, lowers compute cost, and makes features more tolerant to small shifts.

layers.MaxPooling2D(pool_size=(2, 2))

Dropout

Dropout randomly disables part of the network during training, which helps reduce overfitting.

layers.Dropout(0.3)

Dense

After convolutional layers extract features, dense layers usually perform the final classification.

layers.Dense(10, activation="softmax")

Build A Basic CNN

The example below uses CIFAR-10, a dataset of 10 classes of 32x32x3 color images.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

model = keras.Sequential([
    layers.Input(shape=(32, 32, 3)),

    layers.Conv2D(32, 3, padding="same", activation="relu"),
    layers.BatchNormalization(),
    layers.Conv2D(32, 3, padding="same", activation="relu"),
    layers.MaxPooling2D(),
    layers.Dropout(0.25),

    layers.Conv2D(64, 3, padding="same", activation="relu"),
    layers.BatchNormalization(),
    layers.Conv2D(64, 3, padding="same", activation="relu"),
    layers.MaxPooling2D(),
    layers.Dropout(0.3),

    layers.Conv2D(128, 3, padding="same", activation="relu"),
    layers.BatchNormalization(),
    layers.GlobalAveragePooling2D(),

    layers.Dense(128, activation="relu"),
    layers.Dropout(0.4),
    layers.Dense(10, activation="softmax"),
])

model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

model.summary()

GlobalAveragePooling2D is often preferable to Flatten because it reduces the number of parameters and lowers overfitting risk.

Train The Model

callbacks = [
    keras.callbacks.EarlyStopping(
        monitor="val_accuracy",
        patience=8,
        restore_best_weights=True,
    ),
    keras.callbacks.ReduceLROnPlateau(
        monitor="val_loss",
        factor=0.5,
        patience=3,
        min_lr=1e-5,
    ),
]

history = model.fit(
    x_train,
    y_train,
    validation_split=0.1,
    epochs=50,
    batch_size=64,
    callbacks=callbacks,
)

test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
print(f"test accuracy: {test_acc:.4f}")

EarlyStopping stops training when validation performance no longer improves. ReduceLROnPlateau lowers the learning rate when validation loss stalls.

Add Data Augmentation

Image augmentation exposes the model to more variations and often improves generalization.

data_augmentation = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.08),
    layers.RandomZoom(0.1),
    layers.RandomTranslation(0.08, 0.08),
])

augmented_model = keras.Sequential([
    layers.Input(shape=(32, 32, 3)),
    data_augmentation,
    model,
])

In production projects, keep augmentation active only during training and avoid transformations that change the label meaning.

Diagnose Training Curves

After training, inspect training and validation curves.

import matplotlib.pyplot as plt

plt.plot(history.history["accuracy"], label="train acc")
plt.plot(history.history["val_accuracy"], label="val acc")
plt.xlabel("epoch")
plt.ylabel("accuracy")
plt.legend()
plt.grid(True)
plt.show()

Common signals:

  • High training accuracy but low validation accuracy: overfitting
  • Both curves are low: underfitting
  • Validation loss is unstable: learning rate may be too high
  • Training is slow: check GPU usage and input pipeline performance

Practical Improvements

  1. Use proven architectures such as ResNet, EfficientNet, or MobileNet.
  2. Apply transfer learning with a pretrained backbone.
  3. Add data augmentation, but keep labels semantically correct.
  4. Use BatchNormalization to stabilize deeper networks.
  5. Use class weights or resampling for imbalanced datasets.

Summary

A CNN workflow usually follows this pattern: convolution layers extract local features, pooling lowers spatial resolution, regularization reduces overfitting, and the final classifier predicts class probabilities. After this chapter, continue with image classification, transfer learning, and deployment to apply CNNs to real image tasks.