Convolutional Neural Networks (CNN)
Convolutional Neural Networks, or CNNs, are a standard architecture for images, video frames, spectrograms, and other grid-like data. A CNN learns local patterns first, such as edges and textures, then combines them into higher-level shapes and semantic features.
This chapter builds a practical TensorFlow/Keras CNN and explains the purpose of each layer, how to train it, and how to diagnose common problems.
When To Use CNNs
CNNs are a strong baseline for:
- Image classification
- Object detection pipelines
- Image segmentation backbones
- Medical image analysis
- Spectrogram classification
- Local pattern extraction from one-dimensional or two-dimensional signals
If neighboring positions in your input carry related meaning, a CNN is usually worth trying.
Core Layers
Conv2D
Conv2D slides learnable filters over the input and extracts local features.
Important parameters:
filters: number of output channelskernel_size: filter size, commonly3x3padding="same": keeps spatial dimensions stable across layersactivation="relu": adds nonlinear modeling capacity
MaxPooling2D
Pooling reduces spatial size, lowers compute cost, and makes features more tolerant to small shifts.
Dropout
Dropout randomly disables part of the network during training, which helps reduce overfitting.
Dense
After convolutional layers extract features, dense layers usually perform the final classification.
Build A Basic CNN
The example below uses CIFAR-10, a dataset of 10 classes of 32x32x3 color images.
GlobalAveragePooling2D is often preferable to Flatten because it reduces the number of parameters and lowers overfitting risk.
Train The Model
EarlyStopping stops training when validation performance no longer improves. ReduceLROnPlateau lowers the learning rate when validation loss stalls.
Add Data Augmentation
Image augmentation exposes the model to more variations and often improves generalization.
In production projects, keep augmentation active only during training and avoid transformations that change the label meaning.
Diagnose Training Curves
After training, inspect training and validation curves.
Common signals:
- High training accuracy but low validation accuracy: overfitting
- Both curves are low: underfitting
- Validation loss is unstable: learning rate may be too high
- Training is slow: check GPU usage and input pipeline performance
Practical Improvements
- Use proven architectures such as ResNet, EfficientNet, or MobileNet.
- Apply transfer learning with a pretrained backbone.
- Add data augmentation, but keep labels semantically correct.
- Use
BatchNormalizationto stabilize deeper networks. - Use class weights or resampling for imbalanced datasets.
Summary
A CNN workflow usually follows this pattern: convolution layers extract local features, pooling lowers spatial resolution, regularization reduces overfitting, and the final classifier predicts class probabilities. After this chapter, continue with image classification, transfer learning, and deployment to apply CNNs to real image tasks.