Skip to content

Chapter 1: Environment Setup and Installation

Before starting to learn Scikit-learn, we need to set up the development environment first. This chapter will provide a detailed introduction on how to install and configure Scikit-learn and its related dependencies.

1.1 System Requirements

Scikit-learn supports the following operating systems:

  • Windows 7 and above
  • macOS 10.9 and above
  • Linux (most distributions)

Python Version Requirements:

  • Python 3.8 or higher
  • Python 3.9 or 3.10 recommended

1.2 Installation Methods

This is the simplest installation method:

bash
# Install the latest version of scikit-learn
pip install scikit-learn

# Or specify a version
pip install scikit-learn==1.3.0

Method 2: Using conda Installation

If you use Anaconda or Miniconda:

bash
# Install from conda-forge channel
conda install -c conda-forge scikit-learn

# Or install from default channel
conda install scikit-learn

Method 3: Installing from Source

Suitable for users who need the latest development version:

bash
# Clone the repository
git clone https://github.com/scikit-learn/scikit-learn.git
cd scikit-learn

# Install dependencies
pip install -e .

1.3 Core Dependency Packages

Scikit-learn depends on the following core packages, which are usually automatically installed:

bash
# Core numerical computing library
pip install numpy>=1.17.3

# Scientific computing library
pip install scipy>=1.5.0

# Job scheduling library
pip install joblib>=1.1.1

For a better learning experience, it is recommended to install the following packages:

bash
# Data processing and analysis
pip install pandas>=1.0.5

# Data visualization
pip install matplotlib>=3.1.3
pip install seaborn>=0.11.0

# Interactive development environment
pip install jupyter>=1.0.0
pip install ipython>=7.15.0

# Install all recommended packages at once
pip install pandas matplotlib seaborn jupyter ipython

1.5 Verify Installation

Create a simple Python script to verify successful installation:

python
# test_installation.py
import sklearn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

print("Installation Verification")
print(f"Scikit-learn Version: {sklearn.__version__}")
print(f"NumPy Version: {np.__version__}")
print(f"Pandas Version: {pd.__version__}")

# Test basic functions
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict
accuracy = model.score(X_test, y_test)
print(f"Test Accuracy: {accuracy:.2f}")
print("All components working normally!")

Run the verification script:

bash
python test_installation.py

Expected output similar to:

Installation Verification
Scikit-learn Version: 1.3.0
NumPy Version: 1.24.3
Pandas Version: 2.0.3
Test Accuracy: 1.00
All components working normally!

1.6 Development Environment Selection

Jupyter Notebook is very suitable for learning and experimentation:

bash
# Start Jupyter Notebook
jupyter notebook

VS Code

Visual Studio Code is an excellent code editor:

  1. Install VS Code
  2. Install Python extension
  3. Install Jupyter extension

PyCharm

Professional Python IDE:

  1. Download PyCharm Community Edition (free)
  2. Configure Python interpreter
  3. Install necessary plugins

1.7 Virtual Environment Management

It is strongly recommended to use virtual environments to manage project dependencies:

Using venv

bash
# Create virtual environment
python -m venv sklearn_env

# Activate virtual environment
# Windows
sklearn_env\Scripts\activate
# macOS/Linux
source sklearn_env/bin/activate

# Install packages
pip install scikit-learn pandas matplotlib jupyter

# Deactivate virtual environment
deactivate

Using conda

bash
# Create environment
conda create -n sklearn_env python=3.10

# Activate environment
conda activate sklearn_env

# Install packages
conda install scikit-learn pandas matplotlib jupyter

# Deactivate environment
conda deactivate

1.8 Common Installation Issues

Issue 1: Permission Errors

bash
# Solution: Use user installation
pip install --user scikit-learn

Issue 2: Network Issues

bash
# Use domestic mirror source
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple scikit-learn

Issue 3: Version Conflicts

bash
# Upgrade pip
pip install --upgrade pip

# Force reinstall
pip install --force-reinstall scikit-learn

Issue 4: Compilation Errors

bash
# Install pre-compiled version
pip install --only-binary=all scikit-learn

1.9 Performance Optimization Suggestions

Using Optimized BLAS Libraries

bash
# Install Intel MKL (recommended)
conda install mkl

# Or use OpenBLAS
conda install openblas

Multi-threading Configuration

python
# Set number of threads
import os
os.environ['OMP_NUM_THREADS'] = '4'
os.environ['MKL_NUM_THREADS'] = '4'

1.10 Next Steps

Congratulations! You have successfully set up the Scikit-learn development environment. Now you can:

  1. Continue learning Quick Start Guide
  2. Familiarize yourself with Jupyter Notebook usage
  3. Browse Scikit-learn official documentation

Exercises

  1. Run the verification script in your environment to ensure all components work normally
  2. Create a new virtual environment and install Scikit-learn in it
  3. Try running a simple machine learning example in Jupyter Notebook

Summary

In this chapter, we learned:

  • Scikit-learn system requirements and installation methods
  • Installation of core dependency packages and recommended tools
  • Selection and configuration of development environment
  • Usage of virtual environments
  • Solutions to common problems

After mastering these basics, we can begin our true machine learning journey!


Next Chapter Preview: In Quick Start Guide, we will create our first machine learning model and experience the powerful features of Scikit-learn.

Content is for learning and research only.