Chapter 1: Environment Setup and Installation
Before starting to learn Scikit-learn, we need to set up the development environment first. This chapter will provide a detailed introduction on how to install and configure Scikit-learn and its related dependencies.
1.1 System Requirements
Scikit-learn supports the following operating systems:
- Windows 7 and above
- macOS 10.9 and above
- Linux (most distributions)
Python Version Requirements:
- Python 3.8 or higher
- Python 3.9 or 3.10 recommended
1.2 Installation Methods
Method 1: Using pip Installation (Recommended)
This is the simplest installation method:
# Install the latest version of scikit-learn
pip install scikit-learn
# Or specify a version
pip install scikit-learn==1.3.0Method 2: Using conda Installation
If you use Anaconda or Miniconda:
# Install from conda-forge channel
conda install -c conda-forge scikit-learn
# Or install from default channel
conda install scikit-learnMethod 3: Installing from Source
Suitable for users who need the latest development version:
# Clone the repository
git clone https://github.com/scikit-learn/scikit-learn.git
cd scikit-learn
# Install dependencies
pip install -e .1.3 Core Dependency Packages
Scikit-learn depends on the following core packages, which are usually automatically installed:
# Core numerical computing library
pip install numpy>=1.17.3
# Scientific computing library
pip install scipy>=1.5.0
# Job scheduling library
pip install joblib>=1.1.11.4 Recommended Additional Packages
For a better learning experience, it is recommended to install the following packages:
# Data processing and analysis
pip install pandas>=1.0.5
# Data visualization
pip install matplotlib>=3.1.3
pip install seaborn>=0.11.0
# Interactive development environment
pip install jupyter>=1.0.0
pip install ipython>=7.15.0
# Install all recommended packages at once
pip install pandas matplotlib seaborn jupyter ipython1.5 Verify Installation
Create a simple Python script to verify successful installation:
# test_installation.py
import sklearn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
print("Installation Verification")
print(f"Scikit-learn Version: {sklearn.__version__}")
print(f"NumPy Version: {np.__version__}")
print(f"Pandas Version: {pd.__version__}")
# Test basic functions
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
# Load data
iris = load_iris()
X, y = iris.data, iris.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train model
model = LogisticRegression()
model.fit(X_train, y_train)
# Predict
accuracy = model.score(X_test, y_test)
print(f"Test Accuracy: {accuracy:.2f}")
print("All components working normally!")Run the verification script:
python test_installation.pyExpected output similar to:
Installation Verification
Scikit-learn Version: 1.3.0
NumPy Version: 1.24.3
Pandas Version: 2.0.3
Test Accuracy: 1.00
All components working normally!1.6 Development Environment Selection
Jupyter Notebook (Recommended for Beginners)
Jupyter Notebook is very suitable for learning and experimentation:
# Start Jupyter Notebook
jupyter notebookVS Code
Visual Studio Code is an excellent code editor:
- Install VS Code
- Install Python extension
- Install Jupyter extension
PyCharm
Professional Python IDE:
- Download PyCharm Community Edition (free)
- Configure Python interpreter
- Install necessary plugins
1.7 Virtual Environment Management
It is strongly recommended to use virtual environments to manage project dependencies:
Using venv
# Create virtual environment
python -m venv sklearn_env
# Activate virtual environment
# Windows
sklearn_env\Scripts\activate
# macOS/Linux
source sklearn_env/bin/activate
# Install packages
pip install scikit-learn pandas matplotlib jupyter
# Deactivate virtual environment
deactivateUsing conda
# Create environment
conda create -n sklearn_env python=3.10
# Activate environment
conda activate sklearn_env
# Install packages
conda install scikit-learn pandas matplotlib jupyter
# Deactivate environment
conda deactivate1.8 Common Installation Issues
Issue 1: Permission Errors
# Solution: Use user installation
pip install --user scikit-learnIssue 2: Network Issues
# Use domestic mirror source
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple scikit-learnIssue 3: Version Conflicts
# Upgrade pip
pip install --upgrade pip
# Force reinstall
pip install --force-reinstall scikit-learnIssue 4: Compilation Errors
# Install pre-compiled version
pip install --only-binary=all scikit-learn1.9 Performance Optimization Suggestions
Using Optimized BLAS Libraries
# Install Intel MKL (recommended)
conda install mkl
# Or use OpenBLAS
conda install openblasMulti-threading Configuration
# Set number of threads
import os
os.environ['OMP_NUM_THREADS'] = '4'
os.environ['MKL_NUM_THREADS'] = '4'1.10 Next Steps
Congratulations! You have successfully set up the Scikit-learn development environment. Now you can:
- Continue learning Quick Start Guide
- Familiarize yourself with Jupyter Notebook usage
- Browse Scikit-learn official documentation
Exercises
- Run the verification script in your environment to ensure all components work normally
- Create a new virtual environment and install Scikit-learn in it
- Try running a simple machine learning example in Jupyter Notebook
Summary
In this chapter, we learned:
- Scikit-learn system requirements and installation methods
- Installation of core dependency packages and recommended tools
- Selection and configuration of development environment
- Usage of virtual environments
- Solutions to common problems
After mastering these basics, we can begin our true machine learning journey!
Next Chapter Preview: In Quick Start Guide, we will create our first machine learning model and experience the powerful features of Scikit-learn.