Skip to content

Python Scientific Computing

Python has become the mainstream language in scientific computing, data analysis, and machine learning fields thanks to its concise syntax and vast, active community. This is largely due to a set of powerful open-source libraries that provide efficient tools for processing large-scale datasets and performing complex mathematical operations.

This chapter will briefly introduce the three most core libraries in the scientific computing ecosystem: NumPy, Pandas, and Matplotlib.

To use these libraries, first you need to install them:

bash
pip install numpy pandas matplotlib

NumPy: Foundation of Numerical Computing

NumPy (Numerical Python) is the cornerstone of Python's scientific computing ecosystem. It provides a core data structure: N-dimensional array object (ndarray).

Main features of NumPy:

  • Efficient Array Operations: NumPy arrays are fixed-size arrays consisting of elements of the same type. Since they're implemented in C language at the bottom, mathematical operations on arrays are very fast, far surpassing Python's native lists.
  • Broadcasting: Allows NumPy to perform arithmetic operations between arrays of different shapes, greatly simplifying code.
  • Rich Mathematical Functions: Provides many functions for linear algebra, Fourier transforms, and random number generation.

Example:

python
import numpy as np

# Create a NumPy array from Python list
a = np.array([1, 2, 3, 4, 5])

# Perform vectorized operations on entire array
b = a * 2
print(b) # Output: [ 2 4 6 8 10]

# Create a 2x3 two-dimensional array (matrix)
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix.shape) # Output: (2, 3)

# Calculate mean of array
print(np.mean(a)) # Output: 3.0

Pandas: Data Analysis and Processing

Pandas is a library built on top of NumPy that provides advanced data structures and analysis tools for handling structured data (such as tabular data, time-series data).

Core Data Structures of Pandas:

  1. Series: A one-dimensional labeled array, similar to a column of data. It can store any data type.
  2. DataFrame: A two-dimensional labeled data structure, similar to a spreadsheet or SQL table. It has row indices and column indices, and is the most commonly used object in Pandas.

Main Features of Pandas:

  • Easily read and write data from multiple file formats (such as CSV, Excel, SQL databases).
  • Powerful data filtering, grouping, merging, and reshaping capabilities.
  • Gracefully handle missing data.
  • Built-in time-series functionality.

Example:

python
import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
df = pd.DataFrame(data)

# Display first few rows of DataFrame
print(df.head())

# Select rows where 'Age' is greater than 30
print(df[df['Age'] > 30])

# Group by 'City' and calculate average age
print(df.groupby('City')['Age'].mean())

Matplotlib: Data Visualization

Matplotlib is the most famous data visualization library in Python. It provides a flexible platform for creating various static, dynamic, and interactive charts.

Basic Usage:

Typically use its pyplot submodule (usually aliased as plt) to create charts.

Example: Plotting a Simple Line Chart

python
import matplotlib.pyplot as plt
import numpy as np

# Prepare data
x = np.linspace(0, 10, 100) # Generate 100 points from 0 to 10
y = np.sin(x)

# Create chart
plt.figure(figsize=(8, 4)) # Set chart size
plt.plot(x, y, label='sin(x)') # Plot line chart

# Add title and labels
plt.title('Simple Sine Wave')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend() # Display legend
plt.grid(True) # Display grid

# Show chart
plt.show()

These three libraries together form a powerful ecosystem, making Python the preferred tool for data scientists and researchers.

Content is for learning and research only.