Pandas DataFrame Data Structure
DataFrame is the most important two-dimensional data structure in Pandas, which can be understood as a labeled two-dimensional array or table. This chapter will comprehensively introduce the creation, operations, and applications of DataFrame.
📚 DataFrame Overview
What is DataFrame
DataFrame is a two-dimensional labeled data structure with the following characteristics:
- Both rows and columns have labels: Each row and column has a corresponding index
- Columns can have different data types: Integers, floats, strings, etc.
- Variable size: Columns can be inserted and deleted
- Like a spreadsheet: Python version of Excel or SQL tables
Components of DataFrame
- Data (values): The actual stored data
- Row index (index): Labels for rows
- Column index (columns): Labels for columns
- Data types (dtypes): Data type of each column
🔨 Creating DataFrame
Creating from Dictionary
Creating from List
Creating from Series
Creating from NumPy Array
Creating Empty DataFrame
Creating from Files (Preview)
🔍 DataFrame Attributes
Basic Attributes
Viewing Data Overview
Memory Usage
🎯 Indexing and Selection
Column Selection
Row Selection
Conditional Selection
Advanced Indexing
🔧 DataFrame Operations
Adding Columns
Deleting Columns
Adding Rows
Deleting Rows
📊 Data Operations and Calculations
Mathematical Operations
Statistical Functions
Sorting
🔄 Data Processing
Handling Missing Values
Handling Duplicate Values
Data Type Conversion
🔗 DataFrame Merging and Joining
Concatenating DataFrames
Merging DataFrames
🎨 Practical Application Examples
Example 1: Sales Data Analysis
Example 2: Student Grade Management System
Example 3: Financial Data Analysis
📈 Performance Optimization Tips
Vectorized Operations
Memory Optimization
📝 Chapter Summary
Through this chapter, you should have mastered:
✅ DataFrame Basic Concepts: Understanding the structure and characteristics of DataFrame
✅ Creating DataFrame: Mastering various methods for creating DataFrame
✅ Indexing and Selection: Proficiently using various indexing and selection methods
✅ Data Operations: Performing CRUD operations
✅ Data Processing: Handling missing values, duplicates, and other data quality issues
✅ Data Merging: Joining and merging multiple DataFrames
✅ Practical Applications: Solving real data analysis problems
✅ Performance Optimization: Improving code execution efficiency and memory usage
Key Points
- DataFrame is the Core of Pandas: Mastering DataFrame is fundamental to data analysis
- Flexibility of Indexing: loc, iloc, boolean indexing provide powerful data selection capabilities
- Importance of Vectorized Operations: Avoid loops, use Pandas built-in methods
- Data Quality Management: Handle missing values and duplicates promptly
- Memory Optimization: Choosing appropriate data types can significantly save memory
Next Steps
Now that you've mastered Pandas core data structures, next we'll learn how to handle CSV and Excel files.
Next Chapter: Pandas CSV and Excel Handling