Skip to content

Pandas Data Visualization

Data visualization is an important part of data analysis, helping us intuitively understand data distribution, trends, and patterns. Pandas provides powerful visualization capabilities that seamlessly integrate with matplotlib and seaborn, allowing us to quickly create various types of charts.

1. Visualization Basic Setup

1.1 Environment Configuration

python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Set font and negative sign display
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 10

# Set chart style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("Visualization environment configured")

1.2 Creating Sample Datasets

python
# Set random seed
np.random.seed(42)

# Create comprehensive dataset
n_samples = 1000

# Sales data
sales_data = {
    'date': pd.date_range('2022-01-01', periods=365, freq='D'),
    'product': np.random.choice(['Product A', 'Product B', 'Product C', 'Product D', 'Product E'], 365),
    'region': np.random.choice(['North', 'South', 'East', 'West'], 365),
    'sales_amount': np.random.normal(1000, 300, 365),
    'quantity': np.random.randint(10, 100, 365),
    'customer_satisfaction': np.random.uniform(3.0, 5.0, 365)
}

# Ensure sales amount is positive
sales_data['sales_amount'] = np.maximum(sales_data['sales_amount'], 100)

sales_df = pd.DataFrame(sales_data)
sales_df['month'] = sales_df['date'].dt.month
sales_df['quarter'] = sales_df['date'].dt.quarter
sales_df['weekday'] = sales_df['date'].dt.day_name()

# Employee data
employee_data = {
    'employee_id': range(1, n_samples + 1),
    'department': np.random.choice(['IT', 'HR', 'Finance', 'Marketing', 'Operations'], n_samples),
    'position': np.random.choice(['Junior', 'Mid-level', 'Senior', 'Expert', 'Manager'], n_samples),
    'salary': np.random.normal(8000, 2000, n_samples),
    'age': np.random.randint(22, 60, n_samples),
    'experience_years': np.random.randint(0, 20, n_samples),
    'performance_score': np.random.uniform(60, 100, n_samples)
}

employee_data['salary'] = np.maximum(employee_data['salary'], 3000)
employee_df = pd.DataFrame(employee_data)

print("Sample datasets created")
print(f"Sales data: {sales_df.shape}")
print(f"Employee data: {employee_df.shape}")

2. Basic Chart Types

2.1 Line Plot

python
print("\n=== Line Plot Examples ===")

# Time series line plot
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# 1. Basic time series line plot
daily_sales = sales_df.groupby('date')['sales_amount'].sum()
daily_sales.plot(ax=axes[0,0], color='blue', linewidth=2)
axes[0,0].set_title('Daily Sales Trend')
axes[0,0].set_ylabel('Sales Amount')
axes[0,0].grid(True, alpha=0.3)

# 2. Multi-line plot
product_daily_sales = sales_df.groupby(['date', 'product'])['sales_amount'].sum().unstack()
product_daily_sales.plot(ax=axes[0,1], marker='o', markersize=3)
axes[0,1].set_title('Daily Sales by Product')
axes[0,1].set_ylabel('Sales Amount')
axes[0,1].legend(title='Product', bbox_to_anchor=(1.05, 1), loc='upper left')
axes[0,1].grid(True, alpha=0.3)

# 3. Rolling average line plot
daily_sales_ma = daily_sales.rolling(window=7).mean()
axes[1,0].plot(daily_sales.index, daily_sales.values, alpha=0.3, label='Original Data')
axes[1,0].plot(daily_sales_ma.index, daily_sales_ma.values, color='red', linewidth=2, label='7-Day Moving Average')
axes[1,0].set_title('Sales with Moving Average')
axes[1,0].set_ylabel('Sales Amount')
axes[1,0].legend()
axes[1,0].grid(True, alpha=0.3)

# 4. Area plot
region_daily_sales = sales_df.groupby(['date', 'region'])['sales_amount'].sum().unstack()
region_daily_sales.plot.area(ax=axes[1,1], alpha=0.7)
axes[1,1].set_title('Regional Sales Stacked Area Chart')
axes[1,1].set_ylabel('Sales Amount')
axes[1,1].legend(title='Region', bbox_to_anchor=(1.05, 1), loc='upper left')

plt.tight_layout()
plt.show()

2.2 Bar Plot

python
print("\n=== Bar Plot Examples ===")

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# 1. Basic bar chart
product_sales = sales_df.groupby('product')['sales_amount'].sum().sort_values(ascending=False)
product_sales.plot(kind='bar', ax=axes[0,0], color='skyblue')
axes[0,0].set_title('Total Sales by Product')
axes[0,0].set_ylabel('Sales Amount')
axes[0,0].tick_params(axis='x', rotation=45)

# 2. Horizontal bar chart
region_sales = sales_df.groupby('region')['sales_amount'].sum().sort_values()
region_sales.plot(kind='barh', ax=axes[0,1], color='lightcoral')
axes[0,1].set_title('Total Sales by Region')
axes[0,1].set_xlabel('Sales Amount')

# 3. Grouped bar chart
product_region_sales = sales_df.groupby(['product', 'region'])['sales_amount'].sum().unstack()
product_region_sales.plot(kind='bar', ax=axes[1,0])
axes[1,0].set_title('Product-Region Sales Grouped Bar Chart')
axes[1,0].set_ylabel('Sales Amount')
axes[1,0].tick_params(axis='x', rotation=45)
axes[1,0].legend(title='Region', bbox_to_anchor=(1.05, 1), loc='upper left')

# 4. Stacked bar chart
product_region_sales.plot(kind='bar', stacked=True, ax=axes[1,1])
axes[1,1].set_title('Product-Region Sales Stacked Bar Chart')
axes[1,1].set_ylabel('Sales Amount')
axes[1,1].tick_params(axis='x', rotation=45)
axes[1,1].legend(title='Region', bbox_to_anchor=(1.05, 1), loc='upper left')

plt.tight_layout()
plt.show()

2.3 Scatter Plot

python
print("\n=== Scatter Plot Examples ===")

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# 1. Basic scatter plot
employee_df.plot.scatter(x='age', y='salary', ax=axes[0,0], alpha=0.6)
axes[0,0].set_title('Age vs Salary')
axes[0,0].set_xlabel('Age')
axes[0,0].set_ylabel('Salary')

# 2. Scatter plot with color grouping
for dept in employee_df['department'].unique():
    dept_data = employee_df[employee_df['department'] == dept]
    axes[0,1].scatter(dept_data['experience_years'], dept_data['salary'], 
                     label=dept, alpha=0.6)
axes[0,1].set_title('Experience vs Salary (by Department)')
axes[0,1].set_xlabel('Experience (Years)')
axes[0,1].set_ylabel('Salary')
axes[0,1].legend()

# 3. Bubble chart (scatter size represents third variable)
scatter = axes[1,0].scatter(employee_df['age'], employee_df['salary'], 
                           s=employee_df['performance_score']*2, 
                           c=employee_df['experience_years'], 
                           alpha=0.6, cmap='viridis')
axes[1,0].set_title('Age-Salary-Performance-Experience Scatter')
axes[1,0].set_xlabel('Age')
axes[1,0].set_ylabel('Salary')
colorbar = plt.colorbar(scatter, ax=axes[1,0])
colorbar.set_label('Experience (Years)')

# 4. Sales data scatter plot
sales_df.plot.scatter(x='quantity', y='sales_amount', 
                     c='customer_satisfaction', 
                     colormap='RdYlBu', ax=axes[1,1], alpha=0.6)
axes[1,1].set_title('Quantity vs Sales (color = Satisfaction)')
axes[1,1].set_xlabel('Quantity')
axes[1,1].set_ylabel('Sales Amount')

plt.tight_layout()
plt.show()

2.4 Histogram and Density Plot

python
print("\n=== Histogram and Density Plot Examples ===")

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# 1. Basic histogram
employee_df['salary'].plot.hist(bins=30, ax=axes[0,0], alpha=0.7, color='skyblue')
axes[0,0].set_title('Salary Distribution Histogram')
axes[0,0].set_xlabel('Salary')
axes[0,0].set_ylabel('Frequency')

# 2. Density plot
employee_df['salary'].plot.density(ax=axes[0,1], color='red', linewidth=2)
axes[0,1].set_title('Salary Distribution Density Plot')
axes[0,1].set_xlabel('Salary')
axes[0,1].set_ylabel('Density')

# 3. Grouped histogram
for dept in employee_df['department'].unique():
    dept_salary = employee_df[employee_df['department'] == dept]['salary']
    axes[1,0].hist(dept_salary, alpha=0.5, label=dept, bins=20)
axes[1,0].set_title('Salary Distribution by Department')
axes[1,0].set_xlabel('Salary')
axes[1,0].set_ylabel('Frequency')
axes[1,0].legend()

# 4. Multi-variable density plot
for dept in employee_df['department'].unique():
    dept_salary = employee_df[employee_df['department'] == dept]['salary']
    dept_salary.plot.density(ax=axes[1,1], alpha=0.7, label=dept)
axes[1,1].set_title('Salary Density by Department')
axes[1,1].set_xlabel('Salary')
axes[1,1].set_ylabel('Density')
axes[1,1].legend()

plt.tight_layout()
plt.show()

3. Advanced Chart Types

3.1 Box Plot

python
print("\n=== Box Plot Examples ===")

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# 1. Basic box plot
employee_df.boxplot(column='salary', ax=axes[0,0])
axes[0,0].set_title('Salary Distribution Box Plot')
axes[0,0].set_ylabel('Salary')

# 2. Grouped box plot
employee_df.boxplot(column='salary', by='department', ax=axes[0,1])
axes[0,1].set_title('Salary Distribution by Department')
axes[0,1].set_xlabel('Department')
axes[0,1].set_ylabel('Salary')

# 3. Multi-variable box plot
employee_df[['salary', 'age', 'performance_score']].boxplot(ax=axes[1,0])
axes[1,0].set_title('Multi-variable Box Plot')
axes[1,0].tick_params(axis='x', rotation=45)

# 4. Seaborn box plot
sns.boxplot(data=employee_df, x='position', y='salary', ax=axes[1,1])
axes[1,1].set_title('Salary Distribution by Position')
axes[1,1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

3.2 Heatmap

python
print("\n=== Heatmap Examples ===")

fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# 1. Correlation heatmap
corr_matrix = employee_df[['age', 'salary', 'experience_years', 'performance_score']].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0, ax=axes[0,0])
axes[0,0].set_title('Employee Data Correlation Heatmap')

# 2. Pivot table heatmap
product_region_pivot = sales_df.pivot_table(
    values='sales_amount', 
    index='product', 
    columns='region', 
    aggfunc='mean'
)
sns.heatmap(product_region_pivot, annot=True, fmt='.0f', cmap='YlOrRd', ax=axes[0,1])
axes[0,1].set_title('Product-Region Average Sales Heatmap')

# 3. Time series heatmap
sales_df['day'] = sales_df['date'].dt.day
month_day_sales = sales_df.pivot_table(
    values='sales_amount',
    index='month',
    columns='day',
    aggfunc='mean'
)
sns.heatmap(month_day_sales, cmap='viridis', ax=axes[1,0], cbar_kws={'label': 'Average Sales'})
axes[1,0].set_title('Month-Day Sales Heatmap')
axes[1,0].set_xlabel('Day')
axes[1,0].set_ylabel('Month')

# 4. Department-position salary heatmap
dept_position_salary = employee_df.pivot_table(
    values='salary',
    index='department',
    columns='position',
    aggfunc='mean'
)
sns.heatmap(dept_position_salary, annot=True, fmt='.0f', cmap='plasma', ax=axes[1,1])
axes[1,1].set_title('Department-Position Average Salary Heatmap')

plt.tight_layout()
plt.show()

3.3 Multi-subplot Layout

python
print("\n=== Multi-subplot Layout Examples ===")

# Create complex multi-subplot layout
fig = plt.figure(figsize=(20, 15))

# Use GridSpec for custom layout
from matplotlib.gridspec import GridSpec
gs = GridSpec(3, 4, figure=fig, hspace=0.3, wspace=0.3)

# First row: Time series analysis
ax1 = fig.add_subplot(gs[0, :2])
daily_sales = sales_df.groupby('date')['sales_amount'].sum()
daily_sales.plot(ax=ax1, color='blue', linewidth=2)
ax1.set_title('Daily Sales Trend', fontsize=14)
ax1.grid(True, alpha=0.3)

ax2 = fig.add_subplot(gs[0, 2:])
monthly_sales = sales_df.groupby('month')['sales_amount'].sum()
monthly_sales.plot(kind='bar', ax=ax2, color='orange')
ax2.set_title('Monthly Sales Distribution', fontsize=14)
ax2.tick_params(axis='x', rotation=0)

# Second row: Product and region analysis
ax3 = fig.add_subplot(gs[1, 0])
product_sales = sales_df.groupby('product')['sales_amount'].sum()
product_sales.plot(kind='pie', ax=ax3, autopct='%1.1f%%')
ax3.set_title('Product Sales Share', fontsize=14)
ax3.set_ylabel('')

ax4 = fig.add_subplot(gs[1, 1])
region_sales = sales_df.groupby('region')['sales_amount'].sum()
region_sales.plot(kind='bar', ax=ax4, color='lightcoral')
ax4.set_title('Regional Sales Comparison', fontsize=14)
ax4.tick_params(axis='x', rotation=45)

ax5 = fig.add_subplot(gs[1, 2:])
sns.boxplot(data=sales_df, x='product', y='sales_amount', ax=ax5)
ax5.set_title('Sales Distribution by Product', fontsize=14)
ax5.tick_params(axis='x', rotation=45)

# Third row: Employee data analysis
ax6 = fig.add_subplot(gs[2, 0])
employee_df.plot.scatter(x='age', y='salary', alpha=0.6, ax=ax6)
ax6.set_title('Age vs Salary', fontsize=14)

ax7 = fig.add_subplot(gs[2, 1])
dept_count = employee_df['department'].value_counts()
dept_count.plot(kind='bar', ax=ax7, color='lightgreen')
ax7.set_title('Department Distribution', fontsize=14)
ax7.tick_params(axis='x', rotation=45)

ax8 = fig.add_subplot(gs[2, 2:])
corr_matrix = employee_df[['age', 'salary', 'experience_years', 'performance_score']].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0, ax=ax8)
ax8.set_title('Employee Data Correlation', fontsize=14)

plt.suptitle('Comprehensive Data Analysis Dashboard', fontsize=16, y=0.98)
plt.show()

4. Interactive Visualization

4.1 Creating Interactive Charts with Plotly

python
print("\n=== Interactive Visualization Examples ===")

try:
    import plotly.express as px
    import plotly.graph_objects as go
    from plotly.subplots import make_subplots
    
    print("Creating interactive charts...")
    
    # 1. Interactive scatter plot
    fig_scatter = px.scatter(
        employee_df, 
        x='age', 
        y='salary',
        color='department',
        size='performance_score',
        hover_data=['experience_years'],
        title='Employee Age-Salary Interactive Scatter Plot'
    )
    fig_scatter.show()
    
    # 2. Interactive time series
    daily_sales_df = sales_df.groupby('date')['sales_amount'].sum().reset_index()
    fig_line = px.line(
        daily_sales_df,
        x='date',
        y='sales_amount',
        title='Daily Sales Interactive Trend'
    )
    fig_line.show()
    
    # 3. Interactive bar chart
    product_sales_df = sales_df.groupby('product')['sales_amount'].sum().reset_index()
    fig_bar = px.bar(
        product_sales_df,
        x='product',
        y='sales_amount',
        title='Product Sales Interactive Bar Chart'
    )
    fig_bar.show()
    
except ImportError:
    print("Plotly not installed, skipping interactive visualization demo")
    print("Install with: pip install plotly")

5. Professional Chart Customization

5.1 Chart Style Customization

python
print("\n=== Chart Style Customization ===")

# Create professional chart style
def create_professional_chart():
    """Create professionally styled charts"""
    
    # Professional color scheme
    colors = ['#2E86AB', '#A23B72', '#F18F01', '#C73E1D', '#592E83']
    
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # 1. Professional bar chart
    product_sales = sales_df.groupby('product')['sales_amount'].sum().sort_values(ascending=False)
    bars = axes[0,0].bar(product_sales.index, product_sales.values, color=colors)
    axes[0,0].set_title('Product Sales Ranking', fontsize=16, fontweight='bold', pad=20)
    axes[0,0].set_ylabel('Sales Amount', fontsize=12)
    axes[0,0].tick_params(axis='x', rotation=45)
    
    # Add value labels
    for bar in bars:
        height = bar.get_height()
        axes[0,0].text(bar.get_x() + bar.get_width()/2., height,
                      f'{height:.0f}',
                      ha='center', va='bottom', fontsize=10)
    
    axes[0,0].grid(True, alpha=0.3, linestyle='--')
    axes[0,0].set_axisbelow(True)
    
    # 2. Professional line chart
    monthly_sales = sales_df.groupby('month')['sales_amount'].sum()
    axes[0,1].plot(monthly_sales.index, monthly_sales.values, 
                  marker='o', linewidth=3, markersize=8, color=colors[0])
    axes[0,1].fill_between(monthly_sales.index, monthly_sales.values, alpha=0.3, color=colors[0])
    axes[0,1].set_title('Monthly Sales Trend', fontsize=16, fontweight='bold', pad=20)
    axes[0,1].set_ylabel('Sales Amount', fontsize=12)
    axes[0,1].set_xlabel('Month', fontsize=12)
    axes[0,1].grid(True, alpha=0.3, linestyle='--')
    
    # 3. Professional scatter plot
    for i, dept in enumerate(employee_df['department'].unique()):
        dept_data = employee_df[employee_df['department'] == dept]
        axes[1,0].scatter(dept_data['age'], dept_data['salary'], 
                         label=dept, alpha=0.7, s=60, color=colors[i % len(colors)])
    
    axes[1,0].set_title('Age-Salary Relationship Analysis', fontsize=16, fontweight='bold', pad=20)
    axes[1,0].set_xlabel('Age', fontsize=12)
    axes[1,0].set_ylabel('Salary', fontsize=12)
    axes[1,0].legend(frameon=True, fancybox=True, shadow=True)
    axes[1,0].grid(True, alpha=0.3, linestyle='--')
    
    # 4. Professional pie chart
    region_sales = sales_df.groupby('region')['sales_amount'].sum()
    wedges, texts, autotexts = axes[1,1].pie(region_sales.values, 
                                            labels=region_sales.index,
                                            autopct='%1.1f%%',
                                            colors=colors,
                                            explode=(0.05, 0.05, 0.05, 0.05),
                                            shadow=True,
                                            startangle=90)
    
    axes[1,1].set_title('Regional Sales Distribution', fontsize=16, fontweight='bold', pad=20)
    
    for autotext in autotexts:
        autotext.set_color('white')
        autotext.set_fontweight('bold')
    
    plt.tight_layout()
    plt.suptitle('Sales Data Analysis Report', fontsize=20, fontweight='bold', y=0.98)
    plt.show()

create_professional_chart()

6. Visualization Best Practices

6.1 Design Principles

python
print("\n=== Visualization Design Principles ===")

def visualization_principles():
    """Visualization design principles examples"""
    
    principles = {
        "Simplicity Principle": "Avoid unnecessary decoration, highlight the data itself",
        "Accuracy Principle": "Ensure charts accurately reflect data, avoid misleading",
        "Consistency Principle": "Maintain consistency in colors, fonts, and styles",
        "Readability Principle": "Ensure labels and legends are clear and readable",
        "Purpose Principle": "Choose appropriate chart types based on analysis purpose"
    }
    
    print("Five Principles of Visualization Design:")
    for principle, description in principles.items():
        print(f"• {principle}: {description}")
    
    # Good vs bad design comparison
    fig, axes = plt.subplots(1, 2, figsize=(15, 6))
    
    # Good design
    monthly_sales = sales_df.groupby('month')['sales_amount'].sum()
    axes[0].plot(monthly_sales.index, monthly_sales.values, 
                linewidth=2, marker='o', markersize=6, color='#2E86AB')
    axes[0].set_title('Monthly Sales Trend', fontsize=14, fontweight='bold', pad=15)
    axes[0].set_xlabel('Month', fontsize=12)
    axes[0].set_ylabel('Sales Amount', fontsize=12)
    axes[0].grid(True, alpha=0.3, linestyle='--')
    axes[0].set_axisbelow(True)
    
    axes[0].text(0.02, 0.98, '✓ Good Design', transform=axes[0].transAxes, 
                fontsize=12, fontweight='bold', color='green',
                verticalalignment='top')
    
    # Bad design example
    axes[1].plot(monthly_sales.index, monthly_sales.values, 
                linewidth=5, marker='s', markersize=12, color='red')
    axes[1].set_title('MONTHLY SALES TREND!!!', fontsize=16, color='red')
    axes[1].set_facecolor('yellow')
    axes[1].grid(True, alpha=0.8, linewidth=2, color='blue')
    
    axes[1].text(0.02, 0.98, '✗ Bad Design', transform=axes[1].transAxes, 
                fontsize=12, fontweight='bold', color='red',
                verticalalignment='top')
    
    plt.tight_layout()
    plt.show()

visualization_principles()

6.2 Chart Selection Guide

python
print("\n=== Chart Selection Guide ===")

def chart_selection_guide():
    """Chart selection guide and examples"""
    
    guide = {
        "Comparing Values": {
            "Recommended Charts": ["Bar chart", "Column chart", "Radar chart"],
            "Example": "Product sales comparison"
        },
        "Showing Trends": {
            "Recommended Charts": ["Line chart", "Area chart"],
            "Example": "Monthly sales trend"
        },
        "Showing Distribution": {
            "Recommended Charts": ["Histogram", "Box plot", "Violin plot"],
            "Example": "Employee salary distribution"
        },
        "Showing Relationships": {
            "Recommended Charts": ["Scatter plot", "Bubble chart", "Heatmap"],
            "Example": "Age vs salary relationship"
        },
        "Showing Composition": {
            "Recommended Charts": ["Pie chart", "Stacked bar chart", "Treemap"],
            "Example": "Regional sales share"
        }
    }
    
    print("Chart Selection Guide:")
    for purpose, info in guide.items():
        print(f"\n{purpose}:")
        print(f"  Recommended: {', '.join(info['Recommended Charts'])}")
        print(f"  Example: {info['Example']}")

chart_selection_guide()

Chapter Summary

This chapter comprehensively covered Pandas data visualization capabilities:

Core Content Review

  1. Basic Chart Types: Line plots, bar charts, scatter plots, histograms
  2. Advanced Charts: Box plots, heatmaps, multi-subplot layouts
  3. Interactive Visualization: Creating dynamic charts with Plotly
  4. Professional Customization: Style design, theme colors, chart templates
  5. Data Storytelling: Creating complete analysis reports and business insights

Key Skills

  • Choose appropriate charts based on data types
  • Use Pandas built-in plotting for quick visualization
  • Create professional charts with matplotlib and seaborn
  • Optimize large data visualization performance
  • Follow visualization design principles

Best Practices

  • Keep charts simple and clear
  • Choose appropriate colors and styles
  • Add necessary labels and descriptions
  • Consider target audience needs
  • Tell meaningful stories with data

Practical Applications

  • Business reports and dashboards
  • Data exploration and analysis
  • Academic research and papers
  • Product analysis and user insights
  • Market research and competitive analysis

Mastering these visualization skills will help you:

  • Quickly discover patterns and trends in data
  • Effectively communicate analysis results and insights
  • Create professional data reports
  • Support data-driven decision making

Exercises

  1. Create a comprehensive dashboard with multiple chart types
  2. Design an interactive sales analysis tool
  3. Create a complete data analysis report
  4. Optimize visualization performance for large datasets
  5. Develop a standardized chart template library

In the next chapter, we will learn about Pandas advanced features, exploring more complex data processing and analysis techniques.

Content is for learning and research only.