📓 Getting Started with Jupyter Notebooks

Introduction

Jupyter Notebooks provide an interactive environment for Python development, data analysis, and prototyping. They combine code, documentation, and output in a single document, making them perfect for exploratory programming, data science, and sharing your work.

1. Installation

Install Jupyter

# Activate your virtual environment first
source venv/bin/activate

# Install Jupyter
pip install jupyter

# Or install the full data science stack
pip install jupyter numpy pandas matplotlib seaborn

Alternative: JupyterLab

# Install JupyterLab (modern interface)
pip install jupyterlab

2. Starting Jupyter

Launch Jupyter Notebook

# Start Jupyter Notebook
jupyter notebook

# Or start JupyterLab
jupyter lab

This will open your browser to http://localhost:8888 (or similar port).

3. Jupyter Interface Overview

Main Components

File Browser: Navigate and manage files
Notebook Editor: Write and execute code
Kernel: Python interpreter running your code
Output Area: Display results, plots, and errors

Creating a New Notebook

Click "New" → "Python 3" (or your kernel)
Or use the + button in the file browser
Rename your notebook by clicking on "Untitled"

4. Cell Types and Usage

Code Cells

Execute Python code:

# This is a code cell
import numpy as np
import matplotlib.pyplot as plt

# Create some data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Plot the data
plt.figure(figsize=(10, 6))
plt.plot(x, y)
plt.title('Sine Wave')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True)
plt.show()

Markdown Cells

Write documentation and explanations:

# Data Analysis Project

## Overview
This notebook analyzes sales data from Q1 2024.

### Key Findings
- Sales increased by 15% compared to Q4 2023
- Top performing product: Widget A
- Peak sales day: Friday

## Methodology
We used the following approach:
1. Data cleaning and preprocessing
2. Exploratory data analysis
3. Statistical modeling
4. Visualization

Raw Cells

For unformatted text (rarely used).

5. Essential Keyboard Shortcuts

Command Mode (press Esc)

A: Insert cell above
B: Insert cell below
DD: Delete cell
Z: Undo cell deletion
M: Convert to Markdown
Y: Convert to Code
R: Convert to Raw
Shift + M: Merge cells
C: Copy cell
V: Paste cell
X: Cut cell

Edit Mode (press Enter)

Shift + Enter: Run cell and move to next
Ctrl + Enter: Run cell and stay
Alt + Enter: Run cell and insert below
Ctrl + /: Toggle comment
Tab: Auto-complete
Shift + Tab: Show function signature

6. Magic Commands

Magic commands provide additional functionality:

Line Magic (single %)

# Timing code execution
%time sum(range(1000000))

# Memory usage
%memit sum(range(1000000))

# List variables
%whos

# Change directory
%cd /path/to/directory

# Run external command
%ls

# Load external script
%load script.py

Cell Magic (double %%)

# Run code in different language
%%bash
ls -la
pwd

# Time entire cell
%%time
import time
time.sleep(2)
print("Done!")

# Write to file
%%writefile hello.py
def greet(name):
    return f"Hello, {name}!"

if __name__ == "__main__":
    print(greet("World"))

7. Data Analysis Workflow

Step 1: Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

Step 2: Load Data

# Load CSV data
df = pd.read_csv('data/sales_data.csv')

# Load Excel data
df = pd.read_excel('data/sales_data.xlsx', sheet_name='Q1')

# Load JSON data
df = pd.read_json('data/sales_data.json')

# Display basic info
print(f"Dataset shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
df.head()

Step 3: Data Exploration

# Basic statistics
df.describe()

# Data types and missing values
df.info()

# Check for missing values
df.isnull().sum()

# Unique values in categorical columns
df['category'].value_counts()

Step 4: Data Visualization

# Create subplots
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Histogram
df['sales'].hist(ax=axes[0,0], bins=30)
axes[0,0].set_title('Sales Distribution')

# Box plot
df.boxplot(column='sales', by='category', ax=axes[0,1])
axes[0,1].set_title('Sales by Category')

# Scatter plot
df.plot.scatter(x='marketing_spend', y='sales', ax=axes[1,0])
axes[1,0].set_title('Marketing Spend vs Sales')

# Correlation heatmap
correlation_matrix = df.select_dtypes(include=[np.number]).corr()
sns.heatmap(correlation_matrix, annot=True, ax=axes[1,1])
axes[1,1].set_title('Correlation Matrix')

plt.tight_layout()
plt.show()

8. Best Practices

Notebook Organization

Start with a title and description
Import all libraries at the top
Load data early
Use markdown cells for explanations
Keep cells focused and small
Clear outputs before committing to version control

Code Quality

# Good: Clear variable names
customer_sales_data = df[df['customer_type'] == 'premium']

# Good: Add comments for complex logic
# Calculate rolling 7-day average sales
df['sales_7day_avg'] = df['sales'].rolling(window=7).mean()

# Good: Use functions for reusable code
def calculate_growth_rate(current, previous):
    """Calculate percentage growth rate."""
    return ((current - previous) / previous) * 100

Error Handling

try:
    # Load data with error handling
    df = pd.read_csv('data/sales_data.csv')
    print("Data loaded successfully!")
except FileNotFoundError:
    print("Data file not found. Please check the path.")
except Exception as e:
    print(f"Error loading data: {e}")

9. Advanced Features

Widgets for Interactive Notebooks

from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets

def plot_sales(month, category):
    filtered_df = df[(df['month'] == month) & (df['category'] == category)]
    plt.figure(figsize=(10, 6))
    plt.plot(filtered_df['day'], filtered_df['sales'])
    plt.title(f'Sales for {category} in {month}')
    plt.show()

# Create interactive widget
interact(plot_sales, 
         month=widgets.Dropdown(options=df['month'].unique()),
         category=widgets.Dropdown(options=df['category'].unique()))

Exporting Notebooks

# Convert to HTML
!jupyter nbconvert --to html my_notebook.ipynb

# Convert to PDF
!jupyter nbconvert --to pdf my_notebook.ipynb

# Convert to Python script
!jupyter nbconvert --to script my_notebook.ipynb

10. Troubleshooting Common Issues

Kernel Issues

# Restart kernel
# Kernel → Restart

# Check kernel status
import sys
print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")

Memory Issues

# Clear variables
%reset -f

# Monitor memory usage
import psutil
print(f"Memory usage: {psutil.virtual_memory().percent}%")

Slow Performance

# Use %%time to identify slow cells
%%time
# Your code here

# Optimize with vectorized operations
# Instead of loops, use pandas operations
df['new_column'] = df['column1'] * df['column2']  # Fast

Version Control

# Add to .gitignore
echo "*.ipynb_checkpoints" >> .gitignore

# Use nbstripout to clean notebooks
pip install nbstripout
nbstripout --install

Publishing Notebooks

GitHub: Notebooks render automatically
nbviewer: Share public notebooks
Binder: Interactive notebooks in the cloud
Google Colab: Free cloud notebooks

Conclusion

Jupyter Notebooks are powerful tools for:

Interactive development and prototyping
Data analysis and visualization
Documentation and sharing
Education and tutorials
Research and experimentation

Master these concepts and you'll be able to create professional, interactive Python notebooks for any project.

Next Steps

Explore JupyterLab for advanced features
Learn about Jupyter extensions and widgets
Practice with real datasets
Try Google Colab for cloud-based notebooks

Introduction​

1. Installation​

Install Jupyter​

Alternative: JupyterLab​

2. Starting Jupyter​

Launch Jupyter Notebook​

3. Jupyter Interface Overview​

Main Components​

Creating a New Notebook​

4. Cell Types and Usage​

Code Cells​

Markdown Cells​

Raw Cells​

5. Essential Keyboard Shortcuts​

Command Mode (press Esc)​

Edit Mode (press Enter)​

6. Magic Commands​

Line Magic (single %)​

Cell Magic (double %%)​

7. Data Analysis Workflow​

Step 1: Import Libraries​

Step 2: Load Data​

Step 3: Data Exploration​

Step 4: Data Visualization​

8. Best Practices​

Notebook Organization​

Code Quality​

Error Handling​

9. Advanced Features​

Widgets for Interactive Notebooks​

Exporting Notebooks​

10. Troubleshooting Common Issues​

Kernel Issues​

Memory Issues​

Slow Performance​

11. Sharing and Collaboration​

Version Control​

Publishing Notebooks​

Conclusion​

Next Steps​