📓 Getting Started with Jupyter Notebooks
Introduction​
Jupyter Notebooks provide an interactive environment for Python development, data analysis, and prototyping. They combine code, documentation, and output in a single document, making them perfect for exploratory programming, data science, and sharing your work.
1. Installation​
Install Jupyter​
# Activate your virtual environment first
source venv/bin/activate
# Install Jupyter
pip install jupyter
# Or install the full data science stack
pip install jupyter numpy pandas matplotlib seaborn
Alternative: JupyterLab​
# Install JupyterLab (modern interface)
pip install jupyterlab
2. Starting Jupyter​
Launch Jupyter Notebook​
# Start Jupyter Notebook
jupyter notebook
# Or start JupyterLab
jupyter lab
This will open your browser to http://localhost:8888 (or similar port).
3. Jupyter Interface Overview​
Main Components​
- File Browser: Navigate and manage files
- Notebook Editor: Write and execute code
- Kernel: Python interpreter running your code
- Output Area: Display results, plots, and errors
Creating a New Notebook​
- Click "New" → "Python 3" (or your kernel)
- Or use the
+button in the file browser - Rename your notebook by clicking on "Untitled"
4. Cell Types and Usage​
Code Cells​
Execute Python code:
# This is a code cell
import numpy as np
import matplotlib.pyplot as plt
# Create some data
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Plot the data
plt.figure(figsize=(10, 6))
plt.plot(x, y)
plt.title('Sine Wave')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True)
plt.show()
Markdown Cells​
Write documentation and explanations:
# Data Analysis Project
## Overview
This notebook analyzes sales data from Q1 2024.
### Key Findings
- Sales increased by 15% compared to Q4 2023
- Top performing product: Widget A
- Peak sales day: Friday
## Methodology
We used the following approach:
1. Data cleaning and preprocessing
2. Exploratory data analysis
3. Statistical modeling
4. Visualization
Raw Cells​
For unformatted text (rarely used).
5. Essential Keyboard Shortcuts​
Command Mode (press Esc)​
A: Insert cell aboveB: Insert cell belowDD: Delete cellZ: Undo cell deletionM: Convert to MarkdownY: Convert to CodeR: Convert to RawShift + M: Merge cellsC: Copy cellV: Paste cellX: Cut cell
Edit Mode (press Enter)​
Shift + Enter: Run cell and move to nextCtrl + Enter: Run cell and stayAlt + Enter: Run cell and insert belowCtrl + /: Toggle commentTab: Auto-completeShift + Tab: Show function signature
6. Magic Commands​
Magic commands provide additional functionality:
Line Magic (single %)​
# Timing code execution
%time sum(range(1000000))
# Memory usage
%memit sum(range(1000000))
# List variables
%whos
# Change directory
%cd /path/to/directory
# Run external command
%ls
# Load external script
%load script.py
Cell Magic (double %%)​
# Run code in different language
%%bash
ls -la
pwd
# Time entire cell
%%time
import time
time.sleep(2)
print("Done!")
# Write to file
%%writefile hello.py
def greet(name):
return f"Hello, {name}!"
if __name__ == "__main__":
print(greet("World"))
7. Data Analysis Workflow​
Step 1: Import Libraries​
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')
# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
Step 2: Load Data​
# Load CSV data
df = pd.read_csv('data/sales_data.csv')
# Load Excel data
df = pd.read_excel('data/sales_data.xlsx', sheet_name='Q1')
# Load JSON data
df = pd.read_json('data/sales_data.json')
# Display basic info
print(f"Dataset shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
df.head()
Step 3: Data Exploration​
# Basic statistics
df.describe()
# Data types and missing values
df.info()
# Check for missing values
df.isnull().sum()
# Unique values in categorical columns
df['category'].value_counts()
Step 4: Data Visualization​
# Create subplots
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
# Histogram
df['sales'].hist(ax=axes[0,0], bins=30)
axes[0,0].set_title('Sales Distribution')
# Box plot
df.boxplot(column='sales', by='category', ax=axes[0,1])
axes[0,1].set_title('Sales by Category')
# Scatter plot
df.plot.scatter(x='marketing_spend', y='sales', ax=axes[1,0])
axes[1,0].set_title('Marketing Spend vs Sales')
# Correlation heatmap
correlation_matrix = df.select_dtypes(include=[np.number]).corr()
sns.heatmap(correlation_matrix, annot=True, ax=axes[1,1])
axes[1,1].set_title('Correlation Matrix')
plt.tight_layout()
plt.show()
8. Best Practices​
Notebook Organization​
- Start with a title and description
- Import all libraries at the top
- Load data early
- Use markdown cells for explanations
- Keep cells focused and small
- Clear outputs before committing to version control
Code Quality​
# Good: Clear variable names
customer_sales_data = df[df['customer_type'] == 'premium']
# Good: Add comments for complex logic
# Calculate rolling 7-day average sales
df['sales_7day_avg'] = df['sales'].rolling(window=7).mean()
# Good: Use functions for reusable code
def calculate_growth_rate(current, previous):
"""Calculate percentage growth rate."""
return ((current - previous) / previous) * 100
Error Handling​
try:
# Load data with error handling
df = pd.read_csv('data/sales_data.csv')
print("Data loaded successfully!")
except FileNotFoundError:
print("Data file not found. Please check the path.")
except Exception as e:
print(f"Error loading data: {e}")
9. Advanced Features​
Widgets for Interactive Notebooks​
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
def plot_sales(month, category):
filtered_df = df[(df['month'] == month) & (df['category'] == category)]
plt.figure(figsize=(10, 6))
plt.plot(filtered_df['day'], filtered_df['sales'])
plt.title(f'Sales for {category} in {month}')
plt.show()
# Create interactive widget
interact(plot_sales,
month=widgets.Dropdown(options=df['month'].unique()),
category=widgets.Dropdown(options=df['category'].unique()))
Exporting Notebooks​
# Convert to HTML
!jupyter nbconvert --to html my_notebook.ipynb
# Convert to PDF
!jupyter nbconvert --to pdf my_notebook.ipynb
# Convert to Python script
!jupyter nbconvert --to script my_notebook.ipynb
10. Troubleshooting Common Issues​
Kernel Issues​
# Restart kernel
# Kernel → Restart
# Check kernel status
import sys
print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")
Memory Issues​
# Clear variables
%reset -f
# Monitor memory usage
import psutil
print(f"Memory usage: {psutil.virtual_memory().percent}%")
Slow Performance​
# Use %%time to identify slow cells
%%time
# Your code here
# Optimize with vectorized operations
# Instead of loops, use pandas operations
df['new_column'] = df['column1'] * df['column2'] # Fast
11. Sharing and Collaboration​
Version Control​
# Add to .gitignore
echo "*.ipynb_checkpoints" >> .gitignore
# Use nbstripout to clean notebooks
pip install nbstripout
nbstripout --install
Publishing Notebooks​
- GitHub: Notebooks render automatically
- nbviewer: Share public notebooks
- Binder: Interactive notebooks in the cloud
- Google Colab: Free cloud notebooks
Conclusion​
Jupyter Notebooks are powerful tools for:
- Interactive development and prototyping
- Data analysis and visualization
- Documentation and sharing
- Education and tutorials
- Research and experimentation
Master these concepts and you'll be able to create professional, interactive Python notebooks for any project.
Next Steps​
- Explore JupyterLab for advanced features
- Learn about Jupyter extensions and widgets
- Practice with real datasets
- Try Google Colab for cloud-based notebooks