Skip to main content

📓 Getting Started with Jupyter Notebooks

Introduction​

Jupyter Notebooks provide an interactive environment for Python development, data analysis, and prototyping. They combine code, documentation, and output in a single document, making them perfect for exploratory programming, data science, and sharing your work.

1. Installation​

Install Jupyter​

# Activate your virtual environment first
source venv/bin/activate

# Install Jupyter
pip install jupyter

# Or install the full data science stack
pip install jupyter numpy pandas matplotlib seaborn

Alternative: JupyterLab​

# Install JupyterLab (modern interface)
pip install jupyterlab

2. Starting Jupyter​

Launch Jupyter Notebook​

# Start Jupyter Notebook
jupyter notebook

# Or start JupyterLab
jupyter lab

This will open your browser to http://localhost:8888 (or similar port).

3. Jupyter Interface Overview​

Main Components​

  • File Browser: Navigate and manage files
  • Notebook Editor: Write and execute code
  • Kernel: Python interpreter running your code
  • Output Area: Display results, plots, and errors

Creating a New Notebook​

  1. Click "New" → "Python 3" (or your kernel)
  2. Or use the + button in the file browser
  3. Rename your notebook by clicking on "Untitled"

4. Cell Types and Usage​

Code Cells​

Execute Python code:

# This is a code cell
import numpy as np
import matplotlib.pyplot as plt

# Create some data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Plot the data
plt.figure(figsize=(10, 6))
plt.plot(x, y)
plt.title('Sine Wave')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True)
plt.show()

Markdown Cells​

Write documentation and explanations:

# Data Analysis Project

## Overview
This notebook analyzes sales data from Q1 2024.

### Key Findings
- Sales increased by 15% compared to Q4 2023
- Top performing product: Widget A
- Peak sales day: Friday

## Methodology
We used the following approach:
1. Data cleaning and preprocessing
2. Exploratory data analysis
3. Statistical modeling
4. Visualization

Raw Cells​

For unformatted text (rarely used).

5. Essential Keyboard Shortcuts​

Command Mode (press Esc)​

  • A: Insert cell above
  • B: Insert cell below
  • DD: Delete cell
  • Z: Undo cell deletion
  • M: Convert to Markdown
  • Y: Convert to Code
  • R: Convert to Raw
  • Shift + M: Merge cells
  • C: Copy cell
  • V: Paste cell
  • X: Cut cell

Edit Mode (press Enter)​

  • Shift + Enter: Run cell and move to next
  • Ctrl + Enter: Run cell and stay
  • Alt + Enter: Run cell and insert below
  • Ctrl + /: Toggle comment
  • Tab: Auto-complete
  • Shift + Tab: Show function signature

6. Magic Commands​

Magic commands provide additional functionality:

Line Magic (single %)​

# Timing code execution
%time sum(range(1000000))

# Memory usage
%memit sum(range(1000000))

# List variables
%whos

# Change directory
%cd /path/to/directory

# Run external command
%ls

# Load external script
%load script.py

Cell Magic (double %%)​

# Run code in different language
%%bash
ls -la
pwd

# Time entire cell
%%time
import time
time.sleep(2)
print("Done!")

# Write to file
%%writefile hello.py
def greet(name):
return f"Hello, {name}!"

if __name__ == "__main__":
print(greet("World"))

7. Data Analysis Workflow​

Step 1: Import Libraries​

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

Step 2: Load Data​

# Load CSV data
df = pd.read_csv('data/sales_data.csv')

# Load Excel data
df = pd.read_excel('data/sales_data.xlsx', sheet_name='Q1')

# Load JSON data
df = pd.read_json('data/sales_data.json')

# Display basic info
print(f"Dataset shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
df.head()

Step 3: Data Exploration​

# Basic statistics
df.describe()

# Data types and missing values
df.info()

# Check for missing values
df.isnull().sum()

# Unique values in categorical columns
df['category'].value_counts()

Step 4: Data Visualization​

# Create subplots
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Histogram
df['sales'].hist(ax=axes[0,0], bins=30)
axes[0,0].set_title('Sales Distribution')

# Box plot
df.boxplot(column='sales', by='category', ax=axes[0,1])
axes[0,1].set_title('Sales by Category')

# Scatter plot
df.plot.scatter(x='marketing_spend', y='sales', ax=axes[1,0])
axes[1,0].set_title('Marketing Spend vs Sales')

# Correlation heatmap
correlation_matrix = df.select_dtypes(include=[np.number]).corr()
sns.heatmap(correlation_matrix, annot=True, ax=axes[1,1])
axes[1,1].set_title('Correlation Matrix')

plt.tight_layout()
plt.show()

8. Best Practices​

Notebook Organization​

  1. Start with a title and description
  2. Import all libraries at the top
  3. Load data early
  4. Use markdown cells for explanations
  5. Keep cells focused and small
  6. Clear outputs before committing to version control

Code Quality​

# Good: Clear variable names
customer_sales_data = df[df['customer_type'] == 'premium']

# Good: Add comments for complex logic
# Calculate rolling 7-day average sales
df['sales_7day_avg'] = df['sales'].rolling(window=7).mean()

# Good: Use functions for reusable code
def calculate_growth_rate(current, previous):
"""Calculate percentage growth rate."""
return ((current - previous) / previous) * 100

Error Handling​

try:
# Load data with error handling
df = pd.read_csv('data/sales_data.csv')
print("Data loaded successfully!")
except FileNotFoundError:
print("Data file not found. Please check the path.")
except Exception as e:
print(f"Error loading data: {e}")

9. Advanced Features​

Widgets for Interactive Notebooks​

from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets

def plot_sales(month, category):
filtered_df = df[(df['month'] == month) & (df['category'] == category)]
plt.figure(figsize=(10, 6))
plt.plot(filtered_df['day'], filtered_df['sales'])
plt.title(f'Sales for {category} in {month}')
plt.show()

# Create interactive widget
interact(plot_sales,
month=widgets.Dropdown(options=df['month'].unique()),
category=widgets.Dropdown(options=df['category'].unique()))

Exporting Notebooks​

# Convert to HTML
!jupyter nbconvert --to html my_notebook.ipynb

# Convert to PDF
!jupyter nbconvert --to pdf my_notebook.ipynb

# Convert to Python script
!jupyter nbconvert --to script my_notebook.ipynb

10. Troubleshooting Common Issues​

Kernel Issues​

# Restart kernel
# Kernel → Restart

# Check kernel status
import sys
print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")

Memory Issues​

# Clear variables
%reset -f

# Monitor memory usage
import psutil
print(f"Memory usage: {psutil.virtual_memory().percent}%")

Slow Performance​

# Use %%time to identify slow cells
%%time
# Your code here

# Optimize with vectorized operations
# Instead of loops, use pandas operations
df['new_column'] = df['column1'] * df['column2'] # Fast

11. Sharing and Collaboration​

Version Control​

# Add to .gitignore
echo "*.ipynb_checkpoints" >> .gitignore

# Use nbstripout to clean notebooks
pip install nbstripout
nbstripout --install

Publishing Notebooks​

  • GitHub: Notebooks render automatically
  • nbviewer: Share public notebooks
  • Binder: Interactive notebooks in the cloud
  • Google Colab: Free cloud notebooks

Conclusion​

Jupyter Notebooks are powerful tools for:

  • Interactive development and prototyping
  • Data analysis and visualization
  • Documentation and sharing
  • Education and tutorials
  • Research and experimentation

Master these concepts and you'll be able to create professional, interactive Python notebooks for any project.

Next Steps​

  • Explore JupyterLab for advanced features
  • Learn about Jupyter extensions and widgets
  • Practice with real datasets
  • Try Google Colab for cloud-based notebooks