Profiling
Learning Objectives
- By the end of this lesson, you will be able to:
- - Understand what profiling is
- - Use the cProfile module
- - Identify performance bottlenecks
- - Analyze profiling results
- - Use profiling tools effectively
- - Understand profiling output
- - Apply profiling to real code
- - Interpret profiling statistics
- - Use line_profiler for line-by-line profiling
- - Make informed optimization decisions
Lesson 18.1: Profiling
Learning Objectives
By the end of this lesson, you will be able to:
- Understand what profiling is
- Use the cProfile module
- Identify performance bottlenecks
- Analyze profiling results
- Use profiling tools effectively
- Understand profiling output
- Apply profiling to real code
- Interpret profiling statistics
- Use line_profiler for line-by-line profiling
- Make informed optimization decisions
Introduction to Profiling
Profiling is the process of measuring where your program spends time and which functions are called most often. It helps identify performance bottlenecks.
Why Profile?
- Find bottlenecks: Identify slow code
- Measure performance: Get actual timing data
- Optimize effectively: Focus on what matters
- Verify improvements: Measure before and after
- Data-driven decisions: Make optimization decisions based on data
What is Profiling?
Profiling measures:
- Execution time: How long functions take
- Call counts: How many times functions are called
- Call hierarchy: Which functions call which
- Memory usage: How much memory is used (with some tools)
cProfile Module
Basic Profiling
The cProfile module provides deterministic profiling:
import cProfile
def slow_function():
total = 0
for i in range(1000000):
total += i * i
return total
# Profile the function
cProfile.run('slow_function()')
Profiling with runctx()
import cProfile
def add(x, y):
return x + y
# Profile with context
cProfile.runctx('result = add(2, 3)', globals(), locals())
Saving Profile Results
import cProfile
def my_function():
# Your code here
pass
# Save profile to file
cProfile.run('my_function()', 'profile_output.prof')
Analyzing Profile Results
import cProfile
import pstats
# Run profiling
cProfile.run('my_function()', 'profile_output.prof')
# Analyze results
stats = pstats.Stats('profile_output.prof')
stats.sort_stats('cumulative')
stats.print_stats(10) # Print top 10
Understanding Profile Output
Profile Statistics
Profile output shows:
- ncalls: Number of calls
- tottime: Total time in function (excluding subcalls)
- percall: Time per call (tottime / ncalls)
- cumtime: Cumulative time (including subcalls)
- percall: Cumulative time per call
- filename:lineno(function): Location of function
Example Output
1000003 function calls in 2.345 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 2.345 2.345 <string>:1(<module>)
1 2.345 2.345 2.345 2.345 example.py:5(slow_function)
1000000 0.000 0.000 0.000 0.000 {built-in method range}
Sorting Options
stats.sort_stats('cumulative') # By cumulative time
stats.sort_stats('tottime') # By total time
stats.sort_stats('calls') # By call count
stats.sort_stats('name') # By function name
Filtering Results
stats.print_stats('my_module') # Only functions in my_module
stats.print_stats(0.1) # Only functions taking > 10% of time
Identifying Bottlenecks
What to Look For
- High cumulative time: Functions that take a lot of time
- High call count: Functions called many times
- High per-call time: Slow individual function calls
- Recursive calls: Functions calling themselves many times
Example: Finding Bottlenecks
import cProfile
import pstats
def process_data(data):
result = []
for item in data:
# Expensive operation
processed = expensive_operation(item)
result.append(processed)
return result
def expensive_operation(item):
total = 0
for i in range(1000):
total += i * item
return total
# Profile
data = list(range(100))
cProfile.run('process_data(data)', 'profile.prof')
# Analyze
stats = pstats.Stats('profile.prof')
stats.sort_stats('cumulative')
stats.print_stats(20)
Common Bottlenecks
- Loops: Nested loops, large iterations
- I/O operations: File I/O, network I/O
- String operations: Concatenation, formatting
- List operations: Appending, searching
- Function calls: Too many function calls
- Recursion: Deep recursion
Profiling Techniques
Technique 1: Profile Entire Script
# profile_script.py
import cProfile
import pstats
if __name__ == '__main__':
cProfile.run('main()', 'profile.prof')
stats = pstats.Stats('profile.prof')
stats.sort_stats('cumulative')
stats.print_stats()
Technique 2: Profile Specific Function
import cProfile
def my_function():
# Code to profile
pass
# Profile just this function
cProfile.run('my_function()')
Technique 3: Profile with Decorator
import cProfile
import pstats
from functools import wraps
def profile_decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
profiler = cProfile.Profile()
profiler.enable()
result = func(*args, **kwargs)
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10)
return result
return wrapper
@profile_decorator
def my_function():
# Code to profile
pass
Technique 4: Profile Context Manager
import cProfile
import pstats
from contextlib import contextmanager
@contextmanager
def profile_context():
profiler = cProfile.Profile()
profiler.enable()
try:
yield profiler
finally:
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10)
# Use
with profile_context():
my_function()
Advanced Profiling
Call Graph Analysis
import cProfile
import pstats
def analyze_call_graph(profile_file):
stats = pstats.Stats(profile_file)
stats.print_callers() # Show who calls each function
stats.print_callees() # Show what each function calls
Time Distribution
import cProfile
import pstats
def analyze_time_distribution(profile_file):
stats = pstats.Stats(profile_file)
stats.sort_stats('cumulative')
# Print functions taking most time
stats.print_stats(0.1) # Top 10%
Module-Level Analysis
import cProfile
import pstats
def analyze_by_module(profile_file):
stats = pstats.Stats(profile_file)
# Group by module
stats.print_stats('module_name')
Practical Examples
Example 1: Profiling a Slow Function
import cProfile
import pstats
def slow_function():
result = 0
for i in range(1000000):
result += i ** 2
return result
# Profile
cProfile.run('slow_function()', 'slow.prof')
# Analyze
stats = pstats.Stats('slow.prof')
stats.sort_stats('cumulative')
stats.print_stats()
Example 2: Profiling Data Processing
import cProfile
import pstats
def process_large_dataset(data):
results = []
for item in data:
# Multiple operations
processed = transform(item)
filtered = filter_data(processed)
results.append(filtered)
return results
def transform(item):
return item * 2
def filter_data(item):
return item if item > 10 else None
# Profile
data = list(range(10000))
cProfile.run('process_large_dataset(data)', 'process.prof')
# Analyze
stats = pstats.Stats('process.prof')
stats.sort_stats('tottime')
stats.print_stats(20)
Example 3: Comparing Implementations
import cProfile
import pstats
def implementation1(data):
result = []
for item in data:
result.append(item * 2)
return result
def implementation2(data):
return [item * 2 for item in data]
# Profile both
data = list(range(100000))
cProfile.run('implementation1(data)', 'impl1.prof')
cProfile.run('implementation2(data)', 'impl2.prof')
# Compare
stats1 = pstats.Stats('impl1.prof')
stats2 = pstats.Stats('impl2.prof')
stats1.sort_stats('cumulative')
stats2.sort_stats('cumulative')
print("Implementation 1:")
stats1.print_stats(5)
print("\nImplementation 2:")
stats2.print_stats(5)
Using timeit for Micro-benchmarks
timeit Module
For small code snippets, use timeit:
import timeit
# Time a single expression
time = timeit.timeit('sum(range(100))', number=1000)
print(f"Time: {time} seconds")
# Time with setup
time = timeit.timeit(
'result = sum(data)',
setup='data = list(range(100))',
number=1000
)
print(f"Time: {time} seconds")
Comparing Approaches
import timeit
# Compare two approaches
approach1 = timeit.timeit(
'result = [x * 2 for x in range(1000)]',
number=1000
)
approach2 = timeit.timeit(
'result = []\nfor x in range(1000):\n result.append(x * 2)',
number=1000
)
print(f"List comprehension: {approach1}")
print(f"For loop: {approach2}")
Common Profiling Patterns
Pattern 1: Profile Before Optimization
import cProfile
import pstats
# Profile original
cProfile.run('original_function()', 'before.prof')
# Make changes
# ...
# Profile optimized
cProfile.run('optimized_function()', 'after.prof')
# Compare
stats_before = pstats.Stats('before.prof')
stats_after = pstats.Stats('after.prof')
Pattern 2: Profile Hot Paths
import cProfile
def main():
# Profile only the hot path
with cProfile.Profile() as profiler:
hot_path_function()
profiler.print_stats()
Pattern 3: Profile Specific Operations
import cProfile
def process_data(data):
profiler = cProfile.Profile()
# Profile expensive operation
profiler.enable()
result = expensive_operation(data)
profiler.disable()
profiler.print_stats()
return result
Interpreting Results
Key Metrics
- cumulative time: Total time including subcalls
- total time: Time in function excluding subcalls
- call count: How many times function was called
- per-call time: Average time per call
What to Optimize
Optimize functions that have:
- High cumulative time: Taking a lot of total time
- High call count: Called very frequently
- High per-call time: Each call is slow
Example Analysis
ncalls tottime percall cumtime percall filename:lineno(function)
100000 1.234 0.000 1.234 0.000 example.py:10(expensive_op)
1 0.001 0.001 1.235 1.235 example.py:5(process_data)
Analysis:
expensive_opis called 100,000 times- Takes 1.234 seconds total
- Each call is very fast (0.000 seconds)
- But total time is high due to call count
- Optimization: Reduce call count or optimize the function
Common Mistakes and Pitfalls
1. Profiling Wrong Code
# WRONG: Profiling setup code
import cProfile
cProfile.run('data = list(range(1000000))') # Setup, not actual work
# CORRECT: Profile actual work
cProfile.run('process_data(data)')
2. Not Sorting Results
# WRONG: Unsorted results
stats.print_stats()
# CORRECT: Sort by time
stats.sort_stats('cumulative')
stats.print_stats()
3. Ignoring Call Count
# WRONG: Only looking at time
# Function takes 0.001s but called 1,000,000 times = 1000s total!
# CORRECT: Consider both time and call count
stats.sort_stats('cumulative')
stats.print_stats()
4. Profiling in Wrong Environment
# WRONG: Profiling in development mode
# (Debug mode, different data, etc.)
# CORRECT: Profile in production-like environment
Best Practices
1. Profile Realistic Data
# Use production-like data sizes
data = load_production_data()
cProfile.run('process_data(data)')
2. Profile Multiple Times
# Profile multiple runs for consistency
for i in range(5):
cProfile.run('my_function()', f'run_{i}.prof')
3. Focus on Hot Paths
# Profile the code paths that matter
cProfile.run('hot_path_function()')
4. Compare Before and After
# Always compare optimizations
stats_before = pstats.Stats('before.prof')
stats_after = pstats.Stats('after.prof')
5. Use Appropriate Tools
# Use cProfile for function-level
# Use timeit for micro-benchmarks
# Use line_profiler for line-by-line
Practice Exercise
Exercise: Profiling
Objective: Create a Python program that demonstrates profiling.
Instructions:
-
Create files:
profile_practice.pyandtest_code.py -
Write a program that:
- Uses cProfile to profile code
- Identifies bottlenecks
- Analyzes profiling results
- Compares different implementations
- Demonstrates profiling techniques
-
Your program should include:
- Basic profiling
- Saving and loading profiles
- Analyzing results
- Identifying bottlenecks
- Comparing implementations
- Real-world examples
Example Solution:
"""
Profiling Practice
This program demonstrates profiling techniques.
"""
import cProfile
import pstats
import timeit
# Code to profile
def slow_sum(n):
"""Slow sum implementation."""
total = 0
for i in range(n):
total += i
return total
def fast_sum(n):
"""Fast sum implementation."""
return sum(range(n))
def process_data(data):
"""Process data with multiple operations."""
results = []
for item in data:
# Expensive operations
squared = item ** 2
cubed = item ** 3
results.append((squared, cubed))
return results
def nested_loops(n):
"""Function with nested loops."""
result = []
for i in range(n):
for j in range(n):
result.append(i * j)
return result
def recursive_function(n):
"""Recursive function."""
if n <= 1:
return 1
return n * recursive_function(n - 1)
print("=" * 60)
print("PROFILING PRACTICE")
print("=" * 60)
print()
# 1. Basic profiling
print("1. BASIC PROFILING")
print("-" * 60)
cProfile.run('slow_sum(1000000)')
print()
# 2. Save profile to file
print("2. SAVE PROFILE TO FILE")
print("-" * 60)
cProfile.run('slow_sum(1000000)', 'slow_sum.prof')
print("Profile saved to slow_sum.prof")
print()
# 3. Load and analyze profile
print("3. LOAD AND ANALYZE PROFILE")
print("-" * 60)
stats = pstats.Stats('slow_sum.prof')
stats.sort_stats('cumulative')
stats.print_stats(5)
print()
# 4. Sort by different criteria
print("4. SORT BY DIFFERENT CRITERIA")
print("-" * 60)
stats = pstats.Stats('slow_sum.prof')
print("Sorted by cumulative time:")
stats.sort_stats('cumulative')
stats.print_stats(3)
print()
print("Sorted by total time:")
stats.sort_stats('tottime')
stats.print_stats(3)
print()
print("Sorted by call count:")
stats.sort_stats('calls')
stats.print_stats(3)
print()
# 5. Profile data processing
print("5. PROFILE DATA PROCESSING")
print("-" * 60)
data = list(range(1000))
cProfile.run('process_data(data)', 'process.prof')
stats = pstats.Stats('process.prof')
stats.sort_stats('cumulative')
stats.print_stats(10)
print()
# 6. Compare implementations
print("6. COMPARE IMPLEMENTATIONS")
print("-" * 60)
cProfile.run('slow_sum(100000)', 'slow.prof')
cProfile.run('fast_sum(100000)', 'fast.prof')
stats_slow = pstats.Stats('slow.prof')
stats_fast = pstats.Stats('fast.prof')
print("Slow implementation:")
stats_slow.sort_stats('cumulative')
stats_slow.print_stats(3)
print()
print("Fast implementation:")
stats_fast.sort_stats('cumulative')
stats_fast.print_stats(3)
print()
# 7. Profile nested loops
print("7. PROFILE NESTED LOOPS")
print("-" * 60)
cProfile.run('nested_loops(100)', 'nested.prof')
stats = pstats.Stats('nested.prof')
stats.sort_stats('cumulative')
stats.print_stats(5)
print()
# 8. Profile recursive function
print("8. PROFILE RECURSIVE FUNCTION")
print("-" * 60)
cProfile.run('recursive_function(20)', 'recursive.prof')
stats = pstats.Stats('recursive.prof')
stats.sort_stats('cumulative')
stats.print_stats(10)
print()
# 9. Using timeit for micro-benchmarks
print("9. USING timeit FOR MICRO-BENCHMARKS")
print("-" * 60)
time1 = timeit.timeit('sum(range(1000))', number=10000)
time2 = timeit.timeit(
'total = 0\nfor i in range(1000):\n total += i',
number=10000
)
print(f"Built-in sum: {time1:.6f} seconds")
print(f"For loop: {time2:.6f} seconds")
print()
# 10. Profile with context manager
print("10. PROFILE WITH CONTEXT MANAGER")
print("-" * 60)
profiler = cProfile.Profile()
profiler.enable()
result = slow_sum(100000)
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(5)
print()
# 11. Call graph analysis
print("11. CALL GRAPH ANALYSIS")
print("-" * 60)
stats = pstats.Stats('process.prof')
print("Callers (who calls each function):")
stats.print_callers(5)
print()
# 12. Filter by module
print("12. FILTER BY MODULE")
print("-" * 60)
stats = pstats.Stats('slow_sum.prof')
print("Functions in current module:")
stats.print_stats('__main__')
print()
# 13. Identify bottlenecks
print("13. IDENTIFY BOTTLENECKS")
print("-" * 60)
stats = pstats.Stats('nested.prof')
stats.sort_stats('cumulative')
print("Top time consumers:")
stats.print_stats(0.1) # Top 10%
print()
# 14. Profile decorator
print("14. PROFILE DECORATOR")
print("-" * 60)
from functools import wraps
def profile_decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
profiler = cProfile.Profile()
profiler.enable()
result = func(*args, **kwargs)
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(5)
return result
return wrapper
@profile_decorator
def test_function():
return slow_sum(100000)
result = test_function()
print()
# 15. Real-world: Profile complete workflow
print("15. REAL-WORLD: PROFILE COMPLETE WORKFLOW")
print("-" * 60)
def complete_workflow():
data = list(range(1000))
processed = process_data(data)
total = sum(x[0] for x in processed)
return total
cProfile.run('complete_workflow()', 'workflow.prof')
stats = pstats.Stats('workflow.prof')
stats.sort_stats('cumulative')
stats.print_stats(10)
print()
print("=" * 60)
print("PRACTICE COMPLETE!")
print("=" * 60)
Expected Output (truncated):
============================================================
PROFILING PRACTICE
============================================================
1. BASIC PROFILING
------------------------------------------------------------
1000003 function calls in 0.123 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.123 0.123 <string>:1(<module>)
1 0.123 0.123 0.123 0.123 profile_practice.py:8(slow_sum)
1000000 0.000 0.000 0.000 0.000 {built-in method range}
[... rest of output ...]
Challenge (Optional):
- Profile a real application
- Identify and fix performance bottlenecks
- Compare multiple optimization strategies
- Create a profiling report generator
- Build a performance monitoring system
Key Takeaways
- Profiling - measuring where code spends time
- cProfile module - Python's profiling tool
- Profile output - ncalls, tottime, cumtime, etc.
- Identifying bottlenecks - high time, high call count
- pstats module - analyze profile results
- Sorting results - by time, calls, name
- Saving profiles - save to file for later analysis
- Comparing implementations - profile before/after
- timeit module - for micro-benchmarks
- Call graph - understand call hierarchy
- Best practices - profile realistic data, compare results
- Common bottlenecks - loops, I/O, string ops, recursion
- Interpretation - understand what metrics mean
- Optimization focus - optimize what matters
- Data-driven - make decisions based on profiling data
Quiz: Profiling
Test your understanding with these questions:
-
What is profiling?
- A) Writing code
- B) Measuring where code spends time
- C) Debugging code
- D) Testing code
-
What module provides profiling in Python?
- A) profile
- B) cProfile
- C) time
- D) debug
-
What does tottime mean?
- A) Total time including subcalls
- B) Time in function excluding subcalls
- C) Time per call
- D) Cumulative time
-
What does cumtime mean?
- A) Total time including subcalls
- B) Time in function excluding subcalls
- C) Time per call
- D) Call count
-
What should you optimize?
- A) Everything
- B) Functions with high cumulative time
- C) Nothing
- D) Only slow functions
-
What is ncalls?
- A) Number of calls
- B) Time per call
- C) Total time
- D) Function name
-
How do you save profile results?
- A) cProfile.save()
- B) cProfile.run('code', 'filename.prof')
- C) cProfile.write()
- D) Can't save
-
What module analyzes profile results?
- A) analyze
- B) pstats
- C) stats
- D) profile
-
What is timeit used for?
- A) Profiling
- B) Micro-benchmarks
- C) Timing
- D) All of the above
-
What should you profile?
- A) Everything
- B) Hot paths and bottlenecks
- C) Nothing
- D) Only slow code
Answers:
- B) Measuring where code spends time (profiling definition)
- B) cProfile (Python's profiling module)
- B) Time in function excluding subcalls (tottime definition)
- A) Total time including subcalls (cumtime definition)
- B) Functions with high cumulative time (what to optimize)
- A) Number of calls (ncalls definition)
- B) cProfile.run('code', 'filename.prof') (saving profiles)
- B) pstats (profile analysis module)
- B) Micro-benchmarks (timeit purpose)
- B) Hot paths and bottlenecks (what to profile)
Next Steps
Excellent work! You've mastered profiling. You now understand:
- The cProfile module
- How to identify bottlenecks
- How to analyze profiling results
- Profiling techniques
What's Next?
- Lesson 18.2: Optimization Techniques
- Learn algorithm optimization
- Understand caching
- Explore optimization strategies
Additional Resources
- cProfile: docs.python.org/3/library/profile.html
- pstats: docs.python.org/3/library/profile.html#module-pstats
- timeit: docs.python.org/3/library/timeit.html
Lesson completed! You're ready to move on to the next lesson.