Code Review

Learning Objectives

By the end of this lesson, you will be able to:
- Understand the code review process
- Identify common code issues
- Provide constructive feedback
- Review code for quality, security, and performance
- Check for PEP 8 compliance
- Review documentation and tests
- Apply code review best practices
- Use code review tools
- Conduct effective code reviews
- Respond to code review feedback
- Improve code through reviews
- Debug code review issues

Lesson 24.3: Code Review

Learning Objectives

By the end of this lesson, you will be able to:

Understand the code review process
Identify common code issues
Provide constructive feedback
Review code for quality, security, and performance
Check for PEP 8 compliance
Review documentation and tests
Apply code review best practices
Use code review tools
Conduct effective code reviews
Respond to code review feedback
Improve code through reviews
Debug code review issues

Introduction to Code Review

Code Review is the systematic examination of code by one or more developers to identify bugs, improve code quality, and share knowledge.

Benefits:

Quality: Catch bugs before production
Knowledge Sharing: Team members learn from each other
Consistency: Maintain coding standards
Security: Identify security vulnerabilities
Performance: Find performance issues
Best Practices: Ensure code follows best practices

Types of Code Review:

Formal: Structured review with checklist
Informal: Quick review for small changes
Pair Programming: Real-time review
Automated: Using tools and linters

Review Process

Code Review Workflow

# Typical workflow:
# 1. Developer creates pull request (PR)
# 2. Automated checks run (tests, linters)
# 3. Reviewers assigned
# 4. Reviewers examine code
# 5. Feedback provided
# 6. Developer addresses feedback
# 7. Re-review if needed
# 8. Approval and merge

What to Review

Functionality:

Does the code work correctly?
Are edge cases handled?
Are error cases handled?
Does it meet requirements?

Code Quality:

Is code readable and maintainable?
Does it follow PEP 8?
Are there code smells?
Is code well-organized?

Performance:

Are there performance issues?
Are algorithms efficient?
Are there unnecessary operations?
Is memory usage reasonable?

Security:

Are there security vulnerabilities?
Is input validated?
Are secrets handled properly?
Is SQL injection prevented?

Testing:

Are there tests?
Do tests cover edge cases?
Are tests maintainable?
Is test coverage adequate?

Documentation:

Is code documented?
Are docstrings present?
Are comments helpful?
Is README updated?

Common Issues

Code Quality Issues

# Issue 1: Magic numbers
# BAD
if user.age > 65:
    discount = 0.15

# GOOD
SENIOR_CITIZEN_AGE = 65
SENIOR_DISCOUNT = 0.15
if user.age > SENIOR_CITIZEN_AGE:
    discount = SENIOR_DISCOUNT

# Issue 2: Long functions
# BAD: Function does too much
def process_order(order):
    # Validate order
    # Calculate total
    # Apply discounts
    # Process payment
    # Send confirmation
    # Update inventory
    # ... 100 lines of code
    pass

# GOOD: Break into smaller functions
def process_order(order):
    validate_order(order)
    total = calculate_total(order)
    apply_discounts(order, total)
    process_payment(order, total)
    send_confirmation(order)
    update_inventory(order)

# Issue 3: Duplicate code
# BAD: Code duplication
def calculate_user_total(user):
    total = 0
    for item in user.items:
        total += item.price * item.quantity
    return total

def calculate_order_total(order):
    total = 0
    for item in order.items:
        total += item.price * item.quantity
    return total

# GOOD: Extract common logic
def calculate_total(items):
    return sum(item.price * item.quantity for item in items)

def calculate_user_total(user):
    return calculate_total(user.items)

def calculate_order_total(order):
    return calculate_total(order.items)

Security Issues

# Issue 1: SQL Injection
# BAD: Vulnerable to SQL injection
def get_user(username):
    query = f"SELECT * FROM users WHERE username = '{username}'"
    # Vulnerable!

# GOOD: Use parameterized queries
def get_user(username):
    query = "SELECT * FROM users WHERE username = ?"
    cursor.execute(query, (username,))

# Issue 2: Hardcoded secrets
# BAD: Secrets in code
API_KEY = "sk-1234567890abcdef"
PASSWORD = "admin123"

# GOOD: Use environment variables
import os
API_KEY = os.environ.get('API_KEY')
PASSWORD = os.environ.get('PASSWORD')

# Issue 3: No input validation
# BAD: No validation
def process_payment(amount):
    charge_card(amount)  # What if amount is negative?

# GOOD: Validate input
def process_payment(amount):
    if amount <= 0:
        raise ValueError('Amount must be positive')
    if amount > MAX_PAYMENT:
        raise ValueError(f'Amount exceeds maximum: {MAX_PAYMENT}')
    charge_card(amount)

Performance Issues

# Issue 1: Inefficient algorithms
# BAD: O(n²) complexity
def find_duplicates(items):
    duplicates = []
    for i in range(len(items)):
        for j in range(i + 1, len(items)):
            if items[i] == items[j]:
                duplicates.append(items[i])
    return duplicates

# GOOD: O(n) complexity
def find_duplicates(items):
    seen = set()
    duplicates = []
    for item in items:
        if item in seen:
            duplicates.append(item)
        seen.add(item)
    return duplicates

# Issue 2: Unnecessary operations
# BAD: Multiple database queries in loop
def get_user_posts(user_ids):
    posts = []
    for user_id in user_ids:
        user = get_user(user_id)  # N queries!
        user_posts = get_posts(user_id)  # N queries!
        posts.extend(user_posts)
    return posts

# GOOD: Batch queries
def get_user_posts(user_ids):
    users = get_users(user_ids)  # 1 query
    posts = get_posts_by_users(user_ids)  # 1 query
    return posts

# Issue 3: Memory inefficiency
# BAD: Loading all data into memory
def process_large_file(filename):
    with open(filename) as f:
        all_lines = f.readlines()  # Loads entire file!
    for line in all_lines:
        process(line)

# GOOD: Process line by line
def process_large_file(filename):
    with open(filename) as f:
        for line in f:
            process(line)

Error Handling Issues

# Issue 1: Silent failures
# BAD: Errors are ignored
def process_data(data):
    try:
        result = complex_operation(data)
    except:
        pass  # Error silently ignored!

# GOOD: Proper error handling
def process_data(data):
    try:
        result = complex_operation(data)
        return result
    except ValueError as e:
        logger.error(f'Invalid data: {e}')
        raise
    except Exception as e:
        logger.error(f'Unexpected error: {e}')
        raise

# Issue 2: Too broad exception handling
# BAD: Catching all exceptions
try:
    risky_operation()
except Exception:  # Too broad!
    handle_error()

# GOOD: Catch specific exceptions
try:
    risky_operation()
except ValueError:
    handle_value_error()
except ConnectionError:
    handle_connection_error()
except Exception as e:
    logger.error(f'Unexpected error: {e}')
    raise

# Issue 3: No error handling
# BAD: No error handling
def divide(a, b):
    return a / b  # What if b is 0?

# GOOD: Handle errors
def divide(a, b):
    if b == 0:
        raise ValueError('Cannot divide by zero')
    return a / b

Documentation Issues

# Issue 1: Missing docstrings
# BAD: No documentation
def calculate(x, y):
    return x * y + 10

# GOOD: Documented
def calculate(x, y):
    """Calculate value using formula.

    Args:
        x: First value
        y: Second value

    Returns:
        Calculated result: x * y + 10
    """
    return x * y + 10

# Issue 2: Outdated comments
# BAD: Comment doesn't match code
# This function calculates the sum
def calculate(x, y):
    return x * y  # Actually multiplies!

# GOOD: Accurate comments
# This function multiplies two numbers
def calculate(x, y):
    return x * y

# Issue 3: No type hints
# BAD: No type information
def process(data, count):
    # What types are data and count?
    pass

# GOOD: Type hints
def process(data: List[Dict], count: int) -> bool:
    """Process data."""
    pass

Best Practices

For Reviewers

Be Constructive:

# BAD: Negative feedback
# This code is terrible!

# GOOD: Constructive feedback
# Consider extracting this logic into a separate function
# for better reusability and testability.

Be Specific:

# BAD: Vague feedback
# This could be better

# GOOD: Specific feedback
# This function is 50 lines long. Consider breaking it into
# smaller functions: validate_input(), process_data(), format_output()

Ask Questions:

# GOOD: Ask clarifying questions
# Why was this approach chosen over using the existing utility function?
# Is there a specific reason for this implementation?

Suggest Improvements:

# GOOD: Provide suggestions
# Consider using a dictionary lookup instead of multiple if-else statements:
# STATUS_MAP = {'pending': 1, 'active': 2, 'completed': 3}
# status_code = STATUS_MAP.get(status, 0)

Check the Big Picture:

Does this fit with the overall architecture?
Are there similar patterns elsewhere in the codebase?
Does this introduce technical debt?

For Authors

Keep PRs Small:

Easier to review
Faster feedback
Less context switching

Provide Context:

Clear PR description
Link to related issues
Explain design decisions

Respond to Feedback:

Acknowledge all comments
Ask for clarification if needed
Make requested changes or explain why not

Test Your Code:

Write tests
Test edge cases
Verify manually

Code Review Checklist

Functionality

[ ] Code works as intended
[ ] Edge cases are handled
[ ] Error cases are handled
[ ] Requirements are met
[ ] No breaking changes (if applicable)

Code Quality

[ ] Follows PEP 8
[ ] Code is readable
[ ] No code duplication
[ ] Functions are focused
[ ] Proper naming conventions

Security

[ ] No hardcoded secrets
[ ] Input validation present
[ ] SQL injection prevented
[ ] XSS prevention (if web)
[ ] Authentication/authorization correct

Performance

[ ] No obvious performance issues
[ ] Efficient algorithms used
[ ] No unnecessary operations
[ ] Memory usage reasonable

Testing

[ ] Tests are present
[ ] Tests cover edge cases
[ ] Tests are maintainable
[ ] Test coverage adequate

Documentation

[ ] Docstrings present
[ ] Comments are helpful
[ ] README updated (if needed)
[ ] Type hints used (if applicable)

Code Review Tools

Linters

# flake8 - Style guide enforcement
pip install flake8
flake8 myfile.py

# pylint - Comprehensive linter
pip install pylint
pylint myfile.py

# black - Code formatter
pip install black
black myfile.py

# mypy - Type checker
pip install mypy
mypy myfile.py

Automated Review

# Pre-commit hooks
# .pre-commit-config.yaml
repos:
  - repo: https://github.com/psf/black
    rev: 22.3.0
    hooks:
      - id: black
  - repo: https://github.com/pycqa/flake8
    rev: 4.0.1
    hooks:
      - id: flake8

Practical Examples

Example 1: Reviewing a Function

# CODE TO REVIEW:
def process_users(users):
    result = []
    for u in users:
        if u.age > 18:
            if u.email:
                if '@' in u.email:
                    result.append(u.email.upper())
    return result

# REVIEW FEEDBACK:
"""
Issues found:
1. Function name is vague - what does "process" mean?
2. Nested if statements make code hard to read
3. Email validation is incomplete (just checks for '@')
4. No error handling if users is None or empty
5. No docstring
6. No type hints
7. Magic number (18) should be a constant

Suggested improvements:
- Rename function to be more specific
- Extract email validation to separate function
- Use early returns or list comprehension
- Add proper email validation
- Add docstring and type hints
- Define constant for minimum age
"""

# IMPROVED CODE:
from typing import List, Optional
import re

MIN_AGE = 18

def get_valid_user_emails(users: List[dict]) -> List[str]:
    """Get uppercase emails for users over minimum age.

    Args:
        users: List of user dictionaries with 'age' and 'email' keys

    Returns:
        List of uppercase email addresses for valid users

    Example:
        >>> users = [{'age': 25, 'email': 'user@example.com'}]
        >>> get_valid_user_emails(users)
        ['USER@EXAMPLE.COM']
    """
    if not users:
        return []

    valid_emails = []
    for user in users:
        if user.get('age', 0) > MIN_AGE:
            email = user.get('email')
            if email and is_valid_email(email):
                valid_emails.append(email.upper())

    return valid_emails

def is_valid_email(email: str) -> bool:
    """Validate email address format.

    Args:
        email: Email address to validate

    Returns:
        True if email is valid, False otherwise
    """
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

Example 2: Reviewing a Class

# CODE TO REVIEW:
class DataProcessor:
    def __init__(self):
        self.data = []

    def add(self, x):
        self.data.append(x)

    def process(self):
        total = 0
        for x in self.data:
            total = total + x
        return total / len(self.data)

# REVIEW FEEDBACK:
"""
Issues found:
1. No error handling for division by zero
2. No type hints
3. No docstrings
4. Variable name 'x' is not descriptive
5. Could use sum() instead of manual loop
6. No validation of input data

Suggested improvements:
- Add error handling
- Add type hints and docstrings
- Use more descriptive variable names
- Use built-in functions where possible
- Add input validation
"""

# IMPROVED CODE:
from typing import List
from statistics import mean

class DataProcessor:
    """Processes numeric data and calculates statistics."""

    def __init__(self):
        """Initialize DataProcessor with empty data list."""
        self.data: List[float] = []

    def add(self, value: float) -> None:
        """Add a value to the data set.

        Args:
            value: Numeric value to add
        """
        if not isinstance(value, (int, float)):
            raise TypeError('Value must be numeric')
        self.data.append(float(value))

    def process(self) -> float:
        """Calculate mean of all values.

        Returns:
            Mean of all values in data set

        Raises:
            ValueError: If data set is empty
        """
        if not self.data:
            raise ValueError('Cannot process empty data set')

        return mean(self.data)

Practice Exercise

Exercise: Code Review

Objective: Review provided code and identify issues.

Instructions:

Review the provided code
Identify issues (quality, security, performance, documentation)
Provide constructive feedback
Suggest improvements
Refactor the code

Example Solution:

"""
Code Review Exercise
Review and improve this code
"""

# ORIGINAL CODE (with issues):
def process_orders(orders):
    results = []
    for o in orders:
        total = 0
        for item in o.items:
            total = total + item.price * item.quantity
        if total > 100:
            total = total * 0.9
        results.append({'order_id': o.id, 'total': total})
    return results

# REVIEW FEEDBACK:
"""
Issues identified:
1. No type hints
2. No docstring
3. Magic number (100, 0.9) should be constants
4. Variable names are too short (o, item)
5. Manual sum calculation instead of using sum()
6. No error handling
7. No input validation
8. Hardcoded discount logic
9. Could use list comprehension or generator
10. No handling of edge cases (empty orders, None values)

Improvements:
- Add type hints and docstrings
- Extract constants
- Use descriptive variable names
- Use built-in functions
- Add error handling
- Extract discount logic
- Add input validation
"""

# IMPROVED CODE:
from typing import List, Dict, Optional
from dataclasses import dataclass

# Constants
DISCOUNT_THRESHOLD = 100.0
DISCOUNT_RATE = 0.9

@dataclass
class Item:
    """Represents an order item."""
    price: float
    quantity: int

@dataclass
class Order:
    """Represents an order."""
    id: int
    items: List[Item]

def calculate_item_total(item: Item) -> float:
    """Calculate total for a single item.

    Args:
        item: Order item with price and quantity

    Returns:
        Total price for the item
    """
    return item.price * item.quantity

def calculate_order_total(order: Order) -> float:
    """Calculate total for an order.

    Args:
        order: Order with items

    Returns:
        Total price for the order
    """
    if not order.items:
        return 0.0

    subtotal = sum(calculate_item_total(item) for item in order.items)
    return subtotal

def apply_discount(total: float) -> float:
    """Apply discount if threshold is met.

    Args:
        total: Order total

    Returns:
        Total with discount applied if applicable
    """
    if total > DISCOUNT_THRESHOLD:
        return total * DISCOUNT_RATE
    return total

def process_orders(orders: List[Order]) -> List[Dict[str, float]]:
    """Process orders and calculate totals with discounts.

    Calculates the total for each order and applies a discount
    if the total exceeds the threshold.

    Args:
        orders: List of Order objects to process

    Returns:
        List of dictionaries with 'order_id' and 'total' keys

    Raises:
        ValueError: If orders is None or contains invalid data

    Example:
        >>> items = [Item(price=50.0, quantity=2)]
        >>> order = Order(id=1, items=items)
        >>> results = process_orders([order])
        >>> results[0]['total']
        90.0
    """
    if orders is None:
        raise ValueError('Orders cannot be None')

    if not orders:
        return []

    results = []
    for order in orders:
        if order is None:
            continue

        try:
            total = calculate_order_total(order)
            discounted_total = apply_discount(total)

            results.append({
                'order_id': order.id,
                'total': round(discounted_total, 2)
            })
        except (AttributeError, TypeError) as e:
            # Log error and skip invalid order
            print(f'Error processing order {order.id if order else "unknown"}: {e}')
            continue

    return results

# Additional improvements: Add tests
def test_process_orders():
    """Test process_orders function."""
    # Test with empty list
    assert process_orders([]) == []

    # Test with single order
    items = [Item(price=50.0, quantity=2)]
    order = Order(id=1, items=items)
    results = process_orders([order])
    assert results[0]['total'] == 90.0  # 100 * 0.9

    # Test with order below threshold
    items = [Item(price=10.0, quantity=5)]
    order = Order(id=2, items=items)
    results = process_orders([order])
    assert results[0]['total'] == 50.0  # No discount

    print('All tests passed!')

if __name__ == '__main__':
    test_process_orders()

Expected Output: Improved code with all issues addressed.

Challenge (Optional):

Review a larger codebase
Create a code review checklist
Set up automated code review tools
Practice pair programming
Review open-source projects

Key Takeaways

Code Review - Systematic code examination
Review Process - Structured workflow
Common Issues - Quality, security, performance, errors, documentation
Constructive Feedback - Be helpful and specific
Checklist - Use checklist for thorough review
Tools - Use linters and automated tools
Best Practices - For reviewers and authors
Security - Check for vulnerabilities
Performance - Identify performance issues
Documentation - Ensure code is documented
Testing - Verify tests are present and adequate
Code Quality - Check readability and maintainability
Consistency - Ensure code follows standards
Learning - Code review is a learning opportunity
Collaboration - Improves team collaboration

Quiz: Code Review

Test your understanding with these questions:

What is code review?
- A) Testing code
- B) Systematic code examination
- C) Writing code
- D) Debugging code
What should reviewers be?
- A) Critical
- B) Constructive
- C) Negative
- D) Silent
What is a code smell?
- A) Bad code pattern
- B) Code that works
- C) Commented code
- D) Test code
What should be reviewed?
- A) Functionality
- B) Code quality
- C) Security
- D) All of the above
What prevents SQL injection?
- A) Input validation
- B) Parameterized queries
- C) Escaping
- D) All of the above
What is a magic number?
- A) Hardcoded numeric value
- B) Random number
- C) Large number
- D) Prime number
What checks code style?
- A) Linter
- B) Formatter
- C) Type checker
- D) All of the above
What should PRs be?
- A) Large
- B) Small
- C) Complex
- D) Simple
What should comments explain?
- A) What code does
- B) Why code does it
- C) How code works
- D) All of the above
What improves through code review?
- A) Code quality
- B) Team knowledge
- C) Code consistency
- D) All of the above

Answers:

B) Systematic code examination (code review definition)
B) Constructive (reviewer approach)
A) Bad code pattern (code smell)
D) All of the above (review scope)
D) All of the above (SQL injection prevention)
A) Hardcoded numeric value (magic number)
D) All of the above (code checking tools)
B) Small (PR size)
B) Why code does it (comment best practice)
D) All of the above (code review benefits)

Next Steps

Excellent work! You've mastered code review. You now understand:

Review process
Common issues
Best practices
How to conduct effective code reviews

What's Next?

Module 25: Design Patterns
Lesson 25.1: Creational Patterns
Learn design patterns
Apply patterns to code

Additional Resources

Code Review Best Practices: Industry best practices
Google Code Review Guide: google.github.io/eng-practices/review/
Code Review Checklist: Create your own checklist
Linting Tools: flake8, pylint, black documentation

Lesson completed! You're ready to move on to the next module.

Previous: Code Documentation Next: Creational Patterns

Course Navigation

Code Quality

Design Patterns