I’m getting started on some projects that will make use of extensive Python performance profiling, unfortunately Python doesn’t focus on performance and so doesn’t have benchmark tools like I might find in Go. I’ve noticed that the two most important usages I’m looking at when profiling are speed and memory usage. For the latter, I simply use memory_profiler from the command line - which is pretty straight forward. However for speed usage, I did find a snippet that I thought would be useful to include and update depending on how my usage changes.

import cProfile

from pstats import Stats
from functools import wraps


def sprofile(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        pr = cProfile.Profile()
        pr.enable()

        result = func(*args, **kwargs)

        pr.disable()
        Stats(pr).strip_dirs().sort_stats('cumulative').print_stats(20)
        return result

    return wrapper

This decorator allows you to profile the speed performance of functions on the stack below the function being decorated. It uses standard library dependencies, which is great, and you can change the way the stats are printed out to suit your needs (e.g. this is formatted well for my analysis style).

The report it prints out is as follows:

        7636523 function calls (7636479 primitive calls) in 14.669 seconds

   Ordered by: cumulative time
   List reduced from 306 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   14.669   14.669 sequential.py:107(run)
      150    2.584    0.017   14.633    0.098 sequential.py:75(step)
   843750   10.228    0.000   10.335    0.000 grid.py:72(neighborhood_sum)
   843750    0.765    0.000    0.988    0.000 grid.py:129(__setitem__)
   843750    0.529    0.000    0.726    0.000 grid.py:124(__getitem__)
  2531454    0.275    0.000    0.275    0.000 {built-in method builtins.isinstance}
  1687783    0.145    0.000    0.145    0.000 {built-in method builtins.len}
   843750    0.107    0.000    0.107    0.000 grid.py:57(adjacency)
      151    0.001    0.000    0.020    0.000 std.py:1099(__iter__)
       82    0.000    0.000    0.019    0.000 std.py:1317(refresh)
       83    0.000    0.000    0.017    0.000 std.py:1447(display)
       83    0.000    0.000    0.015    0.000 std.py:1089(__repr__)
      9/4    0.000    0.000    0.014    0.004 <frozen importlib._bootstrap>:978(_find_and_load)
      9/4    0.000    0.000    0.014    0.004 <frozen importlib._bootstrap>:948(_find_and_load_unlocked)
       83    0.002    0.000    0.014    0.000 std.py:310(format_meter)
      9/4    0.000    0.000    0.013    0.003 <frozen importlib._bootstrap>:663(_load_unlocked)
     17/6    0.000    0.000    0.011    0.002 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
        1    0.000    0.000    0.010    0.010 std.py:511(__new__)
        1    0.000    0.000    0.009    0.009 std.py:623(get_lock)
        1    0.000    0.000    0.009    0.009 std.py:79(__init__)

You can see in this report that the majority of time is being spent in the neighborhood_sum function from line 3 and that the step function calls it nearly 5,625 times!