When benchmarking Python programs, it is very common for me to use memory_profiler
from the command line - e.g. mprof run python myscript.py
. This creates a .dat file in the current working directory which you can view with mprof show
. More often than not, though I want to compare two different runs for their memory profiles or do things like annotate the graphs with different timing benchmarks. This requires generating my own figures, which requires loading the memory profiler data myself.
I’m sure that the memory_profiler
library probably has some utility functions to do this, but the simplest for me is to load things into a Pandas series. The mprof
command keeps track of real timestamps, so in order to do comparisons, I have to reindex the time series based on the starting timestamp reference. The code snippet is as follows:
import pandas as pd
def load_mprofile(path, name=None):
ref = None
times, values = [], []
with open(path, 'r') as f:
for line in f:
if line.startswith("CMDLINE"):
if name is None:
name = line.rstrip("CMDLINE").strip()
if line.startswith("MEM"):
parts = line.split()
val, ts = float(parts[1]), float(parts[2])
if ref is None:
ref = ts
times.append(ts-ref)
values.append(val)
return pd.Series(values, index=times, name=name)
Using this loader, I can compare two memory profiling sequences as follows:
import os
import matplotlib.pyplot as plt
def plot_mprofiles(directory=".", ax=None):
if ax is None:
_, ax = plt.subplots(figsize=(9,6))
for name in os.listdir(directory):
s = load_mprofile(os.path.join(directory, name))
s.plot(ax=ax)
ax.legend()
return ax
Pretty straight forward, but a useful snippet to be able to lookup at a glance.