Read mprofile Output into Pandas

When benchmarking Python programs, it is very common for me to use memory_profiler from the command line - e.g. mprof run python myscript.py. This creates a .dat file in the current working directory which you can view with mprof show. More often than not, though I want to compare two different runs for their memory profiles or do things like annotate the graphs with different timing benchmarks. This requires generating my own figures, which requires loading the memory profiler data myself. ...

July 27, 2020 · 2 min · 253 words · Benjamin Bengfort

Basic Python Profiling

I’m getting started on some projects that will make use of extensive Python performance profiling, unfortunately Python doesn’t focus on performance and so doesn’t have benchmark tools like I might find in Go. I’ve noticed that the two most important usages I’m looking at when profiling are speed and memory usage. For the latter, I simply use memory_profiler from the command line - which is pretty straight forward. However for speed usage, I did find a snippet that I thought would be useful to include and update depending on how my usage changes. ...

July 14, 2020 · 2 min · 370 words · Benjamin Bengfort

Launching a JupyterHub Instance

In this post I walk through the steps of creating a multi-user JupyterHub sever running on an AWS Ubuntu 18.04 instance. There are many ways of setting up JupyterHub including using Docker and Kubernetes - but this is a pretty staight forward mechanism that doesn’t have too many moving parts such as TLS termination proxies etc. I think of this as the baseline setup. Note that this setup has a few pros or cons depending on how you look at them. ...

October 9, 2019 · 8 min · 1644 words · Benjamin Bengfort

Mount an EBS volume

Once the EBS volume has been created and attached to the instance, ssh into the instance and list the available disks: $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 86.9M 1 loop /snap/core/4917 loop1 7:1 0 12.6M 1 loop /snap/amazon-ssm-agent/295 loop2 7:2 0 91M 1 loop /snap/core/6350 loop3 7:3 0 18M 1 loop /snap/amazon-ssm-agent/930 nvme0n1 259:0 0 300G 0 disk nvme1n1 259:1 0 8G 0 disk └─nvme1n1p1 259:2 0 8G 0 part / In the above case we want to attach nvme0n1 - a 300GB gp2 EBS volume. Check if the volume already has data in it (e.g. created from a snapshot or being attached to a new instance): ...

February 5, 2019 · 2 min · 323 words · Benjamin Bengfort

Visual Diagnostics for More Effective Machine Learning

Visual Diagnostics for More Effective Machine Learning Description Modeling is often treated as a search activity: find some combination of features, algorithm, and hyperparameters that yields the best score after cross-validation. In this talk, we will explore how to steer the model selection process with visual diagnostics and the Yellowbrick library, leading to more effective and more interpretable results and faster experimental workflows. ...

January 10, 2019 · 1 min · 63 words · Benjamin Bengfort

Blast Throughput

Blast throughput is what we call a throughput measurement such that N requests are simultaneously sent to the server and the duration to receive responses for all N requests is recorded. The throughput is computed as N/duration where duration is in seconds. This is the typical and potentially correct way to measure throughput from a client to a server, however issues do arise in distributed systems land: the requests must all originate from a single client high latency response outliers can skew results you must be confident that N is big enough to max out the server N mustn’t be so big as to create non-server related bottlenecks. In this post I’ll discuss my implementation of the blast workload as well as an issue that came up with many concurrent connections in gRPC. This led me down the path to use one connection to do blast throughput testing, which led to other issues, which I’ll discuss later. ...

September 26, 2018 · 3 min · 569 words · Benjamin Bengfort

Go Testing Notes

In this post I’m just going to maintain a list of notes for Go testing that I seem to commonly need to reference. It will also serve as an index for the posts related to testing that I have to commonly look up as well. Here is a quick listing of the table of contents: Basics Table Driven Tests Fixtures Golden Files Frameworks No Framework Ginkgo & Gomega Helpers Temporary Directories Sources and References Basics Just a quick reminder of how to write tests, benchmarks, and examples. A test is written as follows: ...

September 22, 2018 · 6 min · 1150 words · Benjamin Bengfort