Read mprofile Output into Pandas

When benchmarking Python programs, it is very common for me to use memory_profiler from the command line - e.g. mprof run python myscript.py. This creates a .dat file in the current working directory which you can view with mprof show. More often than not, though I want to compare two different runs for their memory profiles or do things like annotate the graphs with different timing benchmarks. This requires generating my own figures, which requires loading the memory profiler data myself. ...

July 27, 2020 · 2 min · 253 words · Benjamin Bengfort

Basic Python Profiling

I’m getting started on some projects that will make use of extensive Python performance profiling, unfortunately Python doesn’t focus on performance and so doesn’t have benchmark tools like I might find in Go. I’ve noticed that the two most important usages I’m looking at when profiling are speed and memory usage. For the latter, I simply use memory_profiler from the command line - which is pretty straight forward. However for speed usage, I did find a snippet that I thought would be useful to include and update depending on how my usage changes. ...

July 14, 2020 · 2 min · 370 words · Benjamin Bengfort

Mount an EBS volume

Once the EBS volume has been created and attached to the instance, ssh into the instance and list the available disks: $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 86.9M 1 loop /snap/core/4917 loop1 7:1 0 12.6M 1 loop /snap/amazon-ssm-agent/295 loop2 7:2 0 91M 1 loop /snap/core/6350 loop3 7:3 0 18M 1 loop /snap/amazon-ssm-agent/930 nvme0n1 259:0 0 300G 0 disk nvme1n1 259:1 0 8G 0 disk └─nvme1n1p1 259:2 0 8G 0 part / In the above case we want to attach nvme0n1 - a 300GB gp2 EBS volume. Check if the volume already has data in it (e.g. created from a snapshot or being attached to a new instance): ...

February 5, 2019 · 2 min · 323 words · Benjamin Bengfort

Blast Throughput

Blast throughput is what we call a throughput measurement such that N requests are simultaneously sent to the server and the duration to receive responses for all N requests is recorded. The throughput is computed as N/duration where duration is in seconds. This is the typical and potentially correct way to measure throughput from a client to a server, however issues do arise in distributed systems land: the requests must all originate from a single client high latency response outliers can skew results you must be confident that N is big enough to max out the server N mustn’t be so big as to create non-server related bottlenecks. In this post I’ll discuss my implementation of the blast workload as well as an issue that came up with many concurrent connections in gRPC. This led me down the path to use one connection to do blast throughput testing, which led to other issues, which I’ll discuss later. ...

September 26, 2018 · 3 min · 569 words · Benjamin Bengfort

Future Date Script

This is kind of a dumb post, but it’s something I’m sure I’ll look up in the future. I have a lot of emails where I have to send a date that’s sometime in the future, e.g. six weeks from the end of a class to specify a deadline … I’ve just been opening a Python terminal and importing datetime and timedelta but I figured this quick script on the command line would make my life a bit easier: ...

September 5, 2018 · 1 min · 104 words · Benjamin Bengfort

Aggregating Reads from a Go Channel

Here’s the scenario: we have a buffered channel that’s being read by a single Go routine and is written to by multiple go routines. For simplicity, we’ll say that the channel accepts events and that the other routines generate events of specific types, A, B, and C. If there are more of one type of event generator (or some producers are faster than others) we may end up in the situation where there are a series of the same events on the buffered channel. What we would like to do is read all of the same type of event that is on the buffered channel at once, handling them all simultaneously; e.g. aggregating the read of our events. ...

August 25, 2018 · 3 min · 482 words · Benjamin Bengfort

The Actor Model

Building correct concurrent programs in a distributed system with multiple threads and processes can quickly become very complex to reason about. For performance, we want each thread in a single process to operate as independently as possible; however anytime the shared state of the system is modified synchronization is required. Primitives like mutexes can [ensure structs are thread-safe]({% post_url 2017-02-21-synchronizing-structs %}), however in Go, the strong preference for synchronization is communication. In either case Go programs can quickly become locks upon locks or morasses of channels, incurring performance penalties at each synchronization point. ...

August 3, 2018 · 9 min · 1784 words · Benjamin Bengfort