In Process Cacheing

I have had some recent discussions regarding cacheing to improve application performance that I wanted to share. Most of the time those conversations go something like this: “have you heard of Redis?” I’m fascinated by the fact that an independent, distributed key-value store has won the market to this degree. However, as I’ve pointed out in these conversations, cacheing is a hierarchy (heck, even the processor has varying levels of cacheing). Especially when considering micro-service architectures that require extremely low latency responses, cacheing should be a critical part of the design, not just a bolt-on after thought! ...

May 17, 2017 · 5 min · 877 words · Benjamin Bengfort

Unique Values in Python: A Benchmark

An interesting question came up in the development of Yellowbrick: given a vector of values, what is the quickest way to get the unique values? Ok, so maybe this isn’t a terribly interesting question, however the results surprised us and may surprise you as well. First we’ll do a little background, then I’ll give the results and then discuss the benchmarking method. The problem comes up in Yellowbrick when we want to get the discrete values for a target vector, y — a problem that comes up in classification tasks. By getting the unique set of values we know the number of classes, as well as the class names. This information is necessary during visualization because it is vital in assigning colors to individual classes. Therefore in a Visualizer we might have a method as follows: ...

May 2, 2017 · 5 min · 963 words · Benjamin Bengfort

Measuring Throughput

Part of my research is taking me down a path where I want to measure the number of reads and writes from a client to a storage server. A key metric that we’re looking for is throughput — the number of accesses per second that a system supports. As I discovered in a very simple test to get some baseline metrics, even this simple metric can have some interesting complications. ...

April 28, 2017 · 5 min · 965 words · Benjamin Bengfort

OAuth Tokens on the Command Line

This week I discovered I had a problem with my Google Calendar — events accidentally got duplicated or deleted and I needed a way to verify that my primary calendar was correct. Rather than painstakingly go through the web interface and spot check every event, I instead wrote a Go console program using the Google Calendar API to retrieve events and save them in a CSV so I could inspect them all at once. This was great, and very easy using Google’s Go libraries for their APIs, and the quick start was very handy. ...

April 20, 2017 · 4 min · 719 words · Benjamin Bengfort

Gmail Notifications with Python

I routinely have long-running scripts (e.g. for a data processing task) that I want to know when they’re complete. It seems like it should be simple for me to add in a little snippet of code that will send an email using Gmail to notify me, right? Unfortunately, it isn’t quite that simple for a lot of reasons, including security, attachment handling, configuration, etc. In this snippet, I’ve attached my constant copy and paste notify() function, written into a command line script for easy sending on the command line. ...

April 17, 2017 · 2 min · 398 words · Benjamin Bengfort

A Benchmark of Grumpy Transpiling

On Tuesday evening I attended a Django District meetup on Grumpy, a transpiler from Python to Go. Because it was a Python meetup, the talk naturally focused on introducing Go to a Python audience, and because it was a Django meetup, we also focused on web services. The premise for Grumpy, as discussed in the announcing Google blog post, is also a web focused one — to take YouTube’s API that’s primarily written in Python and transpile it to Go to improve the overall performance and stability of YouTube’s front-end services. ...

March 23, 2017 · 7 min · 1489 words · Benjamin Bengfort

Sanely gRPC Dial a Remote

In my systems I need to handle failure; so unlike in a typical client-server relationship, I’m prepared for the remote I’m dialing to not be available. Unfortunately when you do this with gRPC-Go there are a couple of annoyances you have to address. They are (in order of solutions): Verbose connection logging Background and back-off for reconnection attempts Errors are not returned on demand. There is no ability to keep track of statistics So first the logging. When you dial an unavailable remote as follows: ...

March 21, 2017 · 3 min · 442 words · Benjamin Bengfort