Posts

A Benchmark of Grumpy Transpiling

On Tuesday evening I attended a Django District meetup on Grumpy, a transpiler from Python to Go. Because it was a Python meetup, the talk naturally focused on introducing Go to a Python audience, and because it was a Django meetup, we also focused on web services. The premise for Grumpy, as discussed in the announcing Google blog post, is also a web focused one — to take YouTube’s API that’s primarily written in Python and transpile it to Go to improve the overall performance and stability of YouTube’s front-end services. ...

Sanely gRPC Dial a Remote

In my systems I need to handle failure; so unlike in a typical client-server relationship, I’m prepared for the remote I’m dialing to not be available. Unfortunately when you do this with gRPC-Go there are a couple of annoyances you have to address. They are (in order of solutions): Verbose connection logging Background and back-off for reconnection attempts Errors are not returned on demand. There is no ability to keep track of statistics So first the logging. When you dial an unavailable remote as follows: ...

Contributing a Multiprocess Memory Profiler

In this post I wanted to catalog the process of an open source contribution I was a part of, which added a feature to the memory profiler Python library by Fabian Pedregosa and Philippe Gervais. It’s a quick story to tell but took over a year to complete, and I learned a lot from the process. I hope that the story is revealing, particularly to first time contributors and shows that even folks that have been doing this for a long time still have to find ways to positively approach collaboration in an open source environment. I also think it’s a fairly standard example of how contributions work in practice and perhaps this story will help us all think about how to better approach the pull request process. ...

Pseudo Merkle Tree

A Merkle tree is a data structure in which every non-leaf node is labeled with the hash of its child nodes. This makes them particular useful for comparing large data structures quickly and efficiently. Given trees a and b, if the root hash of either is different, it means that part of the tree below is different (if they are identical, they are probably also identical). You can then proceed in a a breadth first fashion, pruning nodes with identical hashes to directly identify the differences. ...

Using Select in Go

Ask a Go programmer what makes Go special and they will immediately say “concurrency is baked into the language”. Go’s concurrency model is one of communication (as opposed to locks) and so concurrency primitives are implemented using channels. In order to synchronize across multiple channels, go provides the select statement. A common pattern for me has become to use a select to manage broadcasted work (either in a publisher/subscriber model or a fanout model) by initializing go routines and passing them directional channels for synchronization and communication. In the example below, I create a buffered channel for output (so that the workers don’t block waiting for the receiver to collect data), a channel for errors (first error kills the program) and a timer to update the state of my process on a routine basis. The select waits for the first channel to receive a message and then continues processing. By keeping the select in a for loop, I can continually read of the channels until I’m done. ...

Benchmarking Secure gRPC

A natural question to ask after the previous post is “how much overhead does security add?” So I’ve benchmarked the three methods discussed; mutual TLS, server-side TLS, and no encryption. The results are below: Here are the numeric results for one of the runs: BenchmarkMutualTLS-8 200 9331850 ns/op BenchmarkServerTLS-8 300 5004505 ns/op BenchmarkInsecure-8 2000 1179252 ns/op PASS ok github.com/bbengfort/sping 7.364s Here is the code for the benchmarking for reference: ...

Secure gRPC with TLS/SSL

One of the primary requirements for the systems we build is something we call the “minimum security requirement”. Although our systems are not designed specifically for high security applications, they must use minimum standards of encryption and authentication. For example, it seems obvious to me that a web application that stores passwords or credit card information would encrypt their data on disk on a per-record basis with a salted hash. In the same way, a distributed system must be able to handle encrypted blobs, encrypt all inter-node communication, and authenticate and sign all messages. This adds some overhead to the system but the cost of overhead is far smaller than the cost of a breach, and if minimum security is the baseline then the overhead is just an accepted part of doing business. ...