Programmer

Creating a Microservice in Go

Yesterday I built my first microservice (a RESTful API) using Go, and I wanted to collect a few of my thoughts on the experience here before I forgot them. The project, Scribo, is intended to aid in my research by collecting data about a specific network that I’m looking to build distributed systems for. I do have something running, which will need to evolve a lot, and it could be helpful to know where it started. ...

Scikit-Learn Data Management: Bunches

One large issue that I encounter in development with machine learning is the need to structure our data on disk in a way that we can load into Scikit-Learn in a repeatable fashion for continued analysis. My proposal is to use the sklearn.datasets.base.Bunch object to load the data into data and target attributes respectively, similar to how Scikit-Learn’s toy datasets are structured. Using this object to manage our data will mirror the native API and allow us to easily copy and paste code that demonstrates classifiers and techniques with the built in datasets. Importantly, this API will also allow us to communicate to other developers and our future-selves exactly how to use the data. ...

Simple Password Generation

I was talking with @looselycoupled the other day about how we generate passwords for use on websites. We both agree that every single domain should have its own password (to prevent one crack ruling all your Internets). However, we’ve both evolved on the method over time, and I’ve written a simple script that allows me to generate passwords using methodologies discussed in this post. In particular I use the generator to create passwords for pwSafe, the tool I currently use for password management (due to its use of the open source database format created by Bruce Schneier). It is my hope that this script can be embedded directly into pwSafe, or at least allow me to write directly to the database; but for now I just copy and paste with the pbcopy utility. ...

The Bengfort Toolkit

Programming life has finally caused me to give into something that I’ve resisted for a while: the creation of a Bengfort Toolkit and specifically a benlib. This post is mostly a reminder that this toolkit now exists and that I spent valuable time creating it against my better judgement. And as a result, I should probably use it and update it. I’ve already written (whoops, I almost said “you’ve already read” but I know no one reads this) posts about tools that I use frequently including [clock.py]({% post_url 2016-01-12-codetime-and-clock %}) and [requires]({% post_url 2016-01-21-freezing-requirements %}). These things have been simply Python scripts that I’ve put in ~/bin, which is part of my $PATH. These are too small or simple to require full blown repositories and PyPI listings on their own merit. Plus, I honestly believe that I’m the only one that uses them. ...

Anonymizing User Profile Data with Faker

This post is an early draft of expanded work that will eventually appear on the District Data Labs Blog. Your feedback is welcome, and you can submit your comments on the draft GitHub issue. In order to learn (or teach) data science you need data (surprise!). The best libraries often come with a toy dataset to show examples and how the code works. However, nothing can replace an actual, non-trivial dataset for a tutorial or lesson because it provides for deep and meaningful further exploration. Non-trivial datasets can provide surprise and intuition in a way that toy datasets just cannot. Unfortunately, non-trivial datasets can be hard to find for a few reasons, but one common reason is that the dataset contains personally identifying information (PII). ...

Running on Schedule

Automation with Python is a lovely thing, particularly for very repetitive or long running tasks; but unfortunately someone still has to press the button to make it go. It feels like there should be an easy way to set up a program such that it runs routinely, in the background, without much human intervention. Daemonized services are the route to go in server land; but how do you routinely schedule a process to run on your local computer, which may or may not be turned off1? Moreover, long running daemon processes seem expensive when you just want a quick job to execute routinely. ...

Iterators and Generators

This post is an attempt to explain what iterators and generators are in Python, defend the yield statement, and reveal why a library like SimPy is possible. But first some terminology (that specifically targets my friends who Java). Iteration is a syntactic construct that implements a loop over an iterable object. The for statement provides iteration, the while statement may provide iteration. An iterable object is something that implements the iteration protocol (Java folks, read interface). A generator is a function that produces a sequence of results instead of a single value and is designed to make writing iterable objects easier. ...