Posts

Adding a Git Commit to Header Comments

You may have seen the following type of header at the top of my source code: # main # short description # # Author: Benjamin Bengfort <benjamin@bengfort.com> # Created: Tue Mar 08 14:07:24 2016 -0500 # # Copyright (C) 2016 Bengfort.com # For license information, see LICENSE.txt # # ID: main.py [] benjamin@bengfort.com $ All of this is pretty self explanatory with the exception of the final line. This final line is a throw back to Subversion actually, when you could add a $Id$ tag to your code, and Subversion would automatically populate it with something that looks like: ...

The Bengfort Toolkit

Programming life has finally caused me to give into something that I’ve resisted for a while: the creation of a Bengfort Toolkit and specifically a benlib. This post is mostly a reminder that this toolkit now exists and that I spent valuable time creating it against my better judgement. And as a result, I should probably use it and update it. I’ve already written (whoops, I almost said “you’ve already read” but I know no one reads this) posts about tools that I use frequently including [clock.py]({% post_url 2016-01-12-codetime-and-clock %}) and [requires]({% post_url 2016-01-21-freezing-requirements %}). These things have been simply Python scripts that I’ve put in ~/bin, which is part of my $PATH. These are too small or simple to require full blown repositories and PyPI listings on their own merit. Plus, I honestly believe that I’m the only one that uses them. ...

Anonymizing User Profile Data with Faker

This post is an early draft of expanded work that will eventually appear on the District Data Labs Blog. Your feedback is welcome, and you can submit your comments on the draft GitHub issue. In order to learn (or teach) data science you need data (surprise!). The best libraries often come with a toy dataset to show examples and how the code works. However, nothing can replace an actual, non-trivial dataset for a tutorial or lesson because it provides for deep and meaningful further exploration. Non-trivial datasets can provide surprise and intuition in a way that toy datasets just cannot. Unfortunately, non-trivial datasets can be hard to find for a few reasons, but one common reason is that the dataset contains personally identifying information (PII). ...

Implementing the Observer Pattern with an Event System

I was looking back through some old code (hoping to find a quick post before I got back to work) when I ran across a project I worked on called Mortar. Mortar was a simple daemon that ran in the background and watched a particular directory. When a file was added or removed from that directory, Mortar would notify other services or perform some other task (e.g. if it was integrated into a library). At the time, we used Mortar to keep an eye on FTP directories, and when a file was uploaded Mortar would move it to a staging directory based on who uploaded it, then do some work on the file. ...

Running on Schedule

Automation with Python is a lovely thing, particularly for very repetitive or long running tasks; but unfortunately someone still has to press the button to make it go. It feels like there should be an easy way to set up a program such that it runs routinely, in the background, without much human intervention. Daemonized services are the route to go in server land; but how do you routinely schedule a process to run on your local computer, which may or may not be turned off1? Moreover, long running daemon processes seem expensive when you just want a quick job to execute routinely. ...

Iterators and Generators

This post is an attempt to explain what iterators and generators are in Python, defend the yield statement, and reveal why a library like SimPy is possible. But first some terminology (that specifically targets my friends who Java). Iteration is a syntactic construct that implements a loop over an iterable object. The for statement provides iteration, the while statement may provide iteration. An iterable object is something that implements the iteration protocol (Java folks, read interface). A generator is a function that produces a sequence of results instead of a single value and is designed to make writing iterable objects easier. ...

On Interval Calls with Threading

Event driven programming can be a wonderful thing, particularly when the execution of your code is dependent on user input. It is for this reason that JavaScript and other user facing languages implement very strong event based semantics. Many times event driven semantics depends on elapsed time (e.g. wait then execute). Python, however, does not provide a native setTimeout or setInterval that will allow you to call a function after a specific amount of time, or to call a function again and again at a specific interval. ...