Lessons in Discrete Event Simulation

Part of my research involves the creation of large scale distributed systems, and while we do build these systems and deploy them, we do find that simulating them for development and research gives us an advantage in trying new things out. To that end, I employ discrete event simulation (DES) using Python’s SimPy library to build very large simulations of distributed systems, such as the one I’ve built to inspect consistency patterns in variable latency, heterogenous, partition prone networks: CloudScope. ...

April 15, 2016 · 4 min · 823 words · Benjamin Bengfort

NLTK Corpus Reader for Extracted Corpus

Yesterday I wrote a blog about [extracting a corpus]({% post_url 2016-04-10-extract-ddl-corpus %}) from a directory containing Markdown, such as for a blog that is deployed with Silvrback or Jekyll. In this post, I’ll briefly show how to use the built in CorpusReader objects in nltk for streaming the data to the segmentation and tokenization preprocessing functions that are built into NLTK for performing analytics. The dataset that I’ll be working with is the District Data Labs Blog, in particular the state of the blog as of today. The dataset can be downloaded from the ddl corpus, which also has the code in this post for you to use to perform other analytics. ...

April 11, 2016 · 6 min · 1081 words · Benjamin Bengfort

Extracting the DDL Blog Corpus

We have some simple text analyses coming up and as an example, I thought it might be nice to use the DDL blog corpus as a data set. There are relatively few DDL blogs, but they all are long with a lot of significant text and discourse. It might be interesting to try to do some lightweight analysis on them. So, how to extract the corpus? The DDL blog is currently hosted on Silvrback which is designed for text-forward, distraction-free blogging. As a result, there isn’t a lot of cruft on the page. I considered doing a scraper that pulled the web pages down or using the RSS feed to do the data ingestion. After all, I wouldn’t have to do a lot of HTML cleaning. ...

April 10, 2016 · 2 min · 347 words · Benjamin Bengfort

Dispatching Types to Handler Methods

A while I ago, I discussed the [observer pattern]({% post_url 2016-02-16-observer-pattern %}) for dispatching events based on a series of registered callbacks. In this post, I take a look at a similar, but very different methodology for dispatching based on type with pre-assigned handlers. For me, this is actually the more common pattern because the observer pattern is usually implemented as an API to outsider code. On the other hand, this type of dispatcher is usually a programmer’s pattern, used for development and decoupling. ...

April 5, 2016 · 2 min · 426 words · Benjamin Bengfort

Class Variables

These snippets are just a short reminder of how class variables work in Python. I understand this topic a bit too well, I think; I always remember the gotchas and can’t remember which gotcha belongs to which important detail. I generally come up with the right answer then convince myself I’m wrong until I write a bit of code and experiment. Hopefully this snippet will shortcut that process. Consider the following class hierarchy: ...

April 4, 2016 · 3 min · 446 words · Benjamin Bengfort

Simple Password Generation

I was talking with @looselycoupled the other day about how we generate passwords for use on websites. We both agree that every single domain should have its own password (to prevent one crack ruling all your Internets). However, we’ve both evolved on the method over time, and I’ve written a simple script that allows me to generate passwords using methodologies discussed in this post. In particular I use the generator to create passwords for pwSafe, the tool I currently use for password management (due to its use of the open source database format created by Bruce Schneier). It is my hope that this script can be embedded directly into pwSafe, or at least allow me to write directly to the database; but for now I just copy and paste with the pbcopy utility. ...

March 30, 2016 · 5 min · 973 words · Benjamin Bengfort

Visualizing Pi with matplotlib

Happy Pi day! As is the tradition at the University of Maryland (and to a certain extent, in my family) we are celebrating March 14 with pie and Pi. A shoutout to @konstantinosx who, during last year’s Pi day, requested blueberry pie, which was the strangest pie request I’ve received for Pi day. Not that blueberry pie is strange, just that someone would want one so badly for Pi day (he got a mixed berry pie). ...

March 14, 2016 · 2 min · 409 words · Benjamin Bengfort