Syntax Parsing with CoreNLP and NLTK

Syntactic parsing is a technique by which segmented, tokenized, and part-of-speech tagged text is assigned a structure that reveals the relationships between tokens governed by syntax rules, e.g. by grammars. Consider the sentence: The factory employs 12.8 percent of Bradford County. A syntax parse produces a tree that might help us understand that the subject of the sentence is “the factory”, the predicate is “employs”, and the target is “12.8 percent”, which in turn is modified by “Bradford County”....

June 22, 2018 · 4 min · 700 words · Benjamin Bengfort

Continuing Outer Loops with for/else

When you have an outer and an inner loop, how do you continue the outer loop from a condition inside the inner loop? Consider the following code: for i in range(10): for j in range(9): if i <= j: # break out of inner loop # continue outer loop print(i,j) # don't print unless inner loop completes, # e.g. outer loop is not continued print("inner complete!") Here, we want to print for all i ∈ [0,10) all numbers j ∈ [0,9) that are less than or equal to i and we want to print complete once we’ve found an entire list of j that meets the criteria....

May 17, 2018 · 3 min · 500 words · Benjamin Bengfort

Predicted Class Balance

This is a follow on to the [prediction distribution]({{ site.base_url }}{% link _posts/2018-02-28-prediction-distribution.md %}) visualization presented in the last post. This visualization shows a bar chart with the number of predicted and number of actual values for each class, e.g. a class balance chart with predicted balance as well. This visualization actually came before the prior visualization, but I was more excited about that one because it showed where error was occurring similar to a classification report or confusion matrix....

March 8, 2018 · 2 min · 216 words · Benjamin Bengfort

Class Balance Prediction Distribution

In this quick snippet I present an alternative to the confusion matrix or classification report visualizations in order to judge the efficacy of multi-class classifiers: The base of the visualization is a class balance chart, the x-axis is the actual (or true class) and the height of the bar chart is the number of instances that match that class in the dataset. The difference here is that each bar is a stacked chart representing the percentage of the predicted class given the actual value....

February 28, 2018 · 2 min · 215 words · Benjamin Bengfort

Thread and Non-Thread Safe Go Set

I came across this now archived project that implements a set data structure in Go and was intrigued by the implementation of both thread-safe and non-thread-safe implementations of the same data structure. Recently I’ve been attempting to get rid of locks in my code in favor of one master data structure that does all of the synchronization, having multiple options for thread safety is useful. Previously I did this by having a lower-case method name (a private method) that was non-thread-safe and an upper-case method name (public) that did implement thread-safety....

January 26, 2018 · 2 min · 403 words · Benjamin Bengfort

Git-Style File Editing in CLI

A recent application I was working on required the management of several configuration and list files that needed to be validated. Rather than have the user find and edit these files directly, I wanted to create an editing workflow similar to crontab -e or git commit — the user would call the application, which would redirect to a text editor like vim, then when editing was complete, the application would take over again....

January 6, 2018 · 3 min · 549 words · Benjamin Bengfort

Lock Diagnostics in Go

By now it’s pretty clear that I’ve just had a bear of a time with locks and synchronization inside of multi-threaded environments with Go. Probably most gophers would simply tell me that I should share memory by communicating rather than to communication by sharing memory — and frankly I’m in that camp too. The issue is that: Mutexes can be more expressive than channels Channels are fairly heavyweight So to be honest, there are situations where a mutex is a better choice than a channel....

September 28, 2017 · 4 min · 765 words · Benjamin Bengfort