Aggregating Reads from a Go Channel

Here’s the scenario: we have a buffered channel that’s being read by a single Go routine and is written to by multiple go routines. For simplicity, we’ll say that the channel accepts events and that the other routines generate events of specific types, A, B, and C. If there are more of one type of event generator (or some producers are faster than others) we may end up in the situation where there are a series of the same events on the buffered channel. What we would like to do is read all of the same type of event that is on the buffered channel at once, handling them all simultaneously; e.g. aggregating the read of our events. ...

August 25, 2018 · 3 min · 482 words · Benjamin Bengfort

The Actor Model

Building correct concurrent programs in a distributed system with multiple threads and processes can quickly become very complex to reason about. For performance, we want each thread in a single process to operate as independently as possible; however anytime the shared state of the system is modified synchronization is required. Primitives like mutexes can [ensure structs are thread-safe]({% post_url 2017-02-21-synchronizing-structs %}), however in Go, the strong preference for synchronization is communication. In either case Go programs can quickly become locks upon locks or morasses of channels, incurring performance penalties at each synchronization point. ...

August 3, 2018 · 9 min · 1784 words · Benjamin Bengfort

Syntax Parsing with CoreNLP and NLTK

Syntactic parsing is a technique by which segmented, tokenized, and part-of-speech tagged text is assigned a structure that reveals the relationships between tokens governed by syntax rules, e.g. by grammars. Consider the sentence: The factory employs 12.8 percent of Bradford County. A syntax parse produces a tree that might help us understand that the subject of the sentence is “the factory”, the predicate is “employs”, and the target is “12.8 percent”, which in turn is modified by “Bradford County”. Syntax parses are often a first step toward deep information extraction or semantic understanding of text. Note however, that syntax parsing methods suffer from structural ambiguity, that is the possibility that there exists more than one correct parse for a given sentence. Attempting to select the most likely parse for a sentence is incredibly difficult. ...

June 22, 2018 · 4 min · 700 words · Benjamin Bengfort

Understanding Machine Learning Through Visualizations with Benjamin Bengfort and Rebecca Bilbro - Episode 166

Understanding Machine Learning Through Visualizations with Benjamin Bengfort and Rebecca Bilbro - Episode 166 Description Machine learning models are often inscrutable and it can be difficult to know whether you are making progress. To improve feedback and speed up iteration

June 17, 2018 · 1 min · 40 words · Benjamin Bengfort

Continuing Outer Loops with for/else

When you have an outer and an inner loop, how do you continue the outer loop from a condition inside the inner loop? Consider the following code: for i in range(10): for j in range(9): if i <= j: # break out of inner loop # continue outer loop print(i,j) # don't print unless inner loop completes, # e.g. outer loop is not continued print("inner complete!") Here, we want to print for all i ∈ [0,10) all numbers j ∈ [0,9) that are less than or equal to i and we want to print complete once we’ve found an entire list of j that meets the criteria. While this seems like a fairly contrived example, I’ve actually encountered this exact situation in several places in code this week, and I’ll provide a real example in a bit. ...

May 17, 2018 · 3 min · 500 words · Benjamin Bengfort

Predicted Class Balance

This is a follow on to the [prediction distribution]({{ site.base_url }}{% link _posts/2018-02-28-prediction-distribution.md %}) visualization presented in the last post. This visualization shows a bar chart with the number of predicted and number of actual values for each class, e.g. a class balance chart with predicted balance as well. This visualization actually came before the prior visualization, but I was more excited about that one because it showed where error was occurring similar to a classification report or confusion matrix. I’ve recently been using this chart for initial spot checking more however, since it gives me a general feel for how balanced both the class and the classifier is with respect to each other. It has also helped diagnose what is being displayed in the heat map chart of the other post. ...

March 8, 2018 · 2 min · 216 words · Benjamin Bengfort

Class Balance Prediction Distribution

In this quick snippet I present an alternative to the confusion matrix or classification report visualizations in order to judge the efficacy of multi-class classifiers: The base of the visualization is a class balance chart, the x-axis is the actual (or true class) and the height of the bar chart is the number of instances that match that class in the dataset. The difference here is that each bar is a stacked chart representing the percentage of the predicted class given the actual value. If the predicted color matches the actual color then the classifier was correct, otherwise it was wrong. ...

February 28, 2018 · 2 min · 215 words · Benjamin Bengfort