Benchmarking Readline Iterators

I’m starting to get serious about programming in Go, trying to move from an intermediate level to an advanced/expert level as I start to build larger systems. Right now I’m working on a problem that involves on demand iteration, and I don’t want to pass around entire arrays and instead be a bit more frugal about my memory usage. Yesterday, I discussed using [channels to yield iterators from functions]({% post_url 2016-12-22-yielding-functions-for-iteration-golang %}) and was a big fan of the API, but had some questions about memory usage. So today I created a package, iterfile to benchmark and profile various iteration constructs in Go. ...

December 23, 2016 · 2 min · 355 words · Benjamin Bengfort

Yielding Functions for Iteration in Go

It is very common for me to design code that expects functions to return an iterable context, particularly because I have been developing in Python with the yield statement. The yield statement allows functions to “return” the execution context to the caller while still maintaining state such that the caller can return state to the function and continue to iterate. It does this by actually returning a generator, iterable object constructed from the local state of the closure. ...

December 22, 2016 · 3 min · 573 words · Benjamin Bengfort

Data Product Architectures: O'Reilly Webinar

Data Product Architectures: O’Reilly Webinar Description Data products derive their value from data and generate new data in return. As a result, machine-learning techniques must be applied to their architecture and development. Machine learning fits models to make predictions on unknown inputs and must be generalizable and adaptable. As such, fitted models cannot exist in isolation; they must be operationalized and user facing so that applications can benefit from the new data, respond to it, and feed it back into the data product. ...

December 7, 2016 · 1 min · 175 words · Benjamin Bengfort

Exception Handling

This short tutorial is intended to demonstrate the basics of exception handling and the use of context management in order to handle standard cases. These notes were originally created for a training I gave, and the notebook can be found at Exception Handling. I’m happy for any comments or pull requests on the notebook. Exceptions Exceptions are a tool that programmers use to describe errors or faults that are fatal to the program; e.g. the program cannot or should not continue when an exception occurs. Exceptions can occur due to programming errors, user errors, or simply unexpected conditions like no internet access. Exceptions themselves are simply objects that contain information about what went wrong. Exceptions are usually defined by their type - which describes broadly the class of exception that occurred, and by a message that says specifically what happened. Here are a few common exception types: ...

November 21, 2016 · 10 min · 2105 words · Benjamin Bengfort

SVG Vertex with a Timer

In order to promote the use of graph data structures for data analysis, I’ve recently given talks on dynamic graphs: embedding time into graph structures to analyze change. In order to embed time into a graph there are two primary mechanisms: make time a graph element (a vertex or an edge) or have multiple subgraphs where each graph represents a discrete time step. By using either of these techniques, opportunities exist to perform a structural analysis using graph algorithms on time; for example - asking what time is most central to a particular set of relationships. ...

November 4, 2016 · 6 min · 1178 words · Benjamin Bengfort

Message Latency: Ping vs. gRPC

Building distributed systems means passing messages between devices over a network connection. My research specifically considers networks that have extremely variable latencies or that can be partition prone. This led me to the natural question, “how variable are real world networks?” In order to get real numbers, I built a simple echo protocol using Go and gRPC called Orca. I ran Orca for a few days and got some latency measurements as I traveled around with my laptop. Orca does a lot of work, including GeoIP look ups, IP address resolution, and database queries and storage. This post, however, is not about Orca. The latencies I was getting were very high relative to the round-trip latencies reported by the simple ping command that implements the ICMP protocol. ...

November 2, 2016 · 5 min · 987 words · Benjamin Bengfort

Computing Reading Speed

Ashley and I have been going over the District Data Labs Blog trying to figure out a method to make it more accessible both to readers (who are at various levels) and to encourage writers to contribute. To that end, she’s been exploring other blogs to see if we can put multiple forms of content up; long form tutorials (the bulk of what’s there) and shorter idea articles, possibly even as short as the posts I put on my dev journal. One interesting suggestion she had was to mark the reading time of each post, something that the Longreads Blog does. This may help give readers a better sense of the time committment and be able to engage more easily. ...

October 28, 2016 · 4 min · 659 words · Benjamin Bengfort