Public IP Address Discovery

When doing research on peer-to-peer networks, addressing can become pretty complex pretty quickly. Not everyone has the resources to allocate static, public facing IP addresses to machines. A machine that is in a home network for example only has a single public-facing IP address, usually assigned to the router. The router then performs NAT (network address translation) forwarding requests to internal devices. In order to get a service running on an internal network, you can port forward external requests to a specific port to a specific device. Requests are made to the router’s IP address, and the router passes it on. But how do you know the IP address of the device? Moreover, what happens if the router is assigned a new IP address? Static IP addresses generally cost more. ...

July 9, 2017 · 2 min · 350 words · Benjamin Bengfort

On the Tracks with Rails

I’m preparing to move into a new job when I finish my dissertation hopefully later this summer. The new role involves web application development with Rails and so I needed to get up to speed. I had a web application requirement for my research so I figured I’d knock out two birds with one stone and build that app with Rails (a screenshot of the app is above, though of course this is just a front-end and doesn’t really tell you it was built with Rails). ...

July 6, 2017 · 9 min · 1899 words · Benjamin Bengfort

Visual Pipelines for Text Analysis

Visual Pipelines for Text Analysis Description Employing machine learning in practice is half search, half expertise, and half blind luck. In this talk we will explore how to make the luck half less blind by using visual pipelines to steer model selection from raw input to operational prediction. We will look specifically at extending transformer pipelines with visualizers for sentiment analysis and topic modeling text corpora.

June 24, 2017 · 1 min · 66 words · Benjamin Bengfort

Concurrent Subprocesses and Fabric

I’ve ben using Fabric to concurrently start multiple processes on several machines. These processes have to run at the same time (since they are experimental processes and are interacting with each other) and shut down at more or less the same time so that I can collect results and immediately execute the next sample in the experiment. However, I was having a some difficulties directly using Fabric: Fabric can parallelize one task across multiple hosts accordint to roles. Fabric can be hacked to run multiple tasks on multiple hosts by setting env.dedupe_hosts = False Fabric can only parallelize one type of task, not multiple types Fabric can’t handle large numbers of SSH connections In this post we’ll explore my approach with Fabric and my current solution. ...

June 14, 2017 · 6 min · 1198 words · Benjamin Bengfort

Appending Results to a File

In my current experimental setup, each process is a single instance of sample, from start to finish. This means that I need to aggregate results across multiple process runs that are running concurrently. Moreover, I may need to aggregate those results between machines. The most compact format to store results in is CSV. This was my first approach and it had some benefits including: small file sizes readability CSV files can just be concatenated together The problems were: ...

June 12, 2017 · 2 min · 251 words · Benjamin Bengfort

Compression Benchmarks

One of the projects I’m currently working on is the ingestion of RSS feeds into a Mongo database. It’s been running for the past year, and as of this post has collected 1,575,987 posts for 373 feeds after 8,126 jobs. This equates to about 585GB of raw data, and a firm requirement for compression in order to exchange data. Recently, @ojedatony1616 downloaded the compressed zip file (53GB) onto a 1TB external hard disk and attempted to decompress it. After three days, he tried to cancel it and ended up restarting his computer because it wouldn’t cancel. His approach was simply to double click the file on OS X, but that got me to thinking – it shouldn’t have taken that long; why did it choke? Inspecting the export logs on the server, I noted that it took 137 minutes to compress the directory; shouldn’t it take that long to decompress as well? ...

June 7, 2017 · 4 min · 841 words · Benjamin Bengfort

Decorating Nose Tests

Was introduced to an interesting problem today when decorating tests that need to be discovered by the nose runner. By default, nose explores a directory looking for things named test or tests and then executes those functions, classes, modules, etc. as tests. A standard test suite for me looks something like: import unittest class MyTests(unittest.TestCase): def test_undecorated(self): """ assert undecorated works """ self.assertEqual(2+2, 4) The problem came up when we wanted to decorate a test with some extra functionality, for example loading a fixture: ...

May 22, 2017 · 1 min · 184 words · Benjamin Bengfort