Posts

Simple ZMQ Message Passing

There are many ways to create RPCs and send messages between nodes in a distributed system. Typically when we think about messaging, we think about a transport layer (TCP, IP) and a protocol layer (HTTP) along with some message serialization. Perhaps best known are RESTful APIs which allow us to GET, POST, PUT, and DELETE JSON data to a server. Other methods include gRPC which uses HTTP and protocol buffers for interprocess communication. ...

PID File Management

In this discussion, I want to propose some code to perform PID file management in a Go program. When a program is backgrounded or daemonized we need some way to communicate with it in order to stop it. All active processes are assigned a unique process id by the operating system and that ID can be used to send signals to the program. Therefore a PID file: The pid files contains the process id (a number) of a given program. For example, Apache HTTPD may write it’s main process number to a pid file - which is a regular text file, nothing more than that - and later use the information there contained to stop itself. You can also use that information (just do a cat filename.pid) to kill the process yourself, using echo filename.pid | xargs kill. ...

Public IP Address Discovery

When doing research on peer-to-peer networks, addressing can become pretty complex pretty quickly. Not everyone has the resources to allocate static, public facing IP addresses to machines. A machine that is in a home network for example only has a single public-facing IP address, usually assigned to the router. The router then performs NAT (network address translation) forwarding requests to internal devices. In order to get a service running on an internal network, you can port forward external requests to a specific port to a specific device. Requests are made to the router’s IP address, and the router passes it on. But how do you know the IP address of the device? Moreover, what happens if the router is assigned a new IP address? Static IP addresses generally cost more. ...

On the Tracks with Rails

I’m preparing to move into a new job when I finish my dissertation hopefully later this summer. The new role involves web application development with Rails and so I needed to get up to speed. I had a web application requirement for my research so I figured I’d knock out two birds with one stone and build that app with Rails (a screenshot of the app is above, though of course this is just a front-end and doesn’t really tell you it was built with Rails). ...

Visual Pipelines for Text Analysis

Visual Pipelines for Text Analysis Description Employing machine learning in practice is half search, half expertise, and half blind luck. In this talk we will explore how to make the luck half less blind by using visual pipelines to steer model selection from raw input to operational prediction. We will look specifically at extending transformer pipelines with visualizers for sentiment analysis and topic modeling text corpora.

Concurrent Subprocesses and Fabric

I’ve ben using Fabric to concurrently start multiple processes on several machines. These processes have to run at the same time (since they are experimental processes and are interacting with each other) and shut down at more or less the same time so that I can collect results and immediately execute the next sample in the experiment. However, I was having a some difficulties directly using Fabric: Fabric can parallelize one task across multiple hosts accordint to roles. Fabric can be hacked to run multiple tasks on multiple hosts by setting env.dedupe_hosts = False Fabric can only parallelize one type of task, not multiple types Fabric can’t handle large numbers of SSH connections In this post we’ll explore my approach with Fabric and my current solution. ...

Appending Results to a File

In my current experimental setup, each process is a single instance of sample, from start to finish. This means that I need to aggregate results across multiple process runs that are running concurrently. Moreover, I may need to aggregate those results between machines. The most compact format to store results in is CSV. This was my first approach and it had some benefits including: small file sizes readability CSV files can just be concatenated together The problems were: ...