Posts

Extracting a TOC from Markup

In today’s addition of “really simple things that come in handy all the time” I present a simple script to extract the table of contents from markdown or asciidoc files: So this is pretty simple, just use regular expressions to look for lines that start with one or more "#" or "=" (for markdown and asciidoc, respectively) and print them out with an indent according to their depth (e.g. indent ## heading 2 one block). Because this script goes from top to bottom, you get a quick view of the document structure without creating a nested data structure under the hood. I’ve also implemented some simple type detection using common extensions to decide which regex to use. ...

In-Memory File System with FUSE

The Filesystem in Userspace (FUSE) software interface allows developers to create file systems without editing kernel code. This is especially useful when creating replicated file systems, file protocols, backup systems, or other computer systems that require intervention for FS operations but not an entire operating system. FUSE works by running the FS code as a user process while FUSE provides a bridge through a request/response protocol to the kernel. In Go, the FUSE library is implemented by bazil.org/fuse. It is a from-scratch implementation of the kernel-userspace communication protocol and does not use the C library. The library has been excellent for research implementations, particularly because Go is such an excellent language (named programming language of 2016). However, it does lead to some questions (particularly because of the questions in the Go documentation): ...

FUSE Calls on Go Writes

For close-to-open consistency, we need to be able to implement a file system that can detect atomic changes to a single file. Most programming languages implement open() and close() methods for files - but what they are really modifying is the access of a handle to an open file that the operating system provides. Writes are buffered in an asynchronous fashion so that the operating system and user program don’t have to wait for the spinning disk to figure itself out before carrying on. Additional file calls such as sync() and flush() give the user the ability to hint to the OS about what should happen relative to the state of data and the disk, but the OS provides no guarantees that will happen. ...

Error Descriptions for System Calls

Working with FUSE to build file systems means inevitably you have to deal with (or return) system call errors. The Go FUSE implementation includes helpers and constants for returning these errors, but simply wraps them around the syscall error numbers. I needed descriptions to better understand what was doing what. Pete saved the day by pointing me towards the errno.h header file on my Macbook. Some Python later and we had the descriptions: ...

Run Until Error with Go Channels

Writing systems means the heavy use of go routines to support concurrent operations. My current architecture employs several go routines to run a server for a simple web interface as well as command line app, file system servers, replica servers, consensus coordination, etc. Using multiple go routines (threads) instead of processes allows for easier development and shared resources, such as a database that can support transactions. However, management of all these threads can be tricky. ...

Generic JSON Serialization with Go

This post is just a reminder as I work through handling JSON data with Go. Go provides first class JSON support through its standard library json package. The interface is simple, primarily through json.Marshal and json.Unmarshal functions which are analagous to typed versions of json.load and json.dump. Type safety is the trick, however, and generally speaking you define a struct to serialize and deserialize as follows: type Person struct { Name string `json:"name,omitempty"` Age int `json:"age,omitempty"` Salary int `json:"-"` } op := &Person{"John Doe", 42} data, _ := json.Marshal(op) var np Person json.Unmarshall(data, &np) So this is all well and good, until you start wanting to just send around arbirtray data. Luckly the json package will allow you to do that using reflection to load data into a map[string]interface{}, e.g. a dictionary whose keys are strings and whose values are any arbitrary type (anything that implements the null interface, that is has zero or more methods, which all Go types do). So you might see code like this: ...

Resolving Matplotlib Colors

One of the challenges we’ve been dealing with in the Yellowbrick library is the proper resolution of colors, a problem that seems to have parallels in matplotlib as well. The issue is that colors can be described by the user in a variety of ways, then that description has to be parsed and rendered as specific colors. To name a few color specifications that exist in matplotlib: None: choose a reasonable default color The name of the color, e.g. "b" or "blue" The hex code of the color e.g. "#377eb8" The RGB or RGBA tuples of the color, e.g. (0.0078, 0.4470, 0.6353) A greyscale intensity string, e.g. "0.76". The pyplot api documentation sums it up as follows: ...