In my current experimental setup, each process is a single instance of sample, from start to finish. This means that I need to aggregate results across multiple process runs that are running concurrently. Moreover, I may need to aggregate those results between machines.

The most compact format to store results in is CSV. This was my first approach and it had some benefits including:

  1. small file sizes
  2. readability
  3. CSV files can just be concatenated together

The problems were:

  1. headers become very difficult
  2. everything is a string, no int or float types without parsing

The headers problem is really the biggest problem, since I need future me to be able to read the results files and understand what’s going on in them. I therefore opted instead for .jsonl format, where each object is newline delimited JSON. Though way more verbose a format than CSV, it does preclude the headers problem and allows me to aggregate different results versions with ease. Again, I can just concatenate the results from different files together.

This is becoming so common in my Go code, here is a simple function that takes a path to append to as input as well as the JSON value (the interface) and appends the marshaled data to disk:

Now my current worry is atomic appends from multiple processes (is this possible?!) I was hoping that the file system would lock the file between writes, but I’m not sure it does: Is file append atomic in UNIX?. Anyway, more on that later.