When I’m doing exploratory work I want to run many analyses. I’m
usually optimizing for getting something quick, but I want to document
what I’m doing enough that if there are questions about my analysis or
I later want to draw on it I can reconstruct what I did. I’ve taken a
few approaches to this over the years, but here’s how I work these
days:
For each analysis I make a local directory,
~/work/YYYY-MM-DD--topic/. These contain large files I’m
copying locally to work with, temporary files, and outputs. When
these get too big I delete them; they’re not backed up, and I can
rebuild them from things that are backed up.
Code goes in a git repo, in files named like
YYYY-MM-DD--topic.py. Most of my work lately has been
going into an internal repo, but if there’s nothing sensitive I’ll use
a public
one. I don’t bother with meaningful commit messages; the goal is
just to get the deltas backed up. If I later want to run an analysis
similar to an old one I duplicate the code and make a new work
directory.
Code is run from the command line in the work directory, which
means that in my permanent
shell history every command I ran related to topic
will be tagged with ~/work/YYYY-MM-DD--topic/.
This approach optimized for writing over reading, but maintaining
enough context that I can figure out what I was doing if I need to.
I’ll usually link the code from documents that depend on it, but even
if I forget to it’s pretty fast to figure out which code it would have
been from names and dates. Running git grep and histgrep
get me a lot of what other people seem to get from LLM-autocomplete,
and someday I’d like to try priming an LLM with my personal history.
Often something I’m doing moves from “playing around trying to
understand” to “something real that my team will continue to rely on”.
I try to pay attention to whether I’m getting to that point and then
start taking care of the code properly, in an appropriate repo with
meaningful commit messages etc.
Source Control for Prototyping and Analysis
Link post
When I’m doing exploratory work I want to run many analyses. I’m usually optimizing for getting something quick, but I want to document what I’m doing enough that if there are questions about my analysis or I later want to draw on it I can reconstruct what I did. I’ve taken a few approaches to this over the years, but here’s how I work these days:
For each analysis I make a local directory,
~/work/YYYY-MM-DD--topic/
. These contain large files I’m copying locally to work with, temporary files, and outputs. When these get too big I delete them; they’re not backed up, and I can rebuild them from things that are backed up.Code goes in a git repo, in files named like
YYYY-MM-DD--topic.py
. Most of my work lately has been going into an internal repo, but if there’s nothing sensitive I’ll use a public one. I don’t bother with meaningful commit messages; the goal is just to get the deltas backed up. If I later want to run an analysis similar to an old one I duplicate the code and make a new work directory.Code is run from the command line in the work directory, which means that in my permanent shell history every command I ran related to
topic
will be tagged with~/work/YYYY-MM-DD--topic/
.For example, the code for the figures in my recent NAO blog post on flu is in
2024-09-05--flu-chart.py
and2024-09-12--rai1pct-violins.py
.This approach optimized for writing over reading, but maintaining enough context that I can figure out what I was doing if I need to. I’ll usually link the code from documents that depend on it, but even if I forget to it’s pretty fast to figure out which code it would have been from names and dates. Running
git grep
andhistgrep
get me a lot of what other people seem to get from LLM-autocomplete, and someday I’d like to try priming an LLM with my personal history.Often something I’m doing moves from “playing around trying to understand” to “something real that my team will continue to rely on”. I try to pay attention to whether I’m getting to that point and then start taking care of the code properly, in an appropriate repo with meaningful commit messages etc.
Comment via: facebook, mastodon