[Exploratory] What does it mean that an experiment is high bit?

Disclaimer: This is an exploratory writing post. No checking for typos or other editing was done.


John Wentworth noticed that many people perfrom 1-bit, or low-bit experiments. Instead we we want to do experiments that give you as much information as possible. In this document I try to better understand what is going on.

An experiment is two things. A concrete set of steps that you perfrom, and a set of observations you record. Normally we start out with a set of questions (often just one) that we want the experiment to answer.

We can start out with a binary yes/​no question, and then run the experiment. We would normally call this a one-bit experiment, but the experiment itself is not neccesairly 1-bit. We can collect many more observations that are not neccesary for answering the question. Normally we do.

E.g. when we train a neural network, then we get out a set of weights after training. This is our observation. To answer any specific question, we normally need to process our observations in specific ways. For example, we might want to know if a specific training setup produced sparse weights (most weights are zero). In that case we can process the weights to detect if they are sparse, given the training setup.

So normally we would call this a low bit experiment, that just tells if the weights are sparse. However, our observations from the experiments, i.e. the weights, can possibly be used to answer different questions as well.

So normally when we say that an experiment is low bit, we are talking about the question we are trying to answer.

We can think of an expeirment as a touple (Experimental Setup, Observations, Questions). Note that any experimental setup will have many possible observations that we can make, but usually we only record a tiny subset of these. The second entry in the touple is only containing these recorded observations.

An experiment can still be low bit, in terms that the experimental setup does not allow us to make many useful observations, or we do not record many useful observations, where useful refers to how many important questions we can answer. Also normally we want to ask questions that we can answer with an experiment, that give us a lot of useful information. E.g. if we can answer the quesion of how goals are strucutured in a neural network, then that would actually be really useful. For answering the question, we probably want to split it into different subquestions (finding a partioning of the question will probably almost never work thought) and then extract different pices of information from an experiment(s) to answer them, and build up a model, that answers the originila question.

So it seems saying that an information is low-bit, is saying that we can’t answer many useful questions about it. This might be, because we don’t know the rigth questions, we have not recorded the right observations, or we do not have an experimental setup that even produces the right observations.

No comments.