Yeah, the LW community should find lots of things in DI that seem remarkably familiar once understood...
Owen_Richardson
Thank you, although it’s not so much the writing per se as the analysis of the precise structure of the inferential gaps that needed to be bridged.
And you’ll see lots more of me in the future. I honestly think a big part of the reason I got over my fear of it not being perfect and posted it already was because I’m very lonely, and the case study of the NYC rationalist chapter was the biggest carrot ever.
It’s not that I’m socially inept. Quite the opposite, when I apply myself. It’s just that I get so damn tired of… well, you know, the second paragraph sums it up perfectly already, doesn’t it?
Being rational in an irrational world is incredibly lonely. Every interaction reveals that our thought processes differ widely from those around us, and I had accepted that such a divide would always exist. For the first time in my life I have dozens of people with whom I can act freely and revel in the joy of rationality without any social concern—hell, it’s actively rewarded! Until the NYC Less Wrong community formed, I didn’t realize that I was a forager lost without a tribe...
As to having a thick skin, I was actually pretty depressed the first day I got up and saw the first batch of comments, which seemed very negative, like Alicorn’s.
“Pretty depressed” as in not able to keep myself from wondering whether my failure to just commit a nice painless suicide already due to my self-preservation instinct was essentially a form of akrasia. (Obviously, past issues exist, and I’ve been using my informal understanding of REBT to keep myself together, although I think I am “naturally” a very optimistic person.)
But I forced myself to confront the question and admit, as I always do, that I do care, and am going to keep on fighting no matter how impossible success seems or how much it seems that I always just end up getting hurt over and over again, so I may as well stop whining to myself and get back to work! So I cheered myself up.
And then I got home and saw that the situation was actually pretty damn good (had like 20 upvotes, and a couple very positive messages from a few individuals), so...
I don’t think I’m going to have a crisis of faith in “the light in the world” ever again.
Or the 2-4-6 game ‘reversed’, yes. Before Misha’s post, I was actually going to try and straighten out the confusion over “logically faultless communication” by showing how it would apply to the 2-4-6 game ‘reversed’ as an example. I might still, depending.
Although the thing is that I’d say the best way to communicate something like ‘2-4-6’ isn’t as one of the simpler concepts in the hierarchy, but as a ‘cognitive routine’, which is made of a chain (possibly branching) of various simpler concepts that have already been taught.
As Misha said:
For instance, teaching integration by substitution might first involve a simple sequence of examples about identifying when the method is appropriate, then a sequence about choosing the correct substitution, before actually teaching students to solve an integration problem using the method.
(Which doesn’t even get into the hidden complexity of all parts ‘black boxed’ together in the sentence as assumed already taught)
Of course it is easier to explain “logically faultless communication” by showing how it applies to more basic concepts, than to complex concepts that are made up of many of those basic concepts connected together.
The problem is that when you just show the very basic concepts as the AthabascaU module on DI does, people say stuff like this:
And, more generally, all examples given may be used for teaching categorization of objects. How do you teach algorithms (such as multiplication)? How do you teach history and geography? How do you teach calculus? How do you teach scientific method? Not every knowledge can be reduced to questions of form “does X have property Y” taught by presenting series of objects which either are or aren’t Y. In the whole presentation there was not a single practically applicable example. Children don’t need to go to school to learn what “is longer than” or “not horizontally aligned” means.
(Prase, in comment on first DI post.)
I anticipated this, and had tried to avoid it by injecting a little excitement, like: ‘Hey y’all, here’s something extremely valuable but complex and non-obvious. It will seem confusing and/or trivial at first, but it really is valuable!’
And in actual fact, looking back, that probably did help, because I got plenty of people going, ‘What the hell are you talking about? Give us some meat!’ rather than just, ‘Huh. Whatever.’
I’m sure there was a much better way possible of achieving the same goal, but what were the chances of me ever finding it without any feedback on actual attempts?
That makes lots of sense. I know that the original publisher folded and that’s why they had to switch to publishing it through the Association for Direct Instruction directly, but I didn’t know whether they updated it at all beyond the preface at the same time.
The book is honestly full of little typos, so I doubted they’d edited it again anyway. I’ve been taking notes of things I think belong on an errata sheet myself.
An interesting aspect of Direct Instruction that I don’t think has been pointed out yet (well, the book, written in 1982, might not be a likely place to find such a thought): this method of teaching seems ideally suited for teaching an Artificial Intelligence. Part of the gimmick of Direct Instruction is that it tries, as much as possible, not to make assumptions about what sort of things will be obvious to the learner. Granted, a lot of the internal structure still relies on experimental data gathered from human learners, but if we’re creating an AI, it’s a lot easier to program in a set of fundamental responses describing the way it should learn inductively, than to program in the concept of “red” or “faster than” by hand.
Yeah, I wasn’t gonna mention this for ages, but the book Inferred Functions of Performance and Learning by Siegfried Engelmann and Donald Steely might contain some useful original ideas relevant to Artificial Intelligence, but I haven’t read it myself and really have no idea beyond “sounds plausible”.
(That is, I know I’ve been communicating very high certainty that DI is a very big deal when it comes to education, and I’m afraid some may have decided I have a general ‘having very high certainty that things are big deals’ trait, and thus misinterpret this recommendation as far stronger than it’s intended.)
But Zig himself has a short description of what the book’s supposed to be about here, so you might be able to come to a better conclusion yourself just by reading that.
- Sep 8, 2011, 12:39 AM; 0 points) 's comment on What Direct Instruction is by (
Misha, you are spectacularly awesome. =D
I mean, it’s aggravating to see things you wrote and go, “But I SAID that! Was everyone just skimming over that part or what?”, but as the aphorism runs in the DI world, “If the learner hasn’t learned, the teacher hasn’t taught”, eh? :P
[And until one sees that aphorism as perfectly consistent with “logically faultless communication”, one must know that one still hasn’t understood the meaning of the technical term.]
I knew I’d make terribly stupid mistakes in miscommunicating this stuff when I started, so I figured it was time to let go of my fear of not having it be perfect in the first place and just start trying.
I should also make sure, when you say it was 1982, do you mean original publication, or that of the copy you got? The second (and most recent) edition is 1991.
Dunno offhand what’s different. Never saw the older one myself.
So, if I were to make you a bet that #1-and-#2 is true that you should rationally take if you believe that 99% estimate, it must be set up so that gjmgain*0.01>owengain*0.99...
So unless I’m making some embarrassing simple math mistake here, if I put up say $2000 (Canadian) for “gjmgain” (wish I had more to play with, but unpaid intern, no work visa here, etc), you should be willing to put up anything less than $20.20...
Ah! But what if rather than money you had to put up that you would read the entire Theory of Instruction and the entire Research on Direct Instruction?
My reply has so far gotten 3 upvotes, which somewhat suggests that your certainty of points (1) and (2) has risen from “say 98%”, without actually saying so. Just for the sake of explicitness, what is your current estimate? :P
Sorry I promised I’d type that section out yesterday, but didn’t. Honestly, I’ve been juggling so many things I’d need to kage bunshin myself with my computer to handle them all.
(Yes that’s right, I just made a “Nartuto” reference. :P
Can you imagine what a “Naruto” equivalent of HPMOR would be like?
...I can’t. Other than “awesome”.)
Anyway, rather than typing out the section, I found a scanner and signed up at photobucket.
Here’s the page. The section I was referring to starts at Prescriptive Applications of Programs [“programs” meaning the task analysis], and ends at the Summary.
- Sep 5, 2011, 7:47 PM; 0 points) 's comment on What Direct Instruction is by (
In fact you are correct: Negative examples, as in examples outside the higher-order class, are not used in the teaching of sub-classes of a “higher-order noun”. However, in discriminating between sub-classes, examples of other sub-classes serve as negatives for the sub-class you are currently teaching.
Please see chapter 11 in TOI, “Hierarchical Class Programs”, p 123.
We do care that they learn the terminology. When I said they are not accessible through ‘simple’ verbal rules like: “Listen: a bird is a small feathered flapping winged thing,” I mean not that they are deaf or completely without language that you can expand, but that they are very young children (or older children from disadvantaged backgrounds, and I can give you a real horror story from my school demonstrating how little some of these families interact with their children, see end of comment) who would mostly not process and retain even such ‘simple’-seeming rules like that.
Learners who are not yet familiar with the generalized concept of ‘higher-order nouns’, and must be shown that the way in which the verbal structure is the same for the statements “this truck is red” and “this truck is a vehicle” and the false statement “this truck is a dog” does not mean that you could find a truck that was not a vehicle in the same way that you could find a truck that’s not red, or respond to the statement “this truck is a vehicle” with “no it’s not! It’s a truck”.
A sequence for teaching the higher-order classes would start with examples of vehicles (sub-classes you are later going to teach) and differences that are as minimally different as possible (avoiding boundaries that are unclear even in the language of knowledgeable adults).
The wording of the first, modeled examples in the sequence could be like, “This is a vehicle/this is not a vehicle”. The test example wording could be, “Tell me, vehicle or not-vehicle?”
Once firm, you move on to the sequence for teaching the first sub-class.
These are vehicles:
+This vehicle is a truck
[large difference to show sameness, a general principle derived directly from the basic axioms of the stimulus-locus analysis]
+This vehicle is a truck
[minimum difference to show difference (again, a general principle), using one of the sub-classes you will teach later as a negative]
-This vehicle is not a truck
Model perhaps a couple more, then
Your turn. Tell me if each vehicle is a truck or not a truck.
+Tell me about this vehicle “A truck”
-Tell me about this vehicle “Not a truck”
etc
Then in the next sequence you introduce a very different sub-class, like “boat”
+My turn. What kind of vehicle is this? A boat.
[big difference to the next positive to show sameness]
+My turn. What kind of vehicle is this? A boat.
+Your turn. What kind of vehicle is this? “A boat”
-Your turn. What kind of vehicle is this? “A truck”
etc
Continue introducing sub-classes as each becomes firm, juxtaposing members of the previously taught sub-classes with the new addition for discrimination.
Horror story: Once someone brought home the wrong kid, X, told them to watch TV, and didn’t notice that it was the wrong kid until hours later when the school finally found the free time around their frantic search for the assumed kidnapper of X (actually just the person who had come to pick up Y), and phoned Y’s family to ask why they hadn’t picked up Y yet.
- Sep 4, 2011, 4:57 AM; 1 point) 's comment on The Cluster Structure of Thingspace by (
Yes, hence why I suggest Theory of Instruction for review.
Although the comment I just made today on this post is currently sandwiched between two comments made yesterday (I thought comment threads just ended up in order from most recent to least?)
Wanting that it be noticed as much as possible, I’ll just ctrl-v here if that’s not a breach of etiquette:
I would like to link to this comment on Eliezer’s post “the cluster structure of thingspace” in which I quickly note how TOI relates.
I would like to link to this comment on Eliezer’s post “the cluster structure of thingspace”.
On Eliezer’s post “The Cluster Structure of Thingspace”
You could give relatively simple verbal intensional definitions to try and lead someone to the bird cluster, yes. But if you had someone who wasn’t practically accessible through those verbal communications, how would you do it?
You’d have to show extensional examples, positives and negatives, and indicate the value of each example by some clear and consistent signal.
You couldn’t give all possible extensional examples, so you would have to select some. And you couldn’t give them all at once, so you’d have to present them in a particular order.
What is the theory for finding optimized selections and orderings of examples for leading the learner to the cluster? How does that theory extend to the more complicated case where you have to communicate the subtypes within the “bird” cluster?
This is one of the many things that the Theory of Direct Instruction that’s presented in Engelmann and Carnine’s text Theory of Instruction: Principles and Applications addresses. [They call it a “multi-dimensional non-comparative concept” (non-comparative” meaning the value of any example is absolute rather than relative to the last), or “noun” for short.]
And of course, if you had to select and order the presentation of simple verbal definitions/descriptions as examples themselves, the theory would also have application.
Please see here for a clarification of what “someone who wasn’t practically accessible through those verbal communications” means, and a more concrete example of teaching the higher-order class ‘vehicles’ and sub-classes.
- Sep 4, 2011, 5:24 AM; 0 points) 's comment on What are good topics for literature review prizes? by (
- Sep 4, 2011, 5:05 AM; 0 points) 's comment on What are good topics for literature review prizes? by (
- On Eliezer’s post “The Cluster Structure of Thingspace” by Sep 4, 2011, 5:02 AM; -3 points) (
Well, one of the things I intend to do once I master the application of DI theory myself is create a DI course covering the material in Theory of Instruction itself.
But I don’t think I’m going to blog the entire contents of 376 page text-book in the next year, what with the huge amount of studying and practice I have to do myself on a more advanced level, and my full-time internship as an elementary teacher.
My intention here is simply to interest some LWers in joining me in my studies, so that they have the opportunity to catch up to me as soon as possible (and given the intelligence distribution on LW, hopefully have some of them surpass me!).
Do I still get to keep your upvote?
PS: Just occurred to me that there’s a terminology mismatch I may have to remember to explain at some point. In the context of DI theory, “sequence” is usually used to mean one of the shortest units of instruction, like, directed at a single basic-form or joining-form concept (a simple joining form concept! Not a transformation concept with many sub-types, which would also be split into multiple sequences).
A ‘program’ refers to a relatively short series of sequences directed at bringing the learner to mastery of a “task” for which those concepts are logically necessary.
How an entire course like “Reading Mastery” for the entire grade-level of kindergarten is unambigiously referred to I’m honestly not sure offhand. I think it’s usually clear from context.
- Sep 5, 2011, 7:41 PM; 1 point) 's comment on What Direct Instruction is by (
About valid downvote uses, please see the note added to the beginning of this post.
Cool, I look forward to discussing the French lessons with you (although honestly I’ve lately been practicing Spanish a lot more).
Remember to ask me for some small charts I made that will help you immensely in properly producing all the standard French phonemes that differ from the standard (American) English set at some point.
DI could probably be somewhat adapted to help in self-teaching, since it would at least give you a useful classification system for what possible logical structure the ideas you’re looking for might have… but that’s the possible structures of the most basic components, which for an advanced subject are arranged in relationships which make the whole thing exponentially more complex. Although different complex ideas can often differ in small ways, but...
Yeah, I dunno. It would have at least a little use I’m sure, but I would bet it couldn’t produce anywhere near the level of “magic”-seeming results that a good DI program designed by someone who already understands the material can.
But really, the basic answer is, “No. DI is simply the application of practical epistemology to teaching. When you are learning something on your own, you have to apply practical epistemology in the normal direction.”
Oh, and If you’re interested in Japanese but aren’t yet at a very high level, look for the book “Japanese Verbs & Essentials of Grammar” by Rita L. Lampkin. I took Japanese in high school, and the school only had a grade 11 class. I came into it below the level of the other students, but the teacher bumped me up to being the only grade 12 Japanese student in the school during that year. I believe that book was the single most significant factor in that, and any enabling traits I had on the side, I’m sure you already have too.
Actually, even though I still achieved a very weak grasp of the language at my peak and quickly lost most of that when I stopped practicing after grade 12, I think I would be able to apply DI theory to use that reference work to produce some great instruction on using the language expressively in a conversational context once I master the use of DI theory more.
(Essentially I would be using the book as a ‘prosthetic’ understanding of the material while I designed the instruction. I haven’t given it much thought, but I think this may not be possible in the same way with mathematical proofs [cuz that’s more cognitive routines than just transformation concepts, and cognitive routines are higher in the hierarchy of the knowledge-system analysis because they incorporate transformations as components.])
And that’s actually a project I’ve been thinking about how I might possibly do it eventually for a while.
You sound like you could make an awesome collaborator on something like that, if you got your hands on a copy of Theory of Instruction.
[Edit: note that language instruction will generally be much easier to design (for any student who already has a good adult grasp of their native language).
This is because it’s largely just subtype analysis of single- and double-transformations that you need to understand, which means you really don’t need the sections of Theory of Instruction that deal with cognitive routines, diagnosis and corrections, the response-locus analysis, and philosophical and research issues, and that’s the entire last half of the book!
Well, a bit of the response-locus analysis would be useful for teaching the production of new phonemes, but not that much theoretical detail.
The major practical difficulty would be tracking the schedule of the integrated review, simply because of the sheer number of distinct entires (vocabulary words and grammatical patterns), but it would be relatively easy to design a Computer Assisted Design program to help with that.
Oh, and the subtype analysis, which is the most complicated theoretical part, needed for ordering the introduction of the transformation concepts, is only needed for teaching all the basic grammatical structures of the language.
Once you’re done with that, all that’s left is just vocabulary and idioms, which pretty much just follow the same logical template over and over again.
(The only difference is that you’d start to be able to provide more and more of the definitions and directions of the instruction entirely within the target language.)]
Ha, upvoted cuz this one made me laugh! =]
NO, I am not a member of the Church of Scientology, or in any way sympathetic to their views (although I do feel sorry for those poor messed up people as I do for every other person who’s living their life drowned deep beneath the sanity water line).
The only common idea on LW that I can think of offhand that I don’t think is really part of the correct contrarian cluster is the 3^^^3 dustspecks thing, and I’m damn sure many LWers would agree with me on that one.
Thank you very much for the IM convo you had with me resolving all those confusions I had about how I had miscommunicated various points. I just thought I should post this comment before I go and use what you gave me to write a much better clarification post for discussion, since the comment was almost done when you IM’d me.
I hope that report doesn’t read too similarly to my posts! I only did the quickest skim of the “what is DI” section, but it seems to be like an abridgement of “Rubric for Identifying Authentic Direct Instruction Programs” (prologue here ) that’s even more abridged than my treatment of Theory of Instruction.
I wouldn’t even try to communicate DI theory across a large inferential distance (unless the audience already had a positive impression of it from personal experiences teaching from DI programs after failing with low performing kids, to make them patient).
But with LW, I figured that after showing the experimental evidence of effectiveness (in the way teachers who had previously been failures became successful after they started using DI programs), to convince some LW people to study the theory themselves, they would find it much easier to understand because of their prior understanding of concepts like extensional/intensional definitions, ‘looking into the dark’, and thingspace.
For instance, when studying the theory behind the design of templates for teaching ‘noun’ concepts (or ‘multi-dimensional non-comparatives’), any LWer should go, “Ohhh, this is taking the basics of what Eliezer was talking about with the example of the ‘bird-cluster’ in ‘thingspace’, and applying it make sure you’re giving enough information to someone else who doesn’t already know what the label ‘bird’ refers to that they can find the cluster!”
And of course as we discussed by IM, the way that “logically faultless communication” relates to the “2-4-6 game” and is “logically faultless” in the same way Bayes is the logically faultless way of doing induction yourself, but that doesn’t guarantee that a particular user won’t misapply it. But if they do misapply it, someone who knows how to apply it correctly can figure out exactly how the mistake-maker must change to become correct.
But yeah, I’ll go get on that clarification post for discussion now.
Thanks again!
Actually, it just occurred to me, when you said:
Were you one of the people I explicitly pointed that connection out to, or did you have the opportunity to notice it for yourself?