Recent trends in my field of research, syntactic parsing
We’ve been trying for a long time to make computers speak and listen. Here is what has been happening with the part I work on for the last few years, or at least the part I’m excited about.
What makes understanding hard is that what you are trying to understand can mean so many different things. SO many different things. More than you think!! In fact the number grows way out of line with the number of words.
Until a few years ago, the number one idea we had was to figure out how to put together just a few words at a time. The key was not to think about too many words at once. If you do that, you can make lots of little groups, and put them together. You start at the words. You put together a few words, and get a longer bit back. Then you put together two longer bits. You work your way up, making a tree. I’ll draw you a little tree.
(
( The old )
(voice
( their troubles ) ) )
If the computer has that tree, you can ask it “Who were the troubles voiced by?”, and it can tell you “the old”. Course, it doesn’t know what “the old” are. That’s just some marks to it. But it gets how to put the words together, and give back other marks that are right.
Until the last few years, we thought it was a big deal to see the whole tree before you cut out any others for sure.
Now another way’s shown up. And I think the facts are almost in. I’m definitely calling it early here, probably most don’t agree!!
The other way is to work your way along, from left to right. The funny thing is, that’s what we do! But it’s taken a while for us to get our heads around how to make it work for a computer. But now, I think we’ve made it better than the other way. It makes the computer right when it guesses just as much, but it’s much much faster.
The problem was, if you work your way along, left to right, it’s hard to be sure your guess is the best guess for the words all put together, if you can’t go back and change your mind. And the computer gets lots wrong. If you let it just run, it gets something wrong, but has to move forward, and the idea it’s trying to build doesn’t make sense. You get to “the old voice”, and your tree is the one for “the voice that is old”, not “voicing is what the old are doing”. And then you’re stuck.
People really thought that was it for that approach. If you couldn’t sort your guesses about the whole thing, how could you know you didn’t just close your mind too early? People found nice ways to promise that you would see every total idea at the end, so you could pick one then.
The problem is, you see every idea, but you can only ask questions about the way small groups of words were put together, when that idea was put together. You know how I said you’d build a tree? You could only ask questions about bits next to each other.
The other way, you’re building this tree as you go along, left to right. As every word comes in, you add it to your tree. So yeah okay, we do have to make our guesses, and live with them. We don’t see all the different possible trees at the end. We do get locked in. But all you have to do is not get locked in totally. Keep some other trees around. It turns out we need to keep thinking about 30-60 trees. Less if we’re a bit bright about it.
We’ve been doing good stuff this way. I write this kind of thing. My one makes the computer guess where over 9 words in 10 are in the tree, and it can do ten hundred words in a blink. It’s pretty cool. That’s over a hundred times faster than we could do 3 to 5 years ago.
Recent trends in my field of research, syntactic parsing
We’ve been trying for a long time to make computers speak and listen. Here is what has been happening with the part I work on for the last few years, or at least the part I’m excited about.
What makes understanding hard is that what you are trying to understand can mean so many different things. SO many different things. More than you think!! In fact the number grows way out of line with the number of words.
Until a few years ago, the number one idea we had was to figure out how to put together just a few words at a time. The key was not to think about too many words at once. If you do that, you can make lots of little groups, and put them together. You start at the words. You put together a few words, and get a longer bit back. Then you put together two longer bits. You work your way up, making a tree. I’ll draw you a little tree.
( ( The old ) (voice ( their troubles ) ) )
If the computer has that tree, you can ask it “Who were the troubles voiced by?”, and it can tell you “the old”. Course, it doesn’t know what “the old” are. That’s just some marks to it. But it gets how to put the words together, and give back other marks that are right.
Until the last few years, we thought it was a big deal to see the whole tree before you cut out any others for sure.
Now another way’s shown up. And I think the facts are almost in. I’m definitely calling it early here, probably most don’t agree!!
The other way is to work your way along, from left to right. The funny thing is, that’s what we do! But it’s taken a while for us to get our heads around how to make it work for a computer. But now, I think we’ve made it better than the other way. It makes the computer right when it guesses just as much, but it’s much much faster.
The problem was, if you work your way along, left to right, it’s hard to be sure your guess is the best guess for the words all put together, if you can’t go back and change your mind. And the computer gets lots wrong. If you let it just run, it gets something wrong, but has to move forward, and the idea it’s trying to build doesn’t make sense. You get to “the old voice”, and your tree is the one for “the voice that is old”, not “voicing is what the old are doing”. And then you’re stuck.
People really thought that was it for that approach. If you couldn’t sort your guesses about the whole thing, how could you know you didn’t just close your mind too early? People found nice ways to promise that you would see every total idea at the end, so you could pick one then.
The problem is, you see every idea, but you can only ask questions about the way small groups of words were put together, when that idea was put together. You know how I said you’d build a tree? You could only ask questions about bits next to each other.
The other way, you’re building this tree as you go along, left to right. As every word comes in, you add it to your tree. So yeah okay, we do have to make our guesses, and live with them. We don’t see all the different possible trees at the end. We do get locked in. But all you have to do is not get locked in totally. Keep some other trees around. It turns out we need to keep thinking about 30-60 trees. Less if we’re a bit bright about it.
We’ve been doing good stuff this way. I write this kind of thing. My one makes the computer guess where over 9 words in 10 are in the tree, and it can do ten hundred words in a blink. It’s pretty cool. That’s over a hundred times faster than we could do 3 to 5 years ago.