I haven’t studied the field in awhile, but back when I did, I got a lot from Advanced Compiler Design & Implementation by Steven Muchnick.
There is an unfortunate tendency, when teaching about compilers, to teach about components in the order they would be built in a project course, which puts a lot of undue emphasis on the boring solved problem of parsing. The layman’s view of a compiler is that it translates from high-level languages to machine code; the reality is that it translates from a high-level language through a series of intermediate representations, each of which brings out some property that makes it amenable to optimizations; and these optimizations and representations are what make up the bulk of a compiler. A good litmus-test for understanding, is ability to translate a function into static single assignment (SSA) form.
Muchnick’s book is excellent, and we wrote the open source GLSL compiler using it. I wish it was a little more opinionated on how to do things right and avoid wasting your time (Go SSA! Right away! Even if you feel your problem space is special and SSA might not help you!) as opposed to just reporting on all the variations that exist in the wild, but it’s hard to fault it for that when I wish software theory was more grounded in reality in general.
And, yeah, I’m proud to say I still don’t know how to write a lexer or parser. I’ve got flex/bison for that.
For what it’s worth, my compilers professor shared this complaint, and was vocal about it on the first day. He structured his class so that he gave us a (nearly) working compiler for a simple arithmetic language (with lex/yacc). Throughout the semester, we gradually added features to the language it compiled until it looked more like C. By giving us the BNF for the grammar in each assignment, expanding the given parser was trivial. It did mean that we were guided towards maintaining his pipeline stages instead of devising our own, but they changed more and more throughout the various projects. This also meant that poor design early on lead to a great deal of refactoring in later projects.
I would love to read more about this. Examples? Links?
I haven’t studied the field in awhile, but back when I did, I got a lot from Advanced Compiler Design & Implementation by Steven Muchnick.
There is an unfortunate tendency, when teaching about compilers, to teach about components in the order they would be built in a project course, which puts a lot of undue emphasis on the boring solved problem of parsing. The layman’s view of a compiler is that it translates from high-level languages to machine code; the reality is that it translates from a high-level language through a series of intermediate representations, each of which brings out some property that makes it amenable to optimizations; and these optimizations and representations are what make up the bulk of a compiler. A good litmus-test for understanding, is ability to translate a function into static single assignment (SSA) form.
Muchnick’s book is excellent, and we wrote the open source GLSL compiler using it. I wish it was a little more opinionated on how to do things right and avoid wasting your time (Go SSA! Right away! Even if you feel your problem space is special and SSA might not help you!) as opposed to just reporting on all the variations that exist in the wild, but it’s hard to fault it for that when I wish software theory was more grounded in reality in general.
And, yeah, I’m proud to say I still don’t know how to write a lexer or parser. I’ve got flex/bison for that.
For what it’s worth, my compilers professor shared this complaint, and was vocal about it on the first day. He structured his class so that he gave us a (nearly) working compiler for a simple arithmetic language (with lex/yacc). Throughout the semester, we gradually added features to the language it compiled until it looked more like C. By giving us the BNF for the grammar in each assignment, expanding the given parser was trivial. It did mean that we were guided towards maintaining his pipeline stages instead of devising our own, but they changed more and more throughout the various projects. This also meant that poor design early on lead to a great deal of refactoring in later projects.