I’m only going to respond to the last few paragraphs you wrote. I did read the rest. But I think most of the relevant issues are easier to talk about in a concrete context which the shell analogy supplies.
Your UNIX shell programming example is well placed. It is roughly a model that matches my proposal with connected DSLs, but it is not a panacea (perhaps far from it). I will point out that the languages you mention (awk, sed, and perl) are all general purpose (Turing-complete) text based languages, which is far from the type of DSL I am proposing. Also the shell limits interaction between DSLs to character streams via pipes. This representation of communication rarely maps cleanly to the problem being solved; forcing the implementations to compensate. This generates a great deal of overhead in terms of cognitive effort, complexity, cost ($, development time, run-time), and in some sense a reduction of beauty in the Universe.
Yes. It’s clunky. But it’s not clunky by happenstance. It’s clunky because standardized IPC is really hard.
To highlight the difference between shell programming and the system I’m proposing, start with the shell programming model, but in addition to character streams add support for the communication of structured data, and in addition to pipes add new communication models like a directed graph communication model. Add DSLs that perform transformations on structured data, and DSLs for interactive interfaces. Now you can create sophisticated applications such as syntax sensitive editors while programming at a level that feels like scripting or perhaps like painting; and given the composability of my DSLs, the parts of this program could be optimized and specialized (to the hardware) together to run like a single, purpose built program.
It’s a standard observation in the programming language community that a library is sort of a miniature domain-specific language. Every language worth talking about can be “extended” in this way. But there’s nothing novel about saying “we can extend the core Java language by defining additional classes.” Languages like C++ and Scala go to some trouble to let user classes resemble the core language, syntactically. (With features like operator overloading).
I assume you want to do something different from that, since if you wanted C++, you know where to find it.
In particular, I assume you want to be able to write and compose DSLs, where those DSLs cannot be implemented as libraries in some base GPL. But that’s a self-contradictory desire. If DSL A and DSL B don’t share common abstractions, they won’t compose cleanly.
Think about types for a minute. Suppose DSL A has some type system t, and DSL B has some other set of types t’. If t and t’ aren’t identical, then you’ll have trouble sharing data between those DSLs, since there won’t be a way to represent the data from A in B (or vice versa).
Alternatively, ask about implementation. I have a chunk of code written in A and a chunk written in B. I’d like my compiler/translator to optimize across the boundary. I also want to be able to handle memory management, synchronization, etc across the boundary. That’s what composability means, I think.
Today, we often achieve it by having a shared representation that we compile down to. For instance, there are a bunch of languages that all compile down to JVM bytecode, to the .NET CLR, or to GCC’s intermediate representation. (This also sidesteps the type problem I mentioned above.)
But the price is that if you have to compile to reasonably clean JVM bytecode (or the like), that really limits you. To give an example of an un-embeddable language, I don’t believe you could compile C to JVM bytecode and have it efficiently share objects with Java code. Look at the contortions scala has gone through to implement closures and higher-order functions efficiently.
if two DSLs A and B share a common internal representation, they aren’t all that separate as languages. Alternatively, if A and B are really different—say, C and Haskell—then you would have an awfully hard time writing a clean implementation of the joint language.
Shell is a concrete example of this. I agree that a major reason why shell is clunk is that you can’t communicate structured data. Everything has to be serialized, and in practice, mostly in newline-delimited lists of records, which is very limiting. But that’s not simply because of bad design. It’s because most of the languages we glue together with shell don’t have any other data type in common. Awk and sed don’t have powerful type systems. If they did, they would be much more complicated—and that would make them much less useful.
Another reason shell programming is hard is that there aren’t good constructs in the shell for error handling, concurrency, and so forth. But there couldn’t be, in some sense—you would have to carry the same mechanisms into each of the embedded languages. And that’s intolerably confining.
I’m only going to respond to the last few paragraphs you wrote. I did read the rest. But I think most of the relevant issues are easier to talk about in a concrete context which the shell analogy supplies.
Yes. It’s clunky. But it’s not clunky by happenstance. It’s clunky because standardized IPC is really hard.
It’s a standard observation in the programming language community that a library is sort of a miniature domain-specific language. Every language worth talking about can be “extended” in this way. But there’s nothing novel about saying “we can extend the core Java language by defining additional classes.” Languages like C++ and Scala go to some trouble to let user classes resemble the core language, syntactically. (With features like operator overloading).
I assume you want to do something different from that, since if you wanted C++, you know where to find it.
In particular, I assume you want to be able to write and compose DSLs, where those DSLs cannot be implemented as libraries in some base GPL. But that’s a self-contradictory desire. If DSL A and DSL B don’t share common abstractions, they won’t compose cleanly.
Think about types for a minute. Suppose DSL A has some type system t, and DSL B has some other set of types t’. If t and t’ aren’t identical, then you’ll have trouble sharing data between those DSLs, since there won’t be a way to represent the data from A in B (or vice versa).
Alternatively, ask about implementation. I have a chunk of code written in A and a chunk written in B. I’d like my compiler/translator to optimize across the boundary. I also want to be able to handle memory management, synchronization, etc across the boundary. That’s what composability means, I think.
Today, we often achieve it by having a shared representation that we compile down to. For instance, there are a bunch of languages that all compile down to JVM bytecode, to the .NET CLR, or to GCC’s intermediate representation. (This also sidesteps the type problem I mentioned above.)
But the price is that if you have to compile to reasonably clean JVM bytecode (or the like), that really limits you. To give an example of an un-embeddable language, I don’t believe you could compile C to JVM bytecode and have it efficiently share objects with Java code. Look at the contortions scala has gone through to implement closures and higher-order functions efficiently.
if two DSLs A and B share a common internal representation, they aren’t all that separate as languages. Alternatively, if A and B are really different—say, C and Haskell—then you would have an awfully hard time writing a clean implementation of the joint language.
Shell is a concrete example of this. I agree that a major reason why shell is clunk is that you can’t communicate structured data. Everything has to be serialized, and in practice, mostly in newline-delimited lists of records, which is very limiting. But that’s not simply because of bad design. It’s because most of the languages we glue together with shell don’t have any other data type in common. Awk and sed don’t have powerful type systems. If they did, they would be much more complicated—and that would make them much less useful.
Another reason shell programming is hard is that there aren’t good constructs in the shell for error handling, concurrency, and so forth. But there couldn’t be, in some sense—you would have to carry the same mechanisms into each of the embedded languages. And that’s intolerably confining.