I spent a lot of time thinking about this, and now it seems to me that this is a wrong question. The right question is: “how to make the best legible language?” Maybe it will require some changes to the concept of “statement”.
Why one statement plus one statement makes two statements, but one expression plus one expression makes one expression; why “x=1; y=1;” is two units, but “(x == 1) && (y == 1)” is one unit? What happens if a statement is a part of an expression, in an inline anonymous function? Where should we place semicolons or line breaks then?
Sorry, I don’t have a good answer. As a half-good answer, I would go with the early VB syntax: the rule is unambiguous (unlike some JavaScript rules), and it requires a special symbol in a special situation (as opposed to using a special symbol in non-special situation).
Another half-good answer: use four-space tabs for “this is the next statement” and a half-tab (two spaces) for “here continues the previous line”. (If the statement has more than two lines, all the lines except the first one are aligned the same; the half-tabs don’t accumulate.)
Why one statement plus one statement makes two statements, but one expression plus one expression makes one expression; why “x=1; y=1;” is two units, but “(x == 1) && (y == 1)” is one unit?
Because a statement is the fundamental unit of an imperative language. If “x=1; y=1;” were one unit, it would be one statement. Technically, on another level, multiple statements enclosed in braces is a single statement. Your objection does suggest another solution I forgot to put in—ban arbitrarily complex expressions. Then statements are of bounded length and have no need to span multiple lines. The obvious example for a language that makes this choice is assembly.
What happens if a statement is a part of an expression, in an inline anonymous function? Where should we place semicolons or line breaks then?
You could ban inline anonymous functions, or require them to be a single expression. You could implement half of Lisp as named functions that are building blocks for your “single expression” anonymous functions, so this doesn’t necessarily lose expressive power.
As a half-good answer, I would go with the early VB syntax
That Microsoft changed it is weak evidence against it—it suggests that people really don’t like having to add that extra symbol. There is that ambiguity problem, though. (Javascript’s rule* technically requires an arbitrarily large amount of lookahead—I think the modern VB rule is more sane from a compiler perspective, but can still have annoying consequences)
Your “other half-good answer” isn’t really very distinct from the first: the half-tab takes the role of the special symbol; it being at the beginning of the line just changes how you specify the grammar. (Vim scripting is an example of an existing language that uses a symbol at the beginning of a line for continuations) It also creates an extra burden (even compared to current whitespace-sensitive languages like Python) to maintain the indentation correctly. In particular, it forbids you from adding lots of extra indentation to, for example, line up the second part of a statement with a similar element on the first line (think making a C-style function call, then indenting subsequent lines to the point where the opening bracket of the argument list was. Or indenting to the opening bracket of the innermost still-open group in general.)
*Technical note: Javascript’s rule is “put in a semicolon if leaving it out leads to a syntax error”. VB’s rule is, more or less, “continue the statement if ending it at the linebreak leads to a syntax error”. In general, this will lead to Javascript continuing statements in unexpected places, and will lead to VB terminating statements in unexpected places.
Because a statement is the fundamental unit of an imperative language.
I don’t believe this is true, at least not for the usual sense of “statement”, which is “code with side effects which, unlike an expression, has no type (not even unit/void) and does not evaluate to a value”.
You can easily make a language with no statements, just expressions. As an example, start with C. Remove the semicolon and replace all uses of it with the comma operator. You may need to adjust the semantics very slightly to compensate (I can’t say where offhand).
Presto, you have a statement-less language that looks quite functional: everything (other than definitions) is an expression (i.e. has a type and yields a value), and every program corresponds to the evaluation of a nested tree of expressions (rather than the execution of a sequence of statements).
Yet, the expressions have side effects upon evaluation, there is global shared mutable state, there are variables, there is a strict and well-defined eager order of evaluation—all the semantics of C are intact. Calling this a non-imperative language would be a matter of definition, I guess, but there’s no substantial difference between real C and this subset of it.
Because a statement is the fundamental unit of an imperative language.
So the question “what kind of language are we trying to make?” must be answered before “what syntax would make it most legible?”.
Assuming an imperative language, the simplest solution would be one command per line, no exceptions. There is a scrollbar at the bottom; or you can split a long line into more lines by using temporary variables.
No syntax can make all programs legible. A good syntax is without exceptions and without unnecessary clutter. But if the user decides to write programs horribly, nothing can stop them.
An important choice is whether you make formatting significant (Python-style) or not. Making formatting significant has an advantage that you would probably format your code anyway, so the formatting can carry some information that does not have to be written explicitly, e.g. by curly brackets. But people will complain that in some situations a possibility to use their own formatting would be better. You probably can’t make everyone happy.
I spent a lot of time thinking about this, and now it seems to me that this is a wrong question. The right question is: “how to make the best legible language?” Maybe it will require some changes to the concept of “statement”.
Why one statement plus one statement makes two statements, but one expression plus one expression makes one expression; why “x=1; y=1;” is two units, but “(x == 1) && (y == 1)” is one unit? What happens if a statement is a part of an expression, in an inline anonymous function? Where should we place semicolons or line breaks then?
Sorry, I don’t have a good answer. As a half-good answer, I would go with the early VB syntax: the rule is unambiguous (unlike some JavaScript rules), and it requires a special symbol in a special situation (as opposed to using a special symbol in non-special situation).
Another half-good answer: use four-space tabs for “this is the next statement” and a half-tab (two spaces) for “here continues the previous line”. (If the statement has more than two lines, all the lines except the first one are aligned the same; the half-tabs don’t accumulate.)
Because a statement is the fundamental unit of an imperative language. If “x=1; y=1;” were one unit, it would be one statement. Technically, on another level, multiple statements enclosed in braces is a single statement. Your objection does suggest another solution I forgot to put in—ban arbitrarily complex expressions. Then statements are of bounded length and have no need to span multiple lines. The obvious example for a language that makes this choice is assembly.
You could ban inline anonymous functions, or require them to be a single expression. You could implement half of Lisp as named functions that are building blocks for your “single expression” anonymous functions, so this doesn’t necessarily lose expressive power.
That Microsoft changed it is weak evidence against it—it suggests that people really don’t like having to add that extra symbol. There is that ambiguity problem, though. (Javascript’s rule* technically requires an arbitrarily large amount of lookahead—I think the modern VB rule is more sane from a compiler perspective, but can still have annoying consequences)
Your “other half-good answer” isn’t really very distinct from the first: the half-tab takes the role of the special symbol; it being at the beginning of the line just changes how you specify the grammar. (Vim scripting is an example of an existing language that uses a symbol at the beginning of a line for continuations) It also creates an extra burden (even compared to current whitespace-sensitive languages like Python) to maintain the indentation correctly. In particular, it forbids you from adding lots of extra indentation to, for example, line up the second part of a statement with a similar element on the first line (think making a C-style function call, then indenting subsequent lines to the point where the opening bracket of the argument list was. Or indenting to the opening bracket of the innermost still-open group in general.)
*Technical note: Javascript’s rule is “put in a semicolon if leaving it out leads to a syntax error”. VB’s rule is, more or less, “continue the statement if ending it at the linebreak leads to a syntax error”. In general, this will lead to Javascript continuing statements in unexpected places, and will lead to VB terminating statements in unexpected places.
I don’t believe this is true, at least not for the usual sense of “statement”, which is “code with side effects which, unlike an expression, has no type (not even unit/void) and does not evaluate to a value”.
You can easily make a language with no statements, just expressions. As an example, start with C. Remove the semicolon and replace all uses of it with the comma operator. You may need to adjust the semantics very slightly to compensate (I can’t say where offhand).
Presto, you have a statement-less language that looks quite functional: everything (other than definitions) is an expression (i.e. has a type and yields a value), and every program corresponds to the evaluation of a nested tree of expressions (rather than the execution of a sequence of statements).
Yet, the expressions have side effects upon evaluation, there is global shared mutable state, there are variables, there is a strict and well-defined eager order of evaluation—all the semantics of C are intact. Calling this a non-imperative language would be a matter of definition, I guess, but there’s no substantial difference between real C and this subset of it.
So the question “what kind of language are we trying to make?” must be answered before “what syntax would make it most legible?”.
Assuming an imperative language, the simplest solution would be one command per line, no exceptions. There is a scrollbar at the bottom; or you can split a long line into more lines by using temporary variables.
No syntax can make all programs legible. A good syntax is without exceptions and without unnecessary clutter. But if the user decides to write programs horribly, nothing can stop them.
An important choice is whether you make formatting significant (Python-style) or not. Making formatting significant has an advantage that you would probably format your code anyway, so the formatting can carry some information that does not have to be written explicitly, e.g. by curly brackets. But people will complain that in some situations a possibility to use their own formatting would be better. You probably can’t make everyone happy.