gjm comments on The Limits of My Rationality

gjm 9 Dec 2014 22:54 UTC
1 point

anglophone

English-speaking.

ISO-8859-1

An 8-bit character set (i.e., representing 256 different characters) suitable for many Western European languages.

Windows codepage 1252

Something very much like ISO-8859-1 but slightly different, used on computers running Microsoft Windows. It’s slightly different because for some reason (there are more and less cynical explanations) Microsoft seem unable to use anything standardized without modifying it a little.

ANSI

Microsoft-Windows-ese for “an 8-bit character set whose first half is the same as ASCII”. Specifying the second half is the job of a “code page”, such as the “code page 1252” mentioned above.

not universally machine readable

Not machine-readable without knowledge of which “code page” (see above) it uses. If you know that, or can guess it, you’re OK.

encryption

Not actually encryption, despite the term “encoding”. A character encoding is a way of representing characters as smallish numbers suitable for storing in a computer. Strictly speaking, every time I said “character set” above I should have said “encoding”. Every time you have any text on a computer, it’s represented internally via some encoding. Common encodings include ASCII (7 bits so 128 characters, but actually some of those 128 slots are reserved for things that aren’t really characters), ISO-8859-1 (8 bits, suitable for much Western European text, though actually nowadays the slightly different ISO-8859-15 is preferred because it includes the Euro currency symbol), UTF-8 (variable length, from 8 to 24 bits per character, represents the whole—very large—Unicode character repertoire). For most purposes UTF-8 is a good bet.

irregardless

Regardless. (Sorry.)

[EDITED to answer the question about “not universally machine readable”.]
- JoshuaMyer 9 Dec 2014 23:15 UTC
  0 points
  Parent
  It has nothing to do with my article but you’ve made me very happy by explaining this to me. I think I understand better what is meant by “encoding”. Also the bit about regardless I found quite witty and even laughed out loud (xkcd.com kept me informed about the OED’s decision on that word).
  
  So the encoding was probably not the problem then because most programs default ANSI and it was not the unanimous first suggestion from everyone to switch to 7 bit encoding … although I do understand why ASCII is more universal now. Open questions in my mind now include: does the GUI read ASCII and ANSI? And what encoding is used for copy and pasting text?
  - gjm 10 Dec 2014 1:12 UTC
    0 points
    Parent
    
    So the encoding was probably not the problem
    
    The main problem was most likely that your text was full of nonbreaking spaces. A conversion to actual ASCII would have got rid of those because the (rather limited) ASCII character repertoire doesn’t include nonbreaking spaces. I doubt that using an “ANSI” character set did that, though, so yes, the encoding was probably a red herring.
    
    does the GUI read ASCII and ANSI?
    
    What GUI?
    
    what encoding is used for copy and pasting text?
    
    That would be an implementation detail of your operating system; if it’s competently implemented (which I think pretty much everything is these days) you should think of what’s copied and pasted as being made up of characters, not of the numbers used to encode them.
    
    However, at least on some systems, if you copy from one application that supports (not just plain text but) formatted text into another, the formatting will be (at least roughly) preserved. This will happen, e.g., if you copy and paste from a web browser into Microsoft Word. I find that this is scarcely ever what I want. There’s usually a way to paste in just the text (sometimes categorized as “Paste Special”, which may offer other less-common options for pasting stuff too).
    - JoshuaMyer 10 Dec 2014 14:37 UTC
      0 points
      Parent
      cool :-)