Lumifer comments on The Limits of My Rationality

Lumifer 9 Dec 2014 21:28 UTC
1 point
0

this was .txt which is fairly universally readable

What you want right now is a plain-ASCII text file. No Unicode, no HTML, no nothing.
- JoshuaMyer 9 Dec 2014 21:40 UTC
  0 points
  0
  Parent
  Thank you. I will try this and see if it helps with the paragraph double spacing problem.
  - JoshuaMyer 9 Dec 2014 21:43 UTC
    0 points
    0
    Parent
    Wow. My encoding options are limited to two Unicode variants, ANSI and UTF-8. Will any of those work for these purposes?
    - Lumifer 9 Dec 2014 22:00 UTC
      0 points
      0
      Parent
      For future reference, ANSI is not Unicode. You can google up the gory details if interested, but basically ASCII is a seven-bit character set with 128 symbols. The so-called ANSI (it’s a misnomer) extends ASCII to 8 bits and so another 128 symbols, but without specifying what these symbols should be. On most Anglophone computers these will correspond to ISO 8859-1 (or a very similar Windows codepage 1252), but in other parts of the world they will correspond to whatever the local codepage is and it can be anything it wants to be.
      
      UTF-8, on the other hand, is proper Unicode. So it seems the closest you can get to plain ASCII is to use ANSI.
      - JoshuaMyer 9 Dec 2014 22:18 UTC
        0 points
        0
        Parent
        So, if I understand the implication, anything encoded in ANSI is not universally machine readable (there are several unfamiliar terms for me here “anglophone” “ISO 8859-1″ and “Windows codepage 1252”)? I probably won’t look up all the details, because I rarely need to know how many bits a method of encryption involves (I’m probably betraying my naivety here) irregardless of the character set used, but I appreciate how solid of a handle you seem to have on the subject.
        gjm 9 Dec 2014 22:54 UTC
        1 point
        0
        Parent
        
        anglophone
        
        English-speaking.
        
        ISO-8859-1
        
        An 8-bit character set (i.e., representing 256 different characters) suitable for many Western European languages.
        
        Windows codepage 1252
        
        Something very much like ISO-8859-1 but slightly different, used on computers running Microsoft Windows. It’s slightly different because for some reason (there are more and less cynical explanations) Microsoft seem unable to use anything standardized without modifying it a little.
        
        ANSI
        
        Microsoft-Windows-ese for “an 8-bit character set whose first half is the same as ASCII”. Specifying the second half is the job of a “code page”, such as the “code page 1252” mentioned above.
        
        not universally machine readable
        
        Not machine-readable without knowledge of which “code page” (see above) it uses. If you know that, or can guess it, you’re OK.
        
        encryption
        
        Not actually encryption, despite the term “encoding”. A character encoding is a way of representing characters as smallish numbers suitable for storing in a computer. Strictly speaking, every time I said “character set” above I should have said “encoding”. Every time you have any text on a computer, it’s represented internally via some encoding. Common encodings include ASCII (7 bits so 128 characters, but actually some of those 128 slots are reserved for things that aren’t really characters), ISO-8859-1 (8 bits, suitable for much Western European text, though actually nowadays the slightly different ISO-8859-15 is preferred because it includes the Euro currency symbol), UTF-8 (variable length, from 8 to 24 bits per character, represents the whole—very large—Unicode character repertoire). For most purposes UTF-8 is a good bet.
        
        irregardless
        
        Regardless. (Sorry.)
        
        [EDITED to answer the question about “not universally machine readable”.]
        JoshuaMyer 9 Dec 2014 23:15 UTC
        0 points
        0
        Parent
        It has nothing to do with my article but you’ve made me very happy by explaining this to me. I think I understand better what is meant by “encoding”. Also the bit about regardless I found quite witty and even laughed out loud (xkcd.com kept me informed about the OED’s decision on that word).
        
        So the encoding was probably not the problem then because most programs default ANSI and it was not the unanimous first suggestion from everyone to switch to 7 bit encoding … although I do understand why ASCII is more universal now. Open questions in my mind now include: does the GUI read ASCII and ANSI? And what encoding is used for copy and pasting text?
        gjm 10 Dec 2014 1:12 UTC
        0 points
        0
        Parent
        
        So the encoding was probably not the problem
        
        The main problem was most likely that your text was full of nonbreaking spaces. A conversion to actual ASCII would have got rid of those because the (rather limited) ASCII character repertoire doesn’t include nonbreaking spaces. I doubt that using an “ANSI” character set did that, though, so yes, the encoding was probably a red herring.
        
        does the GUI read ASCII and ANSI?
        
        What GUI?
        
        what encoding is used for copy and pasting text?
        
        That would be an implementation detail of your operating system; if it’s competently implemented (which I think pretty much everything is these days) you should think of what’s copied and pasted as being made up of characters, not of the numbers used to encode them.
        
        However, at least on some systems, if you copy from one application that supports (not just plain text but) formatted text into another, the formatting will be (at least roughly) preserved. This will happen, e.g., if you copy and paste from a web browser into Microsoft Word. I find that this is scarcely ever what I want. There’s usually a way to paste in just the text (sometimes categorized as “Paste Special”, which may offer other less-common options for pasting stuff too).
        JoshuaMyer 10 Dec 2014 14:37 UTC
        0 points
        0
        Parent
        cool :-)
        JoshuaMyer 9 Dec 2014 22:18 UTC
        0 points
        0
        Parent
        Either way, I owe you.
    - JoshuaMyer 9 Dec 2014 21:56 UTC
      0 points
      0
      Parent
      ANSI works if I turn off word wrap and put the space between paragraphs, as you suggested. Thanks again Lumifer.