Talk:UTF-8

From WikiProjectMed
Jump to navigation Jump to search

Table should not only use color to encode information (but formatting like bold and underline)

As in a previous comment https://en.wikipedia.org/wiki/Talk:UTF-8/Archive_1#Colour_in_example_table? this has been done before, and is *better* so that everyone can clearly see the different part of the code. Relying on color alone is not good, due to color vision deficiencies and varying color rendition on devices.

Microsoft script dead link

   and Microsoft has a script for Windows 10, to enable it by default for its program Microsoft Notepad
   "Script How to set default encoding to UTF-8 for notepad by PowerShell". gallery.technet.microsoft.com. Retrieved 2018-01-30.
   https://gallery.technet.microsoft.com/scriptcenter/How-to-set-default-2d9669ae?ranMID=24542&ranEAID=TnL5HPStwNw&ranSiteID=TnL5HPStwNw-1ayuyj6iLWwQHN_gI6Np_w&tduid=(1f29517b2ebdfe80772bf649d4c144b1)(256380)(2459594)(TnL5HPStwNw-1ayuyj6iLWwQHN_gI6Np_w)()

This link is dead. How to fix it? — Preceding unsigned comment added by Un1Gfn (talkcontribs) 02:58, 5 April 2021 (UTC)[reply]

That text, and that link, appears to have been removed, so there's no longer anything to fix. Guy Harris (talk) 23:43, 21 December 2023 (UTC)[reply]

utf8 octal conversion

I think this section should be rewritten. It makes no sense to talk about bytes if you have triplets of octal numbers which make 9 bits in total, not 8. The grouping shown in the section is ambiguous (and wrong). --84.167.187.209 (talk) 02:24, 29 May 2021 (UTC)[reply]

The table is correct, the results are 1 to 4 bytes, each displayed as 3 octal digits, the left-most digit cannot be greater than 3. If the bytes were somehow appended into a single octal number then you would first have an endieness question, and more importantly it would remove the alignment between the output octal digits and the input octal digits.
I have made this modification of the octal table. Do you understand it? x,y,z and w are octal digits.--BIL (talk) 22:11, 29 May 2021 (UTC)[reply]
           
Octal code point <-> Octal UTF-8 conversion
First code point Last code point Code point Byte 1 Byte 2 Byte 3 Byte 4
000 177 xxx xxx
0200 3777 xxyy 3xx 2yy
04000 77777 xyyzz 34x 2yy 2zz
100000 177777 1xyyzz 35x 2yy 2zz
0200000 4177777 xyyzzww 36x 2yy 2zz 2ww
Yes that is a lot clearer.Spitzak (talk) 23:44, 29 May 2021 (UTC)[reply]

I do agree there is a huge amount of bloat in this article, conversion from/to UTF-8 is actually really simple and I would love to see the majority of this text spew deleted.Spitzak (talk) 20:24, 29 May 2021 (UTC)[reply]

Suggest/recommend throwing out the whole UTF-8#Octal section. I’m sure the intellectual exercise must have been “neat” or “kind of cool” to whoever took the time and effort to type it up and add it to the article, but IMHO it’s cruft like this that explains how this article got to be so long and bloated. I haven’t seen this appear in _any_ of the Unicode standards documents, and even the single reference cited admits that the API library just compares the binary, even if it might conceivably, theoretically be more convenient for a human with a scientific calculator converting hexadecimal to octal to compare bits manually. This article would IMHO be much more concise and more “encyclopedic” if the half of it comprising personal commentary/observations such as this section (which might be more appropriate, say, as a post on a personal blog, for example) were trimmed.  —PowerPCG5 (talk) 08:35, 10 November 2021 (UTC)[reply]
Excellent idea. This section does not add useful information. −Woodstone (talk) 13:52, 10 November 2021 (UTC)[reply]
Absolutely agree. About 3/4 of this article is bloated with trivial observations and/or redundant rewording of the same information over and over again. I did edit this table last, not because I liked it, but it was even larger and more intrusive before (they put it in as more columns in the other tables), and attempts to just remove it got reverted...Spitzak (talk) 15:10, 10 November 2021 (UTC)[reply]
Such is the unfortunate nature of a community-built wiki - editors contribute to their own niche and hobbies. Criticism of Wikipedia#Systemic bias in coverage
Criticism of Wikipedia#Quality of writing is funny too. Wqwt (talk) 07:21, 4 September 2022 (UTC)[reply]

Unfortunately there is an error in it. If first code point is 0200 then last code point can not be 3777, for example. Please consider description in main article of how Encoding process works, then you find that first Unicode code point is always 0 . Apparently that is how it is really done. — Preceding unsigned comment added by SiwardDeGroot (talkcontribs) 14:48, 21 July 2023 (UTC)[reply]

You are talking about "overlong encodings". The code point 0 should be done by the one-byte entry in the first line of the table. Encoding code point 0 using the second line of the table is an error. Spitzak (talk) 15:45, 21 July 2023 (UTC)[reply]

US-ASCII

@Comp.arch: With respect to Special:Diff/1105781113, it's better to use just "ASCII" unless it could be misinterpreted as some other variant of ISO 646 instead of ANSI X3.4-1986. It is not the case here, but I think current usage in the article is okay. IANA preference for "US-ASCII" only matters for use in the charset parameter or similar where "ASCII" is not even a valid label at all. Please don't link it halfway as US-ASCII because that makes absolutely no sense in any context and looks like a formatting mistake. Link the whole US-ASCII, piping it if you don't like the redirect. – MwGamera (talk) 12:49, 22 August 2022 (UTC)[reply]

The article contains "{{efn", which looks like a mistake.

I would've fixed it myself but I don't know how to transform the remaining sentence to make sense. 2A01:C23:8D8D:BF00:C070:85C1:B1B8:4094 (talk) 16:17, 2 April 2024 (UTC)[reply]

I fixed it, I think. I'm not 100% sure it's how the previous editors intended. I invite them to review and confirm. Indefatigable (talk) 19:03, 2 April 2024 (UTC)[reply]