This link has been bookmarked by 227 people . It was first bookmarked on 09 Jun 2008, by Ken Wei.
-
22 Jun 17
-
01 Jul 16
-
A character encoding tells the computer how to interpret raw zeroes and ones into real characters. It usually does this by pairing numbers with characters.
-
The first step of our journey is to find out what the encoding of your website is. The most reliable way is to ask your browser:
-
- Mozilla Firefox
- Tools > Page Info: Encoding
- Internet Explorer
- View > Encoding: bulleted item is unofficial name
-
For all those skeptics out there, there is a very good reason why the character encoding should be explicitly stated. When the browser isn't told what the character encoding of a text is, it has to guess: and sometimes the guess is wrong. Hackers can manipulate this guess in order to slip XSS past filters and then fool the browser into executing it as active code. A great example of this is the Google UTF-7 exploit.
-
In fact, the
METAtag is designed as a substitute for the HTTP header for contexts where sending headers is impossible (such as locally stored files without a webserver). Thus the namehttp-equiv(HTTP equivalent). -
If your website:
- ...only uses ASCII characters,
- Either way is fine, but I recommend switching both to UTF-8 (more on this later).
- ...uses special characters, and they display properly,
- Change the embedded encoding to the server encoding.
- ...uses special characters, but users often complain that they come out garbled,
- Change the server encoding to the embedded encoding.
-
Thus, a chicken-egg problem: a character encoding is necessary to interpret the text of a document. A
METAtag is in the text of a document. TheMETAtag gives the character encoding. How can we determine the contents of aMETAtag, inside the text, if we don't know it's character encoding? And how do we figure out the character encoding, if we don't know the contents of theMETAtag? -
Fortunately for us, the characters we need to write the
METAare in ASCII, which is pretty much universal over every character encoding that is in common use today. So, all the web-browser has to do is parse all the way down until it gets to the Content-Type tag, extract the character encoding tag, then re-parse the document according to this new information. -
Obviously this is complicated, so browsers prefer the simpler and more efficient solution: get the character encoding from a somewhere other than the document itself, i.e. the HTTP headers, much to the chagrin of HTML authors who can't set these headers.
-
Many software projects, at one point or another, suddenly realize that they should be supporting more than one language. Even regular usage in one language sometimes requires the occasional special character that, without surprise, is not available in your character set. Sometimes developers get around this by adding support for multiple encodings: when using Chinese, use Big5, when using Japanese, use Shift-JIS, when using Greek, etc. Other times, they use character references with great zeal.
UTF-8, however, obviates the need for any of these complicated measures. After getting the system to use UTF-8 and adjusting for sources that are outside the hand of the browser (more on this later), UTF-8 just works. You can use it for any language, even many languages at once, you don't have to worry about managing multiple encodings, you don't have to use those user-unfriendly entities.
-
Wikipedia is a great case study for an application that originally used ISO-8859-1 but switched to UTF-8 when it became far to cumbersome to support foreign languages.
-
application/x-www-form-urlencodedwhich is used for GET and by default for POST, andmultipart/form-datawhich may be used by POST, and is required when you want to upload files.
-
-
10 Mar 16
makeller63"Character encoding and character sets are not that difficult to understand, but so many people blithely stumble through the worlds of programming without knowing what to actually do about it, or say "Ah, it's a job for those internationalization experts." No, it is not! This document will walk you through determining the encoding of your system and how you should handle this information. It will stay away from excessive discussion on the internals of character encoding."
-
17 Feb 16
-
27 Nov 15
Jochen FrommIf you don't hate character encodings, you're not a real programmer.
-
10 Apr 15
-
16 Sep 14
-
03 Jun 14
-
27 Feb 14
-
23 Sep 13
Alan LevineCharacter encoding and character sets are not that difficult to understand, but so many people blithely stumble through the worlds of programming without knowing what to actually do about it, or say "Ah, it's a job for those internationalization experts."
-
19 Sep 13
-
23 May 13
-
UTF-8: The Secret of Character Encoding
-
-
14 Feb 13
-
06 Feb 13
-
25 Nov 12
Saul HaroPequeña explicación sobre el manejo del encoding y como afecta a nuestros proyectos Web. Recomendable porque vemos aspectos de seguridad que seguramente no habíamos contemplado antes.
-
23 Nov 12
-
20 Nov 12
-
16 Jun 12
-
13 May 12
-
02 May 12
-
22 Apr 12
-
14 Mar 12
-
25 Jan 12
-
24 Jan 12
-
13 Dec 11
-
08 Dec 11
-
15 Nov 11
-
25 Oct 11
-
22 Oct 11
-
28 Sep 11
-
27 Sep 11
-
02 Sep 11
-
05 Aug 11
-
26 Jul 11
-
18 Jun 11
-
01 Jun 11
-
16 May 11
-
24 Apr 11
-
17 Mar 11
-
12 Mar 11
-
22 Feb 11
-
12 Feb 11
-
05 Jan 11
-
16 Dec 10
-
13 Dec 10
-
12 Dec 10
-
24 Nov 10
-
11 Nov 10
-
29 Sep 10
-
08 Sep 10
-
27 Aug 10
-
16 Jul 10
-
21 Jun 10
-
29 Apr 10
-
23 Apr 10
-
10 Apr 10
-
10 Mar 10
-
06 Mar 10
-
04 Mar 10
Sean O'HalpinHandy overview of character encoding
-
24 Feb 10
-
22 Feb 10
-
21 Jan 10
-
02 Nov 09
-
27 Oct 09
-
14 Aug 09
-
11 Aug 09
-
05 Aug 09
-
24 Jul 09
-
10 Jun 09
-
27 May 09
-
22 Apr 09
elle mCharacter encoding and character sets are not that difficult to understand, but so many people blithely stumble through the worlds of programming without knowing what to actually do about it, or say "Ah, it's a job for those internationalization experts."
-
11 Mar 09
-
05 Mar 09
-
18 Feb 09
-
18 Dec 08
-
14 Dec 08
-
19 Nov 08
-
17 Nov 08
-
Yeray DariasCharacter encoding and character sets are not that difficult to understand, but so many people blithely stumble through the worlds of programming without knowing what to actually do about it, or say "Ah, it's a job for those internationalization experts."
-
14 Nov 08
Bradley DilgerGood explanations, HTML, PHP, server stuff too
-
13 Nov 08
-
10 Nov 08
-
09 Nov 08
-
06 Nov 08
-
Yoram BlumenbergCharacter encoding and character sets are not that difficult to understand, but so many people blithely stumble through the worlds of programming without knowing what to actually do about it, or say "Ah, it's a job for those internationalization experts."
!check !read.on utf-8 html xhtml character encoding article howto resource
-
05 Nov 08
-
Pedro TrindadeDescribes the rationale for using UTF-8, the ramifications otherwise, and how to make the switch.
Would you like to comment?
Join Diigo for a free account, or sign in if you are already a member.