This article is a mirror article of machine translation, please click here to jump to the original article.

View: 6768|Reply: 0

The UTF-8 encoding problem BOM is described in detail

[Copy link]
Posted on 10/30/2014 5:38:44 PM | | |
UTF-8 BOM issues to watch out for in Wordpress
I encountered a problem very early on, that is, after installing a certain plugin, a white screen would appear after clicking to activate. I have never figured out what the reason is, and the previous solution is that if it does not contain Chinese characters, directly transfer the file to ASCII code, which can generally be solved. When I got a blog for my brother today, this situation happened again. After researching for a long time, I finally found the answer.

There is a concept of BOM in the Unicode specification. BOM - Byte Order Mark, which is the byte order mark. Find a note about the BOM here:

In the UCS code there is a character called "ZERO WIDTH NO-BREAK SPACE", which is encoded as FEFF. FFFE is a non-existent character in UCS, so it should not appear in the actual transmission. The UCS specification recommends that we transfer the character "ZERO WIDTH NO-BREAK SPACE" before transmitting the byte stream. This way, if the receiver receives a FEFF, it indicates that the byte stream is Big-Endian; If FFFE is received, it indicates that the bytestream is Little-Endian. Therefore, the character "ZERO WIDTH NO-BREAK SPACE" is also called BOM.

UTF-8 does not require a BOM to indicate the byte order, but it can use a BOM to indicate how it is encoded. The UTF-8 encoding of the character "ZERO WIDTH NO-BREAK SPACE" is EF BB BF. So if the receiver receives a byte stream that starts with EF BB BF, they know it's UTF-8 encoding.

Windows uses the BOM to mark how text files are encoded.

In addition, the FAQ-BOM on the unicode website explains the BOM in detail. The official natural authority, but in English, seems to be more laborious.

In a UTF-8 encoded file, the BOM occupies three bytes. If you use Notepad to save a text file as UTF-8 encoding, open the file with UE and switch to the hexadecimal editing state to see the FFFE at the beginning. This is a good way to identify UTF-8 encoded files, software uses the BOM to identify whether the file is UTF-8 encoded, and many software also requires that the imported file must have a BOM. However, there is still a lot of software that does not recognize BOM. When I was researching Firefox, I knew that in early versions of Firefox, extensions could not have a BOM, but Firefox 1.5 and later versions have begun to support BOM. Now I found out that PHP also does not support BOM.

PHP was designed without considering the BOM, which means it doesn't ignore the three characters of the BOM at the beginning of the UTF-8 encoded file. Since it must be in
As I saw in the Bo-Blog wiki, Bo-Blog, which also uses PHP, is also troubled by BOM. Another problem is mentioned: "Due to the limitation of the COOKIE sending mechanism, in the files that already have a BOM at the beginning of these files, COOKIEs cannot be sent (because PHP has already sent the file header before the COOKIE is sent), so the login and logout functions are invalid. All functions that rely on COOKIEs and SESSIONs are invalid. This should be the reason for the blank page in the WordPress background, because any of the executed files contain a BOM, and all three characters will be sent out, causing the functionality that depends on cookies and sessions to fail.

The solution is to save the file as ASCII code if it only contains English characters (or characters in ASCII encoding). If you are using an editor such as UE, click File->Convert->UTF-8 to ASCII, or select ASCII encoding in Save As. If it is a line ending in DOS format, you can open it with Notepad, click Save As, and select ASCII encoding. If it contains Chinese characters, you can use UE's save as function and select "UTF-8 no BOM". Please refer to the image below:

According to the instructions of the Bo-Blog wiki: Editplus needs to be saved as gb and then as UTF-8. However, be careful when doing this, all characters that are not included in the GBK encoding will be lost. If there are some non-Chinese characters in the file, don't use this method. (From this small aspect, UE - UltraEdite-32 is indeed much better than Editplus, Editplus is too lightweight)

Another way I found is to use the file editor provided by Wordpress. This method is not limited, and there is no need to download a special editor, after all, everyone is using Wordpress. First, turn on the write permission of the file you want to edit in ftp, then enter the WordPress background - > management-> file editor, enter the path to edit the file, and click Edit file. You won't be able to see the first three characters in the editing screen that appears, but that's okay, position your cursor before the first character of the entire file and press the Backspace key. OK, click Update File, refresh it in ftp, you can see that the file is 3 bytes smaller, and you're done.

Finally, this is a big problem, all those who want to write their own plugins, edit other people's plugins for their own use, and need to modify the template (this is estimated to be needed by everyone), it is best to understand the above knowledge, so as not to be overwhelmed when there is a problem.





Previous:Watch elementary school students play LOL and kill everything in seconds! (Latest Player Original)
Next:Luo Yonghao responded that "hammer mobile phones below 2500 are grandchildren"
Disclaimer:
All software, programming materials or articles published by Code Farmer Network are only for learning and research purposes; The above content shall not be used for commercial or illegal purposes, otherwise, users shall bear all consequences. The information on this site comes from the Internet, and copyright disputes have nothing to do with this site. You must completely delete the above content from your computer within 24 hours of downloading. If you like the program, please support genuine software, purchase registration, and get better genuine services. If there is any infringement, please contact us by email.

Mail To:help@itsvse.com