This article is a mirror article of machine translation, please click here to jump to the original article.

View: 14853|Reply: 0

[Source] BOM prefix issues encountered when reading Unicode files (UTF-8, etc.) in Java

[Copy link]
Posted on 1/14/2019 4:26:17 PM | | |
The problem of BOM first characters encountered when reading Unicode files (UTF-8, etc.) in Java and how to deal with them

Text files created with a text editor in Windows will have a BOM ID added to the file header (the first character) if you choose to save them in a Unicode format such as UTF-8.

This identification is not removed when the file is read in Java, and String.trim() cannot be removed. If you use readLine() to read the first line and store it in the String, the length of the String will be 1 larger than what you see, and the first character is this BOM.

This can cause some trouble, such as when reading an ini file, if you want to tell if the first line starts with "[", you can't judge correctly.

Fortunately, when Java reads Unicode files, it uniformly changes the BOM to "\uFEFF", so you can solve it manually (after judgment, use substring() or replace() to remove this BOM):

However,This approach is not perfectIf the generated jar file runs under Windows, there is still a problem. The ultimate workaround is to use the BOMInputStream provided by apache commons io:

What is BOM?


BOM = Byte Order Mark
The BOM is the recommended method of marking the order of bytes in the Unicode specification. For example, for UTF-16, if the receiver receives a BOM of FEFF, it indicates that the byte stream is Big-Endian; If FFFE is received, it indicates that the bytestream is Little-Endian.
UTF-8 does not require a BOM to indicate byte order, but it can be used to indicate "I am UTF-8 encoded". The UTF-8 encoding of the BOM is EF BB BF (as seen by opening text with UltraEdit and switching to hexadecimal). So if the receiver receives a byte stream that starts with EF BB BF, they know it's UTF-8 encoding.





Previous:Jackson common error solutions
Next:Bar Coding Software Bartender!
Disclaimer:
All software, programming materials or articles published by Code Farmer Network are only for learning and research purposes; The above content shall not be used for commercial or illegal purposes, otherwise, users shall bear all consequences. The information on this site comes from the Internet, and copyright disputes have nothing to do with this site. You must completely delete the above content from your computer within 24 hours of downloading. If you like the program, please support genuine software, purchase registration, and get better genuine services. If there is any infringement, please contact us by email.

Mail To:help@itsvse.com