What is BOM?
The byte-order mark (BOM), or byte-order mark, is a special mark inserted into a Unicode file encoded with UTF-8, UTF16, or UTF-32 to identify the encoding type of a Unicode file. For UTF-8, a BOM is not required, as it is used to mark the encoding type and byte order (big-endian or little-endian) of a multi-byte encoded file. In UTF-8, the number of bits encoded for each character is expressed by the first byte, and there is no distinction between big-endian and little-endian. UTF-8 does not require a BOM, although the Unicode standard allows the use of BOMs in UTF-8. Therefore, UTF-8 without a BOM is the standard form, and it is mainly Microsoft's habit to place a BOM in a UTF-8 file (by the way: it is also Microsoft's habit to call UTF-16 with a BOM "Unicode" without going into details). BOMs are prepared for UTF-16 and UTF-32 to mark byte order. Microsoft uses BOM in UTF-8 because it allows for a clear distinction between UTF-8 and ASCII encoding, otherwise opening a CSV file in Excel may be garbled. But such a file can cause problems in operating systems other than Windows. The difference between "UTF-8" and "UTF-8 with BOM" is whether there is a BOM or not. That is, whether there is a U+FEFF at the beginning of the file. UTF-8 web code should not use BOMs, otherwise errors are common. When outputting a CSV file from http response, it is not included by default when set to utf8 BOM, but Windows Excel uses BOM to confirm UTF8 encoding, and all need to write BOM to the beginning of the file.
When you first develop a Java code generator, you will put the file directlyWrite to a UTF-8 file that contains the BOM tagThis will lead to packaging errors, as follows:
Illegal characters: '\ufeff'
How do I use .NET / C# to determine if a file contains BOM tags? , the code is as follows:
Colleagues, attachedConverts the UTF-8 BOM format to the UTF-8 (without BOM tags) format, the full code is as follows:
(End)
|