This article is a mirror article of machine translation, please click here to jump to the original article.

View: 48991|Reply: 3

[Source] .NET/C# to determine if a BOM tag is included

[Copy link]
Posted on 7/16/2021 1:22:54 PM | | | |
What is BOM?

The byte-order mark (BOM), or byte-order mark, is a special mark inserted into a Unicode file encoded with UTF-8, UTF16, or UTF-32 to identify the encoding type of a Unicode file. For UTF-8, a BOM is not required, as it is used to mark the encoding type and byte order (big-endian or little-endian) of a multi-byte encoded file. In UTF-8, the number of bits encoded for each character is expressed by the first byte, and there is no distinction between big-endian and little-endian.
UTF-8 does not require a BOM, although the Unicode standard allows the use of BOMs in UTF-8. Therefore, UTF-8 without a BOM is the standard form, and it is mainly Microsoft's habit to place a BOM in a UTF-8 file (by the way: it is also Microsoft's habit to call UTF-16 with a BOM "Unicode" without going into details).
BOMs are prepared for UTF-16 and UTF-32 to mark byte order. Microsoft uses BOM in UTF-8 because it allows for a clear distinction between UTF-8 and ASCII encoding, otherwise opening a CSV file in Excel may be garbled. But such a file can cause problems in operating systems other than Windows.
The difference between "UTF-8" and "UTF-8 with BOM" is whether there is a BOM or not. That is, whether there is a U+FEFF at the beginning of the file.
UTF-8 web code should not use BOMs, otherwise errors are common. When outputting a CSV file from http response, it is not included by default when set to utf8
BOM, but Windows Excel uses BOM to confirm UTF8 encoding, and all need to write BOM to the beginning of the file.



When you first develop a Java code generator, you will put the file directlyWrite to a UTF-8 file that contains the BOM tagThis will lead to packaging errors, as follows:

Illegal characters: '\ufeff'


How do I use .NET / C# to determine if a file contains BOM tags? , the code is as follows:





Colleagues, attachedConverts the UTF-8 BOM format to the UTF-8 (without BOM tags) format, the full code is as follows:

(End)

Score

Number of participants1MB+1 contribute+1 Collapse reason
Mo Feng 123 + 1 + 1 Very powerful!

See all ratings





Previous:SQL SERVER removes the delete partition function and partition scheme
Next:The space footprint of each table in SQL Server database
Posted on 7/16/2021 10:41:40 PM |
 Landlord| Posted on 11/1/2024 3:00:47 PM |
UTF-8 removes the BOM tag

 Landlord| Posted on 6/25/2025 4:13:03 PM |
utf-8 does not write to the BOM
Disclaimer:
All software, programming materials or articles published by Code Farmer Network are only for learning and research purposes; The above content shall not be used for commercial or illegal purposes, otherwise, users shall bear all consequences. The information on this site comes from the Internet, and copyright disputes have nothing to do with this site. You must completely delete the above content from your computer within 24 hours of downloading. If you like the program, please support genuine software, purchase registration, and get better genuine services. If there is any infringement, please contact us by email.

Mail To:help@itsvse.com