IT:AD:BOM
Summary
- Endian:
- A majority of mainframes and networking protocols are BigEndian
- A majority of micro chips (ie, Intel) are little endian. *
Notes
XML / Unicode
- The BOM is not guaranteed to be in every Unicode file.
- The BOM is not necessary in XML files: because the Unicode encoding can be determined from the leading less than sign.
- Can be determined from first characters of Xml:
- 3C 00 UCS-2LE or UTF-16LE
- 00 3C UCS-2BE or UTF-16BE
- 3C XX UTF-8 (where XX is non-zero)
Without contextual information, a BOM, or a file type standard with a header like XML and HTML, a file should be assumed to be in the default system locale ANSI code page, governed by the Language for non-Unicode Programs in the Regional Settings on the computer on which it is found.
Detecting Encoding
More
This is interesing: * http://www.codeproject.com/Tips/672470/Simple-Character-Encoding-Detection
Resources
- Great explanation:
- Lib to help:
- See encoding: