IT:AD:BOM

Summary

  • Endian:
    • A majority of mainframes and networking protocols are BigEndian
    • A majority of micro chips (ie, Intel) are little endian.

    *

  • BOM Examples:
    • FF FE: UCS-2LE or UTF-16LE 1)
    • FE FF: UCS-2BE or UTF-16BE 2)
    • EF BB BF UTF-8 3)

XML / Unicode

  • The BOM is not guaranteed to be in every Unicode file.
  • The BOM is not necessary in XML files: because the Unicode encoding can be determined from the leading less than sign.
    • Can be determined from first characters of Xml:
      • 3C 00 UCS-2LE or UTF-16LE
      • 00 3C UCS-2BE or UTF-16BE
      • 3C XX UTF-8 (where XX is non-zero)

Without contextual information, a BOM, or a file type standard with a header like XML and HTML, a file should be assumed to be in the default system locale ANSI code page, governed by the Language for non-Unicode Programs in the Regional Settings on the computer on which it is found.

Detecting Encoding

More