# IT:AD:BOM # * [[../|(UP)]] {{indexmenu>.#2|nsort tsort}} * Endian: * A majority of mainframes *and networking protocols* are BigEndian * A majority of micro chips (ie, Intel) are little endian. * ## Notes ## * BOM Examples: * FF FE: UCS-2LE or UTF-16LE ((http://codesnipers.com/?q=node/68)) * FE FF: UCS-2BE or UTF-16BE ((http://codesnipers.com/?q=node/68)) * EF BB BF UTF-8 ((http://codesnipers.com/?q=node/68)) ### XML / Unicode * The BOM is not guaranteed to be in every Unicode file. * The BOM is not necessary in XML files: because the Unicode encoding can be determined from the leading less than sign. * Can be determined from first characters of Xml: * 3C 00 UCS-2LE or UTF-16LE * 00 3C UCS-2BE or UTF-16BE * 3C XX UTF-8 (where XX is non-zero) Without contextual information, a BOM, or a file type standard with a header like XML and HTML, a file should be assumed to be in the default system locale ANSI code page, governed by the Language for non-Unicode Programs in the Regional Settings on the computer on which it is found. ### Detecting Encoding Refer to: * http://www.codeproject.com/Articles/17201/Detect-Encoding-for-In-and-Outgoing-Text ### More ### This is interesing: * http://www.codeproject.com/Tips/672470/Simple-Character-Encoding-Detection ## Resources ## * Great explanation: * http://codesnipers.com/?q=node/68 * Lib to help: * http://www.codeproject.com/Articles/17201/Detect-Encoding-for-In-and-Outgoing-Text * See encoding: * http://blogs.msdn.com/b/oldnewthing/archive/2004/03/24/95235.aspx