The GEDCOM 5.5.1 standard opens up the possibility of encoding a GEDCOM file in UNICODE, meaning UTF-16. Now who wants or needs to encode a GEDCOM file in this way? Nobody I know. The only effective difference between UTF-8 and UTF-16 encoding is the the size of the resulting file and for pretty well all western languages the UTF-8 file is guaranteed to be smaller than the UTF-16 one. About half size for English only. However UNICODE is in the standard and GED-inline will read these files.

Now although the GEDCOM standard doesn't mention the fact, UTF-16 files come in 2 mutually incompatible flavours, big-endian and little-endian. When the context doesn't give a clue to which flavour to use the standard specifies that the file should start with a so-called byte order mark (BOM) which specifies the flavour. The standard recommends that files without a BOM should be regarded as big-endian.

GED-inline therefore supports 3 possible UTF-16 formats:

  • UTF-16 with big-endian BOM
  • UTF-16 with little-endian BOM
  • UTF-16 without BOM

A UTF-16 file without a BOM but having little-endian flavour will give rise to the GED-inline message: 'File not recognised as valid GEDCOM file'.

There is a well known file floating around the Internet called ULHL.GED which gives this error message.