IT:AD:Distributed Identities

Summary

With the advent of the web, most developers have had the freedom to ignore the problem of distribution.

With often disconnected Mobile applications, the design patterns of the past are necessarily coming back.

The key problem is how to create Id's in a distributed scenario.

The long and short of it is that much as we used to being spoon-fed by SQL Server et al, and have grown dependent on auto incremented identities, they – whether they are int or Guids based – are simply not suitable solutions for distributed architecture.

In such cases, the only viable solution is date+random(and optionally server indicator) based bigint or Guids.

Notes

The Pros/Cons of various solutions

There are various proposals for distributed identies. For examples, start with those presented here:

But there are other options beyond that as well.

Notes

  • Id's on their own are generally faster than Guids.
    • But Auto Incremented Identities, are not appropriate for distributed scenarios (you can't easily import the remote machine's autoincremented key).
    • In replicated scenarios, where two servers have copies of the exact same database, with the same identity offset, collision will occur.
  • Date based (ie Ticks) are not precise enough across all machines).
  • BigInt can be interesting:
    • Datebased + random (cf SimpleFlake) is a suitable solution.
  • MachineId + Id is a viable solution, but causes error prone data-layer sql issues (regular devs are not used to crafting sql to use two keys for every query).
    • Note that the Id can't be an auto-increment anyway (can't insert the remote machine's id into an autoincrement column…)
  • random Guids are non linear and therefor cause havoc with inserts and reindexing.
  • Sql Server's NewSequentialId is not easy to replicate on Android or other platform than SQL Server (due to it being a bit-diddled, NIC based, time based solution…see Guid).
  • Either BigInts or COMB's are the only viable solutionbeyond SQL Server.
    • See XAct.IDistributedIdService in XActLib.

Issues with COMB and other random based solutions

Ticks + Random + sequence (per tick) is a good basis for a solution, but note that the solution has to handle the following:

  • timespan ticks makes them sequential (which leads to less thrashing of the db table)
  • use a sequence number within the same time increment, resetting every time increment.
  • Random ands spread to the timespan
  • NIC cards are often used to disambiguate one box from an another,
  • note that one might consider adding the process id in order to disambiguate on multiple processors

The output can be a bigint (see SimpleFlake) or a Guid.

Again, in all cases, I would not use an auto-incremented int/bigint, nor a Guid with NEWSEQUENCEID(), as I don't think that method can be safely replicated on other platforms.

Why GUID and not BigInt

BigInt would save storage space lost in indexes.

But I'll be going with Code generated Guids from here on in, anyway, because:

  • Guid's more descriptively demonstrate the reason for their choice (“Global….”)
  • Being sequential, they don't thrash the db, nor are significantly slower than BigInts to insert or search against.
  • Finally, and this is just me…EF's conventions will make int properties DatabaseGeneratedOption.Identity, unless specifically addressed via Fluent notation, whereas Guid based Id columns are by default marked as DatabaseGeneratedOption.None.

Recipes

SET IDENTITY_INSERT ReseedDemo ON
INSERT ReseedDemo (Id,X) VALUES (1000022,'A') 
SET IDENTITY_INSERT ReseedDemo OFF
DBCC CHECKIDENT (ReseedDemo, RESEED, 21)

Resources