IT:AD:Distributed Identities

* See also:

(UP)
See also:
- http://push.cx/2014/distributed-id-generation-and-bit-packing-chibrary
- Speed / Size Summary...great!
- Regarding separating PK and CL Index when talking about Guids
- Guid
- IT:AD:SQLite-Sync
- IT:AD:SQLAnywhere
- http://msdn.microsoft.com/en-us/library/bb902854.aspx
- SimpleFlake ← bigint based.
- Twitter SnowFlake ←- centralized
- Flake 128 bit, no coordination required.
- RustFlakes
  - C#
    - Based on Flake
    - Order of bits is not suitable for SQL Server indexing (was needed at the end)
    - Like that it can be fed an identity (eg MachineId, etc.)
    - Doesn't have process Id separation built into it.
    - CodeProject
    - C#
    - SQL Server ready
    - Doesn't have machine id differentiator
    - Doesn't have process id differentiator
    - That said, due to order of bytes, am using that in XActLib for now.
    - SnowMaker
    - Centralized
    - C#
    - Long based
    - Requires Azure Blob Storage

Summary

With the advent of the web, most developers have had the freedom to ignore the problem of distribution.

With often disconnected Mobile applications, the design patterns of the past are necessarily coming back.

The key problem is how to create Id's in a distributed scenario.

The long and short of it is that much as we used to being spoon-fed by SQL Server et al, and have grown dependent on auto incremented identities, they – whether they are int or Guids based – are simply not suitable solutions for distributed architecture.

In such cases, the only viable solution is date+random(and optionally server indicator) based bigint or Guids.

There are various proposals for distributed identies. For examples, start with those presented here: * http://blogs.msdn.com/b/sqlazure/archive/2010/07/15/10038656.aspx

But there are other options beyond that as well.

Notes

* Id's on their own are generally faster than Guids.

But Auto Incremented Identities, are not appropriate for distributed scenarios (you can't easily import the remote machine's autoincremented key).
In replicated scenarios, where two servers have copies of the exact same database, with the same identity offset, collision will occur.

* Date based (ie Ticks) are not precise enough across all machines). * BigInt can be interesting:

Datebased + random (cf SimpleFlake) is a suitable solution.

* MachineId + Id is a viable solution, but causes error prone data-layer sql issues (regular devs are not used to crafting sql to use two keys for every query).

Note that the Id can't be an auto-increment anyway (can't insert the remote machine's id into an autoincrement column…)

* random Guids are non linear and therefor cause havoc with inserts and reindexing. * Sql Server's NewSequentialId is not easy to replicate on Android or other platform than SQL Server (due to it being a bit-diddled, NIC based, time based solution…see Guid). * Either BigInts or COMB's are the only viable solutionbeyond SQL Server.

See XAct.IDistributedIdService in XActLib.

Issues with COMB and other random based solutions

Ticks + Random + sequence (per tick) is a good basis for a solution, but note that the solution has to handle the following:

timespan ticks makes them sequential (which leads to less thrashing of the db table)
use a sequence number within the same time increment, resetting every time increment.
Random ands spread to the timespan
NIC cards are often used to disambiguate one box from an another,
note that one might consider adding the process id in order to disambiguate on multiple processors

The output can be a bigint (see SimpleFlake) or a Guid.

Again, in all cases, I would not use an auto-incremented int/bigint, nor a Guid with NEWSEQUENCEID(), as I don't think that method can be safely replicated on other platforms.

BigInt would save storage space lost in indexes.

But I'll be going with Code generated Guids from here on in, anyway, because: * Guid's more descriptively demonstrate the reason for their choice (“Global….”) * Being sequential, they don't thrash the db, nor are significantly slower than BigInts to insert or search against. * Finally, and this is just me…EF's conventions will make int properties DatabaseGeneratedOption.Identity, unless specifically addressed via Fluent notation, whereas Guid based Id columns are by default marked as DatabaseGeneratedOption.None.

SET IDENTITY_INSERT ReseedDemo ON
INSERT ReseedDemo (Id,X) VALUES (1000022,'A') 
SET IDENTITY_INSERT ReseedDemo OFF
DBCC CHECKIDENT (ReseedDemo, RESEED, 21)

IT:AD:Distributed Identities

Summary

Notes

The Pros/Cons of various solutions

Notes

Issues with COMB and other random based solutions

Why GUID and not BigInt

Recipes

Resources

Notes