Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider our default GUID generation strategy #30753

Open
roji opened this issue Apr 24, 2023 · 8 comments
Open

Reconsider our default GUID generation strategy #30753

roji opened this issue Apr 24, 2023 · 8 comments

Comments

@roji
Copy link
Member

roji commented Apr 24, 2023

We currently generate GUIDs on the client by default; on SQL Server, we use SequentialGuidValueGenerator, which works similarly to the SQL Server NEWSEQUENTIALID. Sequential GUIDs reduce index fragmentation (important for good performance), and generating them on the client reduces network roundtrips by allowing principal with GUID keys to be inserted in the same batch with dependents pointing to them.

However, there's a pretty reliable-looking report that in parallel environments, our cilent-generated GUIDs do lead to high index fragmentation due to various jitter factors, causing values not to be inserted in order (see previous perf investigation but which was serial only). We should try reproducing this; if this is the case, we should consider switching to server-generated GUIDs by default (with NEWSEQUENTIALID) - the importance of good, non-fragmented indexes is likely to far overweigh the extra roundtrips at insertion with dependents.

@roji roji changed the title Reconsider out default GUID generation strategy Reconsider our default GUID generation strategy Apr 24, 2023
@ajcvickers
Copy link
Member

Keep in mind that one of the main benefits of client-side generation is that the key value is known immediately after the entity is tracked, rather than only later after it has been inserted into the database. Some applications use this for tracking entities throughout their lifecycle, so changing this by default would be quite a significant breaking change.

@roji
Copy link
Member Author

roji commented Apr 25, 2023

Let's discuss, I'd like to understand this better..

@roji
Copy link
Member Author

roji commented Sep 11, 2023

Here's one example where having client-generated value known immediately after Add/Attach is in fact a disadvantage: #13575 (comment)

@ConnerPhillis
Copy link

ConnerPhillis commented Sep 11, 2023

Keep in mind that one of the main benefits of client-side generation is that the key value is known immediately after the entity is tracked, rather than only later after it has been inserted into the database.

I wrote the report, and I just want to add that IMO, I think having GUID keys be immediately available is a great feature. However, I think that there are probably a lot of people who are using sequential GUIDs that may not even know that EF core is generating IDs on the client side unless they're adding HasDefaultValueSql to their code.

#13592

@JackTrapper
Copy link

The issue you're experiencing is that even though a type-1 UUID is based on a timestamp, SQL Server will first sort by the nodeid. So when you have multiple clients each generating their own UUIDs client side, they will have different nodeIDs. This then causes SQL Server to essentially sort rows from one client together, and rows from another client together.

When in reality they should be sorted together based on the 100ns resolution timetsamp; and in the unlikely event that two users generated a UUID at the exact same 100ns tick, then the tie is broken in the usual way (with the collision counter, and then the nodeid).

Entity Framework currently swaps bytes around to make it more sortable based on the big-endian that SQL Server uses. Which is good.

But you have to take it a step further, acknowledge that SQL Server sorts by the "NodeID" bytes first, and move the timestamp to there.

Here's a C# class we use to generate SQL Server "sortable" UUIDs:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Web;

/// <summary>
/// Summary description for SortableGuid
/// </summary>
public class SortableGuid
{
	//UUID timestamps are 100ns ticks starting from October 15, 1582 (date we switched to Gregorian calendar)
	//Windows FILETIME is 100ns ticks starting from January 1, 1601  (the date of the start of the first 400-year cycle of the Gregorian calendar)
	private static readonly Int64 guidEpochOffset = -5748192000000000; //6653 days --> 159672 hours --> 9580320 minutes --> 574819200 seconds -> 574819200000 ms -> 574819200000000 us --> 5748192000000000 ticks

	private static Int64 _lastTick = DateTime.UtcNow.Ticks + guidEpochOffset;

	/// <summary>
	/// 
	/// </summary>
	/// <param name="location">The destination, whose value is compared with comparand and possibly replaced.</param>
	/// <param name="comparison">The value that is compared to the value at location1.</param>
	/// <param name="newValue">The value that replaces the destination value if the comparison results in equality.</param>
	/// <returns>true if comparand was greater than location, and location was updated to newValue. 
	/// Otherwise false.</returns>
	public static Boolean InterlockedExchangeIfGreaterThan(ref Int64 location, Int64 newValue, Int64 comparand)
	{
		//Thank you Raymond Chen: https://stackoverflow.com/a/13056904/12597
		Int64 currentValue;
		do
		{
			currentValue = Interlocked.Read(ref location); //a read of a 64-bit location is not atomic on x86. If location was 32-bit you could get away with just "currentValue = location;"
			if (currentValue >= comparand) return false;
		}
		while (System.Threading.Interlocked.CompareExchange(ref location, newValue, currentValue) != currentValue);
		return true;
	}

	/// <summary>
	/// Returns a new sortable guid. These guid's are suitable as clustering keys in SQL Server.
	/// </summary>
	/// <returns></returns>
	public static Guid NewGuid()
	{
		/*
			Blatently borrowed from Entity Framework.
				https://github.com/dotnet/efcore/blob/master/src/EFCore/ValueGeneration/SequentialGuidValueGenerator.cs


			Except with two differences:
				- they start with an initial timetime, generated statically once - and keep increasing that. 
					That causes the "timestamp" to drift further and further from reality. 
					We generate a timestamp each time, and only rely on incrementing if we're not greater than the last timestamp. 
					A CPU is capable of generating ~200k GUIDs per 100ns tick - so it's not like we can ignore the duplciate ticks problem.
				- UUID timestamp ticks start at October 15, 1582 (date of gregorian calendar). 
					Windows, and DateTime.Ticks returns number of ticks since January 1, 1601 (the date of the first 400 year Gregorian cycle). 
					We'll offset the timestamp to match the UUID spec.
				- We place the version number type-7: Sortable by SQL Server with a timestamp.

			SQL Server sorts first by the NodeID (i.e. the final 6-bytes):

			16-byte guid                                            Microsoft clsid string representation
			===========--=====--=====--=====--=================     ======================================
			33 22 11 00  55 44  77 66  99 88  ff ee dd cc bb aa ==> {00112233-4455-6677-9988-ffeeddccbbaa}
			\_______________________/  \___/  \_______________/
			Timestamp and Version  ^    Clk        Node ID

			The goal is to create a sequential uuid (e.g. UuidCreateSequential), but without having to rely on a MAC address. 
			(i.e. Does an Azure web-site even **have** a MAC address? 
			We certainly can't P/Invoke out to UuidCreateSequental when we're not running on Windows)

			So we conceptually move the 8-byte timestamp to it's new location in NodeID+ClockSequence

			And what used to be Timestamp+Version becomes random.

			And, like type-4 Uuids being called Type-4 to help reduce the chances of collisions between types,
			we call this new version type-7 (sortable by SQL Server with a timestamp).
		*/

		//Start from a new random (type 4) uuid. 
		Byte[] guidBytes = Guid.NewGuid().ToByteArray();

		//Generate 8-byte timestamp
		Int64 currentTicks = DateTime.UtcNow.Ticks + guidEpochOffset;
		//Make sure that currentTicks is greater than the last tick count (in case they're generating guids faster than 100ns apart)
		Int64 last;
		do
		{
			last = Interlocked.Read(ref _lastTick); //a read of a 64-bit value isn't atomic; so it has to be interlocked.
			if (currentTicks <= last)
				currentTicks = last + 1;
		} while (Interlocked.CompareExchange(ref _lastTick, currentTicks, last) != last);


		Byte[] counterBytes = BitConverter.GetBytes(currentTicks);

		if (!BitConverter.IsLittleEndian)
		{
			Array.Reverse(counterBytes);
		}

		//This Guid started it's life as a Type 4 (Random) guid, but the final 8 bytes were changed to contain a sequential counter.
		//I want to tag this as a different kind of Uuid algorithm. In Delphi we already called this a Type 7 uuid.
		guidBytes[07] = (Byte)(0x70 | (guidBytes[07] & 0x0f));

		//Put the timestamp in place - in the proper SQL Server sorting order.
		guidBytes[08] = counterBytes[1];
		guidBytes[09] = counterBytes[0];
		guidBytes[10] = counterBytes[7];
		guidBytes[11] = counterBytes[6];
		guidBytes[12] = counterBytes[5];
		guidBytes[13] = counterBytes[4];
		guidBytes[14] = counterBytes[3];
		guidBytes[15] = counterBytes[2];

		Guid result = new Guid(guidBytes);
		return result;
	}
}

@onionhammer
Copy link

How does Postgres sort uuids? would it also benefit from this work?

@roji
Copy link
Member Author

roji commented Apr 20, 2024

@onionhammer not quite. PG can indeed benefit from switching to UUIDv7, and this is something I hope we'll get to do for 9.0. But that's not the same as the SQL Server's "sequential" GUID - see npgsql/efcore.pg#2909).

@roji
Copy link
Member Author

roji commented Apr 20, 2024

Splitting out @JackTrapper's suggestion above to #33579, whereas this issue can continue tracking switching to server-side generation (which would be an inferior solution).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants