Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: interleaved tables support #618

Open
vi3k6i5 opened this issue May 5, 2021 · 5 comments
Open

feat: interleaved tables support #618

vi3k6i5 opened this issue May 5, 2021 · 5 comments
Assignees
Labels
api: spanner Issues related to the googleapis/python-spanner-django API. priority: p2 Moderately-important priority. Fix may not be included in next release.
Milestone

Comments

@vi3k6i5
Copy link
Contributor

vi3k6i5 commented May 5, 2021

(@c24t edit)

Spanner recommends using interleaved tables for parent-child relationships. Interleaving is a special case foreign key relationship that guarantees child and parent rows are co-located in Spanner. Django has no concept of interleaved tables.

Some investigation using the example from the cloud spanner docs:

CREATE TABLE Singers (
  SingerId   INT64 NOT NULL,
  FirstName  STRING(1024),
  LastName   STRING(1024),
  SingerInfo BYTES(MAX),
) PRIMARY KEY (SingerId);

CREATE TABLE Albums (
  SingerId     INT64 NOT NULL,
  AlbumId      INT64 NOT NULL,
  AlbumTitle   STRING(MAX),
) PRIMARY KEY (SingerId, AlbumId),
  INTERLEAVE IN PARENT Singers ON DELETE CASCADE;

CREATE TABLE Songs (
  SingerId     INT64 NOT NULL,
  AlbumId      INT64 NOT NULL,
  TrackId      INT64 NOT NULL,
  SongName     STRING(MAX),
) PRIMARY KEY (SingerId, AlbumId, TrackId),
  INTERLEAVE IN PARENT Albums ON DELETE CASCADE;

inspectdb gives us reasonable models, but not immediately usable:

$ python manage.py inspectdb
# This is an auto-generated Django model module.
# You'll have to do the following manually to clean this up:
#   * Rearrange models' order
#   * Make sure each model has one field with primary_key=True
#   * Make sure each ForeignKey has `on_delete` set to the desired behavior.
#   * Remove `managed = False` lines if you wish to allow Django to create, modify, and delete the table
# Feel free to rename the models, but don't rename db_table values or field names.
from django.db import models


class Albums(models.Model):
    singerid = models.IntegerField(db_column='SingerId', primary_key=True)  # Field name made lowercase.
    albumid = models.IntegerField(db_column='AlbumId')  # Field name made lowercase.
    albumtitle = models.TextField(db_column='AlbumTitle', blank=True, null=True)  # Field name made lowercase.

    class Meta:
        managed = False
        db_table = 'Albums'
        unique_together = (('singerid', 'albumid'), ('singerid', 'albumid'),)


class Singers(models.Model):
    singerid = models.IntegerField(db_column='SingerId', primary_key=True)  # Field name made lowercase.
    firstname = models.CharField(db_column='FirstName', max_length=1024, blank=True, null=True)  # Field name made lowercase.
    lastname = models.CharField(db_column='LastName', max_length=1024, blank=True, null=True)  # Field name made lowercase.
    singerinfo = models.BinaryField(db_column='SingerInfo', blank=True, null=True)  # Field name made lowercase.

    class Meta:
        managed = False
        db_table = 'Singers'


class Songs(models.Model):
    singerid = models.IntegerField(db_column='SingerId', primary_key=True)  # Field name made lowercase.
    albumid = models.IntegerField(db_column='AlbumId')  # Field name made lowercase.
    trackid = models.IntegerField(db_column='TrackId')  # Field name made lowercase.
    songname = models.TextField(db_column='SongName', blank=True, null=True)  # Field name made lowercase.

    class Meta:
        managed = False
        db_table = 'Songs'
        unique_together = (('singerid', 'albumid', 'trackid'), ('singerid', 'albumid', 'trackid'),)

Some cleanup: keep the unique_together constraints, lose Songs.singerid, add FKs from Albums to Singers and Songs to Albums:

from django.db import models


class Singers(models.Model):
    singerid = models.IntegerField(db_column='SingerId', primary_key=True)
    firstname = models.CharField(db_column='FirstName', max_length=1024, blank=True, null=True)
    lastname = models.CharField(db_column='LastName', max_length=1024, blank=True, null=True)
    singerinfo = models.BinaryField(db_column='SingerInfo', blank=True, null=True)

    class Meta:
        managed = False
        db_table = 'Singers'


class Albums(models.Model):
    albumid = models.IntegerField(db_column='AlbumId', primary_key=True)
    singer = models.ForeignKey(Singers, db_column='SingerId', on_delete=models.CASCADE)
    albumtitle = models.TextField(db_column='AlbumTitle', blank=True, null=True)

    class Meta:
        managed = False
        db_table = 'Albums'
        unique_together = (('singer', 'albumid'), ('singer', 'albumid'),)


class Songs(models.Model):
    trackid = models.IntegerField(db_column='TrackId', primary_key=True)
    album = models.ForeignKey(Albums, db_column='AlbumId', on_delete=models.CASCADE)
    songname = models.TextField(db_column='SongName', blank=True, null=True)

    class Meta:
        managed = False
        db_table = 'Songs'
        unique_together = (('album', 'trackid'), ('album', 'trackid'),)

Using these models, writing a row to the interleaved child table fails:

from myapp import models

s1 = models.Singers(singerid=1, firstname="Pink", lastname="Floyd")
a11 = models.Albums(albumid=11, singer=s1, albumtitle="Wish You Were Here")
t111 = models.Songs(album=a11, trackid=111, songname="Have a Cigar")

s1.save()  # Works: <Singers: Singers object (1)>
a11.save()  # Fails: ProgrammingError: 400 Key column SingerId cannot be updated.

The error we get back from Spanner is INVALID_ARGUMENT, which gets ​surfaced as a ProgrammingError in the spanner python client. The offending gRPC call to google.spanner.v1.Spanner/ExecuteSql has args:

sql: "UPDATE Albums SET SingerId = @a0, AlbumTitle = @a1 WHERE Albums.AlbumId = @a2"
params: {
 ​fields: [
   ​{key: "a0" value {string_value: "1"}},
   ​{key: "a1" value {string_value: "Wish You Were Here"}},
   ​{key: "a2" value {string_value: "11"}},
]}
param_types: [
 ​{key: "a0" value {code: INT64}},
 ​{key: "a1" value {code: STRING}},
 ​{key: "a2" value {code: INT64}},
]

This makes sense: it shouldn't be possible to change SingerId after creating the album since this would break the parent-child relationship. But why are we doing an update in the first place?

When we save the model, django tries doing an update first, and then an insert if there was no existing row to update.

If we skip the call to _do_update in django, the following insert actually works as expected. The request is the same as above, but with a different sql arg:

sql: "INSERT INTO Albums (AlbumId, SingerId, AlbumTitle) VALUES (@a0, @a1, @a2)"`

This suggests we might be able to make interleaved tables behave by adding some special handling in SQLUpdateCompiler and SQLInsertCompiler, and adding something like an InterleavedForeignKey field so we can treat parent-child relationships differently than regular FKs in the compiler.

There may be a simpler approach, and there are almost definitely other complications that I haven't considered here, but I think it's good news that we can treat interleaved parents essentially as FKs and still use the basic django model machinery.

@michi88 let me know what you think, if you've got another use case in mind I'd be interested to hear it.

@product-auto-label product-auto-label bot added the api: spanner Issues related to the googleapis/python-spanner-django API. label May 5, 2021
@vi3k6i5 vi3k6i5 changed the title "feat: interleaved tables support" feat: interleaved tables support May 5, 2021
@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels May 8, 2021
@c24t c24t added the priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. label May 14, 2021
@c24t c24t added this to the v2.2.2b0 milestone May 14, 2021
@c24t c24t removed 🚨 This issue needs some love. triage me I really want to be triaged. labels May 14, 2021
@michi88
Copy link

michi88 commented May 17, 2021

Hi all!

Is there a plan on how we plan to support this?

@c24t
Copy link
Contributor

c24t commented May 20, 2021

Hey @michi88, thanks for taking a look. I updated the description with more info, the next step is to decide how we want to expose interleaved tables in this package's API and write some test cases. I suggested adding an InterleavedForeignKey field above but there may be better alternatives.

@c24t
Copy link
Contributor

c24t commented May 20, 2021

From a conversation with @michi88:

Another approach is to add a Meta.interleave_in_parent option, e.g. in the Songs model:

interleave_in_parent = (Albums, ('singerid', 'albumid', 'trackid'))

We could use this with or without a new InterleavedForeignKey field.

We need to decide whether to (a) have one parent ID field (as in the example above) or (b) separate fields for each ancestor. (a) makes for smaller models but requires some introspection for operations like inserts that require ancestors' IDs, (b) is more explicit but may mean doing some extra integrity checks.

@ansh0l
Copy link
Member

ansh0l commented May 18, 2022

Spanner Limitations added via PR lists some workaround for folks who are blocked on this. Keeping this issue open for tracking reasons.

@ansh0l ansh0l added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels May 18, 2022
@ansh0l
Copy link
Member

ansh0l commented May 18, 2022

Bumping the priority down to p2 since workaround exists. @asthamohta : Please take it forward and feel free to revert in case that's not the right direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/python-spanner-django API. priority: p2 Moderately-important priority. Fix may not be included in next release.
Projects
None yet
Development

No branches or pull requests

6 participants