Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ibis_cloud_spanner folder added #186

Merged
merged 15 commits into from Mar 9, 2021
Merged

ibis_cloud_spanner folder added #186

merged 15 commits into from Mar 9, 2021

Conversation

dollylipare
Copy link
Contributor

"ibis_cloud_spanner" folder has been added under the branch named "ibis-cloud-spanner". This folder doesn't include "tests" folder. Also the "compiler.py" file contains only the basic code, the remaining functionalities are yet to be added.


for item in schema_list:
field_name = item.name
if(item.type_.code == 8):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db_id = self.dataset_id
database = instance.database(db_id)
with database.snapshot() as snapshot:
query="select * from {} limit 1".format(self.table_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this would still give you the schema with LIMIT 0.

# return '{}.`{}`'.format(arg_formatted, field)


def _array_concat(translator, expr):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods are copy-pasted from BigQuery, correct? I'd much rather see a much smaller compiler module that inherits from the BigQuery connector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have taken these methods from BigQuery as they also work for cloud spanner. But there are two methods namely "_array_index" and "_regex_extract" who have slightly differnt logic in cloud spanner.

So I have updated this "compiler.py" file which now inherits from the BigQuery connector and contains the above two mentioned methods.




__all__ = ('compile', 'connect', 'verify', ''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the trailing empty string(s) '' about? Seems like these should be removed.

from third_party.ibis.ibis_cloud_spanner import table

from google.cloud import spanner
from google.cloud import spanner
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see at least three from google.cloud import spanner imports. These can be consolidated.

from functools import partial

import numpy as np
import regex as re
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused as to why we aren't using the built-in re module. If we are using a third-party library, then we really shouldn't be renaming it to something that conflicts with the built-in.

import google.cloud.spanner as cs
from google.cloud import spanner
import pandas as pd
import regex as re
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't we using the re module built-in? https://docs.python.org/3/library/re.html

@@ -0,0 +1,354 @@
# Copyright 2021 Google Inc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you copy the interface from the Table class in googleapis/python-spanner#219 instead? That way we can switch more easily to it once it has merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more descriptive on this? According to our understanding, are you asking us to include functions like "_get_schema", "_exists" ,etc. which are present in Table class in "googleapis/python-spanner#219" and absent in our Table class version?

"""
return table.TableReference(self, table_id)

class DatasetReference(object):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me why this class is necessary? Can't we just use Database objects?

https://googleapis.dev/python/spanner/latest/database-api.html#google.cloud.spanner_v1.database.Database

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class required as we are using "TableReference" class which takes the object of "DatasetReference" class as an argument.
As representation of table object is like:- Table(TableReference(DatasetReference))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tim and the enitre Google Cloud Spanner team for constant commitment and support throughout the affiliation.

# limitations under the License.


"""CloudScanner public API."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""CloudScanner public API."""
"""Cloud Spanner public API."""

self.database_name = self.instance.database(database_id)
(
self.data_instance,
self.billing_instance,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spanner doesn't support cross-instance queries in the way that BigQuery does. There isn't a separation between "data" and "billing" instance.

We do not need the same "parse" logic that we have in BigQuery.



"""
self.spanner_client = spanner.Client()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want a project parameter passed in here. It can be an optional argument to connect.

See: https://googleapis.dev/python/spanner/latest/client-api.html#google.cloud.spanner_v1.client.Client


def _fully_qualified_name(self, name, database):
instance, dataset = self._parse_instance_and_dataset(database)
return "{}".format(name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???

Wouldn't this whole function be simplified to return name?

third_party/ibis/ibis_cloud_spanner/client.py Show resolved Hide resolved
third_party/ibis/ibis_cloud_spanner/client.py Show resolved Hide resolved
third_party/ibis/ibis_cloud_spanner/client.py Show resolved Hide resolved
@tswast tswast merged commit 00158af into develop Mar 9, 2021
@tswast tswast deleted the ibis-cloud-spanner branch March 9, 2021 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants