Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crates are more than tables #274

Open
steveoh opened this issue Jan 10, 2019 · 7 comments
Open

crates are more than tables #274

steveoh opened this issue Jan 10, 2019 · 7 comments
Assignees

Comments

@steveoh
Copy link
Member

steveoh commented Jan 10, 2019

In the future crates will have data sources that do not resemble the tables that we expect. They will be web api requests to map servers or custom apis.

We need to come up with a solution to take structured data and make it a table for the hashing to work or use a different solution.

We should brainstorm some solutions and maybe prototype a few things.

@steveoh steveoh changed the title crates as more than tables crates are more than tables Jan 10, 2019
@stdavis
Copy link
Member

stdavis commented Jun 24, 2020

How much do we want to change (if at all) the Crate API? Could we do something like this for feature service sources?

new Crate(source_name='https://url-to-feature-service', source_workspace=None...)

And then either to a URL regex on source_name or check for source_workspace == None? Or maybe source_workspace should be some sort of constant like FEATURE_SERVICE?

Or maybe source_name should be the name of the feature service and source_workspace should be some sort of base URL? This seems like a bit of a pain.

What could this look like for custom API requests? Do we need to pass some sort of optional get_data parameter?

def get_data():
	#: request data
	#: build a table and then return the path to it

new Crate(source_name=None, source_workspace=None, get_data=get_data...)

@steveoh
Copy link
Member Author

steveoh commented Jun 24, 2020

What about creating some basic data readers, storing the common ones in forklift, and then passing the data reader function to the crate at creation time.

@stdavis
Copy link
Member

stdavis commented Jun 24, 2020

Something like?

from forklift.readers import feature_service

new Crate(source_name='https://url-to-feature-service', source_workspace=None, get_data=feature_service...)

@steveoh
Copy link
Member Author

steveoh commented Jun 24, 2020

what about

from forklift.readers import feature_service, json_api, csv_api

data_reader = lambda: feature_service(url, and, other, options, like, credentials)
json_data_reader = lambda: json_api(url, and, maybe another function for filtering and transforming?)

new Crate(source_name='My data set', source_workspace=None, get_data=data_reader)
new Crate(source_name='Sure sites', source_workspace=None, get_data=json_data_reader)

The reader may also need another function for filtering fields etc.
We should brainstorm a data reader model that could work for the scenarios we can think of if you like this solution.

@stdavis
Copy link
Member

stdavis commented Jun 24, 2020

I like it! I wasn't thinking about the issue with source_name not being a valid feature class name since we use it for destination_name by default. I wonder if we could make source_name and source_workspace optional. So that it could look something like this:

new Crate(destination_name='MyFeatureClass', destination_workspace='database.gdb', get_data=data_reader)

@steveoh
Copy link
Member Author

steveoh commented Jun 24, 2020

Do you think a breaking change to the crate constructor/name lookup would make this nicer? Do you think it would create a lot of work for our pallets? Do you think it would be worth the improvement to the API surface to simplify the crate constructor?

@stdavis
Copy link
Member

stdavis commented Jun 24, 2020

I'm not sure. It would definitely be a lot of work. But the best that I could come up with as for required init params is:

  • source_name
  • source_workspace
  • destination_workspace
    or
  • destination_name
  • destination_workspace
  • get_data

Maybe that will become a mess in the future? Forklift is already a big, hairy beast so I'm open to simplification even if it requires some up-front work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants