-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating Schema reading procedures and refactoring #78
base: dev
Are you sure you want to change the base?
Conversation
Hsankesara
commented
Jan 16, 2024
•
edited
edited
- Updated data reading procedure
- refactor to make all modules more spark dependent
- Added and updated tests + upgraded spark version to 3.5.0
ca53ac0
to
f2d8faa
Compare
Added custom data reading module that can be used independently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -4,7 +4,7 @@ project: | |||
version: mock_version | |||
|
|||
input: | |||
data_type: local # couldbe mock, local, sftp, s3 | |||
data_type: mock # couldbe mock, local, sftp, s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be better as data_source
or source_type
data type is more specific to the data, this I think relates more to the source of the data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While SFTP (though rsync might be useful for restart function) and S3 (implemented?) probably cover a lot of cases, I don't want to really support every method here as we can't support the long tail of the distribution. It should probably be the user's responsibility to provide a way to expose the remote data with network mounts, local copies, etc.
logger = logging.getLogger(__name__) | ||
|
||
|
||
class CustomDataReader(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the distinction here between ingestion and reader?