Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Simple data iterator for deeplake.Dataset #2016

Open
1 of 2 tasks
elda27 opened this issue Nov 20, 2022 · 5 comments
Open
1 of 2 tasks

[FEATURE] Simple data iterator for deeplake.Dataset #2016

elda27 opened this issue Nov 20, 2022 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@elda27
Copy link

elda27 commented Nov 20, 2022

馃毃馃毃 Feature Request

  • Related to an existing Issue
  • A new implementation (Improvement, Extension)

Is your feature request related to a problem?

The current implementation requires TensorFlow or PyTorch to generate the iterator on the Windows.
Of course, I could use deplake.Dataset.dataloader to accomplish something like this question.
I would like to provide a simple method that can be done identically in all environments.

For example, I have assumed an implementation to preprocess all data in turn on the CPU using this feature.

To create data similar with the current deeplake would require some conversion process.
I assume that all series data is NumPy, and that all other data can be obtained with appropriate types such as str, int, list, etc.

Description of the possible solution

A deeplake.Dataset.tensorflow() includes generator function that yields dictionary of records.
I guess customizing its implementation.

An alternative solution to the problem can look like

ds = deeplake.empty("./example")
ds.create_tensor("image", htype="image.rgb")
ds.create_tensor("tags", htype="list")
ds.create_tensor("caption", htype="text")

for dict_of_tensor in ds.numpy():
    print(dict_of_tensor) # {"image": np.ndarray, "tags": list of str, "caption": str}
@elda27 elda27 added the enhancement New feature or request label Nov 20, 2022
@pyther-hub
Copy link

hey I have solved this issue can I put a pull request

    def dict_record(self):
        from deeplake.enterprise import dataloader
        return iter(map(lambda row: dict(row[0]), dataloader(self).numpy()))

this is the code I have added

@tatevikh
Copy link
Collaborator

Hi @pyther-hub, absolutely! Go for it.

@pyther-hub
Copy link

Hi @pyther-hub, absolutely! Go for it.

sir I have put a pull request please review it

@gulatisukaran
Copy link

Is something still left to be done?

@pyther-hub
Copy link

can I do work on this again? @tatevikh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants