Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import/Export Assistant #161

Open
FelixKirsch opened this issue Nov 3, 2022 · 0 comments
Open

Import/Export Assistant #161

FelixKirsch opened this issue Nov 3, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@FelixKirsch
Copy link
Contributor

FelixKirsch commented Nov 3, 2022

Is your feature request related to a problem? Please describe.
Currently, refinery enables the user to provide upload options to define how the included data should be imported (e.g. column separator, line terminator). But often, the import does not work as expected, for example, the defined options do not match the format of the uploaded file perfectly.
Further, user cannot specify if only part of the included data should be imported to refinery or map data to (existing) data in refinery (e.g. user data).

Describe the solution you'd like

The import and export should be supported by an assistant. This assistant would preview how the uploaded data would be imported into refinery and provide more options.

Preview
The assistant should include a view that displays how (one or a couple) of records would look like in the import or export.
So, for an import, it would show which attributes would be created and the included values for the sample records. For export, it would show the exported record, for example the created json string.

Provide more options

  • Pandas import options
    As already included, the user should be able to specify pandas import options. This includes column separator, line terminator etc.
  • Mappings
    Users should be able to create mappings for the imported data. For example, a mapping between users in the import and users in refinery.
  • Extraction data
    In refinery, extraction data is labeled on token level (tokens are defined by spacy). Other labeling tools follow different approaches. E.g. Labeling studio enables the user to label any charspan. Therefore, charspans must be matched with tokens when importing these data into refinery. Different strategies can be applied for the matching, e.g. expanding the charspan to the next tokens. Here, the user should be able to choose between the different strategies.

Additional context
test finding v1.5.0

@FelixKirsch FelixKirsch added the enhancement New feature or request label Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant