Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: improve docs and example data for excel2xml (DEV-1370) #233

Merged
merged 35 commits into from Oct 6, 2022

Conversation

jnussbaum
Copy link
Collaborator

resolves DEV-1370

@jnussbaum jnussbaum self-assigned this Sep 30, 2022
@@ -58,7 +57,7 @@ def make_xsd_id_compatible(string: str) -> str:
return res


def find_date_in_string(string: str, calling_resource: str = "") -> Optional[str]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 changes in this method:

  • param calling_resource was a relict from earlier times, which was not used any more.
  • param string can actually be None, this case needs to be covered

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is good. If the method does what it says it does, namely "find a date in a string", it should not be possible to hand None for the string...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I changed the signature back to str, but I still sanitize the input. That means that the code demands string to be a string, but it is still ready for the case that string is not a string.

Copy link
Collaborator

@BalduinLandolt BalduinLandolt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some things that struck me during review.

For me, two main questions remain open after looking over it:

  1. what is the target audience of this example repo? I have the feeling that parts of it address a user with very little knowledge, whereas other parts are too difficult for such a user.
  2. wouldn't it make more sense to have this example repository live in its separate repository? and if so, wouldn't it serve the same purpose, if there was a sample python script in the rosetta repo? and in any case, is it necessary to have automated tests for such a script?

docs/assets/0123-import-scripts/README.md Outdated Show resolved Hide resolved
docs/assets/0123-import-scripts/README.md Outdated Show resolved Hide resolved
docs/assets/0123-import-scripts/README.md Outdated Show resolved Hide resolved
docs/assets/0123-import-scripts/README.md Outdated Show resolved Hide resolved
docs/assets/0123-import-scripts/README.md Outdated Show resolved Hide resolved
docs/assets/0123-import-scripts/README.md Outdated Show resolved Hide resolved
docs/assets/0123-import-scripts/import-script.py Outdated Show resolved Hide resolved
@@ -355,7 +356,7 @@ def _upload_resources(
"""

# If there are multimedia files: calculate their total size
bitstream_all_sizes_mb = [os.path.getsize(res.bitstream.value) / 1000000 for res in resources if res.bitstream]
bitstream_all_sizes_mb = [os.path.getsize(os.path.join(imgdir, res.bitstream.value)) / 1000000 for res in resources if res.bitstream]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pathlib.Path tends to offer a nice, higher-level API for most use cases of the os.path module.

Comment on lines 21 to 23
shutil.rmtree("docs/assets/0123-import-scripts-extracted", ignore_errors=True)
if os.path.isfile("docs/assets/0123-import-scripts/data-processed.xml"):
os.remove("docs/assets/0123-import-scripts/data-processed.xml")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be done much nicer using Path

test/e2e/test_0123_import_scripts.py Outdated Show resolved Hide resolved
docs/assets/0123-import-scripts/README.md Outdated Show resolved Hide resolved
docs/assets/0123-import-scripts/README.md Outdated Show resolved Hide resolved
docs/assets/0123-import-scripts/README.md Outdated Show resolved Hide resolved
The potentials for the everyday work of the Client Services at DaSCH are twofold:
1. Data cleaning (recommended): For this purpose, you can think of OpenRefine as a much better version of Excel. You
can perform operations which would be very tiresome in Excel.
2. Conversion to our dps-customised xml format for bulk upload (not recommended)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does dps-customised mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups, that's a weird wording. In my tutorial on the new homepage, I have "DSP-specific XML format". That's better, I guess.

docs/assets/0123-import-scripts/README.md Outdated Show resolved Hide resolved
knora/excel2xml.py Show resolved Hide resolved
@@ -58,7 +57,7 @@ def make_xsd_id_compatible(string: str) -> str:
return res


def find_date_in_string(string: str, calling_resource: str = "") -> Optional[str]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is good. If the method does what it says it does, namely "find a date in a string", it should not be possible to hand None for the string...

Copy link

@irinaschubert irinaschubert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I have to admit that I lost the overview of the PR. Maye you could give it to someone who would have to work with it in the end to check if everything makes sense to a potential user..? Just an idea though, maybe this doesn't make sense at this stage.

docs/dsp-tools-excel2xml.md Show resolved Hide resolved
the DSP server.
"""
# pull the latest state of the git submodule
os.system("git submodule update --init --recursive")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it maybe possible to move this into a beforeAll method instead of inside the test case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, but then I have no place where I can put my import statement (from knora.dsplib.import_scripts import import_script). Python doesn't accept it inside a method, but I have to make the import after the OS call. So I think I just leave it as it is.

@jnussbaum jnussbaum merged commit 9c6827e into main Oct 6, 2022
@jnussbaum jnussbaum deleted the wip/dev-1370-improve-docs-of-excel2xml branch October 6, 2022 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants