Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: excel to json list (DEV-431) #155

Merged
merged 18 commits into from Feb 17, 2022

Conversation

jnussbaum
Copy link
Collaborator

resolves DEV-431
resolves also DEV-137

@jnussbaum jnussbaum self-assigned this Feb 11, 2022
@jnussbaum
Copy link
Collaborator Author

Mir schien die Architektur von excel_to_json_lists.py nicht gut genug dokumentiert zu sein, und an einigen Stellen verbesserungswürdig.

Deshalb habe ich versucht, mehr Ordnung hineinzubringen, was mir auch einigermassen gut gelungen ist. Bloss eine Sache konnte ich nicht zufriedenstellend lösen: Der rekursive Aufruf von get_values_from_excel() dient (oberflächlich betrachtet) nur dazu, den Maximalwert von row zu ermitteln. Der zweite Rückgabewert parentnode wird ja verworfen. Doch in Tat und Wahrheit gibt es hier einen Seiteneffekt, den ich gerne eliminieren würde:

In der Funktion get_values_from_excel() wird der mitgegebene Parameter parentnode verändert und dann zurückgegeben. Im rekursiven Funktionsaufruf wird also currentnode verändert, was aber verschleiert wird dadurch, dass der zweite Rückgabewert verworfen wird.

Um diese Funktionsweise expliziter zu machen, ohne effektiv eine Änderung im Programm hervorzurufen, möchte ich eigentlich folgende Änderung vornehmen:

  • Von parentnode wird eine Kopie verändert+zurückgegeben, und der Rückgabewert wird dann currentnode zugewiesen statt verworfen.

Sollte semantisch dasselbe sein wie zuvor. Leider funktioniert es dann nicht mehr. Im letzten Commit (693eb11) ist sichtbar, was ich diese Verschlimmbesserung rückgängig machen musste. Was habe ich falsch gemacht? Wie könnte man es besser machen?

@jnussbaum jnussbaum marked this pull request as ready for review February 15, 2022 15:28
Copy link

@irinaschubert irinaschubert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to tell what exactly went wrong in your attempt to remove the side effect as there is quite a lot of change in the whole code. I would need to invest more time to investigate the issue. It might have something to do with your method returning two values now instead of just row. This probably changes the whole behaviour of the method.


if not lists:
if 'project' not in data_model or 'lists' not in data_model['project']:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to check if project ist in data_model?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I cannot be sure that data_model is a valid JSON data model, I want to do all checks beforehand. Afterwards, I can go on.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I don't really see why this is needed here. Shouldn't this be checked somewhere else as it has nothing to do with the lists?

"""
Gets all list definitions from a data model and expands them to JSON if they are only referenced via an Excel file
Get all list definitions from a data model and expand them to JSON if they are only referenced via an Excel file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"gets" and "expands" was correct, the docstring should be descriptive not directive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, changed it back

return []

lists = data_model['project']['lists']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the construction with .get() much better because it either returns None or something. If someone ever removes line 19f., there would be a possibility of line 22 to fail if lists is not present. This would cause an uncaught exception and the program to crash.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but I like to have a clear separation between all tests (the extensive testing on line 19 you wondered about) and the straight access (line 22).

If I use the construction with .get() thoroughly, it would need to be lists = data_model.get('project').get('lists'). But then, the first get could return None.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's true, you would have to take it apart into two operations.

# check if the folder parameter is used
if rootnode.get("nodes") and isinstance(nodes, dict) and nodes.get("folder"):
if nodes and isinstance(nodes, dict) and nodes.get("folder"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to check if nodes... you would have to use rootnode.get('nodes') on line 26. Because rootnode['nodes'] never returns None. Instead it throws an exception if nodes is not present. In the actual version it's done wrong, becuase if nodes would ever be None, it wouldn't get to that line...

A detail, but it would be nice to use 'folder' instead of "folder" here as well :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the hint, it was really inconsistent. I improved it.

rootnode, excel_files = prepare_list_creation(excel_folder, rootnode.get("name"), rootnode.get("comments"))

prepared_rootnode, excel_files = prepare_list_creation(
nodes['folder'], str(rootnode['name']), rootnode['comments']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure but we should probably also check if rootnode['name'] and rootnode['comments'] is available. Why do we need the explicit str() for rootnode['name']?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I expanded the checks.

I had used str() to make Mypy happy. But I didn't do it in a good way. I think it's better now, after I implemented your feedback.

knora/dsplib/utils/excel_to_json_lists.py Outdated Show resolved Hide resolved
knora/dsplib/utils/excel_to_json_lists.py Outdated Show resolved Hide resolved
knora/dsplib/utils/excel_to_json_lists.py Outdated Show resolved Hide resolved
knora/dsplib/utils/excel_to_json_lists.py Outdated Show resolved Hide resolved
knora/dsplib/utils/excel_to_json_lists.py Show resolved Hide resolved
jnussbaum and others added 2 commits February 17, 2022 09:24
Co-authored-by: irinaschubert <irina.schubert@dasch.swiss>
jnussbaum and others added 2 commits February 17, 2022 15:06
Co-authored-by: irinaschubert <irina.schubert@dasch.swiss>
@sonarcloud
Copy link

sonarcloud bot commented Feb 17, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link

@irinaschubert irinaschubert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jnussbaum jnussbaum merged commit 8a8c9d0 into main Feb 17, 2022
@jnussbaum jnussbaum deleted the wip/dev-431-refactor-excel-to-json-list branch February 17, 2022 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants