Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line numbers #647

Open
wants to merge 53 commits into
base: main
Choose a base branch
from

Conversation

acoleman2000
Copy link
Contributor

@acoleman2000 acoleman2000 commented Jan 13, 2023

I made changes to python_codegen.py, python_codegen_support.py, and introduced a test file test_line_numbers.py that intergrates with the test suite.

I identified several blockers within the current code preventing line numbers from being associated with keys during the saving process.

During the loading process, the cwl is read in and saved as a CommentedMap, which has associated line numbers. However in the _document_load method in python_codegen_support.py the CommentedMap was replaced with a dictionary

       doc = {
            k: v
            for k, v in doc.items()
            if k not in ("$namespaces", "$schemas", "$base")
           } 

I replaced this code with

        if "$namespaces" in doc:
            doc.pop("$namespaces")
        if "$schemas" in doc:
            doc.pop("$schemas")
        if "$base" in doc:
            doc.pop("$base")

to keep doc in CommentedMap form.

Additionally, I noticed in the fromDoc method doc was being set to None or overriden to be something else, so I saved the original passed in doc as self._doc, following the naming conventions.

I wanted to use the lc info from the original YAML passed in, so I modified the save method for each class to take in line_numbers, a CommentedMap. If line_numbers isn't null, it replaces the self._doc field. This is done to save the original CommentedMap and propagate it downwards.

python_codegen_support.py

I added several methods.

I added a method that extracts the max_line (+ 1) number from a CommentedMap. This iterates through the child with the highest line number until it reaches the end). This is used to insert the line column info for new fields in the returned doc.

I added a method that adds a the kv lc info into the returned doc. This is the real meat of the change. This takes a CommentedMap to insert into, an old CommentedMap, a dictionary of line numbers, and a dictionary of line numbers to maximum col used in the line, and a max_len variable. First the method checks if the key is in the line numbers, and then inserts the old lc info directly info the new Commented Map. Then, if the key isn't in the line numbers, it checks if the value is in the line numbers and inserts it using that line number with an adjusted column number (based on the length of the key and the maximum col for that line). It then checks if the value is in the old_doc, and inserts with that lc information. Finally, if neither the key or the val is the line numbers, it inserts it to max_len, and increases max_len by 1. It has appropriate logic for DSL expansion:

elif isinstance(val, str):  # Logic for DSL expansion with "?"
            if val + "?" in line_numbers:
                line = line_numbers[val + "?"]["line"] + shift
                if line in inserted_line_info:
                    line = max_line
                col = line_numbers[val + "?"]["col"]
                new_doc.lc.add_kv_line_col(key, [line, col, line, col + len(key) + 2])
                inserted_line_info[line] = col + len(key) + 2

I added a method that pulls out the lc info for all kv pairs in a Commented doc. For example, if a CommentedMap was like orderddict("key, "value") with lc info ["key": [1, 0, 1, 6]] it would return {"key": {"line":1, "col": 0}, "value':{"line":1, "col":6}}

I also modified the save method. It changes the return type from list/dict to CommentedSeq/CommentedMap, takes in a doc field, and if the k/v pair is in the doc, it adds the lc info to the return type.

I added a method, iterate_through_doc, that has no type check and takes a list of keys, and iterates through the global doc to the appropriate place. It has no type check since it goes from CommentedMap -> CommentedSeq before eventually ending up at a CommentedMap (or None)

python_codegen.py

I modified several things in python_codegen.py

First, I modified the fromDoc attribute to save the self._doc attribute to the class.

I modified the save method. I changed the return type r from dict to CommentedMap. I added the code to override the self._doc, calculate max_len, line_numbers, and set an empty dictionary to store col info. I also updated max_len after inserting each class attribute to r by calling add_kv, which also adds the lc value to r.

To prevent issues of something like the outputs key being before an inputs key and overexpanding, causing inconsistency with line numbers, I iterate through all keys in the line number doc and add the line numbers, before going through all attributes like normal.

                if isinstance(key, str):
                    if hasattr(self, key):
                        if getattr(self, key) is not None:
                            #add lc info

Additionally, due to array expansion and DSL expansion, sometimes there is a shift down. To appropriately make sure everything ends up on the same line, I added shift counter that says how many lines to shift down for a value.

test_line_numbers

I added 3 tests.

  • One test is outputs field being before inputs.
  • One test checks secondary files DSL expansion.
  • One test checks type DSL expansion.

@mr-c
Copy link
Member

mr-c commented Jan 15, 2023

Thank you @acoleman2000 for this! Can you run make cleanup?

@tetron
Copy link
Member

tetron commented Jan 17, 2023

To re-create "metaschema.py" do

schema-salad-tool --codegen=python schema_salad/metaschema/metaschema.yml > schema_salad/metaschema.py

r = CommentedMap()
if line_info is not None:
self._doc = line_info
if (type(self._doc) == CommentedMap):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally use isinstance. The difference is that a direct comparison on type will not match subclasses, whereas isinstance is true both for the same class and for subclasses.
Practically that may not matter if CommentedMap doesn't have any subclasses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 149b1ba.

schema_salad/python_codegen_support.py Outdated Show resolved Hide resolved
for key in val:
if doc:
if isinstance(key, (int, float, bool, str)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the key shouldn't ever be anything other than a string, so you can either assume it is a string, or keep the type check but throw an error on a non-string subscript.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in commit 0522ce8

@@ -170,6 +170,8 @@ def begin_class(
self.out.write(" pass\n\n\n")
return

field_names.append("_doc")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the fields of the model (which are closely related to, but not exactly the same, as the fields of the class being generated). Why are you adding this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted the original CommentedMap that had line column information to be associated with the top level class (e.g., Workflow) in order to update the attributes that are being saved. I wasn't sure how to save this Commented Map besides saving it as a class attribute. The save method called by the user goes directly to the save method within the class, so I didn't see a way of passing the Commented Map to the first call to save. If you have any ideas of doing this an alternate way I would be happy to reformat it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually changed it to have a global doc variable and have the save methods take a list of strings representing what level in the doc that object is. This is useful for the second issue as well, since there are places that error messages need line numbers appended where no version of the doc is present.

schema_salad/tests/test_line_numbers.py Outdated Show resolved Hide resolved
schema_salad/python_codegen_support.py Outdated Show resolved Hide resolved
schema_salad/python_codegen_support.py Outdated Show resolved Hide resolved
Comment on lines 334 to 344
for key in keys:
if isinstance(doc, CommentedMap):
doc = doc.get(key)
elif isinstance(doc, (CommentedSeq, list)) and isinstance(key, int):
if key < len(doc):
doc = doc[key]
else:
doc = None
else:
doc = None
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need some discussion about what's going on here. It looks like you're using "keys" to find a path through the original document, to find the leaf node that has the line number info we want.

What happens when you have a field with mapSubject and it's been converted (internally) from a dict to a list? In this case it is intentional for save() to emit the normalized form, which is the list form, but that may or may not correspond to the original document, depending on the original document used the list form or the dict form.

if doc:
if i in doc:
r.lc.data[i] = doc.lc.data[i]
new_keys.append(i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append is a destructive modification, so appending to new_keys is also modifying the contents of keys which is probably not what you intended.

@@ -43,6 +43,9 @@
IdxType = MutableMapping[str, Tuple[Any, "LoadingOptions"]]


doc_line_info = CommentedMap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doc_line_info being a global variable, how about having save() take LoadingOptions and using original_doc ?

saved_val = saved_val[0]
r["{fieldname}"] = saved_val

max_len = add_kv(old_doc = doc, new_doc = r, line_numbers = line_numbers, key = "{fieldname}", val = r.get("{fieldname}"), max_len = max_len, cols = cols)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have r, key and val together already, I would consider moving the assignment of r["{fieldname}"] = saved_val into add_kv() so the add_kv() method is responsible for setting both the value and metadata of the entry at the same time.

@codecov
Copy link

codecov bot commented Jun 8, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (138e249) 83.68% compared to head (be53207) 83.63%.

❗ Current head be53207 differs from pull request most recent head 3afd4b0. Consider uploading reports for the commit 3afd4b0 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #647      +/-   ##
==========================================
- Coverage   83.68%   83.63%   -0.06%     
==========================================
  Files          22       22              
  Lines        4580     4497      -83     
  Branches     1239     1242       +3     
==========================================
- Hits         3833     3761      -72     
+ Misses        483      470      -13     
- Partials      264      266       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tetron tetron marked this pull request as ready for review June 22, 2023 21:12
Copy link
Member

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting! I left some comments. Would be nice if this PR doesn't add 100,000+ lines

@@ -46,7 +47,7 @@ help: Makefile
## cleanup : shortcut for "make sort_imports format flake8 diff_pydocstyle_report"
cleanup: sort_imports format flake8 diff_pydocstyle_report

## install-dep : install most of the development dependencies via pip
## install-dep : inshttps://github.com/common-workflow-language/cwltool/issues?q=is%3Aissue+is%3Aopen+author%3Atom-tantall most of the development dependencies via pip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Comment on lines +529 to +536
# names = []
# for name in field_names:
# names.append("('%s', 0)"%name)

# self.serializer.write(
# fmt(f"""ordered_attrs = CommentedMap(["{', '.join(names)}])\n""", 4)
# )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# names = []
# for name in field_names:
# names.append("('%s', 0)"%name)
# self.serializer.write(
# fmt(f"""ordered_attrs = CommentedMap(["{', '.join(names)}])\n""", 4)
# )

return max_len + 1, inserted_line_info


@no_type_check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this and schema_salad/cwl_v1_2.py ?


from ruamel.yaml.comments import CommentedMap

import schema_salad.tests.cwl_v1_2 as cwl_v1_2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we keeping schema_salad/tests/cwl_v1_2.py up to date? Maybe it would be better to generate that when the tests are run..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants