Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add delete wrongly encoded files #285

Merged
merged 9 commits into from Apr 21, 2024

Conversation

padjal
Copy link
Contributor

@padjal padjal commented Apr 18, 2024

Added a script to check for wrongly encoded files as per issue #169

@yegor256
Copy link
Owner

@padjal how do you know that it works? We can't have a script in the repo without a test that covers it. See how other filters are implemented, together with their tests. Also, run make test before making a pull request

@yegor256
Copy link
Owner

@padjal also, see this: #173

@padjal
Copy link
Contributor Author

padjal commented Apr 19, 2024

@yegor256 I added tests and made some small changes to the filter. Please check.

@padjal
Copy link
Contributor Author

padjal commented Apr 19, 2024

@yegor256 After I noticed that I have a lint error, I corrected the code so that it confirms the repo standards. Could you run the other workflows?

@yegor256
Copy link
Owner

@padjal still issues with the build. Try make test locally

@padjal
Copy link
Contributor Author

padjal commented Apr 19, 2024

@yegor256 I fixed the linting errors.

@yegor256
Copy link
Owner

@padjal see how other scripts are designed, for example this one: https://github.com/yegor256/cam/blob/master/filters/070-delete-invalid-files.sh It calls delete-invalid-files.py for each file

@padjal
Copy link
Contributor Author

padjal commented Apr 19, 2024

@yegor256 You are right. I added an executions script for the filter to go through all files.

@yegor256
Copy link
Owner

@padjal now, it may be reasonable to add a test for this .sh file. Your PR only tests the .py file.

@padjal
Copy link
Contributor Author

padjal commented Apr 19, 2024

@yegor256 I have added a test for the execution script as well.

@padjal
Copy link
Contributor Author

padjal commented Apr 20, 2024

@yegor256 As advised by the lint check tool:

************* Module delete-wrong-encoded
filters/delete-wrong-encoded.py:44:17: R1714: Consider merging these comparisons with 'in' by using 'encoding in ('ascii', 'UTF-8')'. Use a set instead if elements are hashable. (consider-using-in)

, I have changed my code to conform to these requirements and used encoding in ('ascii', 'UTF-8) instead of (encoding == 'ascii' or encoding == 'UTF-8')

@yegor256
Copy link
Owner

@padjal I believe, the problem now is that you delete not only .java files, but also files inside .git directories (but I'm not sure)

@padjal
Copy link
Contributor Author

padjal commented Apr 20, 2024

@yegor256 Why do you think that files in the .git directories will also be deleted? What suggestion do you have to mitigate this?

confidence = result['confidence']

if not (encoding in ('ascii', 'UTF-8') and confidence == 1.0):
print('found file', java)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@padjal BTW, this debug line looks redundant

yegor256 added a commit that referenced this pull request Apr 21, 2024
@yegor256
Copy link
Owner

@padjal the problem is this: you add a new filter, which deletes the unparseable file that is expected to be deleted by another filter, in test-filter.sh. See my comment: 9411f82

@padjal
Copy link
Contributor Author

padjal commented Apr 21, 2024

@yegor256 How do you think that we can work around this? If the file has a wrong encoding, then it should be deleted. After all, this is why we created this filter. Another option which I see is to move the filter further back after the move-gits-back filter. What would you suggest?

@yegor256
Copy link
Owner

@padjal you can just add one more Java file with a broken syntax to the test-filter.sh test. Now, the file with a broken syntax is at the same time a file with a non-unicode encoding (I believe). That's why your filter kills it, and the 040-delete-unparseable.sh doesn't anything to delete - this is what test-filter.sh complains about.

@padjal
Copy link
Contributor Author

padjal commented Apr 21, 2024

@yegor256 That's a good idea. I just added one more file with wrong encoding to the test-filter.sh script.

Now one of the other tests doesn't complete. Strange thing is that this same test runs successfully locally.

yegor256 added a commit that referenced this pull request Apr 21, 2024
@yegor256 yegor256 merged commit e020f08 into yegor256:master Apr 21, 2024
8 of 9 checks passed
@yegor256
Copy link
Owner

@padjal looks good now, thanks!

@padjal padjal deleted the delete-wrong-encoding branch April 21, 2024 08:49
padjal pushed a commit to padjal/cam that referenced this pull request Apr 22, 2024
padjal pushed a commit to padjal/cam that referenced this pull request Apr 22, 2024
padjal pushed a commit to padjal/cam that referenced this pull request May 6, 2024
padjal pushed a commit to padjal/cam that referenced this pull request May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants