ENH : improvement csv processing #517

kalounis · 2023-12-19T20:47:28Z

Pull request type

Code changes (bugfix, features)

Checklist

Tests for the changes have been added (if needed)
Docs have been reviewed and added / updated
Lint (black rocketpy/ tests/) has passed locally
All tests (pytest --runslow) have passed locally
CHANGELOG.md has been updated (if relevant)

Current behavior

The current code isn't able to deal with CSV files that contain a header.

New behavior

Now, when we create an instance of the Function Class, we can give for argument a CSV file with a header : the code will process this CSV file (drop NaN values) and will be able to deal with the header.

Breaking change

No

Additional information

Enter text here...

… to clear the CSV file and save a CSV without header

…he dataset by droping NaN values. I added my functions in tools.py.

Gui-FernandesBR

Good work! I liked the way you separated the file handler function into different functions within the tools.py module

This is my first review, I will later check documentation and tests, but you can already work on the suggestions.

Btw I think this function can be used in other parts of the code as well.

Gui-FernandesBR · 2023-12-19T20:53:55Z

cleaned_data.csv

did you commit this file by mistake?

Yes I forgot to delete it, sorry!

rocketpy/mathutils/function.py

rocketpy/tools.py

Co-authored-by: Gui-FernandesBR <guilherme_fernandes@usp.br>

…ocketPy-Team/RocketPy into enh/improvement-csv-processing

…h open"

phmbressan · 2023-12-19T21:38:55Z

Could you elaborate a bit more on why is this implementation an advantage over the current one, forgive me if I am mistaken, but my interpretation is that:

The current way the CSVs are read follows the numpy.loadtxt in a try-except clause: numpy will try to convert the CSV to an array of floats, if there is an error, we skip the first header line. If there is still an error, the input is likely bad formatted.
This new implementation creates aditional functions to process the data: the headers will be checked by trying a float() cast.

This seems to the same thing to me, but in the first case we can delegate to numpy the trouble of reading and not maintain these new functions.

We had recently a PR (#485 and #495) to handle CSV headers and there is even a test for it: test_func_from_csv_with_header.

kalounis · 2023-12-19T21:58:28Z

Hello @phmbressan, this code is not only able to skip the first row but also to drop NaN values which it is not the case in the current code. Moreover, the current code is not very efficient dealing with these cases (from my perspective). Try with the following CSV file that @Gui-FernandesBR gave me. The current code throws an error whereas mine deals perfectly with it. Let me know if you want a complete report on this code!
test_dataset_with_header.csv

phmbressan

Thanks to your clarifications @kalounis , now I understand the reasoning behind this PR. In fact, processing NaN values can be quite useful.

As I commented, we just need to make sure to inform if there were any problems on reading the source. After that, we are likely good to go.

phmbressan · 2023-12-21T19:52:09Z

rocketpy/tools.py

+    # Save the processed data to a new CSV file
+    with open(output_path, "w", encoding="utf-8") as output_file:
+        writer = csv.writer(output_file, delimiter=",")
+        writer.writerows(data_no_headers)


Personally, I don't think writing a new file with the processed data is really needed here.

More important, I think, is that we must make it clear to the user that not everything of their file was used (because there were NaN). So, could you trigger a warning that informs if any lines were skipped on source processing.

The implementation of this warning is up to you, but I believe that a simple boolean that is set to True in the for loop above if there were any skipped lines is enough. Then, if this boolean is True raise the warning.

There are other places in rocketpy that we raise warnings if you want to base the implementation on that. Of course, should you have any doubts don't hesitate in commenting.

Gui-FernandesBR · 2023-12-22T01:13:54Z

tests not passing kinda worries me a bit but I couldn't find enough time to investigate it.

Gui-FernandesBR · 2024-01-25T04:03:11Z

If the problem we are trying to solve is to read .csv files that have NaN values, I think we should consider Changning the np.loadtxt to the np.genfromtxt (see https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html).

This function is a bit more complex and can

Relying in numpy may be more beneficial than using the full pythonic approach, both in terms of maintenance and speed. But this is something that can be discussed.

https://stackoverflow.com/questions/20245593/difference-between-numpy-genfromtxt-and-numpy-loadtxt-and-unpack

klounis added 4 commits December 19, 2023 11:50

ENH : added a method to the Function Class named "data_preprocessing"…

603182e

… to clear the CSV file and save a CSV without header

update in the data_preprocessing method

2e1d213

ENH : the code is now able to deal with CSV with headers. It cleans t…

9c71dd4

…he dataset by droping NaN values. I added my functions in tools.py.

updated the code for the csv processing

36fb82b

kalounis requested review from Gui-FernandesBR, ClementMonaco, Adelite25 and styris00 December 19, 2023 20:47

kalounis requested a review from a team as a code owner December 19, 2023 20:47

github-actions bot requested review from giovaniceotto, MateusStano and phmbressan December 19, 2023 20:47

github-actions bot assigned kalounis Dec 19, 2023

Fix code style issues with Black

44fd8b7

kalounis changed the title ~~Enh/improvement csv processing~~ ENH : improvement csv processing Dec 19, 2023

Gui-FernandesBR reviewed Dec 19, 2023

View reviewed changes

kalounis and others added 5 commits December 19, 2023 22:14

delete print in data_preprocessing function

c76dccb

Co-authored-by: Gui-FernandesBR <guilherme_fernandes@usp.br>

delete a useless "if" statement

c729bc6

Co-authored-by: Gui-FernandesBR <guilherme_fernandes@usp.br>

delet CSV file "cleaned_data.csv"

a985667

Merge branch 'enh/improvement-csv-processing' of https://github.com/R…

3eafa6c

…ocketPy-Team/RocketPy into enh/improvement-csv-processing

In method "return_first_data" in tools.py, changed the open by a "wit…

6fc12dc

…h open"

phmbressan reviewed Dec 21, 2023

View reviewed changes

Gui-FernandesBR marked this pull request as draft January 25, 2024 04:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH : improvement csv processing #517

ENH : improvement csv processing #517

kalounis commented Dec 19, 2023

Gui-FernandesBR left a comment

Gui-FernandesBR Dec 19, 2023

kalounis Dec 19, 2023

phmbressan commented Dec 19, 2023 •

edited

kalounis commented Dec 19, 2023 •

edited

phmbressan left a comment

phmbressan Dec 21, 2023

Gui-FernandesBR commented Dec 22, 2023

Gui-FernandesBR commented Jan 25, 2024

ENH : improvement csv processing #517

Are you sure you want to change the base?

ENH : improvement csv processing #517

Conversation

kalounis commented Dec 19, 2023

Pull request type

Checklist

Current behavior

New behavior

Breaking change

Additional information

Gui-FernandesBR left a comment

Choose a reason for hiding this comment

Gui-FernandesBR Dec 19, 2023

Choose a reason for hiding this comment

kalounis Dec 19, 2023

Choose a reason for hiding this comment

phmbressan commented Dec 19, 2023 • edited

kalounis commented Dec 19, 2023 • edited

phmbressan left a comment

Choose a reason for hiding this comment

phmbressan Dec 21, 2023

Choose a reason for hiding this comment

Gui-FernandesBR commented Dec 22, 2023

Gui-FernandesBR commented Jan 25, 2024

phmbressan commented Dec 19, 2023 •

edited

kalounis commented Dec 19, 2023 •

edited