Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom lines PR two, electric boogaloo - Allows the use of a customized data for lines instead of OSM data #842

Merged
merged 18 commits into from Nov 2, 2023

Conversation

carlosfv92
Copy link
Contributor

@carlosfv92 carlosfv92 commented Aug 23, 2023

Changes made:
*The gitignore file added the custom_line path in the data folder to be recognized
*the config.default file has been modified (lines 104) to include a flag named "use_custom_lines" to define if the model will use the default osm data (False) or the custom file (True)
*the clean_osm_script has been adapted (line 855) to include a conditional (if), linked to the flag in the config file in order to use either the default osm data or the custom data file to create the base dataframe of lines (df_lines)
*the config.tutorial file has been modified (lines 118) to include a flag named "use_custom_lines" to define if the model will use the default osm data (False) or the custom file (True) and run the tests for the PR
*To run the model with custom data, a file has to be created and added in the data folder named "custom_lines.geojson"

Closes # (if applicable).

Changes proposed in this Pull Request

Checklist

  • [ x] I consent to the release of this PR's code under the AGPLv3 license and non-code contributions under CC0-1.0 and CC-BY-4.0.
  • [ x] I tested my contribution locally and it seems to work fine.
  • [ x] Code and workflow changes are sufficiently documented.
  • [ x] Newly introduced dependencies are added to envs/environment.yaml and doc/requirements.txt.
  • [ x] Changes in configuration options are added in all of config.default.yaml and config.tutorial.yaml.
  • Add a test config or line additions to test/ (note tests are changing the config.tutorial.yaml)
  • Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
  • A note for the release notes doc/release_notes.rst is amended in the format of previous release notes, including reference to the requested PR.

carlosfv and others added 2 commits August 23, 2023 13:52
*The gitignore file added the custom_line path in the data folder to be included recognized (custom _lines.geojson has some predefined values as an example)
*the config.default file has been modified (lines 104) to include a flag named "use_custom_lines" to define if the model will use the default osm data (False) or the custom file (True)
*the clean_osm_script has been adapted (line 855) to include a conditional (if), linked to the flag in the config file in order to use either the default osm data or the custom data file to create the base dataframe of lines (df_lines)
*the config.tutorial file has been modified (lines 118) to include a flag named "use_custom_lines" to define if the model will use the default osm data (False) or the custom file (True) and run the tests for the PR
*To run the model with custom data, a file has to be created and added in the data folder named "custom_lines.geojson"
@davide-f
Copy link
Member

Thanks @carlosfv92 :D
Thanks for the revision and I'll add a review tomorrow :)

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @carlosfv92 :D
Great proposal, I've added some comments that may help the experience and hopefully may also be a learning experience.
You are becoming familiar with github, no more barriers ahead, the hours of sleep are our only limitation ahahah

.gitignore Outdated Show resolved Hide resolved
Comment on lines 855 to 865
if snakemake.config["clean_osm_data_options"]["use_custom_lines"] == False:
df_lines = gpd.read_file(input_files["lines"])

elif snakemake.config["clean_osm_data_options"]["use_custom_lines"] == True:
current_directory = os.getcwd()
custom_lines_path = (
os.path.dirname(current_directory) + "\\data\\custom_lines.geojson"
)
df_lines = gpd.read_file(custom_lines_path)
if df_lines.empty:
df_lines = gpd.read_file(input_files["lines"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great proposal and preliminary implementation :D
In the following, I add some comments that may help clean and generalize the already good work :) Hope that this comments may also support the learning :)

As a general approach, since we may want to generalize this code to be applicable to lines/cables/generators/substations, it will be useful to create a function to handle that.
In the following, I will describe the core functionality of that function that basically match and revise the code highlighted here.
The implementation of that with a function may be a second step of revision, if you agree.

I'd propose to revise this block with the following:

Suggested change
if snakemake.config["clean_osm_data_options"]["use_custom_lines"] == False:
df_lines = gpd.read_file(input_files["lines"])
elif snakemake.config["clean_osm_data_options"]["use_custom_lines"] == True:
current_directory = os.getcwd()
custom_lines_path = (
os.path.dirname(current_directory) + "\\data\\custom_lines.geojson"
)
df_lines = gpd.read_file(custom_lines_path)
if df_lines.empty:
df_lines = gpd.read_file(input_files["lines"])
custom_data_option = snakemake.config["clean_osm_data_options"].get("use_custom_data", False)
if custom_data_option == "custom_only":
...
df_lines = gpd.read_file(custom_lines_path)
elif custom_data_option == "add_custom"
....
df_lines = pd.concat([df_osm, df_custom], ...)
else:
df_lines = gpd.read_file(input_files["lines"])

The above is partial code and partial pseudocode, but I hope it gives you a hint.
The idea is also to use the option use_custom_data (maybe better to generalize already) as discriminant to optionally add data, not only as a binary technique.

Regarding the path of the custom line, I'd recommend to have it in the config itself.
We can add a new "custom_data:" block in the config near the beginning of the config where all custom data are added.
for example, we can add:

custom_data:
   custom_lines: {path}

With this, the path could be loaded in the script as: os.path.abspath(config["custom_data"]["custom_lines"])
Note that I think it is best to leave it like this and parse the file with the abspath as:

  • the user may potentially specify an absolute path
  • under windows or linux the paths may be handled in different ways, so may be good to standardize them before its use
  • the code with abspath already implicitly assumes the current directory as the parent of the relative path, so it may be redundant and potentially lead to issues.

After this step, and the code being clear for lines, we could extend it to all entries, using an auxiliary function.
Great proposal! :D

carlosfv and others added 5 commits August 29, 2023 16:49
*A function has been added in the clean_osm_data script to allow the use of custom information for lines, substations and cables, replacing the conditional previously introduced (line 838)
*The dataframe created for lines in clean_osm_data is now using said function (line 853).
*The use of custom lines has been tested to work with the new function proposed using only OSM data (OSM_only), only custom data (Custom_only) and using both files (Add_custom)
*The config.default file has now options for lines, substations and cables to use custom data, osm data or both, and now requires to provide paths for accessing the custom files in the case of existing, otherwise the default value is False (line 104)
*The config.tutorial file has now options to use custom data, osm data or both and now requires to provide paths for accessing the custom files (line 118)
…lines

# Conflicts:
#	config.default.yaml
#	config.tutorial.yaml
#	scripts/clean_osm_data.py
*a repeated custom_path_cables was changed to custom_path_substations in both the config.tutorial and config.default
@carlosfv92
Copy link
Contributor Author

Hi @davide-f ! I now have updated the PR to use a function that could be used for other componentes instead of just a conditional (subtations, cables and lines are considered... generators are left out since they have a separate section linked to the add_custom_powerplant function in the build_powerplants script).
I struggled a bit with the update you recently did since it created conflicts when trying to push the changes in this PR (the clean_osm_data script had some parallel changes). However, I think now everything should be compatible and I can say that the code has been applied and tested for lines (using only osm data, only custom data and adding both sets of data).
I'll test substations and cables in the future, once I have real data to actually test them properly

@ekatef
Copy link
Collaborator

ekatef commented Sep 8, 2023

Hey @carlosfv92! Great work and thanks for the discussion yesterday :)

Could you please try to make pre-commit happy revising format of your comments? They are great and should be kept, but would be nice to have a consistent formatting throughout the code.

Could you also add the new configuration parameters into a configtable? That will help to fetch your parameters into the docs automatically. Probably it can also make easier revising the comments as you can move the details into the configtable keeping in the code the most essential hints only. What do you think?

I also agree with Davide's comment that it absolutely makes sense to git-ignore all geojson files, not only custom_lines.geojson. Would you agree?

carlosfv and others added 3 commits September 26, 2023 12:33
*git-ignore file has been updated to avoid all .geojson files
*details of new parameters have been added to the configtables (clean_osm_data_options.csv)
*the config.default and config.tutorial files have been renamed and comments have been formated to fit PEP 8 recommendations
*clean_osm_data script has been renamed and now the description of the added function follows PEP8 format and matches other functions
*the proposed function in clean_osm_data has now been tested for substations also and is now being used in substations and cables
*single apostrophes have been changed for double apostrophes in the clean_osm_data script
@carlosfv92
Copy link
Contributor Author

Hey @ekatef! sorry for taking so long on replying, I had a couple of things going on lately.
Regarding your suggestions!
-I have changed the git-ignore file and now it should avoid all .geojson files
-details of new parameters have been added to the configtables (clean_osm_data_options.csv)
-both config files (default and tutorial) have been reformated to fit PEP 8 format (trying to follow the guidelines and copy the format you already use)
-format changes have been included also in the clean_osm_data script (description, names, apostrophes and spaces) and the proposed function has been tested for substations now

@ekatef
Copy link
Collaborator

ekatef commented Sep 27, 2023

Hey @ekatef! sorry for taking so long on replying, I had a couple of things going on lately. Regarding your suggestions! -I have changed the git-ignore file and now it should avoid all .geojson files -details of new parameters have been added to the configtables (clean_osm_data_options.csv) -both config files (default and tutorial) have been reformated to fit PEP 8 format (trying to follow the guidelines and copy the format you already use) -format changes have been included also in the clean_osm_data script (description, names, apostrophes and spaces) and the proposed function has been tested for substations now

Hey @carlosfv92!
No worries and thanks a lot for the revision. Looks great, and you also managed to make CI happy. Nice work!

There is a couple of additional (rather minor) comments.

  1. Would you mind use only small letters in the parameters name, that is custom_only instead of Custom_only? I think it would be more consistent with other parameters in the config: e.g. we are using transmission or distribution above (although, there is also capitalised OSM parameter ;) ) There is no any deep meaning in that, just some currently assumed conventions.

  2. Not sure what was a reason to specify a path example in a lengthly format, like "C:\...\...\pypsa-earth\data\custom_substations.geojson"? Wouldn't something like data\custom_substations.geojson work?

Could you please let me know what is your feeling about that? 🙂

Otherwise, I don't have any questions, and my feeling is that your PR is close to be ready to merge.
@davide-f would you like to review?

@carlosfv92
Copy link
Contributor Author

Hey @ekatef, thanks for the comments! here are some responses:
1 Not at all! I'm updating the PR with parameter names without the initial capital letters (only kept for OSM)
2 I guess so (I'm changing it now). To be honest I didn't have a particular reason for the path example other than thinking it looked nice and clear... but! I get that it might be too obvious and therefore unnecessary.

*Capital letters from Add_custom and Custom_only parameters have been removed
*Path examples in the tutorial and default config files have been changed to be more concise
Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @carlosfv92 :D

Great revision.
I see that you adopted for not having the general custom_data option for all rules in the description, but we could go for a first implementation and revise it later.
I've added some comments, but the functionality is there and soon ready to merge :D

This is great!!! :D

@@ -835,6 +835,65 @@ def set_name_by_closestcity(df_all_generators, colname="name"):
return df_all_generators


def use_custom_data_files(custom_component_type):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great proposal :)

To improve the modularity, we could revise this function input by defining as load_network_data(network_asset, clean_options, osm_paths), where:

  • network_asset may be lines/substations/cables/generators
  • clean_options: is the dictionary that will coincide with snakemake.config["clean_osm_data_options"]
  • osm_paths is a dictionary that stores the osm paths (this is basically input_files from main)

In the following, comments will follow to adapt the code accordingly

Comment on lines 846 to 874
# checks the type of data/components to be reviewed
if custom_component_type == "lines":
custom_conditional = snakemake.config["clean_osm_data_options"][
"use_custom_lines"
]
custom_path = snakemake.config["clean_osm_data_options"].get(
"path_custom_lines", False
)

elif custom_component_type == "substations":
custom_conditional = snakemake.config["clean_osm_data_options"][
"use_custom_substations"
]
custom_path = snakemake.config["clean_osm_data_options"].get(
"path_custom_substations", False
)

elif custom_component_type == "cables":
custom_conditional = snakemake.config["clean_osm_data_options"][
"use_custom_cables"
]
custom_path = snakemake.config["clean_osm_data_options"].get(
"path_custom_cables", False
)

else:
raise ValueError(
"Invalid custom_component_type. Expected 'lines', 'substations', or 'cables'."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code can be revised to drop the if-elif case, such as

Suggested change
# checks the type of data/components to be reviewed
if custom_component_type == "lines":
custom_conditional = snakemake.config["clean_osm_data_options"][
"use_custom_lines"
]
custom_path = snakemake.config["clean_osm_data_options"].get(
"path_custom_lines", False
)
elif custom_component_type == "substations":
custom_conditional = snakemake.config["clean_osm_data_options"][
"use_custom_substations"
]
custom_path = snakemake.config["clean_osm_data_options"].get(
"path_custom_substations", False
)
elif custom_component_type == "cables":
custom_conditional = snakemake.config["clean_osm_data_options"][
"use_custom_cables"
]
custom_path = snakemake.config["clean_osm_data_options"].get(
"path_custom_cables", False
)
else:
raise ValueError(
"Invalid custom_component_type. Expected 'lines', 'substations', or 'cables'."
)
try:
custom_opt = clean_options[f"use_custom_{network_asset}"]
custom_path = clean_options[f"path_custom_{network_asset}"]
catch:
logger.error(f"Missing use_custom_{network_asset} or path_custom_{network_asset} options in the config file")

May you test if this code works but keep the functionality?

loaded_df = pd.concat([loaded_df1, loaded_df2], ignore_index=True)

else:
raise ValueError("Invalid custom_conditional value.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, avoid raising errors, but a warning may be preferable.

With missing entries, we could adopt the OSM_only functionality.
To do so in a clean way, you could reorder the ifs like in the following (pseudocode).

if custom_only:
  do custom only
elif add_custom:
  do add_custom:
else:
  if custom_conditional != "OSM_only":
      logger.warning(f"Unrecognized option {custom_conditional} for handling custom data of {network_asset}." +
                                 "Default OSM_only option used. ")
  do OSM_only

@@ -852,7 +911,7 @@ def clean_data(

if os.path.getsize(input_files["lines"]) > 0:
# Load raw data lines
df_lines = gpd.read_file(input_files["lines"])
df_lines = use_custom_data_files("lines")
Copy link
Member

@davide-f davide-f Sep 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like the proposal above with the definition like load_network_data(network_asset, clean_options, osm_paths), then we need to these function calls can be revised as:
load_network_data("lines", clean_options, input_files)

where clean_option shall be added as argument to the function clean_data and called with value snakemake.params.clean_osm_data_options.
Note the params key: after the PR #823 , the input parameters of the script shall be first passed to snakemake using the key params and then loaded in the script with snamake.params, see

pypsa-earth/Snakefile

Lines 182 to 184 in 1e19ef1

params:
crs=config["crs"],
clean_osm_data_options=config["clean_osm_data_options"],

By passing the entire clean_options dictionary, we could clean the arguments of the function clean_data, but this may be optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @davide-f ! Thanks for all the comments, they were super useful to make the PR more concise. Some comments on this update:

  • I adopted the "try" command to simplify the ifs and elifs in the code (which did cut down a bunch of lines and made everything nicer to read)
  • I also used the dynamic calling of the "f-string" which simplified the number of variables needed to be defined so, at the end, the network_asset was the only argument needed (aside from the inputs from the config file)

Let me know how it looks and if it needs any additional changes!

carlosfv and others added 2 commits October 30, 2023 17:32
*In the cleaon_osm_data script the function use_custom_data has been renamed (load_network_data) and adapted to use a try block instead of conditionals to make the code more concise
*Lines used to raise errors have been replaced to raise warnings and, if missing inputs for using custom data are found, OSM data is used
*details of parameters in the configtables (clean_osm_data_options.csv) have been changed since variables have been reduced with the new code and now only consider a description of the the data_options that can be used in the input information from the config file
Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic @carlosfv92 :D almost ready to go :) significant improvements!
Don't forget also the line in release_notes

@@ -835,6 +835,53 @@ def set_name_by_closestcity(df_all_generators, colname="name"):
return df_all_generators


def load_network_data(network_asset):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic!!! :D
This function is what we need :)
As little comments: please, add an argument to the dunction like "clean_osm_data_options".
We try to avoid using global values within functions such as "snakemake" as this creates nested dependencies that are hard to maintain.
Moreover, please reduce the number of empty lines: no empty line between a comment and the code for example.
Moreover, no empty line after the docstring ending with """

@@ -852,7 +899,7 @@ def clean_data(

if os.path.getsize(input_files["lines"]) > 0:
# Load raw data lines
df_lines = gpd.read_file(input_files["lines"])
df_lines = load_network_data("lines")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since each load_network_data shall include a parameter "clean_osm_data_options", this needs to be added.
This argument needs to be added to the function clean_data as well.
Note also that parameters are now passed using "snakemake.params" instead of "snakemake.config", so
clean_osm_data_options=snakemake.params["clean_osm_data_options"]

Cool! :D
It would be nice to externalize the custom_data_paths, but in this PR we can keep this structure :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @davide-f ! thanks for the comments, I think these have been taken care of with some considerations:
*For spaces in the code, extra empty lines or separations were deleted (I tried to match the format of other functions)
*Regarding the additional parameter in the load_network_data function, this has been added as "data_options" (I used that name given that when I used "clean_osm_data_options" I got some naming convention errors)
*All sections where the load_network_data function is used have been updated to include the new argument (data options, which is defined at the end of the code with the rest of the variables)
*The data_options parameter has been added in the clean_data function as well, and the argument was passed as suggested (data_options=snakemake.params["clean_osm_data_options"]).
*A line has been added in the release_notes explaining the function but it seems I don't have writing access to change it (to keep it consistent with the latest version I updated the file before adding the note)

I hope that all of this makes sense and that I understood everything correctly, but again, let me know if something else needs to be done.

carlosfv added 3 commits November 1, 2023 14:36
*load_network_data has an additional argument (data_options) considered to avoid calling the dictionary with snakemake in the function
*line explaining the PR added to the release_notes file
*relase_notes have been updated
*release_notes added the description of the function created for the PR
@davide-f davide-f merged commit b95311e into pypsa-meets-earth:main Nov 2, 2023
4 checks passed
@davide-f
Copy link
Member

davide-f commented Nov 2, 2023

Congratulations @carlosfv92 for the merged PR :D
In terms of commits, this PR may have been cleaner, but we get better with more contributions :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants