Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add minimal Excel loader and supporting tests for Excel & CSV #46

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

neiljp
Copy link

@neiljp neiljp commented May 11, 2018

While the wrapper around pandas is minimal, I thought it would be useful to add tests for the CSV support, with fixture files, and then extend this to Excel. I used libreoffice to load the csv and save to two different excel (.xlsx) formats. I'm not sure there's a material or time file available in the current repo, so I've explicitly marked that as unsupported until there's more detail.

It would be great to get some feedback on this before I proceed further, but of course there are many extensions possible, such as loading each component of the data from different sheets, including within the same excel file.

@ricklupton
Copy link
Owner

This looks good to me! I like the tests for loading from files.

I think most people who want to load from Excel would want to load different components from different sheets within the same workbook, as you say. But it would be nice to be flexible about using different files too.

Perhaps an API like this? But definitely open to suggestions if you have a better idea.

Dataset.from_excel(flows_filename, process_filename=None, material_filename=None, 
                   time_filename=None, flows_sheet=None, process_sheet=None, 
                   material_sheet=None, time_sheet=None)
  • from_excel('workbook.xlsx') would load just the flows from the first sheet
  • from_excel('workbook.xlsx', flows_sheet='Flows') would load just the flows from the sheet called 'Flows'
  • from_excel('workbook.xlsx', flows_sheet=2) would load just the flows from the 3rd sheet (this isn't essential but it would just work with pandas I think)
  • from_excel('workbook.xlsx', flows_sheet='Flows', process_sheet='Processes') would load flows and processes from same workbook
  • from_excel('workbook.xlsx', process_filename='processes.xlsx', process_sheet='Processes') would load flows from the first sheet in 'workbook.xlsx' and processes from the 'Processes' sheet of 'processes.xlsx'

@ricklupton
Copy link
Owner

For the material and time tables, you're right, we don't actually have an example of using this at the moment. It could look like this, if you want to add one.

Materials:

id edible_skin
bananas no
oranges no
apples yes

Time:

id day_of_week
2011-08-01 Monday
2011-08-02 Tuesday
... ...

@ricklupton
Copy link
Owner

Also, while you're at it please could you add yourself to the list of contributors at the end of the README (assuming you're happy to be listed there)?

@ricklupton
Copy link
Owner

Hi @neiljp, just wondering if you are still planning to do any more on this? No hurry if so, just let me know at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants