-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Tabular data block that can handle simple CSVs and text files #592
base: main
Are you sure you want to change the base?
Conversation
Passing run #1835 ↗︎
Details:
Review all test suite changes for PR #592 ↗︎ |
Awesome!! This is going to be super useful. Already works for me for a basic csv file, but some thoughts on possible future developments: I agree that a UI to set the parameters for read_csv would be great. I've been thinking of something with live feedback (i.e. you can see the file, and see how changing the parameters splits it up differently, like some spreedsheet applications can do). That's probably a longer term goal, but for now, just having options for Adding a UI for naming/renaming columns and potentially dropping spurious columns would be the other useful feature, but would take more work. Add a button to open in google drive?? Incidentally, by adding |
A few more immediate comments:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some thoughts
Should be merged after #590 |
bac0e6e
to
1c33b25
Compare
… and plots its columns
for more information, see https://pre-commit.ci
Co-authored-by: Josh Bocarsly <32345545+jdbocarsly@users.noreply.github.com>
1c33b25
to
73ec9f8
Compare
I think this is good enough for a first hash, would be good to get in before 0.4.0 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #592 +/- ##
==========================================
- Coverage 67.27% 67.18% -0.09%
==========================================
Files 62 62
Lines 3746 3782 +36
==========================================
+ Hits 2520 2541 +21
- Misses 1226 1241 +15
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed these warnings, and also tweaked the pandas settings so that we can at least read the awkward Raman text files in our repo and a simple csv (with tests). Obviously there's no global solution for this unless we build out a whole UI, but think this should at least fail nicely... |
Also I couldn't repro this, so hopefully I fixed it through other changes |
This PR adds a simple block that passes a data file through
pd.read_csv()
and allows you to make a selectable scatter plot of its columns. It also adds a base componentBokehBlock
that can probably be used by several other blocks with minor tweaking of the supported extensions (which should be dynamic soon anyway).We could make this increasingly useful by being more robust with
read_csv
args and maybe wrapping our own file reader. e.g., I took some random SECM data and realised that the header is formatted in such a way that pandas fails to read it. I think one nice idea could be to write a wrapper toread_csv
that does a binary search to detect the header 'properly', or b) reads a CSV file in reverse, populates the values until the first "broken" line, then treats them as column headers (this could actually be very useful generally, e.g., within pandas itself).