Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load from CSV #128

Open
kwinkunks opened this issue Mar 25, 2021 · 2 comments
Open

Load from CSV #128

kwinkunks opened this issue Mar 25, 2021 · 2 comments
Assignees
Projects

Comments

@kwinkunks
Copy link
Member

Needs improving. For example:

  • Less strict about column names etc. Allow to use_cols like NumPy/Pandas
  • Allow to load from URL
  • Encodings
  • Handling bad intervals (ignore or error), nulls for missing zones
@kwinkunks kwinkunks added this to To do in T21 Mar 30, 2021
@mtb-za mtb-za moved this from To do to In progress in T21 Apr 16, 2021
@mtb-za mtb-za self-assigned this Apr 17, 2021
@mtb-za
Copy link
Contributor

mtb-za commented Apr 18, 2021

There are a number of cases that should probably be handled:

  1. Only tops are given - bases are inferred to be the next top.
  2. Only bases are given - tops are inferred to be the next base.
  3. Both bases and tops are given
  4. Either bases or tops are given along with a thickness - the missing value is calculated using the thickness.

Currently we can handle the first and third cases. The second should not be too difficult, and if that is working both of the fourth case become essentially analogous. This probably should happen when we build a list of intervals, rather than being something that the from_csv method handles specially. This will let other from_* methods to do the same.

One major change that I am making is to explicitly require a top, base and/or thickness column to be specified, unless names=True is passed, in which case it should find them automatically. We are still assuming that there is a component column or similar exists, which can be used to define those for the interval.

We still need to think about the possible things in an Interval object, to decide what we are going to give to the Interval constructor:

  • Top - self-evident
  • Base - self-evident -- if one of these are missing, we can infer from the next one, or possibly a thickness and one of these.
  • Component - A list of Component objects.
  • Description - Plain-text description that gets parsed into a list of Components. Probably this is what we are going to mostly use, when reading a CSV? Do we need to get a number of Components somewhere? That feels like a tricky problem though.
  • Data - Anything else gets added to this dictionary.

mtb-za added a commit that referenced this issue Apr 18, 2021
Simplifying and extending `from_csv`.

Initial test uses `np.genfromtext` internally. Explicitly expects to be told which columns the `top`, `base` and/or `thickness` are in. Other columns will be added to a dictionary from which the intervals will be created.

Addresses, but does not entirely close #128
@kwinkunks kwinkunks moved this from In progress to Testing in T21 Apr 19, 2021
This was referenced Apr 19, 2021
@mtb-za
Copy link
Contributor

mtb-za commented Apr 20, 2021

https://gist.github.com/mtb-za/3f94ffc426e804e7b2c778c2f0c6f051 has a couple of approaches, one using np.genfromtxt and one using csv.DictReader. Not sure if one appeals to you more than another one.

If we want to get input as something other than strings using either base approach, we need to cast them. We can start with the most specific type: int, then try float, and finally leave them as str. We might be able to handle other things, but that might be tricky to decide what that needs to be cleanly. genfromtxt allows for a sequence of dtypes, which is probably possible, but more difficult with csv.DictReader.

kwinkunks pushed a commit that referenced this issue Apr 23, 2022
* Update striplog.py

Simplifying and extending `from_csv`.

Initial test uses `np.genfromtext` internally. Explicitly expects to be told which columns the `top`, `base` and/or `thickness` are in. Other columns will be added to a dictionary from which the intervals will be created.

Addresses, but does not entirely close #128

* Update `from_csv`

Expanded docs. The will need to be restyled to match the rest of the library, but this will work for now.
Added a series of FutureWarnings for args that we want to remove going forward.
Added some new args.

* Updated `from_csv`

Updated docs and args to be the same.
Changed version for deprecation to 0.9.1.

* Updated test suite

Changed tests to use new `from_csv` method.

The tests were passing in strings. These no need to be explicitly converted to `io.StringIO` objects, along with the `names=True`.

* Updated `from_csv()`

Removed extra args. These are not currently being used, but we might want them later. No need for them for now.

* Remove old `from_csv`

Removed old `from_csv` function - it is no longer being used anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
T21
Testing
Development

No branches or pull requests

2 participants