Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading and writing all LAS 3.0 features #5

Open
3 of 9 tasks
kinverarity1 opened this issue Jul 8, 2015 · 22 comments
Open
3 of 9 tasks

Support reading and writing all LAS 3.0 features #5

kinverarity1 opened this issue Jul 8, 2015 · 22 comments
Labels
las3 stuff relating to LAS 3.0
Milestone

Comments

@kinverarity1
Copy link
Owner

kinverarity1 commented Jul 8, 2015

LAS 3 specification: https://github.com/kinverarity1/lasio/blob/main/standards/LAS_3_File_Structure.pdf

Tasks:

Update May 2020: I will start to sketch out a roadmap for how to achieve this. I think once this is reasonably well tested we can do a version 1 release.

Goals:

  • Assume all LAS files are version 3 for parsing, even if they have a VERS code of 2.0. Only avoid this if we have to for performance.
  • Aim to improve reading performance times, it's really bad at the moment

Because I expect this work might require a broken branch for a while, let's merge into the las3-develop branch if we need to.

@kinverarity1
Copy link
Owner Author

Current error is not very graceful:

D:\study\2015\las_kgs_test\2014\1044761865.las
Traceback (most recent call last):
  File ".\basic_read_test.py", line 20, in <module>
    l = las_reader.read(fn)
  File "d:\work\dewnr\logging_software\las_reader\las_reader\__init__.py", line 5, in read
    return las.LASFile(file)
  File "d:\work\dewnr\logging_software\las_reader\las_reader\las.py", line 132, in __init__
    self.read(file, **kwargs)
  File "d:\work\dewnr\logging_software\las_reader\las_reader\las.py", line 149, in read
    self.well = reader.read_section('~W')
  File "d:\work\dewnr\logging_software\las_reader\las_reader\las.py", line 403, in read_section
    parser = SectionParser(section_name, version=self.version)
  File "d:\work\dewnr\logging_software\las_reader\las_reader\las.py", line 487, in __init__
    section_orders = ORDER_DEFINITIONS[self.version][self.section_name2]
KeyError: 3.0

@VelizarVESSELINOV
Copy link
Contributor

For me two key interesting parts of LAS 3 are:

  1. management of arrays (multi dimensional curves)
  2. management of string curves (like geological markers)
  3. management of reference in date-time

I will be very interesting to see this functionality available.

@kinverarity1 kinverarity1 added this to the v1 milestone Feb 1, 2016
@kinverarity1
Copy link
Owner Author

This will be addressed in PR #106 and ultimately #61

Version number will be bumped to 1 when LAS 3 files can be read in.

@roliveira
Copy link
Contributor

👍

I've been looking on the overall content of the package, specifically the parser and I would like to see what you think about the following suggestions:

  • Extend HeaderItem to accept extra items to comply with the LAS 3.0 standard.
    Your current design is this:
 HeaderItem = namedlist("HeaderItem", ["mnemonic", "unit", "value", "descr"])

What I'm proposing is to include two extra items, "format" and "association":

 HeaderItem = namedlist("HeaderItem", ["mnemonic", "unit", "value", "description", "format", "association"])

Those items would be empty in case "VERS" is 1.2 or 2.0. I also think this would ease the support of datetime objects as suggested in #1.

  • Drop the "curves" in favor of "definition" and keep a consistency with the 3.0 new naming convention:

~Section_Parameter, ~Section_Definition, and ~Section_Data

It would be some sort of always treat as 3.0 first approach...

Anyway, just my two cents to align my thoughts before doing some actual work.

@kinverarity1
Copy link
Owner Author

Hey, thanks for the help!

1 - Extend HeaderItem

Yes, sounds excellent.

2 - Drop the "curves" in favor of "definition"

Yep! Sounds good.

I had another read of the v3 spec. How does the below sound?

  • We provide access to each section via a dictionary (maybe hybrid class) in LASFile.sections
  • We provide a LASFile.data_sets dictionary (maybe hybrid class) which then (always) has keys 'Parameter', 'Definition', 'Data', the values of which are references to the relevant section in LASFile.sections.

e.g. for the example file in the LAS v3 spec:

LASFile.sections = {                Type                Example ID
    'Version':                  SectionItem()               1
    'Well':                     SectionItems()              2
    'Parameter':                SectionItems()              3
    'Curve':                    SectionItems()              4
    'Drilling_Parameter':       SectionItems()              5
    'Drilling_Definition':      SectionItems()              6
    'Drilling_Data':            np.ndarray / pd.DataFrame   7
    'Core_Parameter[1]':        SectionItems()              8
    'Core_Definition[1]':       SectionItems()              9
    'Core_Data[1]':             np.ndarray / pd.DataFrame   10
    'Core_Parameter[2]':        SectionItems()              11
    'Core_Definition[2]':       SectionItems()              12
    'Core_Data[2]':             np.ndarray / pd.DataFrame   13
    'Inclinometry_Parameter':   SectionItems()              14
    'Inclinometry_Definition':  SectionItems()              15
    'Inclinometry_Data':        np.ndarray / pd.DataFrame   16
    'Test_Parameter':           SectionItems()              17
    'Test_Definition':          SectionItems()              18
    'Test_Data':                np.ndarray / pd.DataFrame   19
    'Tops_Parameter':           SectionItems()              20
    'Tops_Definition':          SectionItems()              21
    'Tops_Data':                np.ndarray / pd.DataFrame   22
    'Perforation_Parameter':    SectionItems()              23
    'Perforation_Definition':   SectionItems()              24
    'Perforation_Data':         np.ndarray / pd.DataFrame   25
    'Ascii':                    np.ndarray / pd.DataFrame   26
}

and the data_sets variable for this would be:

LASFile.data_sets = {
    'ASCII': {
        'Parameter':            SectionItems()              3
        'Definition':           SectionItems()              4
        'Data':                 np.ndarray / pd.DataFrame   26
    },
    'Drilling': {
        'Parameter':            SectionItems()              5
        'Definition':           SectionItems()              6
        'Data':                 np.ndarray / pd.DataFrame   7
    },
    'Core': [
        {
            'Parameter':        SectionItems()              8
            'Definition':       SectionItems()              9
            'Data':             np.ndarray / pd.DataFrame   10
        },
        {
            'Parameter':        SectionItems()              11
            'Definition':       SectionItems()              12
            'Data':             np.ndarray / pd.DataFrame   13
        }
    ],
    'Inclinometry': {
        'Parameter':            SectionItems()              14
        'Definition':           SectionItems()              15
        'Data':                 np.ndarray / pd.DataFrame   16
    },
    'Test': {
        'Parameter':            SectionItems()              17
        'Definition':           SectionItems()              18
        'Data':                 np.ndarray / pd.DataFrame   19
    },
    'Tops': {
        'Parameter':            SectionItems()              20
        'Definition':           SectionItems()              21
        'Data':                 np.ndarray / pd.DataFrame   22
    },
    'Perforation': {
        'Parameter':            SectionItems()              23
        'Definition':           SectionItems()              24
        'Data':                 np.ndarray / pd.DataFrame   25
    }
}

One problem I can foresee with the above is the "Core" dataset. Maybe they should be retained as "Core[1]" and "Core[2]". I don't know... ?

This would be backwards-compatible with LAS v2 because the ~Parameter, ~Curve, and ~ASCII sections could always be considered part of the "ASCII" data set:

eg v2 pseudocode

LASFile.sections = {
    'Version':                  SectionItem()               1
    'Well':                     SectionItem()               2
    'Parameter':                SectionItem()               3
    'Curves':                   SectionItem()               4
    'ASCII':                    np.ndarray / pd.DataFrame   5
}

LASFile.data_sets = {
    'ASCII': {
        'Parameter':            SectionItem()               3
        'Definition':           SectionItem()               4
        'Data':                 np.ndarray / pd.DataFrame   5
    }
}

I think we should avoid calling the ~Parameter, ~Curve, ~ASCII set of sections the "Log" data set as the v3 spec sometimes does, because we might encounter a file with both the ~Parameter, ~Curve, ~ASCII AND the ~Log_Parameter, ~Log_Definition, and ~Log_Data sections.

(SectionItems() is a new class defined in PR #106)

At least two other major changes needed are:

3 - Add data definition types

Not all data is a CurveItem - we would also need ArrayItem and a "StringCurveItem". The latter could perhaps be achieved best with a wrapper around a striplog

4 - Datetime reference curves

This could be best managed by requiring pandas, which has excellent support for using date/time stamps as indexes.

@roliveira
Copy link
Contributor

No problem, I'm glad you found useful! Continuing the discussion...

2 - Drop the "curves" in favor of "definition"

I agree with what you said.

Actually at first I idealised a dataset comprehending of a data and metadata entry. But while reading this I think that what you propose is closer to file contents and more likely to be understood. Like this I also agree on your read of the v3 spec. The concept of LASFile.data_sets fits just right with the standard!

I don't expect to find a lot of 'Core[1]' around there, and if so I don't see a (lot of) problem in keeping the name like that. Actually for me it looks exactly like you would be accessing the index 1 of the Core array, but I force myself to ignore that and read it as it's merely a name chosen by whoever created this particular file.

You actually grouped the 'Core[1]' and 'Core[2]' into a list of dict:

LASFile.data_sets = {
#...
    'Core': [
       {
           'Parameter':        SectionItems()              8
            'Definition':       SectionItems()              9
            'Data':             np.ndarray / pd.DataFrame   10
        },
        {
            'Parameter':        SectionItems()              11
            'Definition':       SectionItems()              12
            'Data':             np.ndarray / pd.DataFrame   13
        }
    ],
#...
}

But if we keep the 'Core[1]' and 'Core[2]' as two different datasets (that actually share the same metadata) it would be:

LASFile.data_sets = {
#...
    'Core[1]': {
            'Parameter':        SectionItems()              8
            'Definition':       SectionItems()              9
            'Data':             np.ndarray / pd.DataFrame   10
        },
    'Core[2]': {
            'Parameter':        SectionItems()              8
            'Definition':       SectionItems()              9
            'Data':             np.ndarray / pd.DataFrame   10
        },
#...
}

If I understood correctly that could also happen with any other section.

How are you planning to approach the different runs that could be present in the files? Like it is present in the sample_las3.0_spec.las file.

#...
#Run 1 Parameters
RUN_DEPTH.M 0, 1500 : Run 1 Depth Interval {F} | Run[1]
#...
#Run 2 Parameters
RUN_DEPTH.M 1500,2513 : Run 2 Depth Interval {F} | Run[2]
#...

3 - Add data definition types

You are right. However I'm not sure we need a particular type for it. For example the The ArrayItem could hold its dimensions via the ndarray shape and the StringCurveItem would be merely a CurveItem with {S} format. This would avoid wrappers for simplicity, only to have a solid base to start with. I'm not entirely familiar with striplog but it sounds promising.

4 - Datetime reference curves

Yes! If pandas eventually gets in as a requirement it's worth to try to leverage of the other packages that come along, such as the support for spreadsheets and so forth.

@kinverarity1 kinverarity1 modified the milestone: v1 Jul 4, 2019
@kinverarity1 kinverarity1 added the las3 stuff relating to LAS 3.0 label May 3, 2020
kinverarity1 added a commit that referenced this issue May 3, 2020
Now all header sections are parsed fully before returning
to read data sections.

Broken - this is a work in progress.
@kinverarity1 kinverarity1 pinned this issue May 3, 2020
@kinverarity1
Copy link
Owner Author

FYI I'm happy to bring pandas in as a lasio requirement as part of this work. It's probably time.

dcslagel added a commit to dcslagel/lasio that referenced this issue Jul 8, 2020
    Remove python 2.7 from Travis-CI
    Rearrange-Reader: Enable Unknown section tests to pass
    Use TextIOWrapper.tell() to get section start pos
    Add initial LAS 3.0 test infrastructure
    - Add tests/examples/3.0 dir.
    - Add the CWLS's 3.0 example las file.
    - Copy the example file to sample_3.0.las to standardize with
      1.2 and 2.0 sample las files.
    - Create a tests/test_read_30.py with basic read test.
      However, the test is set to SKIP because it current fails on the
      rearrange-reader branch
    First draft at isolated data section reader (kinverarity1#5)
    Now all header sections are parsed fully before returning
    to read data sections.
    Add find_sections_in_file()
    Rebase to master
dcslagel pushed a commit to dcslagel/lasio that referenced this issue Jul 8, 2020
Now all header sections are parsed fully before returning
to read data sections.

Broken - this is a work in progress.
dcslagel pushed a commit to dcslagel/lasio that referenced this issue Jul 8, 2020
dcslagel pushed a commit to dcslagel/lasio that referenced this issue Jul 8, 2020
@shakasaki
Copy link

Hello, I am wondering how this is progressing? I just got a hold of lasio and eagerly want to use it to read in some las v3.0 files from optical televiewer data. I attach part of one file here, with only 2 lines of data to avoid overloading you with data!

When trying to read with the current version of lasio I get the following error:

data_in = ls.read(r'example_header.las', ignore_header_errors=True)
Header section Curves regexp=~C was not found.
Found nonstandard LAS section: ~LOG_PARAMETER
Found nonstandard LAS section: ~LOG_DEFINITION
No data section (regexp='~A') found

The data section (as can be seen in the file) starts with
~LOG_DATA | LOG_DEFINITION
so it is obviously not found. For the rest, the errors do not really go away when I add the ignore-header flag.

Should I try this out with some other branch (i.e. las3.0 ? )

Thanks for your help and for setting this up!

example_file.zip

@kinverarity1
Copy link
Owner Author

Hi @shakasaki! Thanks for trying out lasio - sorry that it doesn't work yet for LAS 3.0. We are progressing, slowly, having today merged #327 which was the first step in the list above. No, the las3.0 branch does not have any improvements for you yet - we will continue to merge improvements to master unless they are breaking changes.

I expect your file may work prior to full LAS 3 support. Thank you very much for providing a example file, I'll test it out shortly and see what stands in the way of reading the data section in at least.

@shakasaki
Copy link

Hello @kinverarity1 and thanks for the response. Please do let me know if you can extract at least the data - that would be great for now. Or perhaps, do you know any resource where I can convert las3.0 to older versions (las 2.0 for example). Thank you for the help!

@kinverarity1
Copy link
Owner Author

kinverarity1 commented Sep 6, 2020

@shakasaki using lasio master you should be able to use this to read the data section from your file: https://gist.github.com/kinverarity1/92f00b781472512349a9312d75fd4c33

It is very hack-y and will certainly break in the future as we add proper support for this kind of functionality, but it should get you by in the meantime. I wasn't sure how to parse values like 73.56.5 or 0.0.0, but can easily tweak this if you let me know how they should be handled 😎

@shakasaki
Copy link

shakasaki commented Sep 10, 2020

@kinverarity1 Thank you so much and apologies for the late response. I have been doing fieldwork for the past few days. I really appreciate the effort you put into helping me out. I hope I can give back to this community somehow!

The values (e.g. 73.56.5) are RGB triples, that is, they denote a colored pixel. These data are from an optical televiewer. Actually the first value is a depth, followed by 360 RBG triples (one for each degree). I will try out the hack today and see how it works out. Thanks again

@kinverarity1
Copy link
Owner Author

No problem at all. You already have helped out the community by posting your example! 😁 I have adapted the notebook to parse the data as RGB triples - see comments on the gist.

@shakasaki
Copy link

shakasaki commented Sep 16, 2020

Hello @kinverarity1

I have been using the code snipped you wrote to read in las 3.0 files and it works. However, when the file is too large (I have files with [198441 rows x 361 columns] ) the approach crashes.

I coded another hacky example using pandas and it can handle the large files, so I was wondering why this is, since I thought lasio also uses pandas internally.

Here is my approach:

import numpy as np
import pandas as pd # to read in dataframe
from subprocess import check_output # to find where the data starts in the file

file_name = 'input las file that is too large for lasio hack-y example'

# following line uses grep to find out where the data starts in the file
first_dataline = int(str(check_output(["grep", "-n", "~LOG_DATA", file_name]), 'utf-8').split(':')[0])

# read in only data, otherwise pandas crashes due to unicode characters
file_in = pd.read_csv(file_name, skiprows=first_dataline, header=None)

# create a depth array and an array to store RGB values as u integers
depth = file_in[0].to_numpy(dtype='float')
store_array = np.ndarray((file_in.shape[0], file_in.shape[1] -1,3),dtype='uint8')

# loop over each column in the ATV data
for col in range(360):
    temp = file_in[col+1].str.split('.', expand = True)
    store_array[:,col-1,:] = temp.apply(lambda x: pd.to_numeric(x, downcast='unsigned')).to_numpy(dtype='uint8')

The approach works well with the large files, but does not read in the header of course. Right now, i'm fine with it. Hopefully this insight can help in making lasio able to handle larger files (if that is actually a problem!)

@kinverarity1
Copy link
Owner Author

Thanks for the code! Lasio doesn't use pandas as a reader yet (see #1 and this thread for a discussion of the reasons) but we plan to switch to it soon. I'm glad you found a solution!

@donald-keighley
Copy link

I'm actually writing a LAS reader for R right now and have been struggling through LAS 3.0. Will be interested to see how you handle it.

@kinverarity1
Copy link
Owner Author

kinverarity1 commented Apr 14, 2021

I completely forgot about this wiki page from ages ago:

Additions in LAS 3 to look out for:

  • parameter zoning - values in HeaderItems can now be arrays e.g.

    NMAT_Depth[1].M 500,1500 : Neutron Matrix Depth interval {F}

  • Extend HeaderItem to include format= and association=

    format should probably be left as is and then used when parsing data - special cases will be strings, datetimes, and DMS location data. Strings will need to be read as is, datetimes parsed to datetime objects, and DMS converted to decimal degrees.

    assocation should be a list of either the associated HeaderItem, or, if that doesn't exist, strings. It has to be a list because there can be multiples:

    NEUT .V/V : Neutron Porosity | MATR[1], MATR[2]

  • Data section name. Ideally I think it's meant to be ~SECTNAME_Data | SECTDEF where again the | indicates the data section SECTNAME represents that defined by SECTDEF. But in practice I suspect this is not reliable, so probably best to assume SECTNAME == SECTDEF and just parse this pattern where it exists and use it to construct the LASFile.data_sets dictionary.

  • Automatically turn lots of curves e.g. TT[1], TT[2], ... TT[n] into a single TT CurveItem with 2D data array.

@kinverarity1
Copy link
Owner Author

@dcslagel Regarding the Transform 21 hackathon - do you think perhaps we should convert (perhaps manually) this issue to a GitHub 'discussion'? I am getting lost in the discussion of the many different elements of what adds up to "LAS 3" support spread across multiple issues. It might work better as a discussion, with different threads for each element? I'm happy to do this tomorrow if you think it best.

@dcslagel
Copy link
Collaborator

Hi @kinverarity1, thought about a bit...

  1. It makes sense to move this to a discussion since it is more of a general discussion than a specific task. However we would lose the Issue # which has been used to reference this in the paths. If that reference is of value then keeping it as an issue is okay too.

  2. The really critical thing is that we breakdown LAS-3 to workable chunks in 'sub' issues. We probably should keep the list of sub-issues at the top of this issue/discussion as well as in the Hackathon Project.

@kinverarity1
Copy link
Owner Author

kinverarity1 commented Apr 14, 2021

👍 I don't want to break links so I'll leave it here but clean up the body of the issue and make sure we have at least 'stub' issues for each individual part that needs doing.

Also, I'll change the title, since some version 3.0 files technically do now "work" in lasio, just not well.

@kinverarity1 kinverarity1 changed the title Support reading LAS 3.0 Support reading and writing all LAS 3.0 features Apr 14, 2021
@kinverarity1
Copy link
Owner Author

Update here: #452 was merged, so the next steps are making sure we can read comma-delimited data sections (#265) and then move on to the other LAS 3 issues which mostly relate to the header sections.

@dcslagel
Copy link
Collaborator

dcslagel commented Nov 5, 2021

Although there is still some work to be done on Issue #265, the basic ability to read the comma-delimited data sections is merged to the main(master) branch via pull-request #485.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
las3 stuff relating to LAS 3.0
Projects
No open projects
Development

No branches or pull requests

6 participants