Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MITgcmutils.diagnostics.readstats does not read output with more than one region #818

Closed
mjlosch opened this issue Mar 28, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@mjlosch
Copy link
Member

mjlosch commented Mar 28, 2024

MITgcmutils.diagnostics.readstats cannot read output with more than one region, probably because there are no blank lines between records of different regions. For example, after running testreport, I change into global_ocean.cs32x15/tr_run.seaice and try:

In [89]: from MITgcmutils import diagnostics as diag

In [90]: d=diag.readstats('dynStDiag.0000072000.txt')          # contains only one region

In [88]: s=diag.readstats('seaiceStDiag.0000072000.txt')       # contains three regions
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [88], in <cell line: 1>()
----> 1 s=diag.readstats('seaiceStDiag.0000072000.txt')

File ~/MITgcm/MITgcm/utils/python/MITgcmutils/MITgcmutils/diagnostics.py:67, in readstats(fname)
     64         break
     66     cols = line.strip().split()
---> 67     k = int(cols[0])
     68     tmp[k] = [float(s) for s in cols[1:]]
     70 res[fld].append(tmp)

ValueError: invalid literal for int() with base 10: 'field'

I do not know how to fix this. Suggestions are welcome.

@mjlosch mjlosch added the bug Something isn't working label Mar 28, 2024
@jm-c
Copy link
Member

jm-c commented Mar 28, 2024

@mjlosch I think the matlab equivalent (MITgcm/utils/matlab/read_StD.m) can read-in multiple regions.

@jahn
Copy link
Member

jahn commented Mar 28, 2024

Shouldn't be too hard. Implementation choices: returned arrays will get a new dimension if multiple regions are present; it may be more predictable to always have this dimension (with extent 1 if only one region), but it would break existing code, so I am hesitant.
If you give me a bit of time, I might remember how I used to regression test this code.

@mjlosch
Copy link
Member Author

mjlosch commented Apr 2, 2024

@jahn, I have a messy solution:

    flds = []
    with open(fname) as f:
        for line in f:
            if line.startswith('# end of header'):
                break

            m = re.match(r'^# ([^:]*) *: *(.*)$', line.rstrip())
            if m:
                var,val = m.groups()
                if var.startswith('Fields'):
                    flds = val.split()
                if var.startswith('Regions'):
                    regs = val.split()

        if len(regs) > 1:
            res = dict((fld,dict((reg,[]) for reg in regs)) for fld in flds)
        else:
            res = dict((fld,[]) for fld in flds)
        itrs = dict((fld,[]) for fld in flds)

        line = f.readline()
        while not line.startswith('# records'):

            m = re.match(r' field : *([^ ]*) *; Iter = *([0-9]*) *; region # *([0-9]*) ; nb\.Lev = *([0-9]*)', line)
            if m:
                fld,itr,reg,nlev = m.groups()
                itrs[fld].append(int(itr))
                nlevs = int(nlev)
                if nlevs > 1: nlevs=nlevs+1
                tmp = np.zeros((nlevs,nstats))
                line = f.readline()
                while not (line.strip() == '' or line.startswith(' field')):
                    if not line.startswith(' k'):
                        cols = line.strip().split()
                        k = int(cols[0])
                        tmp[k] = [float(s) for s in cols[1:]]

                    line = f.readline()

                if len(regs) > 1: res[fld][reg].append(tmp)
                else: res[fld].append(tmp)

            # else:
            #     raise ValueError('readstats: parse error: ' + line)

            else:
                line = f.readline()

    if len(regs)>1:
        totals = dict((fld,np.squeeze(np.array(res[fld]['0']))) for fld in flds)
        locals = dict((fld,[]) for fld in flds)
        for fld in flds:
            for reg in regs:
                if reg!='0':
                    locals[fld].append(np.squeeze(np.array(res[fld][reg])))

        locals = dict((fld,np.array(locals[fld])) for fld in flds)
        return locals, totals, itrs
    else:
        try:
            all = np.rec.fromarrays(
                [np.array(res[fld]) for fld in flds], names=flds)
            return all[:,1:,...],all[:,0,...],itrs
        except:
            totals = dict((fld,np.array(res[fld])[:,0,...]) for fld in flds)
            locals = dict((fld,np.array(res[fld])[:,1:,]) for fld in flds)
            return locals,totals,itrs

I can turn this in a PR if you wnat, but I am all open for anything nicer (and more compact).

@jahn
Copy link
Member

jahn commented Apr 9, 2024

@mjlosch sorry I missed your post while at a meeting. Don't think mine is more compact but I've addressed a couple more issues and tested against a good selection of diagstat output from verification.

@mjlosch
Copy link
Member Author

mjlosch commented May 22, 2024

Has been fixed in PR #823

@mjlosch mjlosch closed this as completed May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants