MITgcmutils.diagnostics.readstats does not read output with more than one region #818

mjlosch · 2024-03-28T12:58:24Z

MITgcmutils.diagnostics.readstats cannot read output with more than one region, probably because there are no blank lines between records of different regions. For example, after running testreport, I change into global_ocean.cs32x15/tr_run.seaice and try:

In [89]: from MITgcmutils import diagnostics as diag

In [90]: d=diag.readstats('dynStDiag.0000072000.txt')          # contains only one region

In [88]: s=diag.readstats('seaiceStDiag.0000072000.txt')       # contains three regions
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [88], in <cell line: 1>()
----> 1 s=diag.readstats('seaiceStDiag.0000072000.txt')

File ~/MITgcm/MITgcm/utils/python/MITgcmutils/MITgcmutils/diagnostics.py:67, in readstats(fname)
     64         break
     66     cols = line.strip().split()
---> 67     k = int(cols[0])
     68     tmp[k] = [float(s) for s in cols[1:]]
     70 res[fld].append(tmp)

ValueError: invalid literal for int() with base 10: 'field'

I do not know how to fix this. Suggestions are welcome.

The text was updated successfully, but these errors were encountered:

jm-c · 2024-03-28T14:05:04Z

@mjlosch I think the matlab equivalent (MITgcm/utils/matlab/read_StD.m) can read-in multiple regions.

jahn · 2024-03-28T16:15:24Z

Shouldn't be too hard. Implementation choices: returned arrays will get a new dimension if multiple regions are present; it may be more predictable to always have this dimension (with extent 1 if only one region), but it would break existing code, so I am hesitant.
If you give me a bit of time, I might remember how I used to regression test this code.

mjlosch · 2024-04-02T14:46:10Z

@jahn, I have a messy solution:

    flds = []
    with open(fname) as f:
        for line in f:
            if line.startswith('# end of header'):
                break

            m = re.match(r'^# ([^:]*) *: *(.*)$', line.rstrip())
            if m:
                var,val = m.groups()
                if var.startswith('Fields'):
                    flds = val.split()
                if var.startswith('Regions'):
                    regs = val.split()

        if len(regs) > 1:
            res = dict((fld,dict((reg,[]) for reg in regs)) for fld in flds)
        else:
            res = dict((fld,[]) for fld in flds)
        itrs = dict((fld,[]) for fld in flds)

        line = f.readline()
        while not line.startswith('# records'):

            m = re.match(r' field : *([^ ]*) *; Iter = *([0-9]*) *; region # *([0-9]*) ; nb\.Lev = *([0-9]*)', line)
            if m:
                fld,itr,reg,nlev = m.groups()
                itrs[fld].append(int(itr))
                nlevs = int(nlev)
                if nlevs > 1: nlevs=nlevs+1
                tmp = np.zeros((nlevs,nstats))
                line = f.readline()
                while not (line.strip() == '' or line.startswith(' field')):
                    if not line.startswith(' k'):
                        cols = line.strip().split()
                        k = int(cols[0])
                        tmp[k] = [float(s) for s in cols[1:]]

                    line = f.readline()

                if len(regs) > 1: res[fld][reg].append(tmp)
                else: res[fld].append(tmp)

            # else:
            #     raise ValueError('readstats: parse error: ' + line)

            else:
                line = f.readline()

    if len(regs)>1:
        totals = dict((fld,np.squeeze(np.array(res[fld]['0']))) for fld in flds)
        locals = dict((fld,[]) for fld in flds)
        for fld in flds:
            for reg in regs:
                if reg!='0':
                    locals[fld].append(np.squeeze(np.array(res[fld][reg])))

        locals = dict((fld,np.array(locals[fld])) for fld in flds)
        return locals, totals, itrs
    else:
        try:
            all = np.rec.fromarrays(
                [np.array(res[fld]) for fld in flds], names=flds)
            return all[:,1:,...],all[:,0,...],itrs
        except:
            totals = dict((fld,np.array(res[fld])[:,0,...]) for fld in flds)
            locals = dict((fld,np.array(res[fld])[:,1:,]) for fld in flds)
            return locals,totals,itrs

I can turn this in a PR if you wnat, but I am all open for anything nicer (and more compact).

jahn · 2024-04-09T23:01:33Z

@mjlosch sorry I missed your post while at a meeting. Don't think mine is more compact but I've addressed a couple more issues and tested against a good selection of diagstat output from verification.

mjlosch · 2024-05-22T15:18:11Z

Has been fixed in PR #823

mjlosch added the bug Something isn't working label Mar 28, 2024

mjlosch assigned jahn and mjlosch Mar 28, 2024

jahn mentioned this issue Apr 9, 2024

Fix reading of diagstats output with more than one region in MITgcmutils #823

Merged

mjlosch closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MITgcmutils.diagnostics.readstats does not read output with more than one region #818

MITgcmutils.diagnostics.readstats does not read output with more than one region #818

mjlosch commented Mar 28, 2024

jm-c commented Mar 28, 2024

jahn commented Mar 28, 2024

mjlosch commented Apr 2, 2024

jahn commented Apr 9, 2024

mjlosch commented May 22, 2024

MITgcmutils.diagnostics.readstats does not read output with more than one region #818

MITgcmutils.diagnostics.readstats does not read output with more than one region #818

Comments

mjlosch commented Mar 28, 2024

jm-c commented Mar 28, 2024

jahn commented Mar 28, 2024

mjlosch commented Apr 2, 2024

jahn commented Apr 9, 2024

mjlosch commented May 22, 2024