ChemSage DAT support: DAT writing #413

maxposchmann · 2022-05-12T18:25:19Z

Writing of ChemSage-style DAT files has not yet been implemented. Adding this would close the TDB/XML/DAT loop (at least for some subset of features) not just in pycalphad, but for the CALPHAD community in general.

I don't believe pycalphad is picky about the widths of values in a DAT file when reading, and Thermochimica certainly is not. However, for generality, it is probably best to output DAT files readable by FactSage (i.e. to fix the width of every written value). A DAT file validation routine would be useful for verifying correct implementation.

Also note that FactSage has updated the DAT format as of version 8.1, and databases written in the new format aren't readable by 8.0 and earlier, Thermochimica, or pycalphad (as far as I know). For now, the previous format has not been deprecated in FactSage. I suggest targeting the FactSage 8.0 and earlier format to start.

maxposchmann · 2022-05-12T18:29:39Z

I'd label this "enhancement", but don't appear to have permission.

bocklund · 2022-05-12T19:02:14Z

You're correct that PyCalphad's DAT reader is not whitespace sensitive. That was my main deterrent for not implementing a DAT writer, as I don't have FactSage to test against as a reference implementation.

FactSage 8.1 changes to the DAT were not on my radar, so I doubt that we are able to read the new databases currently. If that's true, we can open a new issue for reading FactSage 8.1 databases. I agree that targeting 8.0 and earlier databases makes sense as a default, and >=8.1 as an option.

A strict validation routine would be nice, but not necessary in my opinion. Since PyCalphad can read any semantically correct DAT (even if it is grammatically incorrect due to whitespace issues), a writer than produces grammatically correct DAT files can be used to convert a semantically correct DAT to a grammatically correct DAT. Something like: Database("my-invalid-db.dat").to_file("valid-db.dat").

When writing tests for the DAT writer, it may be that a lot of the functionality that would go into a strict validation routine will be implemented, but I think it would be valuable to avoid requiring two reader implementations (a non-strict reader and a strict reader/validator). This is essentially the approach we take with TDB: our reader is flexible, while our writer is strict.

maxposchmann · 2022-05-12T20:59:56Z

Alright, I've got my first roadblock. When reading a DAT, it looks like the distinction between solution_phases and stoichiometric_phases gets discarded, as well as (in many cases) the model identifier (i.e. QKTO). Obviously, TDBs won't have these things to begin with anyway.

So the question is how do we get this information back (or create it from scratch) when writing our DAT? Is the model_hints attribute of the Phase the right place for this? At least for DATs, we could save it there when reading. For other file types, we'll need some routine to try to figure it out.

richardotis · 2022-05-12T21:08:25Z

I think this is true:

# dbf is a Database object; phase_name is a string
is_stoichiometric_phase = all([len(subl)==1 for subl in dbf.phases[phase_name].constituents])

maxposchmann · 2022-05-13T14:47:52Z

Unfortunately it looks like that only works for single-element stoichiometric phases. For example len(dbf.phases[NIKF3_S1(S)].constituents) = 3. I think the constituent count is equal to the number of elements in the phase for stoichiometric phases, but this doesn't yield a way to detect them.

Checking the number of Gibbs energy equations for the phase ought to work for finding stoichiometric phases, as there should only be one (though possibly broken across temperature intervals).

We'll still need to differentiate solution phase models, i.e. SUBG from SUBQ from QKTO. OpenCalphad can do this, so I can look there for hints.

richardotis · 2022-05-13T15:34:59Z

Phase.constituents is a tuple of frozenset objects. If you do len(Phase.constituents), that tells you the number of sublattices (the length of the tuple), which, as you noted, can be greater than one in a stoichiometric phase.

In a stoichiometric phase, there should be exactly one component in each sublattice. If you iterate through Phase.constituents and check the length of each frozenset object, a stoichiometric phase should return a length of one for every sublattice.

Try doing print([len(subl) for subl in dbf.phases['NIKF3_S1(S)'].constituents]). If my understanding of thermodynamics and DAT files is correct, you should get back a list of all 1's.

Regarding the solution phase models, I think you should be able to detect which parameters have been entered into the database, since some parameter types are mutually exclusive for certain phase models. The recipe is something like

from tinydb import where
detect_query = (
    (where("phase_name") == phase_name) & \
    (where("parameter_type") == "QKT")
)
params =list(dbf._parameters.search(detect_query))
if len(params) > 0:
    # this is a QKTO
    # TODO: should also check model_hints in case this is also magnetic

Another thing that might help is to look at the dataclasses defined in cs_dat.py, for example:

pycalphad/pycalphad/io/cs_dat.py

Lines 551 to 564 in b1a3393

    
           @dataclass 
        
           class SUBQPair(Endmember): 
        
               stoichiometry_quadruplet: List[float] 
        
               zeta: float 
        
               def insert(self, dbf: Database, phase_name: str, constituent_array: List[str], gibbs_coefficient_idxs: List[int]): 
        
                   # Here the constituent array should be the pair name using the corrected 
        
                   # names, i.e. CU1.0CL1.0 
        
                   dbf.add_parameter( 
        
                       'MQMG', phase_name, constituent_array, param_order=None, 
        
                       param=self.expr(gibbs_coefficient_idxs), zeta=self.zeta, 
        
                       stoichiometry=self.stoichiometry_quadruplet, 
        
                       force_insert=False, 
        
                       )

The DAT parser uses dataclasses to bridge the DAT representation to pycalphad's internal representation (Database object). Each dataclass has an insert method defined that tells the DAT reader how to add the relevant information to the Database. Your writer needs to basically write the inverse of that function for all the dataclasses, so it may be a useful reference. You can also do a search for terms like SUBG and SUBQ to see where the reader logic is branching, which might also tell you where pycalphad is storing the information your function is trying to write back out.

bocklund · 2022-05-13T15:37:19Z

dbf.phases[phase_name].constituents is the sublattice model, so len(dbf.phases[phase_name].constituents) should be giving the number of sublattices. If each sublattice is singly occupied (looping over the sublattices as in Richard’s code above), then the phase is stoichiometric.

MQMQA SUBG/SUBQ models have a model hint indicating that it’s MQMQA, so we can use that.

QKTO models are treated by pycalphad as CEF models, but they have special Kohler-Toop excess parameters rather than the default Redlich-Kister(-Muggianu) parameters. I don’t think there’s any metadata differentiating QKTO vs RKMP right now, so the writer might have to look at the parameters to see if there are any "QKT" type parameters, otherwise the model is RKMP

maxposchmann · 2022-05-13T16:14:01Z

Ah, sorry, this was a reading comprehension failure on my part. Thanks for the explanation, I think that should save me some time in figuring out how data is used here.

maxposchmann · 2022-05-19T19:30:54Z

Alright, things are moving along nicely with this.

I just implemented stoichiometric phase writing, and it appears that the # dummy phase indicator gets discarded here:

pycalphad/pycalphad/io/cs_dat.py

Lines 839 to 843 in 0586c8d

    
           def parse_endmember(toks: TokenParser, num_pure_elements, num_gibbs_coeffs, is_stoichiometric=False): 
        
               species_name = toks.parse(str) 
        
               if toks[0] == '#': 
        
                   # special case for stoichiometric phases, this is a dummy species, skip it 
        
                   _ = toks.parse(str)

While typically this shouldn't cause you any issues, unless I've misunderstood what is happening this is actually incorrect DAT parsing. Dummy phases should be "suspended", i.e. should not be considered when doing equilibrium calculations etc. It would actually be more correct to discard dummy phases completely.

As far as this pertains to DAT writing: if dummy phases are stored, I'll need a way to identify them. If you choose to omit them entirely, that's fine. I can write dummies automatically, and I think it is totally reasonable not to guarantee conservation of dummies for DAT round-trips.

bocklund · 2022-05-19T21:34:37Z

What is the purpose of dummy phases? Why are they there if they should be discarded? Do they ever affect a calculation in Thermochimica or FactSage?

If dummy phases cannot influence correctness in other software tools (i.e. the result in all software is the same if a PyCalphad-written DAT file doesn't include them), it would be fine with me to special case the DAT reader to discard them if a dummy token is read.

If we want to keep dummies around so we can write them, I think it would be reasonable to add new model hint to flag that a phase is a dummy phase.

richardotis · 2022-05-19T21:53:41Z

It's probably similar to the TDB GES commands for entering/rejecting phases/components on database load. One reason to do this is if there are two models for the same phase in the database, and only one should be used at a time.

maxposchmann · 2022-05-19T21:56:52Z

In Thermochimica we use dummies for phase assemblage initialization. Our first phase assemblage always consists only of phases that are pure in a single element. Since such phases might not otherwise exist in a database, and FactSage always outputs a dummy for each component, it has proven convenient to make use of dummies when necessary.

I'm not certain of the use in FactSage. One reasonable guess is they use them similarly.

Note that there isn't really a "rejection" mechanic in DAT files, as this is manipulated within the FactSage GUI after database load, and FactSage simply doesn't write rejected phases to an exported DAT.

bocklund · 2022-05-19T21:58:50Z

I don't think it's quite the same, as only stoichiometric phases can be dummy phases AFAIK. And some of the phases seem like they could be reasonable (e.g. Ni_Solid_FCC(s)), while Li(s) obviously is not.

 Ni_Solid_FCC(s)         #
  16  2           1.00000    0.00000    0.00000    0.00000    0.00000
  1729.0000     -5179.1590      117.85400     -22.096000     -.48407000E-02
     0.00000000     0.00000000
 1     0.00000000   0.00
  1730.0000     -27024.088      278.70995     -43.100000         0.00000000
     0.00000000     0.00000000
 1     0.00000000   0.00
  633.000     0.520000     0.333333     0.280000
 Li(s)                   #
   4  1           0.00000    0.00000    0.00000    0.00000    1.00000
  6001.0000         0.00000000     0.00000000     0.00000000     0.00000000
     0.00000000     0.00000000
 1     0.00000000   0.00

(source: https://github.com/pycalphad/pycalphad/blob/develop/pycalphad/tests/databases/Ocadiz-Flores.dat#L284-L297)

maxposchmann · 2022-05-19T22:07:36Z

I think the usage of the dummy for that Ni_Solid_FCC(s) phase is a case of a user manually "commenting out" a phase. People do this fairly frequently it seems, which is a point in favor of storing and writing dummies.

I think I ought to leave the treatment of suspended phases in a PyCalphad database to one of you, if that's the route you want to go.

bocklund · 2022-05-19T22:09:10Z

In Thermochimica we use dummies for phase assemblage initialization. Our first phase assemblage always consists only of phases that are pure in a single element. Since such phases might not otherwise exist in a database, and FactSage always outputs a dummy for each component, it has proven convenient to make use of dummies when necessary.

I see. I wonder we could use something like this to help address starting point and minimizer numerical stability with rank deficient cases.

bocklund · 2022-05-19T22:12:58Z

I think I ought to leave the treatment of suspended phases in a PyCalphad database to one of you, if that's the route you want to go.

Yes, that's fine with me. I am in favor of preserving dummy phases for now. Keeping them preserves the existing behavior at least, and we can decide how to best use or suspend dummy phases in a separate issue.

richardotis · 2022-05-19T22:50:49Z

I support preserving dummy phase metadata in the Database object.

richardotis · 2022-05-19T23:55:30Z

It sounds like the most correct thing to do would be (a) preserve dummy phase information in the Database object, and (b) 'suspend' such phases inside of filter_phases somewhere here:

pycalphad/pycalphad/core/utils.py

Lines 309 to 312 in d69040e

    
           phases = [phase for phase in candidate_phases if 
        
                       all_sublattices_active(comps, dbf.phases[phase]) and 
        
                       (phase not in disordered_phases or (phase in disordered_phases and 
        
                       dbf.phases[phase].model_hints.get('ordered_phase') not in candidate_phases))]

That would preserve the right behavior on DAT roundtrip. We'd still lose the information if we went DAT->TDB->DAT, but we cannot guarantee semantic equivalence of databases when we convert between formats.

maxposchmann · 2022-06-17T17:57:12Z

@richardotis @bocklund
I'd like to get your input on minimum requirements to make a pull request to close this issue viable. My thinking is it is reasonable to merge in a DAT writer that can handle some subset of DAT and TDB features and fails smoothly for the rest, rather than wait until it can handle absolutely everything (which may well never be achieved). Opening the pull request sooner rather than later is appealing to me because that way I can get your input on style etc. before continuing to add features (under other issues).

As it stands this would already be a relatively large pull request to review, in my opinion. You can see what I've done here. As it stands right now, I can read a DAT and return an equivalent DAT (notwithstanding some bugs I've opened issues for). I have also converted a couple of non-trivial TDBs to DATs and run calculations in Thermochimica using those, with results that mostly (though not entirely) agree with pycalphad.

The code seems like a bit of a mess to me, and I certainly don't have your python experience. I'd like to get some feedback and/or help on organization if you have time.

richardotis · 2022-06-17T18:02:18Z

Please proceed with the PR. We can document any limitations and work through style issues there. As you allude to in your comment, writing good tests for this feature is going to be key.

bocklund added the enhancement label May 12, 2022

maxposchmann mentioned this issue May 20, 2022

ChemSage DAT support: dummy phase hint and rejection #417

Open

maxposchmann linked a pull request Jun 17, 2022 that will close this issue

ENH: Implement writing of ChemSage DAT files #422

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChemSage DAT support: DAT writing #413

ChemSage DAT support: DAT writing #413

maxposchmann commented May 12, 2022

maxposchmann commented May 12, 2022

bocklund commented May 12, 2022

maxposchmann commented May 12, 2022

richardotis commented May 12, 2022

maxposchmann commented May 13, 2022 •

edited

richardotis commented May 13, 2022 •

edited

bocklund commented May 13, 2022

maxposchmann commented May 13, 2022

maxposchmann commented May 19, 2022 •

edited

bocklund commented May 19, 2022

richardotis commented May 19, 2022 •

edited

maxposchmann commented May 19, 2022

bocklund commented May 19, 2022 •

edited

maxposchmann commented May 19, 2022

bocklund commented May 19, 2022

bocklund commented May 19, 2022 •

edited

richardotis commented May 19, 2022

richardotis commented May 19, 2022

maxposchmann commented Jun 17, 2022

richardotis commented Jun 17, 2022

ChemSage DAT support: DAT writing #413

ChemSage DAT support: DAT writing #413

Comments

maxposchmann commented May 12, 2022

maxposchmann commented May 12, 2022

bocklund commented May 12, 2022

maxposchmann commented May 12, 2022

richardotis commented May 12, 2022

maxposchmann commented May 13, 2022 • edited

richardotis commented May 13, 2022 • edited

bocklund commented May 13, 2022

maxposchmann commented May 13, 2022

maxposchmann commented May 19, 2022 • edited

bocklund commented May 19, 2022

richardotis commented May 19, 2022 • edited

maxposchmann commented May 19, 2022

bocklund commented May 19, 2022 • edited

maxposchmann commented May 19, 2022

bocklund commented May 19, 2022

bocklund commented May 19, 2022 • edited

richardotis commented May 19, 2022

richardotis commented May 19, 2022

maxposchmann commented Jun 17, 2022

richardotis commented Jun 17, 2022

maxposchmann commented May 13, 2022 •

edited

richardotis commented May 13, 2022 •

edited

maxposchmann commented May 19, 2022 •

edited

richardotis commented May 19, 2022 •

edited

bocklund commented May 19, 2022 •

edited

bocklund commented May 19, 2022 •

edited