Reading in header information from itp file #486

csbrasnett · 2022-11-21T11:03:50Z

I realised I buried this within #483, and might be quite ambitious, but it would be useful to read back in header information (eg. the secondary structure determined by DSSP) from an itp file generated by martinize2. It would also be useful for maintaining citation information in a file editing pipeline.

The second advance in this regard would be maintaining grouping of different intra-directive information, eg. whether bonds are backbone-backbone, or angles BB-SC-SC, etc. so that these could be edited selectively, or again, maintained when subsequently writing out.

csbrasnett · 2022-11-22T13:38:59Z

I have a hacky solution for this now, but maybe it'd be good to include it at some point.

Reading in an itp file first requires a list of all the lines in the itp file, which includes the header lines. Something like:

def header_parser(file_lines):

    #1) add the header lines to a list    
    header = True
    header_lines = []
    for i in file_lines:
        if 'moleculetype' in i and header == True:
            header = False
        elif header == True:
            header_lines.append(i)
    
    #2) remove the '; ' from the start of the line and '\n' from the end, they'll get written back in when writing out.
    lines_out = []
    for i in header_lines:
        if len(i) > 1:
            lines_out.append(i[2:-1])
        else:
            lines_out.append(i)
            
    return lines_out

will do the job, so that lines_out can be passed in some form to write_molecule_itp later on. As these lines contain the information about the secondary structure too, they can be used for that as well.

pckroon · 2022-11-22T14:33:25Z

In the current code structure for the parser this will be very hard to include, since 1) the parser is a mess, and 2) you intend to parse comments; these get stripped out at a very early stage of parsing.

This is something we could address when we (finally) redo the parser(s) (again). An option to preserve comments would be valuable.
Even better would be to write our itp header with specifically formatted "comments" describing this kind of metadata. For example, in comments like ;METADATA SS=...., which would facilitate parsing.

fgrunewald · 2022-11-23T10:21:19Z

@pckroon I already made a PR for a comment parsing utility in case you want to see how it could be done. #460 I'm already using it in some production packages as subclass to the current parser. Works pretty well.

The header with citations is more tricky but should be doable by simply dumping it into meta of molecule. Anything else like the digestion part I would argue should be done by another function not the parser (i.e. like with interactions we only parse not interpret).

pckroon added the parser Issues concerning parser behaviour label Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading in header information from itp file #486

Reading in header information from itp file #486

csbrasnett commented Nov 21, 2022

csbrasnett commented Nov 22, 2022 •

edited

pckroon commented Nov 22, 2022

fgrunewald commented Nov 23, 2022

Reading in header information from itp file #486

Reading in header information from itp file #486

Comments

csbrasnett commented Nov 21, 2022

csbrasnett commented Nov 22, 2022 • edited

pckroon commented Nov 22, 2022

fgrunewald commented Nov 23, 2022

csbrasnett commented Nov 22, 2022 •

edited