Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading in header information from itp file #486

Open
csbrasnett opened this issue Nov 21, 2022 · 3 comments
Open

Reading in header information from itp file #486

csbrasnett opened this issue Nov 21, 2022 · 3 comments
Labels
parser Issues concerning parser behaviour

Comments

@csbrasnett
Copy link
Collaborator

I realised I buried this within #483, and might be quite ambitious, but it would be useful to read back in header information (eg. the secondary structure determined by DSSP) from an itp file generated by martinize2. It would also be useful for maintaining citation information in a file editing pipeline.

The second advance in this regard would be maintaining grouping of different intra-directive information, eg. whether bonds are backbone-backbone, or angles BB-SC-SC, etc. so that these could be edited selectively, or again, maintained when subsequently writing out.

@csbrasnett
Copy link
Collaborator Author

csbrasnett commented Nov 22, 2022

I have a hacky solution for this now, but maybe it'd be good to include it at some point.

Reading in an itp file first requires a list of all the lines in the itp file, which includes the header lines. Something like:

def header_parser(file_lines):

    #1) add the header lines to a list    
    header = True
    header_lines = []
    for i in file_lines:
        if 'moleculetype' in i and header == True:
            header = False
        elif header == True:
            header_lines.append(i)
    
    #2) remove the '; ' from the start of the line and '\n' from the end, they'll get written back in when writing out.
    lines_out = []
    for i in header_lines:
        if len(i) > 1:
            lines_out.append(i[2:-1])
        else:
            lines_out.append(i)
            
    return lines_out

will do the job, so that lines_out can be passed in some form to write_molecule_itp later on. As these lines contain the information about the secondary structure too, they can be used for that as well.

@pckroon pckroon added the parser Issues concerning parser behaviour label Nov 22, 2022
@pckroon
Copy link
Member

pckroon commented Nov 22, 2022

In the current code structure for the parser this will be very hard to include, since 1) the parser is a mess, and 2) you intend to parse comments; these get stripped out at a very early stage of parsing.

This is something we could address when we (finally) redo the parser(s) (again). An option to preserve comments would be valuable.
Even better would be to write our itp header with specifically formatted "comments" describing this kind of metadata. For example, in comments like ;METADATA SS=...., which would facilitate parsing.

@fgrunewald
Copy link
Member

@pckroon I already made a PR for a comment parsing utility in case you want to see how it could be done. #460 I'm already using it in some production packages as subclass to the current parser. Works pretty well.

The header with citations is more tricky but should be doable by simply dumping it into meta of molecule. Anything else like the digestion part I would argue should be done by another function not the parser (i.e. like with interactions we only parse not interpret).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Issues concerning parser behaviour
Projects
None yet
Development

No branches or pull requests

3 participants