Indexed codewriting #22219

ThePauliPrinciple · 2021-10-04T10:50:58Z

ThePauliPrinciple
Oct 4, 2021
Collaborator

For a few years already, I have been interrested in using (and have used) Sympy's code writing for indexed objects. In particular, I have used machine learning for obtaining continuous representations of (tensorial) properties of atoms, such as energies, forces and friction tensors, needed for simulations of the dynamics of atoms during chemical reactions.

Since the properties of such systems are bound by symmetries (permutations of the same atoms as well as total rotational symmetry and translation symmetry for periodic systems), the neural networks do not use the cartesian coordinates as inputs directly, but instead use "symmetry functions" which ensure the same input the NNs see is the same if symmetry equivalent cartesian coordinates are given for the atoms in the system. One such choice can be found in equations S1-S4 in this sup. inf. : https://pubs.acs.org/doi/abs/10.1021/acs.jpclett.7b00784. Symmetry functions are a bit of a "flavour of the month" choice, so it is convenient if code can be automatically generated using Sympy, which is already possible, although not without modifications.

Moreover, for tensorial properties of these atoms, jacobians or hessians are required to obtain the correct symmetry of the output (e.g. if two atoms swap, the forces should swap as in, it should be permutation equivariant not invariant). Obtaining jacobians and hessions from these symmetry functions is a lot easier with Sympy than by hand.

For convenience, I want to be able to convert the expressions to tensorflow code for fitting, while c or fortran code is required for integration into the academic spaghetti stack. Currently, Indexed objects are not supported for tensorflow code writing at all, and for good reason. It is not trivial at all to correctly write an Indexed expression into a tensorflow expression, as there might be a lot axis swapping or new axis creation (for broadcasting) required. Fortunately, I have made an attempt to create this functionality.

In particular, I have written the class "ExplicitIndexedAssignment" inheriting from the Assignment class from sympy.codegen.ast.
It is currently defined as "equate the lhs to the sum over all indices not in the lhs of the rhs". In particular, indexed objects are not considered tensors in this assignment, that is to say, if a multiplication contains two the same indices, it is NOT considered a contraction. Currently it works as follows:

X = IndexedBase('X')
D = IndexedBase('D')
N=symbols('N')
i, j = symbols('i j', cls=Idx,range=N)
c = symbols('c',cls=Idx,range=3)
ExplicitIndexAssignment(D[i,j],(X[i,c]-X[j,c])**2 )

Note here in particular that it detected it needs to create a new axis for the subtraction to make sense in tensorflow:

D = tensorflow.math.reduce_sum(tensorflow.math.pow(X[:,None,:] - X[None,:,:], 2),axis=[2])

The expression yields a matrix of atom-atom distances, as are often used in symmetry functions.

My question here is if people feel this could be a good addition to sympy, or if I should write this in my own library, since it does not add anything to the mathematics in sympy, only to the code writing.

There were some modifications I had to make to sympy:

Support indexed printing the tensorflow printing: printer._print(expr.base.label)
Lambdify only allows for oneliners, this made it very inconvenient for testing. I added an if statement to check for an ast.CodeBlock, in which case it writes the codeblock line by line and expects the user to use a ast.Return statement, allowing to run arbitrary code in lambdify with ast.

There are several features I would still like to implement:

Currently the pattern matching of indices sometimes fails (e.g. if X[c,j] as used instead of X[j,c] in the above example)
No optimization is attempted for the order of summing or by obtaining intermediate results. (like numpy's einsum). I realise this is a can of worms.
Allow an explicit sum in the expression, e.g. for tensor-like contractions or in the above example, allow to write the square root in the same line by writing the sum over c explicitly (currently, it would put the sum over c outside of the sqrt if you attempted that) so you have to write two lines of code.
Support for "mapping". That is to say, allow for Indexed objects to be indices, so one could e.g. select atoms of a specific type. For c and fortran code this is trivial, but for tf this would require tf.gather and tf.scatter ops.
Support for the numpy printer. Would consider supporting other codewriters if there is explicit interest.
Test suite: there is a lot of required functionality for a bunch of different programming language. Small modifications can easily conflict with earlier assumptions.

All in all, I'm looking to write sympy expressions which with the Latex printer result in similar expressions as can be found in research papers, but then can be immediately written as tensorflow, c or fortran code, even if that code might not be the most optimized code.

Feel free to share any ideas or suggestions.

oscarbenjamin · 2021-10-04T11:08:57Z

oscarbenjamin
Oct 4, 2021
Maintainer

In general this sounds good. I think that maybe there should be a different version of lambdify for handling cases like this that has some understanding of shape, broadcasting, indexing, contractions and so on (cf #5642 (comment)).

1 reply

ThePauliPrinciple Oct 4, 2021
Collaborator Author

With my current modification, I can run lambdify (at least with the tensorflow printer) as such:

from sympy import lambdify
code=CodeBlock(ExplicitIndexAssignment(D[i,j],(X[i,c]-X[j,c])**2 ),
              ExplicitIndexAssignment(D[i,j],sym.sqrt(D[i,j]) ),
              Return(D[i,j])          
)

distance=lambdify([X],code,'tensorflow')

The modification is

        funcbody.append('return ({})'.format(self._exprrepr(expr)))

to

        if expr.has(CodeBlock):
            for line self._exprrepr(expr).split("\n"):
                funcbody.append(line)
        else: #to make it backwards compatible
            funcbody.append('return ({})'.format(self._exprrepr(expr)))

My reason for not having implemented the writing for numpy yet is because I had used _tensorflowcode in my new class. However, the numpy printer looks for _pythoncode rather than _numpycode, which would then override all python code writers, which is not my intention. I currently do not have a good setup for making modifications in sympy (I'm currently just writing my own subclasses and monkey patching anything which needs editing). Working my way through the development workflow.

Edit:
I realise that you might mean broadcasting of the expression as written, as in, some arguments have additional indices not specified in the expression. This is currently not supported as is, however, it could easily be made:

any [:,:,newaxis] etc can simply have an ellipsis as the first entry (I am not sure if this can also be substituted for no dimenion, if not than this might be an issue) and any swapaxis that need to be performed can simply be based on negative indices. At that point I think any expression that I have in mind will be covered for broadcasting, at least for those cases that could also be broadcasted in numpy. (if you have differently sized additional dimensions in different arguments, there is not a unique solution and numpy would also fail broadcasting e.g. (3,5,5) with (5,5,5), but not (1,5,5) with (5,5,5))

Edit edit:
In particular, if there are any examples you would like included, feel free to leave a sympy expression in combination with example inputs and outputs (for multiple broadcastings you'd like to see supported). I'd love to be proven wrong.

I'm debating if I should write what I have in a pull request and move the discussion (and feature requests) there, or develop this a bit further first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexed codewriting #22219

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Indexed codewriting #22219

ThePauliPrinciple Oct 4, 2021 Collaborator

Replies: 1 comment · 1 reply

oscarbenjamin Oct 4, 2021 Maintainer

ThePauliPrinciple Oct 4, 2021 Collaborator Author

ThePauliPrinciple
Oct 4, 2021
Collaborator

Replies: 1 comment 1 reply

oscarbenjamin
Oct 4, 2021
Maintainer

ThePauliPrinciple Oct 4, 2021
Collaborator Author