Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Better strings #4

Open
jacobwilliams opened this issue Jul 6, 2017 · 7 comments
Open

Proposal: Better strings #4

jacobwilliams opened this issue Jul 6, 2017 · 7 comments

Comments

@jacobwilliams
Copy link
Member

Clive Page (see reference below) has a great summary of why we need better strings. I don't agree that VARYING_STRING should be part of the language. The pace of the language standardization process is too slow. I would prefer it if the language enabled third-party libraries to completely fill the gaps.

StringiFor, for example, is already better than ISO_VARYING_STRING ever was, but it can't completely replace CHARACTER since it can't be used in all contexts. I haven't fully though this through..but what if there was a way to tell the compiler to use a specified derived type as a string (and it would just automatically work in any context that a CHARACTER can currently be used). For example:

  type, use_as_string(str) :: my_string
    character(len=:),allocatable,private :: str = ''
  end type my_string

So, the use_as_string attribute indicates that when a my_string variable is used in place of a character variable, the str component is what is really being used. So:

  type(my_string) :: var

  var = 'string assignment works'
  var = var // 'as does concatenation'
  ilen = len_trim(var) ! etc...

Now, of course, these sorts of things can already be done by defining operators and overloading functions, etc. But what about:

  type(my_string) :: var

  write(var,'(I5)') i
  write(output_unit,'(A)') var
  
  var = var(1:3)  ! substrings work too

These are not currently possible with any third-party library. (Note: I think UDDTIO was supposed to help with I/O but it seems unsatisfactory to me).

To take this idea to its extreme limit, say, we had some function like this:

  subroutine input_string( c )
  character(len=*),intent(in) :: c
  subroutine input_string 

then maybe we could also use our new string type in this context as well:

  type(my_string) :: var
  call subroutine input_string( var )  ! works!

Note that this feature requires:

See also

@szaghi
Copy link
Member

szaghi commented Jul 6, 2017

@jacobwilliams

Hi Jacob,

I agree with many of your ideas, but some of them sound somehow complicated (especially from the compiler developers point of view).

Using also Python, I surely agree that Fortran string handling is too much limited. Someone claim the lack of bitstring, but I am still in doubt about what this really means. My main concerns about Fortran strings handling were related to the difficult to build array/list of variable size strings and the lack of many string manipulation builtins in Python. With StringiFor I solved many of such aspects, but I am not satisfied with StringiFor, mainly for the IO part. Currently I abuse of a simple trick: when I have to do IO of a StringiFor string I do:

use stringifor
type(string) :: var
character(99) :: buffer

  write(buffer, '(I5)') i ; var = trim(buffer)
  write(output_unit,'(A)') var//'' ! on the fly conversion to charcater by means of concat
  
  var = var%slice(1, 3)  ! very bad, but this the best I can do now

All these three case are unsatisfactory for me as for you. I tried to depict the Fortran string handling scenario into this paragraph and this is a summary of the comparison of StringiFor with other approaches.

Now that GNU Fortran support UDTIO I am going to complete it into StringiFor with hope that a minimal part of the burden in IO is limited.

Your suggestion about use_as_string sounds a bit non intuitive for my simple mind. I cannot remember a similar attribute for other types. It could be a great idea, but I have not fully understand all its implications, e.g. I have not figured out if the aliasing between var and str member could generate some unexpected result. Anyhow, this could be really a good idea, but I need to think much more on it. It could be helpful if you have reference on other languages having something similar to use_as_string attribute even for different type of variables (I am trying to figure if Python has something similar, but Python is so flexible that probably it does not need that attribute...).

Anyhow, great idea!

@jacobwilliams
Copy link
Member Author

I don't know of a reference for a similar feature in another language, because I just made it up! :) Probably it does have implications and the compiler writers will complain. I don't know. Maybe there's a better way? I hate having to use these "tricks".

The beauty of it is that it removes much of the burden from the programmer. All you have to do is add some attribute and suddenly all the problems are solved.

I will also think about it some more.

@cmacmackin
Copy link

What you're describing is a sort of automatic forwarding. If there is no type component, type-bound procedure, or procedure taking this type as an argument then we want the compiler to check if a particular member of the type has such a component, type-bound procedure, or procedure accepting it as an argument (as the case may be). Alternatively, we could achieve this by inheriting from built-in types. I would suggest going with the forwarding approach, as it could potentially allow for trying to forward to multiple type-components. Perhaps the syntax could be something like this:

type :: my_string
  character(len=:), allocatable, private, forward :: str
end type

One issues is that I suppose you could argue that the str component is no longer truly private in this case, then.

We could take this further and define a precedence for forwarding using any type of object.

type :: field
  real, dimension(:,:), allocatable, private, forward(1) :: data
  character(len=:), allocatable, private, forward(2) :: label
end type

This would mean that field1 + field2 is interpreted as field(field1%data + field2%data, field1%label), because the field type does not define a + operator, but the first "forwardable" component does. On the other hand, field1 // field2 is interpreted as field(field1%data, field1%label // field2%label), because neither the field type nor the first forwardable component define a // operator, but the second component does. Similarly, write(field1, '(I5)') i would be equivalent to write(field1%label, '(I5)'), i because neither the field type nor the first forwardable component could work in that context.

More details would probably be needed on the exact order of precedence. How this would interact with inheritance or locally defined procedures taking field as an argument would need to be defined. I'm pretty sure that would be doable, though.

@jacobwilliams
Copy link
Member Author

I like it! It's more general (but that could also be a bad thing since it's more work for the compiler writers). Yeah, the multiple forwarding thing could be tricky (how do languages that support this handle that?). I'm going to study it some more.

@cmacmackin
Copy link

This isn't a common feature. The only language I know of to support it is Go. I'm not familiar with Go, however, so I can't explain all of the subtleties.

I should say that I got inspiration from a big list of suggestions someone had linked to on Fortranwiki. Unfortunately, the link to it seems to be broken now (or else I can't find it). The idea to have multiple components which can be forwarded to is mine, however.

@jacobwilliams
Copy link
Member Author

jacobwilliams commented Jul 6, 2017

Check out Section 16 in ftp://ftp.nag.co.uk/sc22wg5/N1951-N2000/N1972.pdf.

[edited broken link: no way, GitHub markdown is not ftp-friendly...]

@cmacmackin
Copy link

That's the one I was thinking of. There are some other good suggestions in that document too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants