Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better sanitizeName #87

Open
arkanoid87 opened this issue Nov 23, 2023 · 2 comments
Open

Better sanitizeName #87

arkanoid87 opened this issue Nov 23, 2023 · 2 comments

Comments

@arkanoid87
Copy link

At the moment, Futhark applies renameCallback before other fixed renaming rules, including nimIdentNormalize pass, which nullify all the NEP1 transformations user supplied renameCallback apply.

I'd say nimIdentNormalize is not really NEP1 friendly

I've recently pushed PR to anycase package to support compile time string transformations https://github.com/epszaw/anycase/issues/6

What about replacing nimIdentNormalize in sanitizeName with something that is more renameCallback and NEP1 friendly? Or maybe provide an additional renameTailCallback to fix things after the fixed transformations?

@arkanoid87
Copy link
Author

arkanoid87 commented Nov 24, 2023

I tested my idea replacing nimIdentNormalize call in futhark.nim and GDAL library, and it seems to be working nicely

changes to futhark.nim sanitizeName proc

import anycase

...

proc sanitizeName(usedNames: var HashSet[string], origName: string, kind: string, renameCallback: RenameCallback, partof = ""): string {.compileTime.} =
  result = origName
  if not renameCallback.isNil:
    result = result.renameCallback(kind, partof)
  if result.startsWith("_"):
    if result.startsWith("__"):
      result = "compiler_" & result[2..^1]
    else:
      result = "internal_" & result[1..^1]

  # result = result.nimIdentNormalize()
  result = case kind:
    of "enum", "struct", "union", "const", "typedef": result.pascal # AnyCase
    of "proc", "arg", "enumval", "field": result.camel # anyCase
    else: result.nimIdentNormalize() # [A|a]nycase

  var renamed = false
  if usedNames.contains(result) or result in builtins:
    result.add kind
    renamed = true
  if usedNames.contains(result) or result in builtins:
    result.add hash(origName).toHex
    renamed = true
  if renamed:
    hint "Renaming \"" & origName & "\" to \"" & result & "\"" & (if partof.len != 0: " in " & partof else: "")
  usedNames.incl result

Results for GDAL library. The dirty part is still under user control via renameCallback

import std/[sugar, os, strutils]
import futhark


func renameCb(n, k: string, p = ""): string =
  result = n
  for prefix in ["GDAL_", "OGR_"]:
    if result.startsWith prefix:
      result = result.replace(prefix, "")
      break
  for prefix in ["GDAL", "OGR", "OCT", "OSR", "CPL", "VSI"]:
    if result.startsWith prefix:
      result = result.replace(prefix, prefix.toLower)
      break
  result = result.replace("_", "")


importc:
  outputPath currentSourcePath.parentDir / "gdal_c.nim"
  renameCallback renameCb
  "gdal/gdal.h"
  "gdal/ogr_api.h"
  "gdal/ogr_srs_api.h"
  "gdal/cpl_conv.h"
  "gdal/cpl_vsi.h"


proc main =
  var major, minor, patch: cint
  assert ogrGetGEOSVersion(major.addr, minor.addr, patch.addr)
  echo "GEOS version: ", major, ".", minor, ".", patch
  osrGetPROJVersion(major.addr, minor.addr, patch.addr)
  echo "PROJ version: ", major, ".", minor, ".", patch
  
  gdalAllRegister()
  let spatialReference = osrNewSpatialReference(nil)
  assert spatialReference.osrImportFromEPSG(4326) == 0

  var geometry: OgrGeometryH
  var wkt = "POINT(1 2)".cstring
  assert gCreateFromWkt(wkt.addr, spatialReference, geometry.addr) == 0

  dump geometry.gGetGeometryType()
  dump geometry.gGetGeometryName()
  let json = geometry.gExportToJson()
  dump json
  vsiFree(json)

  gDestroyGeometry(geometry)
  osrDestroySpatialReference(spatialReference)


main()

generated gdal_c.zip and rename hints

Hint: Renaming "__time_t" to "Timettypedef" [User]
Hint: Renaming "ptr" to "ptrarg" [User]
Hint: Renaming "is" to "isarg" [User]
Hint: Renaming "ptr" to "ptrarg" [User]
Hint: Renaming "ptr" to "ptrarg" [User]

output

GEOS version: 3.10.2
PROJ version: 8.2.1
geometry.gGetGeometryType() = wkbPoint
geometry.gGetGeometryName() = POINT
json = { "type": "Point", "coordinates": [ 2.0, 1.0 ] }

@PMunch
Copy link
Owner

PMunch commented Nov 24, 2023

This code has a slight issue, it won't properly handle name collisions. For anyone who didn't follow the IRC discussion when this came up the reason why sanitizeName calls nimIdentNormalize is so that the hash table lookup of names is guaranteed to get identifiers which will collide in Nim (but which might not collide in C because it's case and underscore sensitive). The proper workaround for this is to still use the normalized name in these data-structures but change to storing the original (or anycased) names and normalizing them when doing lookups.

As a motivating example I wrote a test case in the normalize branch which shows off the kind of thing the current solution was meant to handle. It tries to wrap this perfectly valid (albeit a bit extreme) C header file:

#define my_var 1
#define myVar 2
#define myvar 3
#define MYVAR 4
#define MY_VAR 5
#define MyVar 6
#define My_Var 7

The current version wraps this fine and renames fields to avoid collisions as we can see in the output:

/home/peter/Projects/futhark/src/futhark.nim(190, 5) Hint: Renaming "myVar" to "myvarconst" [User]
/home/peter/Projects/futhark/src/futhark.nim(190, 5) Hint: Renaming "myvar" to "myvarconst00000000C690172C" [User]
/home/peter/Projects/futhark/src/futhark.nim(190, 5) Hint: Renaming "MY_VAR" to "Myvarconst" [User]
/home/peter/Projects/futhark/src/futhark.nim(190, 5) Hint: Renaming "MyVar" to "Myvarconst000000002E4AA817" [User]
/home/peter/Projects/futhark/src/futhark.nim(190, 5) Hint: Renaming "My_Var" to "Myvarconst00000000038D4D97" [User]

The names get appended some information before falling back to a hash of the original identifier.

With your version of sanitizeName we instead see this output:

/home/peter/Projects/futhark/src/futhark.nim(197, 5) Hint: Renaming "myVar" to "MyVarconst" [User]
/home/peter/Projects/futhark/src/futhark.nim(197, 5) Hint: Renaming "MY_VAR" to "MyVarconst00000000AEC77163" [User]
/home/peter/Projects/futhark/src/futhark.nim(197, 5) Hint: Renaming "MyVar" to "MyVarconst000000002E4AA817" [User]
/home/peter/Projects/futhark/src/futhark.nim(197, 5) Hint: Renaming "My_Var" to "MyVarconst00000000038D4D97" [User]
/home/peter/Projects/futhark/src/futhark.nim(908, 3) Hint: Caching Futhark output in /home/peter/.cache/nim/tnormalize_d/futhark_33DDD4433451E496.nim [User]
/home/peter/Projects/futhark/src/futhark.nim(141, 21) Hint: Declaration of Myvar already exists, not redeclaring [User]
/home/peter/Projects/futhark/src/futhark.nim(141, 21) Hint: Declaration of MYVAR already exists, not redeclaring [User]

As we can see it doesn't properly rename all the identifiers, and some of them are aliased together in the wrapper making #define myvar 3 and #define MYVAR 4 now point to #define my_var 1. This means that these values are not possible to get with the wrapper, and with --nodeclguards the code fails to compile at all because of the re-definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants