Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance ArchGDAL OGR objects with parametric composite types #266

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

mathieu17g
Copy link
Collaborator

@mathieu17g mathieu17g commented Nov 22, 2021

Here is a draft for review, demonstrating:

  1. the use of ogr_f_stealgeometry for a gain of an additionnal 7% gain on layer to table conversion when there is only one geometry field
  2. a way to dispatch on type in getfield (to fix issue Consider turning the enums into types for dispatch #246) which brings 5% performance gain to the latter

Performance results
Current master

julia> @benchmark Tables.columntable(AG.getlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30
BenchmarkTools.Trial: 131 samples with 1 evaluation.
 Range (min  max):  194.701 ms  341.328 ms  ┊ GC (min  max): 0.00%  14.57%
 Time  (median):     210.470 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   229.013 ms ±  38.876 ms  ┊ GC (mean ± σ):  3.38% ±  5.16%

    ▅▁█▅█▁▂ ▁                                                    
  ▄▆███████▃█▅▅▃▄▄▄▄▃▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▃▃▅▃▄▁▅▃▁▆▁▅▃▁▄▁▁▃ ▃
  195 ms           Histogram: frequency by time          326 ms <

 Memory estimate: 8.98 MiB, allocs estimate: 373869.

With Shapefile.jl

julia> @benchmark Tables.columntable(Shapefile.Table("../../tmp/road/road.shp")) seconds = 30
BenchmarkTools.Trial: 187 samples with 1 evaluation.
 Range (min  max):  148.703 ms  179.867 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     160.331 ms               ┊ GC (median):    5.03%
 Time  (mean ± σ):   160.476 ms ±   6.494 ms  ┊ GC (mean ± σ):  3.30% ± 3.16%

       ▂   ▅  ▃    ▅▂▆▂        ▃             █ ▂▃ ▂              
  ▄▁▄▅▄██▅██▇▄██▅████████▇▅▄▄▄▅█▅██▇██▇▅▁▅▇█▅█▇██▅███▄▄▇▄▁▅▄▁▁▄ ▄
  149 ms           Histogram: frequency by time          174 ms <

 Memory estimate: 35.89 MiB, allocs estimate: 243040.

Current master with ogr_f_stealgeometry used in getcolumn

ulia> @benchmark Tables.columntable(AG.getlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30
BenchmarkTools.Trial: 140 samples with 1 evaluation.
 Range (min  max):  180.622 ms  344.234 ms  ┊ GC (min  max): 0.00%  14.01%
 Time  (median):     196.758 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   215.515 ms ±  45.714 ms  ┊ GC (mean ± σ):  3.48% ±  5.50%

   ▁▁ ▅▃█                                                        
  ▃████████▆▁▆▃▃▁▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃▁▁▃▃▄▄▃▄▃▃▃▁▁▃ ▃
  181 ms           Histogram: frequency by time          337 ms <

 Memory estimate: 8.98 MiB, allocs estimate: 373879.

With new OGR parametric types and with ogr_f_stealgeometry used in getcolumn

julia> @benchmark Tables.columntable(AG.getFDPlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30
BenchmarkTools.Trial: 147 samples with 1 evaluation.
 Range (min  max):  171.544 ms    1.002 s  ┊ GC (min  max): 0.00%  18.02%
 Time  (median):     187.186 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   204.451 ms ± 72.860 ms  ┊ GC (mean ± σ):  2.00% ±  3.25%

    ▃█▂▄                                                        
  ▅▆████▇▅▄▃▁▃▃▃▄▁▁▁▁▃▃▃▄▃▃▅▄▃▃▃▁▁▁▁▁▁▁▁▁▃▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▃
  172 ms          Histogram: frequency by time          359 ms <

 Memory estimate: 7.75 MiB, allocs estimate: 279381.

1. ArchGDAL OGR related types moved to parametric composite types

Trying to dispatch on types required:

1.1 Creation of two parametric singleton types for fields and geomfields, and of a featuredefn like type alias

  • FType for fields, which is paramaterized by a tuple of OGRFieldType and OGRFieldSubType
  • GType for geomfields, which is parameterized by an OGRwkbGeometryType
  • FDType for featuredefn, which is a 2-tuple of a tuple of GType and a tuple of FType

Note 1: I have added convert functions from FType and DataType which yields a more readable correspondance between Julia DataTypes and GDAL Field type ids, than the current separated convert functions from OGRFieldType and OGRFieldSubType to DataType

Note 2: I have also added a commented draft of new Geometry parametric singleon type (named Geom) based on GType but not used it since I have not identified any benefit

1.2 Duplication of most of OGR related composite types and parameterization with FType, GType and FDType

The duplicated types have the FDP prefix

  • FDP_GeomFieldDefn and FDP_FieldDefn types are parameterized by GType and FType
  • FDP_FeatureDefn, FDP_Feature and FDP_FeatureLayer types are parameterized by FDType

All of them have been set as subtypes of corresponding abstract supertypes DUAL_AbstractXXX, in order to limit the duplication of functions code when not necessary.

Note: A plain FeatureDefn parametric composite type could later be used as type parameter for Feature and FeatureLayer

1.3 Modification of a few functions, used to show a layer and convert it to a table

  • getFDPlayer to handle a feature layer data source with all the newly introduced parametric composite types
  • Some functions, which do not have any gain on using the FDType information, have just been extended to accept DUAL_AbstractXXX
  • Most of the functions benefiting for the FDType information, have been duplicated as generated functions

Note: In the event of the unlikely use case where there would be too many generated functions (a lot of different data sources with different featuredefn) for compilation to be performed in a reasonable time or even to succeed, the generated functions could be moved to optionally-generated functions. The fallbacks being the legacy functions

1.4 Side additions

  1. I have added a macro to export Enum type id for user convenience. Disabled in the PR because it is not convenient when developping using revise.jl because it redefines the related constants. I have not found a way to inhibit the latter

  2. I have generalized the ownedby field in feature layer related types which may serve as a basis to fix issue Model and handle relationships between GDAL objects systematically #215, but @yeesian is tackling this one. For all intents and purposes here are below the relationships which are proposed:
    Ownership relations graph
    OGR Julia objects ownership relations diagram V1

    destroy functions for FDP_XXX types have been created accordingly

2. Steal geometry instead of cloning for the first geometry in features

There appears that Tables.rows(layer) makes a copy of the layer.
Since ogr_f_stealgeometry is faster than the composition of ogr_f_getgeomfieldref and ogr_g_clone, I have modified Tables.getcolumn to use a stealgeom function instead of the current getgeom function.

Unfortunately, ogr_f_getgeomfieldref is implemented in GDAL C interface only for the default (first) geometry. Datasources with only one geometry per feature is the general case, but that could be an issue to raise in GDAL.

3. Further performance optimizations to investigate, regarding the layer to table conversion performance

  • Include field and geomfieldnames in feature definition type FDType to avoid calling ogr_f_getfieldindex
  • Count the number of calls to ogr_l_getnextfeature to ensure there is no unnecessary call to it
    EDIT: Done only one extra call to iterate compared to the number of features when converting a layer to table

@mathieu17g mathieu17g linked an issue Nov 22, 2021 that may be closed by this pull request
@mathieu17g
Copy link
Collaborator Author

mathieu17g commented Nov 26, 2021

  • Include field and geomfield names in feature definition type FDType to avoid calling ogr_f_getfieldindex

I have changed the FeatureDefn type parameter of ArchGDAL OGR objects from

FDType = Tuple{NTuple{NG,GType} where NG,NTuple{NF,FType} where NF}

which is the same as

FDType = Tuple{
    Tuple{Vararg{GType, NG}} where NG, 
    Tuple{Vararg{FType, NF}} where NF,
}

to (thanks to https://discourse.julialang.org/t/parametric-type-with-a-constrained-namedtuple/34511)

FDType = Tuple{
    NamedTuple{NG,<:Tuple{Vararg{GType}}} where NG,
    NamedTuple{NF,<:Tuple{Vararg{FType}}} where NF,
}
  1. a way to dispatch on type in getfield (to fix issue Consider turning the enums into types for dispatch #246) which brings 5% performance gain to the latter

The total performance gain with with the addition of avoiding most of the calls to ogr_f_getfieldindex is now 10% compared to 1.

Current master:

@benchmark Tables.columntable(AG.getlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30
BenchmarkTools.Trial: 142 samples with 1 evaluation.
 Range (min  max):  182.404 ms  335.852 ms  ┊ GC (min  max): 0.00%  10.29%
 Time  (median):     196.571 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   213.022 ms ±  37.221 ms  ┊ GC (mean ± σ):  3.27% ±  4.88%

   ▃█▅▄▂▄  ▄                                                     
  ▆██████▇██▇▃▃▁▃▁▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▁▁▃▃▅▄▇▁▃▄▆▃▅▃▁▁▁▁▁▁▁▁▁▃ ▃
  182 ms           Histogram: frequency by time          318 ms <

 Memory estimate: 8.98 MiB, allocs estimate: 373869.

Shapefile.jl:

julia> @benchmark Tables.columntable(Shapefile.Table("../../tmp/road/road.shp")) seconds = 30BenchmarkTools.Trial: 198 samples with 1 evaluation.
 Range (min  max):  142.824 ms  163.894 ms  ┊ GC (min  max): 0.00%  5.48%
 Time  (median):     152.970 ms               ┊ GC (median):    4.10%
 Time  (mean ± σ):   152.064 ms ±   4.070 ms  ┊ GC (mean ± σ):  2.95% ± 2.46%

                  ▃▂▂▄ ▄               ▇▂█▂▄▄▅                   
  ▃▁▁▃▁▃▃▅▆▅▆▆▅█▇█████▅██▃▃▅▃▃▅▁█▆██▆▇████████▆▃▆▇▇▆▅▅▁▁▁▃▃▁▁▁▃ ▃
  143 ms           Histogram: frequency by time          161 ms <

 Memory estimate: 35.89 MiB, allocs estimate: 243040.

Current master with geometry stealing:

julia> @benchmark Tables.columntable(AG.getlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30
BenchmarkTools.Trial: 144 samples with 1 evaluation.
 Range (min  max):  178.243 ms  312.567 ms  ┊ GC (min  max): 0.00%  13.64%
 Time  (median):     191.755 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   208.443 ms ±  40.801 ms  ┊ GC (mean ± σ):  3.39% ±  5.23%

    █▂▁▆▅▂▂                                                      
  ▃████████▅▄▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▁▁▃▃▃▄▃▅▁▃▄▃▃▃▃ ▃
  178 ms           Histogram: frequency by time          312 ms <

 Memory estimate: 8.98 MiB, allocs estimate: 373879.

New ArchGDAL OGR types with geometry stealing:

julia> @benchmark Tables.columntable(AG.getFDPlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30
BenchmarkTools.Trial: 169 samples with 1 evaluation.
 Range (min  max):  158.547 ms  245.862 ms  ┊ GC (min  max): 0.00%  7.86%
 Time  (median):     169.471 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   178.152 ms ±  22.915 ms  ┊ GC (mean ± σ):  1.48% ± 2.67%

   ▁▃▃▆▄▇█▁▃▂▆▁                                                  
  ▇████████████▇▄▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃▅▇▃▃▆▄▅▃▃▁▃▁▁▁▃ ▃
  159 ms           Histogram: frequency by time          243 ms <

 Memory estimate: 7.75 MiB, allocs estimate: 279394.

Beyond the call to ogr_l_getnextfeature, the last step taking some time is the IGeometry constructor

@mathieu17g
Copy link
Collaborator Author

@yeesian I'm not able to push the commit associated to my previous comment into branch Enhance_ArchGDAL_types. I got the following message:

remote: error: GH006: Protected branch update failed for refs/heads/Enhance_ArchGDAL_types.
remote: error: At least 1 approving review is required by reviewers with write access.
To https://github.com/yeesian/ArchGDAL.jl.git
! [remote rejected] Enhance_ArchGDAL_types -> Enhance_ArchGDAL_types (protected branch hook declined)
error: failed to push some refs to 'https://github.com/yeesian/ArchGDAL.jl.git'

Do you know what I did wrong ?

@yeesian
Copy link
Owner

yeesian commented Nov 26, 2021

I've updated the branch protection settings now, can you try again? Thanks!

@mathieu17g
Copy link
Collaborator Author

It stills fails, but I have a different message:

remote: error: GH006: Protected branch update failed for refs/heads/Enhance_ArchGDAL_types.
remote: error: Changes must be made through a pull request.
To https://github.com/yeesian/ArchGDAL.jl.git
! [remote rejected] Enhance_ArchGDAL_types -> Enhance_ArchGDAL_types (protected branch hook declined)
error: failed to push some refs to 'https://github.com/yeesian/ArchGDAL.jl.git'

@mathieu17g
Copy link
Collaborator Author

Shall I try a force push with lease ?

@yeesian
Copy link
Owner

yeesian commented Nov 28, 2021

It stills fails, but I have a different message

I have further relaxed the branch protection rules -- I'm so sorry for the inconvenience so far -- can you try yet again? I'll get to your PRs soon; have been on the road for the past few days.

@mathieu17g
Copy link
Collaborator Author

Great, it works. I was able to push commit a146c79 above

@mathieu17g
Copy link
Collaborator Author

Continuing to shave off time in layer to table conversion, I noticed that _infergeomtype could be optimized by modifying convert functions between GDAL CEnum.Cenum types to AG Enum types.

New convert functions

function convert(::Type{OGRwkbGeometryType}, gogtinst::GDAL.OGRwkbGeometryType)
    return OGRwkbGeometryType(Integer(gogtinst))
end
function convert(::Type{GDAL.OGRwkbGeometryType}, ogtinst::OGRwkbGeometryType)
    return GDAL.OGRwkbGeometryType(Integer(ogtinst))
end

The convert functions code looks simpler and faster like this, and could be extended to other similar conversions between ArchGDAL Enum and GDAL CEenum.Cenum

This has the prerequisite to align OGRwkbGeometryType Enum instances' assigned values to those of GDAL.OGRwkbGeometryType. It should not have any side effect.

Results

  • With current master:
julia> geom = AG.fromWKT("POINT (1 2)")
Geometry: POINT (1 2)

julia> @benchmark AG._infergeomtype($(geom.ptr))
BenchmarkTools.Trial: 10000 samples with 949 evaluations.
 Range (min  max):  83.511 ns  209.497 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     97.937 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   95.838 ns ±  13.089 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █ ▆ ▂▂       ▄█▂▁▅ ▂▂  ▁▁▁▃▁▁          ▁     ▂               ▂
  █▂█▄██▆▆▅▄█▆██████████▇███████▇▇▇█▇▇▇▇▇██▆▆▇▇█▆▆▆▇▆▅▆▆▄▅▅▆▅▆ █
  83.5 ns       Histogram: log(frequency) by time       143 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
  • With new convert functions:
julia> geom = AG.fromWKT("POINT (1 2)")
Geometry: POINT (1 2)

julia> @benchmark AG._infergeomtype($(geom.ptr))
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
 Range (min  max):  18.790 ns  78.432 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     19.063 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.519 ns ±  2.273 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▄ ▅  ▁                      ▂ ▂               ▁           ▂
  ███▅██▄█▃▅▄▃▄▁▁▄▃▁▃▁▃▁▁▁▁▁▃▁▁▁█▆█▁▁▁▁▁▁▁▁▁▃▁▁▆█▅█▆▆▇▆▅▅▆▆▆▇ █
  18.8 ns      Histogram: log(frequency) by time      29.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Next

I will go saving time with a trial to implement a specialized version of Tables.eachcolumns to strip some time from calls to ArchGDAL.getfield

@mathieu17g
Copy link
Collaborator Author

Next

I will go saving time with a trial to implement a specialized version of Tables.eachcolumns to strip some time from calls to ArchGDAL.getfield

I have copied and specialized Tables.jl/src/fallback.jl and Tables.jl/src/utils.jl. I had to use GeneralizedGenerated.jl to solve a world age problem with generated functions
And now we are closing in on Shapefile.jl performance for layer to table conversion

julia> display(@benchmark DataFrame(AG.getFDPlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30)
       display(@benchmark DataFrame(Shapefile.Table("../../tmp/road/road.shp")) seconds = 30)
BenchmarkTools.Trial: 181 samples with 1 evaluation.
 Range (min  max):  146.779 ms  251.964 ms  ┊ GC (min  max): 0.00%  6.67%
 Time  (median):     157.170 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   165.837 ms ±  26.339 ms  ┊ GC (mean ± σ):  0.99% ± 2.04%

   ▂▇▅█▅▁▁ ▆▁▁                                                   
  ▅███████▇███▅▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▁▃▁▄▁▄▃▃▃▃ ▃
  147 ms           Histogram: frequency by time          251 ms <

 Memory estimate: 6.32 MiB, allocs estimate: 134898.
BenchmarkTools.Trial: 194 samples with 1 evaluation.
 Range (min  max):  146.609 ms  164.070 ms  ┊ GC (min  max): 0.00%  7.49%
 Time  (median):     152.686 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   154.683 ms ±   5.141 ms  ┊ GC (mean ± σ):  2.93% ± 3.05%

      ▂  ▄▃▂▅▃█▄▂▅                            ▄▂  ▇ ▃            
  ▃▁▅▁█▆▁█████████▇▆▅▆▃▇▅▁▁▁▁▁▁▁▅▁▁▁▁▃▃▅▅▆██████▇▇█▆█▆▃█▅▅▃▃▃▁▃ ▃
  147 ms           Histogram: frequency by time          164 ms <

 Memory estimate: 36.71 MiB, allocs estimate: 243065.

@mathieu17g
Copy link
Collaborator Author

@visr you mentioned in comment #238 (comment) that you had a case where layer to table conversion took a longtime

@mathieu17g we can discuss copies and such in a different issue. I still wanted to create one, because I saw that converting a layer to DataFrame for a large vector file took minutes.

Could you hand me the vector file example, to test it with this PR ?

@visr
Copy link
Collaborator

visr commented Dec 8, 2021

I believe in that case I was reading the HydroATLAS / BasinATLAS data. This is a direct link to the download: https://figshare.com/articles/dataset/HydroATLAS_version_1_0/9890531?file=20087237

One nice property of this dataset is that is has detail levels 1 to 12, which gradually get larger. Especially the higher levels started to get really slow I noticed.

@mathieu17g
Copy link
Collaborator Author

@visr Ok thanks. I will test the PR with this file set

@mathieu17g
Copy link
Collaborator Author

@visr I have reimplemented Tables.columns to avoid the long compilation time for cases like with HydroATLAS data

Results

  1. Faster than with Shapefile uncompiled or compiled, starting from level 6
  2. Nevermore than x2 slower uncompiled compared to Shapefile. The worst case being level 1 : 4s vs 2s on my mac (8 years old), which is still reasonable compared to current ArchGDAL master which takes around 150s for level 1+

Performance with HydroATLAS basin data level 6

Shapefile.jl

Uncompiled Shapefile:    7.719764 seconds (10.36 M allocations: 792.434 MiB, 6.68% gc time)
Compiled Shapefile:    BenchmarkTools.Trial: 50 samples with 1 evaluation.
 Range (min  max):  1.790 s    2.263 s  ┊ GC (min  max): 1.03%  10.02%
 Time  (median):     1.979 s              ┊ GC (median):    0.92%
 Time  (mean ± σ):   1.999 s ± 94.965 ms  ┊ GC (mean ± σ):  1.11% ±  1.29%

                     ▁▃▁█          ▁ ▃ ▁                     
  ▇▁▁▄▁▁▄▁▁▄▄▄▄▁▁▁▄▄▁████▄▄▁▄▄▄▁▁▄▄█▇█▄█▄▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  1.79 s         Histogram: frequency by time        2.26 s <

 Memory estimate: 501.84 MiB, allocs estimate: 5446511.

ArchGDAL 0.7.4

Uncompiled AG 0.7.4 :  155.592979 seconds (69.32 M allocations: 3.783 GiB, 0.67% gc time, 93.75% compilation time)
Compiled AG 0.7.4 :    BenchmarkTools.Trial: 5 samples with 1 evaluation.
 Range (min  max):  9.593 s     9.887 s  ┊ GC (min  max): 2.33%  2.87%
 Time  (median):     9.750 s               ┊ GC (median):    2.73%
 Time  (mean ± σ):   9.739 s ± 124.606 ms  ┊ GC (mean ± σ):  2.68% ± 0.22%

  █       █                      █              █          █  
  █▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁█ ▁
  9.59 s         Histogram: frequency by time         9.89 s <

 Memory estimate: 443.71 MiB, allocs estimate: 10380858.

FeatureLayer with stealgeom and new (Tables.)columns

Uncompiled AG normal:    3.652381 seconds (11.98 M allocations: 291.840 MiB, 1.87% gc time, 38.22% compilation time)
Compiled AG normal:    BenchmarkTools.Trial: 50 samples with 1 evaluation.
 Range (min  max):  2.418 s     7.823 s  ┊ GC (min  max): 7.82%  6.07%
 Time  (median):     2.496 s               ┊ GC (median):    7.93%
 Time  (mean ± σ):   2.690 s ± 812.183 ms  ┊ GC (mean ± σ):  7.66% ± 0.72%

  █▄▄                                                         
  ███▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
  2.42 s       Histogram: log(frequency) by time      7.82 s <

 Memory estimate: 179.96 MiB, allocs estimate: 10161324.

FDP_FeatureLayer with (Tables.)columns using plain @generated read function

Uncompiled AG FDP pg:    5.266566 seconds (11.17 M allocations: 492.128 MiB, 6.47% gc time, 3.22% compilation time)
Compiled AG FDP pg:    BenchmarkTools.Trial: 50 samples with 1 evaluation.
 Range (min  max):  1.458 s     3.087 s  ┊ GC (min  max): 1.19%  0.74%
 Time  (median):     1.634 s               ┊ GC (median):    1.31%
 Time  (mean ± σ):   1.694 s ± 244.873 ms  ┊ GC (mean ± σ):  1.28% ± 0.13%

       █▄ ▂                                                   
  ▅▁▁▇███▅█▆▆▅▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
  1.46 s         Histogram: frequency by time         3.09 s <

 Memory estimate: 148.99 MiB, allocs estimate: 5379084.

Making of

I have kept all the Table.columns reimplementation trials in the code of the commit below for archive in case we have to come back to it later

I have dropped the use of immutable Tuples and the line by line column types widening in favor the use of mutable Vectors with wide types and post parsing for type narrowing as sketched for the schema build in issue #223

For the feature reading, I have tried to reimplement eachcolumns from Tables.jl/src/utils.jl with:

  • GeneralizedGenerated.jl:
  • RuntimeGeneratedFunction.jl:

But using plain generated function and get column turned out to be faster both uncompiled and compiled. It is counter intuitive, and maybe my use of GeneralizedGenerated.jl and RuntimeGeneratedFunction.jl could be enhanced (reason why I left the code in the commit)

Results for bigger HydroATLAS file: level 12

Shapefile 1st run:  173.921930 seconds (347.88 M allocations: 20.671 GiB, 11.56% gc time)
Shapefile 2nd run:  190.139144 seconds (342.95 M allocations: 20.386 GiB, 17.76% gc time)

AG FDP pg 1st run:   98.029999 seconds (353.07 M allocations: 9.472 GiB, 8.89% gc time, 0.19% compilation time)
AG FDP pg 2nd run:   93.194404 seconds (348.00 M allocations: 9.178 GiB, 8.46% gc time, 0.02% compilation time)

Results for road.shp (used in previous commits/comments)

Uncompiled Shapefile:    2.908215 seconds (5.19 M allocations: 328.995 MiB, 11.32% gc time)
Compiled Shapefile:    BenchmarkTools.Trial: 193 samples with 1 evaluation.
 Range (min  max):  146.924 ms  183.801 ms  ┊ GC (min  max): 0.00%  5.49%
 Time  (median):     156.321 ms               ┊ GC (median):    4.59%
 Time  (mean ± σ):   155.529 ms ±   5.481 ms  ┊ GC (mean ± σ):  2.91% ± 2.76%

       ▄ ▅▂▄▅▂▂ ▂                  ▄▇ █▄▇                        
  ▅▃▅▅▅██████████▆▆▅▃▅▁▃▃▆▁▅▅▆▃██▆▅██▅███▅▆██▅▆▅▅▅▆▃▁▃▃▅▁▁█▁▃▃▃ ▃
  147 ms           Histogram: frequency by time          167 ms <

 Memory estimate: 36.71 MiB, allocs estimate: 243067.

Uncompiled AG FDP pg:    2.505719 seconds (5.13 M allocations: 294.994 MiB, 3.47% gc time, 7.39% compilation time)
Compiled AG FDP pg:    BenchmarkTools.Trial: 161 samples with 1 evaluation.
 Range (min  max):  166.438 ms  263.748 ms  ┊ GC (min  max): 0.00%  7.81%
 Time  (median):     178.569 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   187.196 ms ±  25.490 ms  ┊ GC (mean ± σ):  1.39% ± 2.57%

   █▃▅▂▇  ▄▃  ▂                                                  
  ▆██████████▆█▆▄▄▃▃▁▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▄▃▃▄▁▅▃▁▁▃▃▃▁▄▃▃▁▃▁▅ ▃
  166 ms           Histogram: frequency by time          260 ms <

 Memory estimate: 8.17 MiB, allocs estimate: 292907.

@visr
Copy link
Collaborator

visr commented Dec 16, 2021

Great that you were able to improve the performance so much! I don't really have experience with generated / RuntimeGeneratedFunctions / GeneralizedGenerated, so it's hard for me to comment on it, other than that it's probably nice if we don't have to add a dependency. Do any of the limitations of generated functions from Base cause issues, like the world-age issues or closures? What was the reason to look into those packages?

@mathieu17g
Copy link
Collaborator Author

mathieu17g commented Dec 16, 2021

Do any of the limitations of generated functions from Base cause issues, like the world-age issues or closures?
What was the reason to look into those packages?

I had the world age issue when trying to call directly asXXX functions.

But the overhead of calling getindex on a vector of asXXX functions at runtime appears to be mitigated somehow by the use of Base.@generated compared to using a FeatureDefn tailored vector function (field1, ..., fieldn) -> (astypeof field1(field1), ..., astypeoffieldn(fieldn)) built with @RuntimeGeneratedFunction.

So the use of Base.@generated functions is faster and simpler. I think it’s good enough.

I’ll make a PR on master for the new Tables.columns for AbstractFeatureLayer

- Reverted from `stealgeom` to `getgeom` in `getcolumn` to repair Tables unit tests
- `Tables.schema` back to return `nothing` and `Schema` extraction functions from FeatureDefn called `gdal_schema` since the result is not a reliable `Table.Schema` (see Shapefile driver handling of mixed Line and MultiLine handling)
@mathieu17g
Copy link
Collaborator Author

Repairing Tables methods unit testing, I had to revert from stealgeom to getgeom.

The impact on performance is limited for HydroATLAS basin test files and French main roads test file (see tests results below). FDP layer to table conversion is on par or faster (for big files) compared to Shapefile.jl

stealgeom could still be interesting:

  • for files with a lot of geometry columns
  • provided GDAL implements OGR_F_StealGeometry(int iGeomField) besides OGR_F_StealGeometry()
  • if we add a Table object to ArchGDAL (as in Shapefile.jl), built directly on a file name and layer number, to avoid leaving a layer with missing (stolen) geometries after a call to Tables.columns or Tables.rows

HydroATLAS Basin level 12

Shapefile 1st run:  177.759011 seconds (347.88 M allocations: 20.670 GiB, 12.77% gc time)
Shapefile 2nd run:  186.639426 seconds (342.95 M allocations: 20.386 GiB, 17.35% gc time)
AG FDP pg 1st run:  103.441506 seconds (353.07 M allocations: 9.472 GiB, 9.15% gc time, 0.20% compilation time)
AG FDP pg 2nd run:   97.793840 seconds (348.00 M allocations: 9.178 GiB, 8.61% gc time, 0.01% compilation time)

HydroATLAS Basin level 6

Shapefile 1st run:    4.370634 seconds (10.36 M allocations: 792.293 MiB, 11.53% gc time)
Compiled Shapefile: BenchmarkTools.Trial: 16 samples with 1 evaluation.
 Range (min  max):  1.782 s     2.311 s  ┊ GC (min  max): 0.00%  14.18%
 Time  (median):     1.922 s               ┊ GC (median):    4.01%
 Time  (mean ± σ):   1.944 s ± 113.983 ms  ┊ GC (mean ± σ):  5.11% ±  3.67%

               ▃ █                                            
  ▇▁▁▁▁▁▁▁▇▁▇▇▁█▇█▇▁▇▇▁▁▇▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  1.78 s         Histogram: frequency by time         2.31 s <

 Memory estimate: 501.84 MiB, allocs estimate: 5446511.

AG FDP pg 1st run:    4.993936 seconds (10.46 M allocations: 450.400 MiB, 2.49% gc time, 3.47% compilation time)
Compiled AG FDP pg: BenchmarkTools.Trial: 17 samples with 1 evaluation.
 Range (min  max):  1.553 s     1.979 s  ┊ GC (min  max): 1.75%  1.98%
 Time  (median):     1.839 s               ┊ GC (median):    1.91%
 Time  (mean ± σ):   1.813 s ± 132.176 ms  ┊ GC (mean ± σ):  1.95% ± 0.71%

  █    ▁                     ▁  ▁     █▁▁ █   ▁ ▁  ▁ ▁     █  
  █▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁███▁█▁▁▁█▁█▁▁█▁█▁▁▁▁▁█ ▁
  1.55 s         Histogram: frequency by time         1.98 s <

 Memory estimate: 148.99 MiB, allocs estimate: 5379083.

French main roads file

Shapefile 1st run:    2.446287 seconds (5.19 M allocations: 329.013 MiB, 5.75% gc time)
Compiled Shapefile: BenchmarkTools.Trial: 192 samples with 1 evaluation.
 Range (min  max):  147.897 ms  170.355 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     157.005 ms               ┊ GC (median):    4.41%
 Time  (mean ± σ):   156.399 ms ±   4.845 ms  ┊ GC (mean ± σ):  2.94% ± 2.80%

        ▂ ▁ ▂  ▁▇█▁   ▂ ▁           ▁ ▂      ▄▄▂▁▂  ▅            
  ▃▃▃▃▃▃█▅█▆█████████▃███▁▅▁▁▅▁▁▃▁▅▃█▅█▆▅▃█▆██████▃██▁█▆▃▅▅▅▆▁▅ ▃
  148 ms           Histogram: frequency by time          165 ms <

 Memory estimate: 36.71 MiB, allocs estimate: 243067.

AG FDP pg 1st run:    2.479176 seconds (5.13 M allocations: 295.079 MiB, 4.28% gc time, 8.29% compilation time)
Compiled AG FDP pg: BenchmarkTools.Trial: 153 samples with 1 evaluation.
 Range (min  max):  176.988 ms  274.031 ms  ┊ GC (min  max): 0.00%  7.41%
 Time  (median):     187.198 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   196.996 ms ±  24.662 ms  ┊ GC (mean ± σ):  1.46% ± 2.73%

    ▁█▇ ▂  ▁                                                     
  ▃▄█████▇▇█▄▅▅▇▃▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▃▃▃▃▃▃▃▃▃▃▃▁▃▃ ▃
  177 ms           Histogram: frequency by time          264 ms <

 Memory estimate: 8.17 MiB, allocs estimate: 292898.

- Fixed `NamedTuple` creation in `Tables.columns` for Julia >= 1.7
- Fixed `_gtnames` for Julia >= 1.7
@yeesian yeesian requested review from visr and removed request for yeesian December 20, 2021 09:41
@yeesian
Copy link
Owner

yeesian commented Dec 20, 2021

I'm really sorry, but I'm going through a period of bereavement due to a personal loss, and nominate @visr to review this PR in my absence. If he might be unavailable to do so, I'll review it when I'm in the appropriate state to do so.

(At a high level, performance matters to me: I'm partial to both this PR and #243 and will allow for them to introduce breaking/experimental changes since it's still pre-1.0 and might give some directional information/guidance on workflows and APIs that enable performance gains. I'd review with an eye towards complexity and maintainability, but my preferences are ultimately non-blocking and I'm willing to postpone them until after @mathieu17g has explored the space of generated functions and unexploited GDAL functions that are important for optimization and might have implications for the API.)

@visr
Copy link
Collaborator

visr commented Dec 20, 2021

Really sorry to hear that @yeesian. Of course take all the time you need, I wish you strength during a difficult time.

I can probably help out a bit here with reviewing. As it is right now I don't fully understand the PR, but that also has to do with my limited experience with generated functions and since I didn't properly dive into the code yet. @mathieu17g hopefully when you come out of the (impressive) rabbit hole here the result will be sufficiently simple.

@mathieu17g
Copy link
Collaborator Author

@yeesian, I'm so sorry for your loss. Thank you for having taken the time to share your views with us in those difficult times.

As of this PR and @generated functions, they are quite adapted to the GDAL transposition to Julia for fields manipulation. They fill the performance gap with what a C compiler would optimize on field manipulations when doing something like building a dataframe.
It remains to be seen, if there is any gain outside this use case. If not we could end up confining the new parameterized types and the associated generated functions to the use case of direct conversion from geofiles to table source.

@mathieu17g
Copy link
Collaborator Author

mathieu17g commented Jan 14, 2022

@visr I have:

  • confined all the new parametric types, related methods, Table object and Tables.jl interface methods in a single file (src/tables2.jl)
  • kept what methods I could in the original files by using abstract types regrouping current non parametric types and new parametric types
  • added tests on Tables.jl interface and complemented the coverage with unit testing on parametric types and methods, including methods left in original files

Could you please have a look to it, and test its performance on your own.
BenchmarkCI judgement gives results a bit different from the ones I get locally, without any order change nonetheless.
Below a copy of commit 2ca3a4c results:

                                                                          ID            time GC time         memory allocations
  –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––––––– ––––––– –––––––––––––– –––––––––––
                              ["shapefile_to_table", "frenchroads_ArchGDAL"] 285.500 ms (5%)          8.43 MiB (1%)      373899
         ["shapefile_to_table", "frenchroads_ArchGDAL_new_Tables_interface"] 238.089 ms (5%)          7.19 MiB (1%)      279339
  ["shapefile_to_table", "frenchroads_ArchGDAL_new_Tables_interface_vsizip"] 366.701 ms (5%)          7.20 MiB (1%)      279381
                       ["shapefile_to_table", "frenchroads_ArchGDAL_vsizip"] 412.876 ms (5%)          8.43 MiB (1%)      373941
                             ["shapefile_to_table", "frenchroads_Shapefile"] 187.883 ms (5%)         35.39 MiB (1%)      243030

@mathieu17g mathieu17g marked this pull request as ready for review January 14, 2022 18:50
@visr
Copy link
Collaborator

visr commented Jan 15, 2022

@mathieu17g I'm a little bit lost. It would be of great help to me if you could write a high level summary of (1) the issue that this PR is trying to fix, and (2) how the proposed changes help fix this issue. I have only a rough understanding about ArchGDAL Tables performance issues. If there is a simple change that solves the largest performance bottleneck, but leaves another 10% on the table, that may well be preferred.

I see you introduce new parametrized structs besides the original unparametrized ones. Is that for now to isolate the changes, or is there a reason we'd have to keep both around?

@mathieu17g
Copy link
Collaborator Author

@visr sure, I will make a summary

@mathieu17g
Copy link
Collaborator Author

@visr this PR tries to solve two issues and introduces some more limited enhancements:

  1. Consider turning the enums into types for dispatch #246 by dispatching getfield on types: x2.5 speedup but complex parametric types are introduced
  2. your unreferenced issue concerning the first run performance on ArchGDAL table interface for HydroATLAS basin shapefiles: x50 speedup
  3. other enhancements:
    1. Geometry stealing instead of cloning: from x1.5 to x500 speedup
    2. Draft of dependency between GDAL OGR objects mapping in ArchGDAL, sketched on new parametric types
    3. Simplification of OGRwkbGeometryType conversions between GDAL and ArchGDAL, used in ArchGDAL Geometry and IGeometry constructors: x30 speedup

The parametric types introduced to solve the 1st issue can be:

  1. extended to replace current ArchGDAL OGR related types
  2. restricted to the table interface implementation and hidden within a Table struct

Since getfield is mainly used in the ArchGDAL table interface where we read and convert all fields of all features of a feature layer, the performance enhancement is mainly felt on the table interface. So option 2 allows to limit the ArchGDAL overall code complexity. The replacement of ArchGDAL OGR related types could be considered later should a new benefit arise. But if both @visr and @yeesian are confortable with this parametric types, we can go for a replacement now.

Follow-ups besides the parametric types handling:

  • Update the documentation (merge prerequisite)
  • Propose a PR on GDAL to implement OGR_F_StealGeometryEx( OGRFeatureH, int iGeomField ) (see issue), and update stealgeom(::FDP_AbstractFeature{FD}, ::Integer)
  • Raise an issue on Tables.jl shoud the HydroATLAS basin shapefiles case be encountered by other tools implementing the Tables.jl interface
  • Transpose the modelization of inter GDAL OGR objects done on parametric types to non parametric types to address Model and handle relationships between GDAL objects systematically #215
  • Further investigate the usage of runtime generated functions, to see if the overhead can be reduced

1. getfield dispatch on field types

  • getfield is called on Feature. Feature type does not expose any information on the field types. I added this information to the Feature type through a feature defintion like parameter to the Feature type.
    This gives:

    • FDP_xxx types, "FDP" standing for Feature Definition Parameterized. See 1st comment §1.4.2 above for details

    • a generated getfield function, which brings a x2.5 speedup

      code and benchmark

      @generated function getfield(
          fdp_feature::FDP_AbstractFeature{FD},
          i::Integer,
      ) where {FD<:FDType}
          return quote
              return if !isfieldset(fdp_feature, i)
                  nothing
              elseif isfieldnull(fdp_feature, i)
                  missing
              else
                  $(_get_fields_asfuncs(FD))[i+1](fdp_feature, i)
              end
          end
      end
      julia> layer = AG.getlayer(AG.read("benchmark/data/road/road.shp")) # You may have to run BenchmarkCI.judge() locally, to get the datafile available in ArchGDAL/benchmark/
             feature = iterate(layer)[1]
             @btime AG.getfield($feature, 0); nothing
        181.913 ns (1 allocation: 16 bytes)
      
      julia> fdp_layer = AG.getFDPlayer(AG.read("benchmark/data/road/road.shp"))
             fdp_feature = iterate(fdp_layer)[1]
             @btime AG.getfield($fdp_feature, 0); nothing
        73.403 ns (0 allocations: 0 bytes)

      Note that I tried to further enhance performance by dispatching on Val{i}(), but to no avail since the time spent on the Val contructor is roughly the same as the time saved by avoiding the getindex in getfield

  • When used in the table interface by a client via Tables.getcolumn or Tables.eachcolumn (SQLite.jl) or Tables.eachcolumns (DataFrames.jl), the speedup remains the more or less the same depending of the cases: x2.5

    table interface row (feature) retrieval benchmark

    julia> file = "benchmark/data/road/road.shp" # You may have to run BenchmarkCI.judge() locally, to get the datafile available in ArchGDAL/benchmark/
    "benchmark/data/road/road.shp"
      
    julia> layer = AG.getlayer(AG.read(file))
           feature = iterate(layer)[1]
           (;names, types) = AG.gdal_schema(layer)
           cols = [Vector{types[i]}(undef, 1) for i in 1:(AG.ngeom(feature) + AG.nfield(feature))]
           fill_row = (val::Any, i::Int, nm::Symbol) -> cols[i][1] = val
           sch = Tables.Schema(names, nothing)
           @btime Tables.eachcolumn($fill_row, $sch, $feature)
      3.274 μs (18 allocations: 368 bytes)
    
    julia> fdp_layer = AG.getFDPlayer(AG.read(file))
           fdp_feature = iterate(fdp_layer)[1]
           types = AG.gdal_schema(fdp_layer).types
           cols = [Vector{types[i]}(undef, 1) for i in 1:(AG.ngeom(fdp_feature) + AG.nfield(fdp_feature))]
           @btime AG.FDPf2c($fdp_feature, 1, $cols)
      1.416 μs (9 allocations: 224 bytes)

    Note that I tried to spare the time spent in getindex in Tables.eachcolumn and Tables.eachcolumns this time via runtime generated function (RuntimeGeneratedFunction.jl and GeneralizedGenerated.jl see src/tables_columns.jl), but to no avail. These runtime generated functions have an overhead bigger than the call to getindex when looping getfield on all the fields of a feature.
    THIS SHOULD PROBABLY BE FURTHER INVESTIGATED

Conclusion:

2. Table interface performance enhancement

@visr you raised the issue that with ArchGDAL ≤ v0.8.0, which uses Tables.jl fallback function Tables.columns with nothing as Schema, the reading of files with many fields was taking "minutes".
Your case takes on my mac: ~150s at the first run vs ~3s at the second

ArchGDAL v0.8.0 table interface uncompiled performance on HydroATLAS basin shapefile level 1

julia> using Pkg; Pkg.precompile(); Pkg.status("ArchGDAL")
       import ArchGDAL as AG
       using DataFrames; using BenchmarkTools; using Printf
       level = 1; file = "tmp/HydroATLAS_v10/BasinATLAS_Data_v10_shp/BasinATLAS_v10_shp/BasinATLAS_v10_lev$(@sprintf("%02i", level)).shp"
       @time DataFrame(AG.getlayer(AG.read(file), 0))      
       Status `~/.../Project.toml`
  [c9ce4bd3] ArchGDAL v0.8.0 `dev/ArchGDAL`
156.487261 seconds (55.14 M allocations: 3.081 GiB, 0.52% gc time, 99.42% compilation time)
10×295 DataFrame
 Row │                            HYBAS_ID    NEXT_DOWN  NEXT_SINK   MAIN_BAS    DIST_SINK  DIST_MAIN  SUB_AREA   UP_AREA    PFAF_ID 
     │ IGeometr                  Int64       Int64      Int64       Int64       Float64    Float64    Float64    Float64    Int64   
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ Geometry: wkbMultiPolygon  1010000010          0  1010000010  1010000010        0.0        0.0  2.99531e7  2.99531e7        1 
   2 │ Geometry: wkbMultiPolygon  2010000010          0  2010000010  2010000010        0.0        0.0  1.78589e7  1.78589e7        2
   3 │ Geometry: wkbMultiPolygon  3010000010          0  3010000010  3010000010        0.0        0.0  1.29491e7  1.29491e7        3
   4 │ Geometry: wkbMultiPolygon  4010000010          0  4010000010  4010000010        0.0        0.0  2.08274e7  2.08274e7        4
   5 │ Geometry: wkbMultiPolygon  5010000010          0  5010000010  5010000010        0.0        0.0  1.08038e7  1.08038e7        5 
   6 │ Geometry: wkbMultiPolygon  6010000010          0  6010000010  6010000010        0.0        0.0  1.78535e7  1.78535e7        6
   7 │ Geometry: wkbMultiPolygon  7010000010          0  7010000010  7010000010        0.0        0.0  1.59146e7  1.59146e7        7
   8 │ Geometry: wkbMultiPolygon  8010000010          0  8010000010  8010000010        0.0        0.0  6.19709e6  6.19709e6        8
   9 │ Geometry: wkbMultiPolygon  8010020760          0  8010020760  8010020760        0.0        0.0  1.17002e5  1.17002e5        3 
  10 │ Geometry: wkbMultiPolygon  9010000010          0  9010000010  9010000010        0.0        0.0  2.14672e6  2.14672e6        9
                                                                                                                   285 columns omitted

Raising the SPECIALIZATION_THRESHOLD and related values in Tables.jl/src/utils.jl (in @nexprs) above the number of fields (400 tried for the basin shapefile level 1 which has around 300 fields), does not bring any performance enhancement

ArchGDAL v0.7.4 table interface uncompiled performance on HydroATLAS basin shapefile level 1, with raised to 400 SPECIALIZATION_THRESHOLD and related values in Tables.jl v1.6.1

julia> using Pkg; Pkg.update(); Pkg.status("ArchGDAL")
       import ArchGDAL as AG
       using DataFrames; using BenchmarkTools; using Printf
       level = 1; file = "tmp/HydroATLAS_v10/BasinATLAS_Data_v10_shp/BasinATLAS_v10_shp/BasinATLAS_v10_lev$(@sprintf("%02i", level)).shp"
       @time DataFrame(AG.getlayer(AG.read(file), 0))
    Updating registry at `~/.julia/registries/General`
    Updating git-repo `https://github.com/JuliaRegistries/General.git`
  No Changes to `~/.../Project.toml`
  No Changes to `~/.../Manifest.toml`
Precompiling project...
  18 dependencies successfully precompiled in 49 seconds (242 already precompiled)
      Status `~/.../Project.toml`
  [c9ce4bd3] ArchGDAL v0.7.4 `dev/ArchGDAL`
159.786486 seconds (61.80 M allocations: 3.383 GiB, 0.56% gc time, 99.45% compilation time)
10×295 DataFrame
 Row │                            HYBAS_ID    NEXT_DOWN  NEXT_SINK   MAIN_BAS    DIST_SINK  DIST_MAIN  SUB_AREA   UP_AREA    P 
     │ IGeometr                  Int64       Int64      Int64       Int64       Float64    Float64    Float64    Float64    I 
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ Geometry: wkbMultiPolygon  1010000010          0  1010000010  1010000010        0.0        0.0  2.99531e7  2.99531e7    
   2 │ Geometry: wkbMultiPolygon  2010000010          0  2010000010  2010000010        0.0        0.0  1.78589e7  1.78589e7
   3 │ Geometry: wkbMultiPolygon  3010000010          0  3010000010  3010000010        0.0        0.0  1.29491e7  1.29491e7
   4 │ Geometry: wkbMultiPolygon  4010000010          0  4010000010  4010000010        0.0        0.0  2.08274e7  2.08274e7
   5 │ Geometry: wkbMultiPolygon  5010000010          0  5010000010  5010000010        0.0        0.0  1.08038e7  1.08038e7    
   6 │ Geometry: wkbMultiPolygon  6010000010          0  6010000010  6010000010        0.0        0.0  1.78535e7  1.78535e7
   7 │ Geometry: wkbMultiPolygon  7010000010          0  7010000010  7010000010        0.0        0.0  1.59146e7  1.59146e7
   8 │ Geometry: wkbMultiPolygon  8010000010          0  8010000010  8010000010        0.0        0.0  6.19709e6  6.19709e6
   9 │ Geometry: wkbMultiPolygon  8010020760          0  8010020760  8010020760        0.0        0.0  1.17002e5  1.17002e5    
  10 │ Geometry: wkbMultiPolygon  9010000010          0  9010000010  9010000010        0.0        0.0  2.14672e6  2.14672e6
                                                                                                             286 columns omitted

The performance bottleneck at first run is located in Tables.add_or_widen! (seen when profiling). Probably around the columm allocation machinery based on Tuple
PROBABLY NEEDS AN ISSUE TO BE RAISED IN TABLES.JL
If we force the usage of Tables.add! instead of Tables.add_or_widen!, by giving the Tables.Schema to Tables.buildcolumns in Tables.columns, the performance issue disappear (see below): 3.22s intead of ~150s on my mac

ArchGDAL v0.7.4 table interface uncompiled performance on HydroATLAS basin shapefile level 1, with a detailed schema specified

julia> using Pkg; Pkg.update(); Pkg.status("ArchGDAL")       
       import ArchGDAL as AG
       using DataFrames; using BenchmarkTools; using Printf; using Tables
       level = 1; file = "tmp/HydroATLAS_v10/BasinATLAS_Data_v10_shp/BasinATLAS_v10_shp/BasinATLAS_v10_lev$(@sprintf("%02i", level)).shp"
       sch = Tables.Schema((Symbol(""), :HYBAS_ID, :NEXT_DOWN, :NEXT_SINK, :MAIN_BAS, :DIST_SINK, :DIST_MAIN, :SUB_AREA, :UP_AREA, :PFAF_ID, :ENDO, :COAST, :ORDER_, :SORT, :dis_m3_pyr, :dis_m3_pmn, :dis_m3_pmx, :run_mm_syr, :inu_pc_smn, :inu_pc_umn, :inu_pc_smx, :inu_pc_umx, :inu_pc_slt, :inu_pc_ult, :lka_pc_sse, :lka_pc_use, :lkv_mc_usu, :rev_mc_usu, :dor_pc_pva, :ria_ha_ssu, :ria_ha_usu, :riv_tc_ssu, :riv_tc_usu, :gwt_cm_sav, :ele_mt_sav, :ele_mt_uav, :ele_mt_smn, :ele_mt_smx, :slp_dg_sav, :slp_dg_uav, :sgr_dk_sav, :clz_cl_smj, :cls_cl_smj, :tmp_dc_syr, :tmp_dc_uyr, :tmp_dc_smn, :tmp_dc_smx, :tmp_dc_s01, :tmp_dc_s02, :tmp_dc_s03, :tmp_dc_s04, :tmp_dc_s05, :tmp_dc_s06, :tmp_dc_s07, :tmp_dc_s08, :tmp_dc_s09, :tmp_dc_s10, :tmp_dc_s11, :tmp_dc_s12, :pre_mm_syr, :pre_mm_uyr, :pre_mm_s01, :pre_mm_s02, :pre_mm_s03, :pre_mm_s04, :pre_mm_s05, :pre_mm_s06, :pre_mm_s07, :pre_mm_s08, :pre_mm_s09, :pre_mm_s10, :pre_mm_s11, :pre_mm_s12, :pet_mm_syr, :pet_mm_uyr, :pet_mm_s01, :pet_mm_s02, :pet_mm_s03, :pet_mm_s04, :pet_mm_s05, :pet_mm_s06, :pet_mm_s07, :pet_mm_s08, :pet_mm_s09, :pet_mm_s10, :pet_mm_s11, :pet_mm_s12, :aet_mm_syr, :aet_mm_uyr, :aet_mm_s01, :aet_mm_s02, :aet_mm_s03, :aet_mm_s04, :aet_mm_s05, :aet_mm_s06, :aet_mm_s07, :aet_mm_s08, :aet_mm_s09, :aet_mm_s10, :aet_mm_s11, :aet_mm_s12, :ari_ix_sav, :ari_ix_uav, :cmi_ix_syr, :cmi_ix_uyr, :cmi_ix_s01, :cmi_ix_s02, :cmi_ix_s03, :cmi_ix_s04, :cmi_ix_s05, :cmi_ix_s06, :cmi_ix_s07, :cmi_ix_s08, :cmi_ix_s09, :cmi_ix_s10, :cmi_ix_s11, :cmi_ix_s12, :snw_pc_syr, :snw_pc_uyr, :snw_pc_smx, :snw_pc_s01, :snw_pc_s02, :snw_pc_s03, :snw_pc_s04, :snw_pc_s05, :snw_pc_s06, :snw_pc_s07, :snw_pc_s08, :snw_pc_s09, :snw_pc_s10, :snw_pc_s11, :snw_pc_s12, :glc_cl_smj, :glc_pc_s01, :glc_pc_s02, :glc_pc_s03, :glc_pc_s04, :glc_pc_s05, :glc_pc_s06, :glc_pc_s07, :glc_pc_s08, :glc_pc_s09, :glc_pc_s10, :glc_pc_s11, :glc_pc_s12, :glc_pc_s13, :glc_pc_s14, :glc_pc_s15, :glc_pc_s16, :glc_pc_s17, :glc_pc_s18, :glc_pc_s19, :glc_pc_s20, :glc_pc_s21, :glc_pc_s22, :glc_pc_u01, :glc_pc_u02, :glc_pc_u03, :glc_pc_u04, :glc_pc_u05, :glc_pc_u06, :glc_pc_u07, :glc_pc_u08, :glc_pc_u09, :glc_pc_u10, :glc_pc_u11, :glc_pc_u12, :glc_pc_u13, :glc_pc_u14, :glc_pc_u15, :glc_pc_u16, :glc_pc_u17, :glc_pc_u18, :glc_pc_u19, :glc_pc_u20, :glc_pc_u21, :glc_pc_u22, :pnv_cl_smj, :pnv_pc_s01, :pnv_pc_s02, :pnv_pc_s03, :pnv_pc_s04, :pnv_pc_s05, :pnv_pc_s06, :pnv_pc_s07, :pnv_pc_s08, :pnv_pc_s09, :pnv_pc_s10, :pnv_pc_s11, :pnv_pc_s12, :pnv_pc_s13, :pnv_pc_s14, :pnv_pc_s15, :pnv_pc_u01, :pnv_pc_u02, :pnv_pc_u03, :pnv_pc_u04, :pnv_pc_u05, :pnv_pc_u06, :pnv_pc_u07, :pnv_pc_u08, :pnv_pc_u09, :pnv_pc_u10, :pnv_pc_u11, :pnv_pc_u12, :pnv_pc_u13, :pnv_pc_u14, :pnv_pc_u15, :wet_cl_smj, :wet_pc_sg1, :wet_pc_ug1, :wet_pc_sg2, :wet_pc_ug2, :wet_pc_s01, :wet_pc_s02, :wet_pc_s03, :wet_pc_s04, :wet_pc_s05, :wet_pc_s06, :wet_pc_s07, :wet_pc_s08, :wet_pc_s09, :wet_pc_u01, :wet_pc_u02, :wet_pc_u03, :wet_pc_u04, :wet_pc_u05, :wet_pc_u06, :wet_pc_u07, :wet_pc_u08, :wet_pc_u09, :for_pc_sse, :for_pc_use, :crp_pc_sse, :crp_pc_use, :pst_pc_sse, :pst_pc_use, :ire_pc_sse, :ire_pc_use, :gla_pc_sse, :gla_pc_use, :prm_pc_sse, :prm_pc_use, :pac_pc_sse, :pac_pc_use, :tbi_cl_smj, :tec_cl_smj, :fmh_cl_smj, :fec_cl_smj, :cly_pc_sav, :cly_pc_uav, :slt_pc_sav, :slt_pc_uav, :snd_pc_sav, :snd_pc_uav, :soc_th_sav, :soc_th_uav, :swc_pc_syr, :swc_pc_uyr, :swc_pc_s01, :swc_pc_s02, :swc_pc_s03, :swc_pc_s04, :swc_pc_s05, :swc_pc_s06, :swc_pc_s07, :swc_pc_s08, :swc_pc_s09, :swc_pc_s10, :swc_pc_s11, :swc_pc_s12, :lit_cl_smj, :kar_pc_sse, :kar_pc_use, :ero_kh_sav, :ero_kh_uav, :pop_ct_ssu, :pop_ct_usu, :ppd_pk_sav, :ppd_pk_uav, :urb_pc_sse, :urb_pc_use, :nli_ix_sav, :nli_ix_uav, :rdd_mk_sav, :rdd_mk_uav, :hft_ix_s93, :hft_ix_u93, :hft_ix_s09, :hft_ix_u09, :gad_id_smj, :gdp_ud_sav, :gdp_ud_ssu, :gdp_ud_usu, :hdi_ix_sav),(AG.IGeometry{AG.wkbMultiPolygon}, Int64, Int64, Int64, Int64, Float64, Float64, Float64, Float64, Int64, Int32, Int32, Int32, Int32, Float64, Float64, Float64, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Float64, Float64, Float64, Float64, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Float64, Float64, Float64, Float64, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int64, Int64, Int32))
       @time DataFrame(Tables.CopiedColumns(Tables.buildcolumns(sch, AG.getlayer(AG.read(file), 0))))
    Updating registry at `~/.julia/registries/General`
    Updating git-repo `https://github.com/JuliaRegistries/General.git`
  No Changes to `~/.../Project.toml`
  No Changes to `~/.../Manifest.toml`
      Status `~/.../Project.toml`
  [c9ce4bd3] ArchGDAL v0.7.4 `dev/ArchGDAL`
  3.229805 seconds (4.18 M allocations: 216.228 MiB, 13.59% gc time, 75.90% compilation time)
10×295 DataFrame
 Row │                            HYBAS_ID    NEXT_DOWN  NEXT_SINK   MAIN_BAS    DIST_SINK  DIST_MAIN  SUB_AREA   
     │ IGeometr                  Int64       Int64      Int64       Int64       Float64    Float64    Float64    
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ Geometry: wkbMultiPolygon  1010000010          0  1010000010  1010000010        0.0        0.0  2.99531e7  
   2 │ Geometry: wkbMultiPolygon  2010000010          0  2010000010  2010000010        0.0        0.0  1.78589e7
   3 │ Geometry: wkbMultiPolygon  3010000010          0  3010000010  3010000010        0.0        0.0  1.29491e7
   4 │ Geometry: wkbMultiPolygon  4010000010          0  4010000010  4010000010        0.0        0.0  2.08274e7
   5 │ Geometry: wkbMultiPolygon  5010000010          0  5010000010  5010000010        0.0        0.0  1.08038e7  
   6 │ Geometry: wkbMultiPolygon  6010000010          0  6010000010  6010000010        0.0        0.0  1.78535e7
   7 │ Geometry: wkbMultiPolygon  7010000010          0  7010000010  7010000010        0.0        0.0  1.59146e7
   8 │ Geometry: wkbMultiPolygon  8010000010          0  8010000010  8010000010        0.0        0.0  6.19709e6
   9 │ Geometry: wkbMultiPolygon  8010020760          0  8010020760  8010020760        0.0        0.0  1.17002e5  
  10 │ Geometry: wkbMultiPolygon  9010000010          0  9010000010  9010000010        0.0        0.0  2.14672e6
                                                                                                287 columns omitted

Unfortunately, we cannot know the tightest schema for sure from a layer feature definition (see the case of Shapefile driver with LineString and MultiLineString). Therefore we would have to provide a generic looser schema using:

  • Union{Missing, IGeometry} for geometry fields
  • Union{Missing, Nothing, <field's DataType>} for fields

and tighten it after the layer has been read. But this feature is not provided by Tables.jl. Therefore we have to implement Tables.columns as mentionned in my fist issue on ArchGDAL :-)

This leads to implementing on ArchGDAL.jl side, at least to handle the missing geometry case, something faster than buildcolumns(::Nothing, rowitr::T) where {T} -> _buildcolumns -> eachcolumns / add_or_widen! -> __buildcolumns. I doubt that it could be done easily.

Tables.columns(::AbstractFeatureLayer) is implemented in this PR in src/tables.jl. Here is what it does:

  • Get the layer's length
    GDAL layer length function is not called during the layer reading to avoid the iterator reset pitfall encountered with some drivers and as faced in Tables.jl fallback functions see comment
  • Retrieve an uncertain schema from the layer feature definition and loosen it as describe above
  • Initialize a 2D-collection
    I used a mutable Vector{Vector} instead of Tuple{Tuple} as in Tables.jl, for the simplicity and for I have not identified any significant performance reduction
  • Fills the 2D-collection with getcolumn
  • Tighten the column types
  • Convert the 2D-collection to a NamedTuple which natively supports the Tables.jl interface
    Note that the use of NamedTuple prevents extremely wide tables with # of columns > 67K which is due to Julia compiler limitation. Should a need arise for larger tables, Table struct would have to be modified to hold something larger

With this function the uncompiled performance is back to an execution time similar to the one of Tables.jl fallback Tables.columns with a specified schema: 3.37s on my mac

ArchGDAL table interface uncompiled performance on HydroATLAS basin shapefile level 1, with its own Tables.columns function implementation

julia> using Pkg; Pkg.update(); Pkg.status("ArchGDAL")       
       import ArchGDAL as AG       
       using DataFrames; using BenchmarkTools; using Printf       
       level = 1; file = "tmp/HydroATLAS_v10/BasinATLAS_Data_v10_shp/BasinATLAS_v10_shp/BasinATLAS_v10_lev$(@sprintf("%02i", level)).shp"       
       @time DataFrame(AG.getlayer(AG.read(file), 0))    
    Updating registry at `~/.julia/registries/General`    
    Updating git-repo `https://github.com/JuliaRegistries/General.git`    
    Updating `~/.../Project.toml`  
  [c9ce4bd3] ~ ArchGDAL v0.7.4 `dev/ArchGDAL`  v0.8.0 `dev/ArchGDAL`    
    Updating `~/.../Manifest.toml`  
  [c9ce4bd3] ~ ArchGDAL v0.7.4 `dev/ArchGDAL`  v0.8.0 `dev/ArchGDAL`
Precompiling project...
  1 dependency successfully precompiled in 11 seconds (259 already precompiled)
      Status `~/.../Project.toml`
  [c9ce4bd3] ArchGDAL v0.8.0 `dev/ArchGDAL`
  3.370118 seconds (5.11 M allocations: 281.298 MiB, 3.18% gc time, 76.13% compilation time)
10×295 DataFrame
 Row │                            HYBAS_ID    NEXT_DOWN  NEXT_SINK   MAIN_BAS    DIST_SINK  DIST_MAIN  SUB_AREA   UP 
     │ IGeometr                  Int64       Int64      Int64       Int64       Float64    Float64    Float64    Fl 
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ Geometry: wkbMultiPolygon  1010000010          0  1010000010  1010000010        0.0        0.0  2.99531e7  2. 
   2 │ Geometry: wkbMultiPolygon  2010000010          0  2010000010  2010000010        0.0        0.0  1.78589e7  1.
   3 │ Geometry: wkbMultiPolygon  3010000010          0  3010000010  3010000010        0.0        0.0  1.29491e7  1.
   4 │ Geometry: wkbMultiPolygon  4010000010          0  4010000010  4010000010        0.0        0.0  2.08274e7  2.
   5 │ Geometry: wkbMultiPolygon  5010000010          0  5010000010  5010000010        0.0        0.0  1.08038e7  1. 
   6 │ Geometry: wkbMultiPolygon  6010000010          0  6010000010  6010000010        0.0        0.0  1.78535e7  1.
   7 │ Geometry: wkbMultiPolygon  7010000010          0  7010000010  7010000010        0.0        0.0  1.59146e7  1.
   8 │ Geometry: wkbMultiPolygon  8010000010          0  8010000010  8010000010        0.0        0.0  6.19709e6  6.
   9 │ Geometry: wkbMultiPolygon  8010020760          0  8010020760  8010020760        0.0        0.0  1.17002e5  1. 
  10 │ Geometry: wkbMultiPolygon  9010000010          0  9010000010  9010000010        0.0        0.0  2.14672e6  2.
                                                                                                   287 columns omitted

Conclusion: Performance issue at first run solved for files of similar to the HydroATLAS basin shapefiles (lots of fields): time is now on par with the standard Tables.jl fallback functions with full Tables.Schema provided

3. Other enhancements

3.1. Geometry stealing instead of cloning

When the source does not have to be preserved, stealing geometry is faster. The speedup increases with the geometry size. From x1.5 (Point) up to x500 (big MultiPolygon as in HydroATLAS basin shapefile level 1)

Geometry stealing vs cloning benchmark

julia> import ArchGDAL as AG
       using GDAL, BenchmarkTools, Printf

julia> file = "dev/ArchGDAL/test/data/point.geojson"
       layer = AG.getlayer(AG.read(file))
       @btime AG.IGeometry(GDAL.ogr_g_clone(GDAL.ogr_f_getgeometryref(feature.ptr))) setup=(feature = iterate(AG.copy($layer))[1])
       @btime AG.IGeometry(GDAL.ogr_f_stealgeometry(feature.ptr)) setup=(feature = iterate(AG.copy($layer))[1])
       nothing
  350.620 ns (3 allocations: 48 bytes)
  211.821 ns (3 allocations: 48 bytes)

julia> level = 1; file = "tmp/HydroATLAS_v10/BasinATLAS_Data_v10_shp/BasinATLAS_v10_shp/BasinATLAS_v10_lev$(@sprintf("%02i", level)).shp"
       layer = AG.getlayer(AG.read(file))
       @btime AG.IGeometry(GDAL.ogr_g_clone(GDAL.ogr_f_getgeometryref(feature.ptr))) setup=(feature = iterate(AG.copy($layer))[1])
       @btime AG.IGeometry(GDAL.ogr_f_stealgeometry(feature.ptr)) setup=(feature = iterate(AG.copy($layer))[1])
       nothing
  676.252 μs (3 allocations: 48 bytes)
  1.159 μs (3 allocations: 48 bytes)

Since the geometry left when stealing is a NULL geometry, stealgeom can only used when the source layer is transient (see example in src/tables2.jl on Table)

Unfortunately, this function is available only for the first geometry on GDAL C interface via OGR_F_StealGeometry(). An issue has been raised and accepted on GDAL side, waiting for a PR to be proposed

3.2. Dependency between GDAL OGR objects modeled in new parametric types

Below a trial for issue #215 being tackled by @yeesian
OGR Julia objects ownership relations diagram V1

3.3. Simplification and speedup of convert for GDAL.OGRwkbGeometryType and ArchGDAL.OGRwkbGeometryType

I have aligned UInt32 values assigned in OGRwkbGeometryType enums between GDAL and ArchGDAL. This allows to avoid the @convert macro, and instead to call directly the enum's constructor on values for conversion
It brings a x30 speedup, propagated to _infergeomtype and the Geometry and IGeometry construtors

OGRwkbGeometryType conversion benchmark between GDAL and ArchGDAL

V0.8.0 master branch:

julia> @btime convert(AG.OGRwkbGeometryType, GDAL.wkbMultiPolygon)
       @btime convert(GDAL.OGRwkbGeometryType, AG.wkbMultiPolygon)
       nothing
  63.165 ns (0 allocations: 0 bytes)
  63.103 ns (0 allocations: 0 bytes)

Enhance_ArchGDAL_types branch:

julia> @btime convert(AG.OGRwkbGeometryType, GDAL.wkbMultiPolygon)
       @btime convert(GDAL.OGRwkbGeometryType, AG.wkbMultiPolygon)
       nothing
  2.744 ns (0 allocations: 0 bytes)
  0.025 ns (0 allocations: 0 bytes)

@visr
Copy link
Collaborator

visr commented Jan 19, 2022

Thanks a lot @mathieu17g for taking the time to write down all your findings! I'll get back to it when I have some more time.

@mathieu17g
Copy link
Collaborator Author

Hi @visr, did you had any time lately to look into this PR ?
I would prefer that we decide on this one before finishing PR #243.

In the mean time, I proposed a PR in GDAL to be able to extend the geometry stealing performance gain beyond the the first geometry. It is still under review. Once merged, I will see to propagate it all the way to ArchGDAL.

@visr
Copy link
Collaborator

visr commented Feb 9, 2022

@mathieu17g thanks for the bump, and sorry I did not yet sit down to get the proper look that this work deserves. I hope to get to this soon.

I see OSGeo/gdal#5196 landed, nice work there!

Regarding this bit of your writeup:

PROBABLY NEEDS AN ISSUE TO BE RAISED IN TABLES.JL

Perhaps it's a good idea to go ahead and raise that issue, with a reproducible example for them to test the performance? I've seen the same functions show up in my earlier profiles, but couldn't really figure out how to improve it.

@mathieu17g
Copy link
Collaborator Author

mathieu17g commented Feb 13, 2022

@visr I have proposed in JuliaData/Tables.jl#273 solution to overcome the performance issue at first run. But the current PR implementation of Tables.columns will still be faster by a factor of x5 at first run and > x5 for subsequent runs (tested on HydroBASIN level 7)

@visr
Copy link
Collaborator

visr commented Feb 22, 2022

Awesome, I tried your inlining change from JuliaData/Tables.jl#273 (comment) as well.

On level 4 BasinATLAS I see similar results, around 100x speedup of compilation, and no real effect on runtime, only some less allocations and less time in gc. Noted this was just a simple repeated at-time exercise.

# level 4, 1st run on columntable: 178.628898 seconds (55.54 M allocations: 3.079 GiB, 0.44% gc time, 98.50% compilation time)
# level 4, 2nd run on columntable:   2.119344 seconds (912.34 k allocations: 222.733 MiB, 3.29% gc time)
# level 4, 3rd run on columntable:    2.293723 seconds (912.34 k allocations: 222.733 MiB, 2.50% gc time)
# level 4, 4th run on columntable:    2.434573 seconds (917.14 k allocations: 223.011 MiB, 6.13% gc time, 0.31% compilation time)
# level 4, 5th run on columntable:    2.177187 seconds (912.34 k allocations: 222.733 MiB, 3.34% gc time)
# level 4, 1st run on inlined columntable: 4.830689 seconds (4.07 M allocations: 196.590 MiB, 1.73% gc time, 49.77% compilation time)
# level 4, 2nd run on inlined columntable: 2.478992 seconds (827.06 k allocations: 22.226 MiB)
# level 4, 3rd run on inlined columntable: 2.294762 seconds (827.06 k allocations: 22.226 MiB)
# level 4, 4th run on inlined columntable: 2.247424 seconds (827.06 k allocations: 22.226 MiB)
# level 4, 5th run on inlined columntable: 2.424489 seconds (827.06 k allocations: 22.226 MiB)

Do you want to go ahead and submit that change as a PR to Tables.jl? You'll probably get more feedback that way. And if it lands, we get a free 100x in ArchGDAL!

Good to know that this PR would still improve the situation futher at least 5x. Though we'd have to weigh it against the added complexity of course, it's not a one line change that gives 100x. So let's start there?

@mathieu17g
Copy link
Collaborator Author

@visr

Do you want to go ahead and submit that change as a PR to Tables.jl? You'll probably get more feedback that way. And if it lands, we get a free 100x in ArchGDAL!

I have made the PR JuliaData/Tables.jl#275 to solve the compilation performance issue JuliaData/Tables.jl#273 and another one seen during the investigation on column ordering when using dictcolumntable JuliaData/Tables.jl#274

@visr
Copy link
Collaborator

visr commented Feb 23, 2022

Great, thanks! Though perhaps better to split it into two separate PR's since the two changes are unrelated. The inlining change is straightforward. For the dicts I guess the main question is do they want to take on OrderedCollections.jl as a new dependency.

@mathieu17g
Copy link
Collaborator Author

Ok let's wait for some feedback from the @quinnj

@quinnj
Copy link

quinnj commented Feb 24, 2022

Sorry for the slowness; I've been slammed w/ non-julia things lately. I'll try to dig in tonight.

@maxfreu
Copy link
Contributor

maxfreu commented Aug 8, 2022

It seems like a lot of work has gone into this PR. What's the current state? This shouldn't go to waste :)

@rafaqz
Copy link
Collaborator

rafaqz commented Aug 8, 2022

Some of this is superceded by #316

There the enum integers are used in the type directly. Its simple and works. But we could also swap them to real types, that would be a minor change (but no performance improvement).

I tried to use this PR but it contains a lot of unrelated ideas. It should be cleanly separated into separate PRs

@maxfreu
Copy link
Contributor

maxfreu commented Aug 9, 2022

Ok, to me it looks as if #316 will be easier to merge and comes first. @mathieu17g can you somehow separate the ideas you had into separate PRs, which are more digestible on their own and hence easier to merge? E.g. one for table interface improvement, one for geometry stealing, one for better convert functions (if they can be separated like that). This can maybe be done best after #316 has been merged. Or do you have any objections against merging #316 and arguments for going further with your version (which looks more complicated after a superficial read)?

@yeesian
Copy link
Owner

yeesian commented Aug 10, 2022

I think there's a lot of great ideas here (for not only improvements but also diagrams and benchmarking) that I'm happy to keep it as-is (without deleting or merging) and closing it only after we feel we've made progress on the items that we'd like to incorporate.

@visr visr removed their request for review December 19, 2022 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider turning the enums into types for dispatch
6 participants