Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check referential equality before comparing the content when checking String equality. #27790

Merged
merged 5 commits into from
Jan 18, 2019

Conversation

oxinabox
Copy link
Contributor

@oxinabox oxinabox commented Jun 26, 2018

This should work because of:
deaf668
for
#22193

I think we might be able to just delete the definition of == all together as isn't === the fallback?
Edit: No, it will fallback to ==(a::AbstractString, b::AbstractString) first

My microbenchmarks show a fairly solid improvement.
From what I understand of deaf668
this is basically just moving the code from julia into C.
but it is code that was already moved into C, it is just the julia code was left behind.

Here are my benchmarks:

julia> a1 = "a"^150
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

julia> a2 = lowercase("A")^150
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

julia> @btime a1 === a2
  12.950 ns (0 allocations: 0 bytes)
true

julia> @btime a1 == a2
  28.680 ns (0 allocations: 0 bytes)
true

###

julia> @btime a1 === a1
  4.466 ns (0 allocations: 0 bytes)
true

julia> @btime a1 == a1
  25.870 ns (0 allocations: 0 bytes)
true

##############

julia> b1 = "a"^5
"aaaaa"

julia> b2 = lowercase("A")^5
"aaaaa"

julia> @btime b1 === b2
  7.804 ns (0 allocations: 0 bytes)
true

julia> @btime b1 == b2
  22.511 ns (0 allocations: 0 bytes)
true
##

julia> @btime b1 === b1
  4.189 ns (0 allocations: 0 bytes)
true

julia> @btime b1 == b1
  20.868 ns (0 allocations: 0 bytes)
true

############

julia> @btime b1 === a1
  5.589 ns (0 allocations: 0 bytes)
false

julia> @btime b1 == a1
  19.065 ns (0 allocations: 0 bytes)
false

############

julia> z2 = lowercase("z")^150
"zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"

julia> @btime z2 === a1
  11.132 ns (0 allocations: 0 bytes)
false

julia> @btime z2 == a1
  23.736 ns (0 allocations: 0 bytes)
false

@KristofferC
Copy link
Sponsor Member

@nanosoldier runbenchmarks("string" || "shootout" || "problem", vs = ":master")

@oxinabox
Copy link
Contributor Author

Ah the speedup for a1 === a1 over a1 === a2 is because
jl_egal checks for pointer equality first.
Before the memcmp
deaf668#diff-757551c8929efe195c9bab04f9925d05R111

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @ararslan

@nalimilan
Copy link
Member

Ah the speedup for a1 === a1 over a1 === a2 is because
jl_egal checks for pointer equality first.
Before the memcmp
deaf668#diff-757551c8929efe195c9bab04f9925d05R111

Maybe better add that pointer check to the Julia method then? Thanks to inlining and avoiding a few type-related checks, it could be even faster.

@oxinabox
Copy link
Contributor Author

oxinabox commented Jun 26, 2018

Maybe better add that pointer check to the Julia method then?

Could do, but that doesn't fully explain the performance improvement.
As a1===a2is also much faster.
Not entirely sure what is up with that.

Nanosoldier saw no significant changes. I am guessing because it does mostly substring comparisons.

@martinholters
Copy link
Member

There is a checked conversion from Int to UInt for the length in ==, but that is just a tiny part of the difference.

@Keno
Copy link
Member

Keno commented Jun 26, 2018

Since String has special egality, this is probably the right fix, with any performance improvements to be addressed by adding specialized codegen. Would be good to know what the difference is though.

@chethega
Copy link
Contributor

chethega commented Jun 27, 2018

For me this looks like a benchmarking artifact.

julia> using BenchmarkTools
julia> a="a"^150;
julia> str2=[a * "$(rand(1:99))" for i=1:10_000];
julia> str1=[a * "$(rand(1:99))" for i=1:10_000];
julia> res=Vector{Bool}(undef, 10_000);

julia> @btime broadcast!(==,$res,$str1,$str2);
  220.631 μs (0 allocations: 0 bytes)
julia> @btime broadcast!(===,$res,$str1,$str2);
  238.878 μs (0 allocations: 0 bytes)

edit: Also, shouldn't we expect jl_egal to be slower because the entry-point on the C side still needs to look at the type-tags (which is redundant on the julia side)? And I would expect weird microbenchmarks for egal, because its lowering is afaik special.

For a more extreme example, consider

julia> str1=["$(rand(10:99))" for i=1:10_000];
julia> str2=["$(rand(10:99))" for i=1:10_000];
julia> @btime broadcast!(===,$res,$str1,$str2);
  87.702 μs (0 allocations: 0 bytes)
julia> @btime broadcast!(==,$res,$str1,$str2);
  66.291 μs (0 allocations: 0 bytes)

edit2: So I'd advocate for

julia> function _streq(a::String, b::String)
       pointer_from_objref(a)==pointer_from_objref(b) && return true
       al=sizeof(a)
       al == sizeof(b) && 0 == ccall(:memcmp, Int32, (Ptr{UInt8}, Ptr{UInt8}, UInt), a, b, al)
       end

@ararslan ararslan added the domain:strings "Strings!" label Jun 28, 2018
@oxinabox
Copy link
Contributor Author

oxinabox commented Jan 2, 2019

Ok, I put in @chethega 's code.

As @nalimilan said

Maybe better add that pointer check to the Julia method then? Thanks to inlining and avoiding a few type-related checks, it could be even faster.

@oxinabox
Copy link
Contributor Author

oxinabox commented Jan 4, 2019

Can someone kick the CI?

@fredrikekre fredrikekre closed this Jan 4, 2019
@fredrikekre fredrikekre reopened this Jan 4, 2019
@oxinabox
Copy link
Contributor Author

oxinabox commented Jan 5, 2019

Do we want to rerun @nanosoldier ? or are we happy to just merge this?
I think@nanonsolider has no meaningful tests of string performance (only substring performance)

@oxinabox
Copy link
Contributor Author

Bump

StefanKarpinski and others added 2 commits January 18, 2019 16:53
Co-Authored-By: oxinabox <oxinabox@ucc.asn.au>
Co-Authored-By: oxinabox <oxinabox@ucc.asn.au>
@StefanKarpinski
Copy link
Sponsor Member

Please squash/merge whenever CI passes.

@ararslan ararslan merged commit eb18db5 into JuliaLang:master Jan 18, 2019
@StefanKarpinski StefanKarpinski added backport 1.1 status:triage This should be discussed on a triage call and removed backport 1.1 labels Jan 31, 2019
@thautwarm
Copy link
Member

thautwarm commented Aug 9, 2019

Hello, === seems not working in Julia 1.1.1(but okay with 1.1.0).

let z = Dict{Symbol, Any}(:class=>"Module"), Main855_1218 = z
                   function Main858_1221(Main856_1219::Dict)
                       Main857_1220 = (get)(Main856_1219, :class) do 
                           nothing
                       end
                       @info :bug Main857_1220 typeof(Main857_1220) Main857_1220 === "Module"
                       if Main857_1220 === "Module"
                           1
                       else
                           nothing
                       end
                   end
                   function Main858_1221(Main856_1219)
                       nothing
                   end
                   Main854_1217 = Main858_1221(Main855_1218)
                   if Main854_1217 === nothing
                       throw("failed")
                   else
                       Main854_1217
                   end
end
┌ Info: bug
│   Main857_1220 = "Module"typeof(Main857_1220) = String
└   Main857_1220 === "Module" = false
ERROR: "failed"
Stacktrace:
 [1] top-level scope at REPL[60]:19

This occurred in my packages and I managed to present the bug in codes without introducing my packages.

@thautwarm
Copy link
Member

Hence now Py2Jl.jl only works in 1.1.0...

@nalimilan
Copy link
Member

@thautwarm This PR isn't included in 1.1.1 AFAICT, and anyway it doesn't change the behavior of ===. Can you file a separate issue?

FWIW, your example code also fails on 1.1.0 here (it works on master, though). Also, why not use ==?

@StefanKarpinski
Copy link
Sponsor Member

That issue seems unrelated to this PR but is definitely a regression. Please open a separate issue.

@StefanKarpinski
Copy link
Sponsor Member

Issue has already been opened here: #32842.

@oxinabox oxinabox changed the title use === for String, String equality. Check referential equality before comparing the content when checking String equality. Aug 9, 2019
@Keno Keno removed the status:triage This should be discussed on a triage call label Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet