Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parse method #392

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

KronosTheLate
Copy link

Motivation:

julia> using CategoricalArrays

julia> x = categorical(["2", "1"], ordered=true)
2-element CategoricalArray{String,1,UInt32}:
 "2"
 "1"

julia> parse(Int64, x[1])
ERROR: MethodError: no method matching parse(::Type{Int64}, ::CategoricalValue{String, UInt32})
Closest candidates are:
  parse(::Type{T}, ::AbstractChar; base) where T<:Integer at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\base\parse.jl:40
  parse(::Type{T}, ::AbstractString; base) where T<:Integer at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\base\parse.jl:240
  parse(::Type{T}, ::AbstractString; kwargs...) where T<:Real at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\base\parse.jl:379
Stacktrace:
 [1] top-level scope
   @ REPL[176]:1

julia> parse.(Int64, x)
ERROR: MethodError: no method matching parse(::Type{Int64}, ::CategoricalValue{String, UInt32})
Closest candidates are:
  parse(::Type{T}, ::AbstractChar; base) where T<:Integer at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\base\parse.jl:40
  parse(::Type{T}, ::AbstractString; base) where T<:Integer at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\base\parse.jl:240
  parse(::Type{T}, ::AbstractString; kwargs...) where T<:Real at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\base\parse.jl:379
Stacktrace:
 [1] _broadcast_getindex_evalf
   @ .\broadcast.jl:670 [inlined]
 [2] _broadcast_getindex
   @ .\broadcast.jl:653 [inlined]
 [3] getindex
   @ .\broadcast.jl:597 [inlined]
 [4] copy
   @ .\broadcast.jl:899 [inlined]
 [5] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(parse), Tuple{Base.RefValue{Type{Int64}}, CategoricalVector{String, UInt32, String, CategoricalValue{String, UInt32}, Union{}}}})
   @ Base.Broadcast .\broadcast.jl:860
 [6] top-level scope
   @ REPL[177]:1

julia> import Base.parse

julia> parse(T::DataType, catval::CategoricalValue) = parse(T, unwrap(catval))
parse (generic function with 17 methods)

julia> parse(Int64, x[1])
2

julia> parse.(Int64, x)
2-element Vector{Int64}:
 2
 1

I am not sure if I added the code in the correct place, or if it makes more sense somewhere else. I also don't know if Base.parse would need to be explicitly imported. I just tried to do what has been done is the surrounding lines.

Motivation:

```
julia> using CategoricalArrays

julia> x = categorical(["2", "1"], ordered=true)
2-element CategoricalArray{String,1,UInt32}:
 "2"
 "1"

julia> parse(Int64, x[1])
ERROR: MethodError: no method matching parse(::Type{Int64}, ::CategoricalValue{String, UInt32})
Closest candidates are:
  parse(::Type{T}, ::AbstractChar; base) where T<:Integer at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\base\parse.jl:40
  parse(::Type{T}, ::AbstractString; base) where T<:Integer at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\base\parse.jl:240
  parse(::Type{T}, ::AbstractString; kwargs...) where T<:Real at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\base\parse.jl:379
Stacktrace:
 [1] top-level scope
   @ REPL[176]:1

julia> parse.(Int64, x)
ERROR: MethodError: no method matching parse(::Type{Int64}, ::CategoricalValue{String, UInt32})
Closest candidates are:
  parse(::Type{T}, ::AbstractChar; base) where T<:Integer at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\base\parse.jl:40
  parse(::Type{T}, ::AbstractString; base) where T<:Integer at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\base\parse.jl:240
  parse(::Type{T}, ::AbstractString; kwargs...) where T<:Real at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\base\parse.jl:379
Stacktrace:
 [1] _broadcast_getindex_evalf
   @ .\broadcast.jl:670 [inlined]
 [2] _broadcast_getindex
   @ .\broadcast.jl:653 [inlined]
 [3] getindex
   @ .\broadcast.jl:597 [inlined]
 [4] copy
   @ .\broadcast.jl:899 [inlined]
 [5] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(parse), Tuple{Base.RefValue{Type{Int64}}, CategoricalVector{String, UInt32, String, CategoricalValue{String, UInt32}, Union{}}}})
   @ Base.Broadcast .\broadcast.jl:860
 [6] top-level scope
   @ REPL[177]:1

julia> import Base.parse

julia> parse(T::DataType, catval::CategoricalValue) = parse(T, unwrap(catval))
parse (generic function with 17 methods)

julia> parse(Int64, x[1])
2

julia> parse.(Int64, x)
2-element Vector{Int64}:
 2
 1
```

I am not sure if I added the code in the correct place, or if it makes more sense somewhere else. I also don't know if `Base.parse` would need to be explicitly imported. I just tried to do what has been done is the surrounding lines.
@bkamins
Copy link
Member

bkamins commented Apr 29, 2022

The only problem is that parse has the contract:

Parse a string as a number

and CategoricalValue is not a string.

However, maybe we can add this as a convenience function. Let us wait for @nalimilan to comment. (as you have noted in your implementation it is already quite easy to achieve what you want just by using unwrap so the question is if it is worth to add it)

@nalimilan
Copy link
Member

At some point CategoricalValue used to be an AbstractString and it implemented all of its interface, but we stopped doing this to support wrapping other types (see discussion at #77). So I don't think we should implement this, as there's no reason to implement parse but not other string functions. I agree it's a bit annoying to have to use unwrap manually, but there's no good alternative currently as we cannot have CategoricalValue{String} <: AbstractString without having CategoricalValue{T} <: AbstractString in general, which creates problems for non-string T.

@KronosTheLate
Copy link
Author

I mean, there is also the Meta.parse function, which does something completely different than Base.parse ("Parse a string as a number"). Why would it be worse to add a third interpretation of parse, which takes a different second-argument input of type CategoricalValue{String}, to do what makes intuitive sense? Which in this case would be what I have defined ( I presume most people would agree). It does not have to be the same parse function as the one in base - I would rather consider it a different function with similar functionality, which to me only add to the seamlessness made possible by multiple dispatch.

And because the functionality is still the same (namely to take a text representation of a number and parse it as a literal), I find it completely appropriate to extend Base.parse.

So to me, it is not a problem that CategoricalValue is not a subtype of String to make this work - rather, the fact that Base.parse can so effortlessly and seamlessly be extended to a type that is not an AbstractString is a key benefit of multiple dispatch. To me, this is great and worthy of celebration, and not at all a problem.

As for other types T in CategoricalValue{T}, I do not see the problem. They do to me not influence this issue. Extending functions that take strings to also take CategoricalValue{String} just makes sense, and is worth adding. It is reasonable to expect it to work, and there is not another interpretation of what should happen that makes more sense (please dispute this point if you disagree).

Adding a single sentence "Base functions that operate on strings are extended to categorical values of type string" somewhere in the docs would make the entire thing transparent to the user, and I can not think of any reason why anyone would have an issue with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants