Unique value classifier for categorical maps with distinct colors for large number of categories. #173

sjsrey · 2023-02-09T00:46:52Z

for more information, see https://pre-commit.ci

jGaboardi · 2023-02-09T01:11:56Z

Does distinctipy need to be added to the .ci/ environments?

jGaboardi · 2023-02-09T01:37:18Z

Looks likes the env for 310 wasn't updated.
Once we get CI passing we'll probably want a test(s) and add the notebook to the tutorials list.

codecov · 2023-02-09T01:55:38Z

Codecov Report

Merging #173 (f7476b1) into main (3c2bb92) will decrease coverage by 2.3%.
The diff coverage is 21.1%.

@@           Coverage Diff           @@
##            main    #173     +/-   ##
=======================================
- Coverage   88.5%   86.2%   -2.3%     
=======================================
  Files          8       8             
  Lines       1070    1108     +38     
=======================================
+ Hits         947     955      +8     
- Misses       123     153     +30

Impacted Files	Coverage Δ
mapclassify/__init__.py	`100.0% <ø> (ø)`
mapclassify/classifiers.py	`85.2% <21.1%> (-2.7%)`	⬇️
mapclassify/greedy.py	`92.1% <0.0%> (ø)`

martinfleis · 2023-02-09T09:49:41Z

To be fair, I don't really see a need for it, especially in mapclassify. It doesn't do anything on the classification front and the only possible benefit over calling distinctipy directly in geopandas is a custom legend with counts. If you are interested only in a categorical plot with N distinct colours, this will do the trick.

gdf.plot('STATE_NAME', cmap=distinctipy.get_colormap(distinctipy.get_colors(gdf.STATE_NAME.nunique())))

If the main functionality you are interested in here is the plot method, then this should live in splot, not here.

knaaptime

this would be super useful for a lot of applications (e.g. like our neighborhood delineation over in geosnap where you can a few hundred neighborhoods in a single metro, but the mpl colormaps dont give enough variation)

I see your point Martin, but i'd just add that IMO the utility of geopandas using mapclassify under the hood is that i dont need to know or care about distinctpy as a user (and i definitely dont want to have to remember cmap=distinctipy.get_colormap(distinctipy.get_colors(gdf.STATE_NAME.nunique())))) even if that's sufficient. Since mapclassify has applications beyond geopandas, and this is a super useful classification method, it feels like an obvious enhancement to me

martinfleis · 2023-02-09T17:38:14Z

But you will not use this under the hood from geopandas and if the main point is exposure of distintcipy, then it should be in splot. I just don't think this belongs to mapclassify and is not consistent with the rest of the package.

knaaptime · 2023-02-09T17:52:57Z

(well, i definitely would use this under the hood in geopandas lots of times, cause i dont wanna type that long string. That's like saying the quantiles class shouldnt be available because you can always do df.assign(col=othercol.quantile()).plot(col)

so i think this is the rub:

imo, the purpose of mapclassify is to create binning schemes that are appropriate for (a wide variety) of cartographic display. In the case of unique values, it's true that you dont need to classify those values, but mapclassify still exists to provide an appropriate binning scheme for mapping those data (without requiring the cartogrtapher to know additional libraries). And there are lots of cases (e.g. landuse classification) where these data and mapping are prevalent

the purpose of splot is for spatial statistical visualizations which is why the esda plotting methods live there (and why that stuff isnt in mapclassify in the first place)

so i guess what im saying is its far more natural for a 'unique binner' to live in the binning package, rather than our version of seaborn

martinfleis · 2023-02-09T18:21:33Z

Alright, let me elaborate a bit as I think that my comments may have come across as a bit too harsh.

I think that this is super useful feature to have when I need to plot categorical variables with more than 20 classes supported natively by geopandas. And if exposed in geopandas, I would also use it myself. However, as implemented here it is not compatible with being exposed in geopandas. That is what I meant. We use mapclassify under the hood via the scheme keyword consuming primarily .bins and .yb to create a categorical variable that is then processed by standard categorical plotting. mapclassify.UniqueValue would then only give us the same information we already have natively in the GeoDataFrame and the main point of plotting with N distinct colour would not get through. So passing scheme="unique_value" and categorical=True, or nothing in case od non-numerical column will result in the same plot. The mapclassify.UniqueValue .plot will not be used there and I can't think of any reasonable way of doing so.

If we want to use this from geopandas, then the reasonable thing would be to open an issue there resulting in a PR ensuring that you can pass something like cmap="distinct" to any categorical plot that would call distinctipy under the hood.

Now onto second point. The point of mapclassify is to discretise continuous variable into a set of classes. I am fine expanding that logic to categorical variables if we think it is useful in some way. But the output is always an array (bins, labels...). And that is consistent across the package. mapclassify.UniqueValue does that as well (though with a questionable value) but on top of that implements something that the package does not have anywhere else - plotting. And it does that only to wrap distinctipy into a more friendly method. Which is inconsistent, it has no precedent in here and feels like an alien part of the codebase within mapclassify.

If there is any functionality in PySAL that is remotely close to this type of choropleth plotting it is splot.vba_choropleth. We can discuss if that belongs to splot given it is aimed at spatial statistical visualizations as you say but it is there, so there is a precedent.

We've been discussing the mess we have with plotting weights (one method in splot, other in libpysal) and that it should be consistently implemented in splot, so I don't want to create yet another place where we have some plotting code.

As a conclusion - if we, as a community, think that it would be useful to have a direct access to N-colored cmap when plotting a categorical variable from a GeoDataFrame, let's open an issue in geopandas and implement it there, where it would belong most naturally. If we also think that having the counts in the legend is important, it may also be included there. The same code can then be shared with the explore method to give it even better visibility.

If you all think that it is okay to implement it as is in mapclassify and that it is the best place for this functionality, I'll accept that. But at the moment I am just not convinced of that.

knaaptime · 2023-02-09T18:41:11Z

one of my fav parts of the dev process is having these discussions to make decisions by committee :D

sjsrey · 2023-02-10T16:52:11Z

This discussion is what I hoped the WIP label would stimulate, so I think this is very productive.

The original motivation for this came from a user of mapclassify who asked for this ability. My pr is intended to show how this might be done - I am uncertain myself where this actually should live - I can see merits in all the the options that have been suggested thus far. A couple of thoughts:

Geopandas consumption

@martinfleis is correct that the current implementation cannot be exposed in geopandas as Unique_Value is not a subclass of MapClassifier. This was done because the classes for UV do not have bounds/intervals, just labels.

We could refactor this to extend the legend handling in mapclassify to deal with the continuous and catgorical variables in a more elegant fashion. If so, then exposing this in geopandas should be possible with the existing api.

Alternatively, we could do a PR into geopandas to add this functionality directly (i.e., it wouldn't be a classifier in mapclassify).

api inconsistency

Yes, since Unique_Value jettisons the inheritance in mapclassify it is not consistent with the other classifiers. The addition of the plot method also marks a departure. The latter was intended to flesh out the plotting issues/design more so than to suggest we add a plot method to all the classifiers in mapclassify (although this gets asked for from time to time).

For plotting code in pysal, I agree it is best to centralize that logic in splot. other packages can consume that api but the consistency should come through splot. I'm not against giving the different packages their own plot methods where it makes sense, as long as the are composed through splot to the extent possible to ensure consistsency.

knaaptime · 2023-02-10T16:58:04Z

😁 @martinfleis I dont read you as harsh, we're both just direct writers with opinions

But the output is always an array (bins, labels...). And that is consistent across the package. mapclassify.UniqueValue does that as well (though with a questionable value) but on top of that implements something that the package does not have anywhere else - plotting. And it does that only to wrap distinctipy into a more friendly method. Which is inconsistent, it has no precedent in here and feels like an alien part of the codebase within mapclassify.

my view is this is a philosophical distinction. Packages evolve over time and this function is designed to help make good looking maps by putting data into bins--which is precisely the purpose of mapclassify

I guess the question is how you view the categorization of functionality across the packages. I'd argue the conceptual difference between the package is more important. The "precedent" here is providing simple tools for creating good looking maps. It doesnt matter how the code works. Implicitly, mapclassify is about creating choropleths, regardless of whether we've done the actual plotting in the past. In this case, it makes sense to go ahead and implement the plotting because that's the best way to surface the functionality we're actually after with the package--making attractive maps easily

although the value-by-alpha stuff exists in splot, the package itself is not about choropleths. It's about wrapping tailored visualizations around spatial analyses. So, personally, I see no precedent for this function over there

if this function is a crayon, then it makes more sense to me inside the box of pencils (mapclassify) than the box of protractors (splot)

martinfleis · 2023-02-10T19:06:27Z

We could refactor this to extend the legend handling in mapclassify to deal with the continuous and catgorical variables in a more elegant fashion. If so, then exposing this in geopandas should be possible with the existing api.

That is only partially true. We would be able to pass labels to geopandas but not colours. And passing labels is a bit pointless given geopandas can do that itself and will call pd.Categorical on those labels anyway.

Alternatively, we could do a PR into geopandas to add this functionality directly (i.e., it wouldn't be a classifier in mapclassify).

Would any of you object this option? It is imho the best one. I would add "distinct" as a special case for the cmap argument to expose distinctipy colormap and additional options to legend_kwds controlling when to show counts in the legend (opt-in) and whether to sort that.

sjsrey · 2023-02-10T19:20:57Z

Would any of you object this option? It is imho the best one. I would add "distinct" as a special case for the cmap argument to expose distinctipy colormap and additional options to legend_kwds controlling when to show counts in the legend (opt-in) and whether to sort that.

I'm leaning this way, as I think it is the cleanest solution.

I can think of a couple of additional options that might be useful, but I could add them to the PR in GP if this the way we go.

martinfleis · 2023-02-10T20:44:38Z

I would start with an issue outlining the idea in the geopandas repo to gather feedback from folks there. We may hit resistance (I don't think we will) and circle back here.

sjsrey added 5 commits January 30, 2023 14:28

Exploring unique value classification

97000a0

generalize function for unique_value

b3c3c43

add sorting option

137c661

Give UniqueValue a plot method

6238c54

ENH: Unique value classifier for categorical maps.

592e1e7

sjsrey added the WIP Work in progress. For discussion and feedback. Do not merge. label Feb 9, 2023

sjsrey requested review from knaaptime and martinfleis February 9, 2023 00:46

[pre-commit.ci] auto fixes from pre-commit.com hooks

3d6db1f

for more information, see https://pre-commit.ci

sjsrey requested a review from jGaboardi February 9, 2023 00:49

Add distinctipy to ci

279ede0

Add distinticpy to ci for 3.10

f7476b1

knaaptime approved these changes Feb 9, 2023

View reviewed changes

sjsrey mentioned this pull request Feb 26, 2023

ENH: Unique valued choropleths for greater than 10 classes. geopandas/geopandas#2804

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unique value classifier for categorical maps with distinct colors for large number of categories. #173

Unique value classifier for categorical maps with distinct colors for large number of categories. #173

sjsrey commented Feb 9, 2023

jGaboardi commented Feb 9, 2023

jGaboardi commented Feb 9, 2023

codecov bot commented Feb 9, 2023 •

edited

martinfleis commented Feb 9, 2023

knaaptime left a comment

martinfleis commented Feb 9, 2023

knaaptime commented Feb 9, 2023

martinfleis commented Feb 9, 2023

knaaptime commented Feb 9, 2023

sjsrey commented Feb 10, 2023

knaaptime commented Feb 10, 2023

martinfleis commented Feb 10, 2023

sjsrey commented Feb 10, 2023

martinfleis commented Feb 10, 2023

Unique value classifier for categorical maps with distinct colors for large number of categories. #173

Are you sure you want to change the base?

Unique value classifier for categorical maps with distinct colors for large number of categories. #173

Conversation

sjsrey commented Feb 9, 2023

jGaboardi commented Feb 9, 2023

jGaboardi commented Feb 9, 2023

codecov bot commented Feb 9, 2023 • edited

Codecov Report

martinfleis commented Feb 9, 2023

knaaptime left a comment

Choose a reason for hiding this comment

martinfleis commented Feb 9, 2023

knaaptime commented Feb 9, 2023

martinfleis commented Feb 9, 2023

knaaptime commented Feb 9, 2023

sjsrey commented Feb 10, 2023

Geopandas consumption

api inconsistency

knaaptime commented Feb 10, 2023

martinfleis commented Feb 10, 2023

sjsrey commented Feb 10, 2023

martinfleis commented Feb 10, 2023

codecov bot commented Feb 9, 2023 •

edited