[SPARK-36435][PYTHON] Implement MultIndex.equal_levels #34113

itholic · 2021-09-27T07:48:13Z

What changes were proposed in this pull request?

This PR proposes implementing MultiIndex.equal_levels.

>>> psmidx1 = ps.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z")])
>>> psmidx2 = ps.MultiIndex.from_tuples([("b", "y"), ("a", "x"), ("c", "z")])
>>> psmidx1.equal_levels(psmidx2)
True

>>> psmidx1 = ps.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z"), ("a", "y")])
>>> psmidx2 = ps.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z"), ("c", "x")])
>>> psmidx1.equal_levels(psmidx2)
True

This was originally proposed in databricks/koalas#1789, and all reviews in origin PR has been resolved.

Why are the changes needed?

We should support the pandas API as much as possible for pandas-on-Spark module.

Does this PR introduce any user-facing change?

Yes, the MultiIndex.equal_levels API is available.

How was this patch tested?

Unittests

SparkQA · 2021-09-27T08:30:21Z

Test build #143642 has finished for PR 34113 at commit 0b79e4a.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public class NettyLogger
class IndexNameTypeHolder(object):
new_class = type(\"NameType\", (NameTypeHolder,),
new_class = param.type if isinstance(param, np.dtype) else param
public final class AlwaysFalse extends Filter
public final class AlwaysTrue extends Filter
public final class And extends BinaryFilter
abstract class BinaryComparison extends Filter
abstract class BinaryFilter extends Filter
public final class EqualNullSafe extends BinaryComparison
public final class EqualTo extends BinaryComparison
public abstract class Filter implements Expression, Serializable
public final class GreaterThan extends BinaryComparison
public final class GreaterThanOrEqual extends BinaryComparison
public final class In extends Filter
public final class IsNotNull extends Filter
public final class IsNull extends Filter
public final class LessThan extends BinaryComparison
public final class LessThanOrEqual extends BinaryComparison
public final class Not extends Filter
public final class Or extends BinaryFilter
public final class StringContains extends StringPredicate
public final class StringEndsWith extends StringPredicate
abstract class StringPredicate extends Filter
public final class StringStartsWith extends StringPredicate
public class ColumnarBatch implements AutoCloseable
case class Sec(child: Expression)
case class Csc(child: Expression)
trait OperationHelper extends AliasHelper with PredicateHelper
class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
case class OptimizeSkewedJoin(
case class SkewJoinChildWrapper(plan: SparkPlan) extends LeafExecNode
case class SimpleCostEvaluator(forceOptimizeSkewedJoin: Boolean) extends CostEvaluator
case class WriterBucketSpec(
case class EnsureRequirements(

SparkQA · 2021-09-27T09:13:55Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48154/

SparkQA · 2021-09-27T10:16:24Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48154/

itholic · 2021-09-27T10:46:59Z

cc @ueshin @HyukjinKwon @xinrong-databricks

python/pyspark/pandas/indexes/multi.py

SparkQA · 2021-09-29T05:14:10Z

Test build #143702 has finished for PR 34113 at commit 6306fb1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-09-29T05:20:11Z

Test build #143704 has finished for PR 34113 at commit 1f51541.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>

SparkQA · 2021-09-29T05:36:55Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48218/

SparkQA · 2021-09-29T05:58:32Z

Test build #143706 has finished for PR 34113 at commit f19e365.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-09-29T06:20:24Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48220/

SparkQA · 2021-09-29T06:21:58Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48218/

SparkQA · 2021-09-29T07:02:24Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48220/

…PARK-36435

SparkQA · 2021-10-01T03:09:18Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48290/

SparkQA · 2021-10-01T03:19:17Z

Test build #143778 has finished for PR 34113 at commit 188f9e7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-10-01T03:52:28Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48290/

HyukjinKwon · 2021-10-01T05:08:07Z

Merged to master.

itholic added 2 commits September 27, 2021 16:38

Implement MultiIndex.equal_levels

8eea4df

rebased to master

0b79e4a

github-actions bot added CORE PYTHON labels Sep 27, 2021

HyukjinKwon reviewed Sep 27, 2021

View reviewed changes

python/pyspark/pandas/indexes/multi.py Show resolved Hide resolved

HyukjinKwon reviewed Sep 27, 2021

View reviewed changes

python/pyspark/pandas/indexes/multi.py Show resolved Hide resolved

itholic added 2 commits September 29, 2021 13:42

Resolved comments

6306fb1

Remove options

1f51541

HyukjinKwon reviewed Sep 29, 2021

View reviewed changes

python/pyspark/pandas/indexes/multi.py Outdated Show resolved Hide resolved

python/pyspark/pandas/indexes/multi.py Show resolved Hide resolved

python/pyspark/pandas/indexes/multi.py Show resolved Hide resolved

Update python/pyspark/pandas/indexes/multi.py

f19e365

Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>

itholic added 2 commits October 1, 2021 11:18

add version

9f938e1

Merge branch 'SPARK-36435' of https://github.com/itholic/spark into S…

188f9e7

…PARK-36435

HyukjinKwon approved these changes Oct 1, 2021

View reviewed changes

HyukjinKwon closed this in 13ddc91 Oct 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-36435][PYTHON] Implement MultIndex.equal_levels #34113

[SPARK-36435][PYTHON] Implement MultIndex.equal_levels #34113

itholic commented Sep 27, 2021

SparkQA commented Sep 27, 2021

SparkQA commented Sep 27, 2021

SparkQA commented Sep 27, 2021

itholic commented Sep 27, 2021

SparkQA commented Sep 29, 2021

SparkQA commented Sep 29, 2021

SparkQA commented Sep 29, 2021

SparkQA commented Sep 29, 2021

SparkQA commented Sep 29, 2021

SparkQA commented Sep 29, 2021

SparkQA commented Sep 29, 2021

SparkQA commented Oct 1, 2021

SparkQA commented Oct 1, 2021

SparkQA commented Oct 1, 2021

HyukjinKwon commented Oct 1, 2021

[SPARK-36435][PYTHON] Implement MultIndex.equal_levels #34113

[SPARK-36435][PYTHON] Implement MultIndex.equal_levels #34113

Conversation

itholic commented Sep 27, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Sep 27, 2021

SparkQA commented Sep 27, 2021

SparkQA commented Sep 27, 2021

itholic commented Sep 27, 2021

SparkQA commented Sep 29, 2021

SparkQA commented Sep 29, 2021

SparkQA commented Sep 29, 2021

SparkQA commented Sep 29, 2021

SparkQA commented Sep 29, 2021

SparkQA commented Sep 29, 2021

SparkQA commented Sep 29, 2021

SparkQA commented Oct 1, 2021

SparkQA commented Oct 1, 2021

SparkQA commented Oct 1, 2021

HyukjinKwon commented Oct 1, 2021