Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Add docstrings for MultiIndex.levels and MultiIndex.codes #55435

Closed
datapythonista opened this issue Oct 7, 2023 · 21 comments
Closed

DOC: Add docstrings for MultiIndex.levels and MultiIndex.codes #55435

datapythonista opened this issue Oct 7, 2023 · 21 comments

Comments

@datapythonista
Copy link
Member

xref #55148

Seems like those docstrings are empty, we should create them.

See the attributes section here: https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.html

The docstring for `MultiIndex.levels should include information to make clear that levels are preserved even if the dataframe using the index doesn't contain all levels. See this page in the docs: https://pandas.pydata.org/docs/user_guide/advanced.html#defined-levels and this comment: #55433 (review)

@datapythonista datapythonista added Docs Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 7, 2023
@shiersansi
Copy link
Contributor

This is the second time I've brought up pr in an open source project, so I misunderstood what you meant, and I'll finish the issue again.

@datapythonista
Copy link
Member Author

It's normal, and the issue was difficult to follow, since it was a discussion, but I think the new issue explains better what needs to be done. If you have any question or you need help we are here to help. Thank you!

@rhshadrach rhshadrach added MultiIndex and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 7, 2023
@datapythonista
Copy link
Member Author

We've got now the docstring for MultiIndex.levels, but the one for MultiIndex.codes is still missing. Labelling this as good first issue in case anyone wants to help.

@mileslow
Copy link

take

@AdventurousDataScientist

Hi mileslow do you still need time for this task, or do you mind if I work on it?

@mileslow
Copy link

@Rollingterminator1 go for it.

@devanshi-jain
Copy link

Hi, is this issue already taken care of?

@sathyaanurag
Copy link

Hi, does this issue still need to be worked on?

@wasimtikki120
Copy link

1. Ensure Correct Data Types:

Make sure that your categorical columns are indeed of the "category" type. You can convert a column to a categorical type using astype:

df['categorical_column'] = df['categorical_column'].astype('category')

2. Check for Null Values:

Ensure that there are no null values in the categorical columns, as this can sometimes affect grouping.

df['categorical_column'].isnull().sum()

If there are null values, you might need to handle them appropriately before performing group operations.

3. Understand Grouping Requirements:

Make sure you understand the requirements of your grouping operation. For example, if you are trying to group by intervals, ensure that your categorical column is defined with the appropriate intervals.

pd.cut(df['numeric_column'], bins=[0, 10, 20, 30])

4. Use Groupby Correctly:

When using groupby, ensure you are providing the correct column name or a list of column names. For example:

grouped_data = df.groupby('categorical_column')['numeric_column'].sum()

Or, for multiple grouping columns:

grouped_data = df.groupby(['categorical_column1', 'categorical_column2'])['numeric_column'].sum()

5. Check Pandas Version:

Ensure that you are using a recent version of pandas. Bugs are often fixed in newer releases. You can check your pandas version with:

import pandas as pd
print(pd.__version__)

If you're using an older version, consider upgrading:

pip install --upgrade pandas

6. Minimal, Complete, and Verifiable Example:

If the issue persists, try to create a minimal, complete, and verifiable example that reproduces the problem. This makes it easier for others to help diagnose and fix the issue.

If you can provide more details or a sample of your code and data, I might be able to give more specific advice. Additionally, checking the pandas documentation or community forums can sometimes provide insights into common issues or bug reports.

@wasimtikki120
Copy link

class MultiIndex:
"""
A multi-level, or hierarchical, index object for pandas DataFrame.

...

Attributes
----------
levels : list
    List of Index objects containing the unique values for each level of the MultiIndex.
codes : list
    List of arrays containing the codes that indicate the position of each element in the levels.

...

Examples
--------
>>> arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]
>>> tuples = list(zip(*arrays))
>>> index = pd.MultiIndex.from_tuples(tuples, names=('first', 'second'))
>>> index
MultiIndex([('A', 1),
            ('A', 2),
            ('B', 1),
            ('B', 2)],
           names=['first', 'second'])

>>> index.levels
[Index(['A', 'B'], dtype='object', name='first'),
 Int64Index([1, 2], dtype='int64', name='second')]

>>> index.codes
[array([0, 0, 1, 1], dtype=int8),
 array([0, 1, 0, 1], dtype=int8)]
"""

def __init__(self, levels, codes):
    """
    Parameters
    ----------
    levels : list
        List of Index objects containing the unique values for each level of the MultiIndex.
    codes : list
        List of arrays containing the codes that indicate the position of each element in the levels.
    """
    self.levels = levels
    self.codes = codes

@Arpan3323
Copy link

take

@chethanc1011
Copy link

take

@dwk601
Copy link

dwk601 commented Dec 15, 2023

Hi, I would like to contribute.

@sjalkote
Copy link
Contributor

Hi, looks like this has been inactive for a while so I'd like to try it

@sjalkote
Copy link
Contributor

take

@sjalkote
Copy link
Contributor

Ah it looks like there is already a docstring for MultiIndex.codes present in the main branch. Seems like this has already been fixed.

"""
Codes of the MultiIndex.
Codes are the position of the index value in the list of level values
for each level.
Returns
-------
tuple of numpy.ndarray
The codes of the MultiIndex. Each array in the tuple corresponds
to a level in the MultiIndex.
See Also
--------
MultiIndex.set_codes : Set new codes on MultiIndex.
Examples
--------
>>> arrays = [[1, 1, 2, 2], ["red", "blue", "red", "blue"]]
>>> mi = pd.MultiIndex.from_arrays(arrays, names=("number", "color"))
>>> mi.codes
(array([0, 0, 1, 1], dtype=int8), array([1, 0, 1, 0], dtype=int8))
"""

@sjalkote sjalkote removed their assignment Apr 10, 2024
@sam-baumann
Copy link
Contributor

take

@sam-baumann
Copy link
Contributor

Looks like #57601 fixed this - can we close this?

@sam-baumann
Copy link
Contributor

@datapythonista can we close this? Looks like was solved by #57601

@GAuravY19
Copy link

is the issue still open ?

@Aloqeely
Copy link
Member

is the issue still open ?

The docstrings have been added, but there are many more issues labeled with 'Docs' that we would appreciate your help on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.