Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPS: Unpin docutils #58413

Merged
merged 5 commits into from
May 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 3 additions & 4 deletions doc/source/user_guide/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,11 +160,10 @@ Here is a sample (using 100 column x 100,000 row ``DataFrames``):
.. csv-table::
:header: "Operation", "0.11.0 (ms)", "Prior Version (ms)", "Ratio to Prior"
:widths: 25, 25, 25, 25
:delim: ;

``df1 > df2``; 13.32; 125.35; 0.1063
``df1 * df2``; 21.71; 36.63; 0.5928
``df1 + df2``; 22.04; 36.50; 0.6039
``df1 > df2``, 13.32, 125.35, 0.1063
``df1 * df2``, 21.71, 36.63, 0.5928
``df1 + df2``, 22.04, 36.50, 0.6039

You are highly encouraged to install both libraries. See the section
:ref:`Recommended Dependencies <install.recommended_dependencies>` for more installation info.
Expand Down
15 changes: 2 additions & 13 deletions doc/source/user_guide/gotchas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -315,19 +315,8 @@ Why not make NumPy like R?

Many people have suggested that NumPy should simply emulate the ``NA`` support
present in the more domain-specific statistical programming language `R
<https://www.r-project.org/>`__. Part of the reason is the NumPy type hierarchy:

.. csv-table::
:header: "Typeclass","Dtypes"
:widths: 30,70
:delim: |

``numpy.floating`` | ``float16, float32, float64, float128``
``numpy.integer`` | ``int8, int16, int32, int64``
``numpy.unsignedinteger`` | ``uint8, uint16, uint32, uint64``
``numpy.object_`` | ``object_``
``numpy.bool_`` | ``bool_``
``numpy.character`` | ``bytes_, str_``
<https://www.r-project.org/>`__. Part of the reason is the
`NumPy type hierarchy <https://numpy.org/doc/stable/user/basics.types.html>`__.

The R language, by contrast, only has a handful of built-in data types:
``integer``, ``numeric`` (floating-point), ``character``, and
Expand Down
77 changes: 37 additions & 40 deletions doc/source/user_guide/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -506,29 +506,28 @@ listed below, those with a ``*`` do *not* have an efficient, GroupBy-specific, i
.. csv-table::
:header: "Method", "Description"
:widths: 20, 80
:delim: ;

:meth:`~.DataFrameGroupBy.any`;Compute whether any of the values in the groups are truthy
:meth:`~.DataFrameGroupBy.all`;Compute whether all of the values in the groups are truthy
:meth:`~.DataFrameGroupBy.count`;Compute the number of non-NA values in the groups
:meth:`~.DataFrameGroupBy.cov` * ;Compute the covariance of the groups
:meth:`~.DataFrameGroupBy.first`;Compute the first occurring value in each group
:meth:`~.DataFrameGroupBy.idxmax`;Compute the index of the maximum value in each group
:meth:`~.DataFrameGroupBy.idxmin`;Compute the index of the minimum value in each group
:meth:`~.DataFrameGroupBy.last`;Compute the last occurring value in each group
:meth:`~.DataFrameGroupBy.max`;Compute the maximum value in each group
:meth:`~.DataFrameGroupBy.mean`;Compute the mean of each group
:meth:`~.DataFrameGroupBy.median`;Compute the median of each group
:meth:`~.DataFrameGroupBy.min`;Compute the minimum value in each group
:meth:`~.DataFrameGroupBy.nunique`;Compute the number of unique values in each group
:meth:`~.DataFrameGroupBy.prod`;Compute the product of the values in each group
:meth:`~.DataFrameGroupBy.quantile`;Compute a given quantile of the values in each group
:meth:`~.DataFrameGroupBy.sem`;Compute the standard error of the mean of the values in each group
:meth:`~.DataFrameGroupBy.size`;Compute the number of values in each group
:meth:`~.DataFrameGroupBy.skew` *;Compute the skew of the values in each group
:meth:`~.DataFrameGroupBy.std`;Compute the standard deviation of the values in each group
:meth:`~.DataFrameGroupBy.sum`;Compute the sum of the values in each group
:meth:`~.DataFrameGroupBy.var`;Compute the variance of the values in each group

:meth:`~.DataFrameGroupBy.any`,Compute whether any of the values in the groups are truthy
:meth:`~.DataFrameGroupBy.all`,Compute whether all of the values in the groups are truthy
:meth:`~.DataFrameGroupBy.count`,Compute the number of non-NA values in the groups
:meth:`~.DataFrameGroupBy.cov` * ,Compute the covariance of the groups
:meth:`~.DataFrameGroupBy.first`,Compute the first occurring value in each group
:meth:`~.DataFrameGroupBy.idxmax`,Compute the index of the maximum value in each group
:meth:`~.DataFrameGroupBy.idxmin`,Compute the index of the minimum value in each group
:meth:`~.DataFrameGroupBy.last`,Compute the last occurring value in each group
:meth:`~.DataFrameGroupBy.max`,Compute the maximum value in each group
:meth:`~.DataFrameGroupBy.mean`,Compute the mean of each group
:meth:`~.DataFrameGroupBy.median`,Compute the median of each group
:meth:`~.DataFrameGroupBy.min`,Compute the minimum value in each group
:meth:`~.DataFrameGroupBy.nunique`,Compute the number of unique values in each group
:meth:`~.DataFrameGroupBy.prod`,Compute the product of the values in each group
:meth:`~.DataFrameGroupBy.quantile`,Compute a given quantile of the values in each group
:meth:`~.DataFrameGroupBy.sem`,Compute the standard error of the mean of the values in each group
:meth:`~.DataFrameGroupBy.size`,Compute the number of values in each group
:meth:`~.DataFrameGroupBy.skew` * ,Compute the skew of the values in each group
:meth:`~.DataFrameGroupBy.std`,Compute the standard deviation of the values in each group
:meth:`~.DataFrameGroupBy.sum`,Compute the sum of the values in each group
:meth:`~.DataFrameGroupBy.var`,Compute the variance of the values in each group

Some examples:

Expand Down Expand Up @@ -832,19 +831,18 @@ The following methods on GroupBy act as transformations.
.. csv-table::
:header: "Method", "Description"
:widths: 20, 80
:delim: ;

:meth:`~.DataFrameGroupBy.bfill`;Back fill NA values within each group
:meth:`~.DataFrameGroupBy.cumcount`;Compute the cumulative count within each group
:meth:`~.DataFrameGroupBy.cummax`;Compute the cumulative max within each group
:meth:`~.DataFrameGroupBy.cummin`;Compute the cumulative min within each group
:meth:`~.DataFrameGroupBy.cumprod`;Compute the cumulative product within each group
:meth:`~.DataFrameGroupBy.cumsum`;Compute the cumulative sum within each group
:meth:`~.DataFrameGroupBy.diff`;Compute the difference between adjacent values within each group
:meth:`~.DataFrameGroupBy.ffill`;Forward fill NA values within each group
:meth:`~.DataFrameGroupBy.pct_change`;Compute the percent change between adjacent values within each group
:meth:`~.DataFrameGroupBy.rank`;Compute the rank of each value within each group
:meth:`~.DataFrameGroupBy.shift`;Shift values up or down within each group

:meth:`~.DataFrameGroupBy.bfill`,Back fill NA values within each group
:meth:`~.DataFrameGroupBy.cumcount`,Compute the cumulative count within each group
:meth:`~.DataFrameGroupBy.cummax`,Compute the cumulative max within each group
:meth:`~.DataFrameGroupBy.cummin`,Compute the cumulative min within each group
:meth:`~.DataFrameGroupBy.cumprod`,Compute the cumulative product within each group
:meth:`~.DataFrameGroupBy.cumsum`,Compute the cumulative sum within each group
:meth:`~.DataFrameGroupBy.diff`,Compute the difference between adjacent values within each group
:meth:`~.DataFrameGroupBy.ffill`,Forward fill NA values within each group
:meth:`~.DataFrameGroupBy.pct_change`,Compute the percent change between adjacent values within each group
:meth:`~.DataFrameGroupBy.rank`,Compute the rank of each value within each group
:meth:`~.DataFrameGroupBy.shift`,Shift values up or down within each group

In addition, passing any built-in aggregation method as a string to
:meth:`~.DataFrameGroupBy.transform` (see the next section) will broadcast the result
Expand Down Expand Up @@ -1092,11 +1090,10 @@ efficient, GroupBy-specific, implementation.
.. csv-table::
:header: "Method", "Description"
:widths: 20, 80
:delim: ;

:meth:`~.DataFrameGroupBy.head`;Select the top row(s) of each group
:meth:`~.DataFrameGroupBy.nth`;Select the nth row(s) of each group
:meth:`~.DataFrameGroupBy.tail`;Select the bottom row(s) of each group
:meth:`~.DataFrameGroupBy.head`,Select the top row(s) of each group
:meth:`~.DataFrameGroupBy.nth`,Select the nth row(s) of each group
:meth:`~.DataFrameGroupBy.tail`,Select the bottom row(s) of each group

Users can also use transformations along with Boolean indexing to construct complex
filtrations within groups. For example, suppose we are given groups of products and
Expand Down
18 changes: 9 additions & 9 deletions doc/source/user_guide/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,13 +94,14 @@ well). Any of the axes accessors may be the null slice ``:``. Axes left out of
the specification are assumed to be ``:``, e.g. ``p.loc['a']`` is equivalent to
``p.loc['a', :]``.

.. csv-table::
:header: "Object Type", "Indexers"
:widths: 30, 50
:delim: ;

Series; ``s.loc[indexer]``
DataFrame; ``df.loc[row_indexer,column_indexer]``
.. ipython:: python

ser = pd.Series(range(5), index=list("abcde"))
ser.loc[["a", "c", "e"]]

df = pd.DataFrame(np.arange(25).reshape(5, 5), index=list("abcde"), columns=list("abcde"))
df.loc[["a", "c", "e"], ["b", "d"]]

.. _indexing.basics:

Expand All @@ -116,10 +117,9 @@ indexing pandas objects with ``[]``:
.. csv-table::
:header: "Object Type", "Selection", "Return Value Type"
:widths: 30, 30, 60
:delim: ;

Series; ``series[label]``; scalar value
DataFrame; ``frame[colname]``; ``Series`` corresponding to colname
Series, ``series[label]``, scalar value
DataFrame, ``frame[colname]``, ``Series`` corresponding to colname

Here we construct a simple time series data set to use for illustrating the
indexing functionality:
Expand Down
67 changes: 32 additions & 35 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,26 +16,25 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
.. csv-table::
:header: "Format Type", "Data Description", "Reader", "Writer"
:widths: 30, 100, 60, 60
:delim: ;

text;`CSV <https://en.wikipedia.org/wiki/Comma-separated_values>`__;:ref:`read_csv<io.read_csv_table>`;:ref:`to_csv<io.store_in_csv>`
text;Fixed-Width Text File;:ref:`read_fwf<io.fwf_reader>`
text;`JSON <https://www.json.org/>`__;:ref:`read_json<io.json_reader>`;:ref:`to_json<io.json_writer>`
text;`HTML <https://en.wikipedia.org/wiki/HTML>`__;:ref:`read_html<io.read_html>`;:ref:`to_html<io.html>`
text;`LaTeX <https://en.wikipedia.org/wiki/LaTeX>`__;;:ref:`Styler.to_latex<io.latex>`
text;`XML <https://www.w3.org/standards/xml/core>`__;:ref:`read_xml<io.read_xml>`;:ref:`to_xml<io.xml>`
text; Local clipboard;:ref:`read_clipboard<io.clipboard>`;:ref:`to_clipboard<io.clipboard>`
binary;`MS Excel <https://en.wikipedia.org/wiki/Microsoft_Excel>`__;:ref:`read_excel<io.excel_reader>`;:ref:`to_excel<io.excel_writer>`
binary;`OpenDocument <http://opendocumentformat.org>`__;:ref:`read_excel<io.ods>`;
binary;`HDF5 Format <https://support.hdfgroup.org/HDF5/whatishdf5.html>`__;:ref:`read_hdf<io.hdf5>`;:ref:`to_hdf<io.hdf5>`
binary;`Feather Format <https://github.com/wesm/feather>`__;:ref:`read_feather<io.feather>`;:ref:`to_feather<io.feather>`
binary;`Parquet Format <https://parquet.apache.org/>`__;:ref:`read_parquet<io.parquet>`;:ref:`to_parquet<io.parquet>`
binary;`ORC Format <https://orc.apache.org/>`__;:ref:`read_orc<io.orc>`;:ref:`to_orc<io.orc>`
binary;`Stata <https://en.wikipedia.org/wiki/Stata>`__;:ref:`read_stata<io.stata_reader>`;:ref:`to_stata<io.stata_writer>`
binary;`SAS <https://en.wikipedia.org/wiki/SAS_(software)>`__;:ref:`read_sas<io.sas_reader>`;
binary;`SPSS <https://en.wikipedia.org/wiki/SPSS>`__;:ref:`read_spss<io.spss_reader>`;
binary;`Python Pickle Format <https://docs.python.org/3/library/pickle.html>`__;:ref:`read_pickle<io.pickle>`;:ref:`to_pickle<io.pickle>`
SQL;`SQL <https://en.wikipedia.org/wiki/SQL>`__;:ref:`read_sql<io.sql>`;:ref:`to_sql<io.sql>`

text,`CSV <https://en.wikipedia.org/wiki/Comma-separated_values>`__, :ref:`read_csv<io.read_csv_table>`, :ref:`to_csv<io.store_in_csv>`
text,Fixed-Width Text File, :ref:`read_fwf<io.fwf_reader>` , NA
text,`JSON <https://www.json.org/>`__, :ref:`read_json<io.json_reader>`, :ref:`to_json<io.json_writer>`
text,`HTML <https://en.wikipedia.org/wiki/HTML>`__, :ref:`read_html<io.read_html>`, :ref:`to_html<io.html>`
text,`LaTeX <https://en.wikipedia.org/wiki/LaTeX>`__, :ref:`Styler.to_latex<io.latex>` , NA
text,`XML <https://www.w3.org/standards/xml/core>`__, :ref:`read_xml<io.read_xml>`, :ref:`to_xml<io.xml>`
text, Local clipboard, :ref:`read_clipboard<io.clipboard>`, :ref:`to_clipboard<io.clipboard>`
binary,`MS Excel <https://en.wikipedia.org/wiki/Microsoft_Excel>`__ , :ref:`read_excel<io.excel_reader>`, :ref:`to_excel<io.excel_writer>`
binary,`OpenDocument <http://opendocumentformat.org>`__, :ref:`read_excel<io.ods>`, NA
binary,`HDF5 Format <https://support.hdfgroup.org/HDF5/whatishdf5.html>`__, :ref:`read_hdf<io.hdf5>`, :ref:`to_hdf<io.hdf5>`
binary,`Feather Format <https://github.com/wesm/feather>`__, :ref:`read_feather<io.feather>`, :ref:`to_feather<io.feather>`
binary,`Parquet Format <https://parquet.apache.org/>`__, :ref:`read_parquet<io.parquet>`, :ref:`to_parquet<io.parquet>`
binary,`ORC Format <https://orc.apache.org/>`__, :ref:`read_orc<io.orc>`, :ref:`to_orc<io.orc>`
binary,`Stata <https://en.wikipedia.org/wiki/Stata>`__, :ref:`read_stata<io.stata_reader>`, :ref:`to_stata<io.stata_writer>`
binary,`SAS <https://en.wikipedia.org/wiki/SAS_(software)>`__, :ref:`read_sas<io.sas_reader>` , NA
binary,`SPSS <https://en.wikipedia.org/wiki/SPSS>`__, :ref:`read_spss<io.spss_reader>` , NA
binary,`Python Pickle Format <https://docs.python.org/3/library/pickle.html>`__, :ref:`read_pickle<io.pickle>`, :ref:`to_pickle<io.pickle>`
SQL,`SQL <https://en.wikipedia.org/wiki/SQL>`__, :ref:`read_sql<io.sql>`,:ref:`to_sql<io.sql>`

:ref:`Here <io.perf>` is an informal performance comparison for some of these IO methods.

Expand Down Expand Up @@ -1837,14 +1836,13 @@ with optional parameters:

.. csv-table::
:widths: 20, 150
:delim: ;

``split``; dict like {index -> [index], columns -> [columns], data -> [values]}
``records``; list like [{column -> value}, ... , {column -> value}]
``index``; dict like {index -> {column -> value}}
``columns``; dict like {column -> {index -> value}}
``values``; just the values array
``table``; adhering to the JSON `Table Schema`_
``split``, dict like {index -> [index]; columns -> [columns]; data -> [values]}
``records``, list like [{column -> value}; ... ]
``index``, dict like {index -> {column -> value}}
``columns``, dict like {column -> {index -> value}}
``values``, just the values array
``table``, adhering to the JSON `Table Schema`_

* ``date_format`` : string, type of date conversion, 'epoch' for timestamp, 'iso' for ISO8601.
* ``double_precision`` : The number of decimal places to use when encoding floating point values, default 10.
Expand Down Expand Up @@ -2025,14 +2023,13 @@ is ``None``. To explicitly force ``Series`` parsing, pass ``typ=series``

.. csv-table::
:widths: 20, 150
:delim: ;

``split``; dict like {index -> [index], columns -> [columns], data -> [values]}
``records``; list like [{column -> value}, ... , {column -> value}]
``index``; dict like {index -> {column -> value}}
``columns``; dict like {column -> {index -> value}}
``values``; just the values array
``table``; adhering to the JSON `Table Schema`_

``split``, dict like {index -> [index]; columns -> [columns]; data -> [values]}
``records``, list like [{column -> value} ...]
``index``, dict like {index -> {column -> value}}
``columns``, dict like {column -> {index -> value}}
``values``, just the values array
``table``, adhering to the JSON `Table Schema`_


* ``dtype`` : if True, infer dtypes, if a dict of column to dtype, then use those, if ``False``, then don't infer dtypes at all, default is True, apply only to the data.
Expand Down