xarray
python
import numpy as np import pandas as pd import xarray as xray import xarray import xarray as xr
np.random.seed(123456)
- Grouped and resampling quantile calculations now use the vectorized algorithm in
flox>=0.9.4
if present. By Deepak Cherian. - Do not broadcast in arithmetic operations when global option
arithmetic_broadcast=False
(6806
,8784
). By Etienne Schalk and Deepak Cherian. - Add the
.oindex
property to Explicitly Indexed Arrays for orthogonal indexing functionality. (8238
,8750
) By Anderson Banihirwe. - Add the
.vindex
property to Explicitly Indexed Arrays for vectorized indexing functionality. (8238
,8780
) By Anderson Banihirwe. - Expand use of
.oindex
and.vindex
properties. (:pull: 8790) By Anderson Banihirwe and Deepak Cherian. - Allow creating :py
xr.Coordinates
objects with no indexes (8711
) By Benoit Bovy and Tom Nicholas.
- Don't allow overwriting index variables with
to_zarr
region writes. (8589
,8876
). By Deepak Cherian.
- The default
freq
parameter in :pyxr.date_range
and :pyxr.cftime_range
is set to'D'
only ifperiods
,start
, orend
areNone
(8770
,8774
). By Roberto Chang. - Ensure that non-nanosecond precision :py
numpy.datetime64
and :pynumpy.timedelta64
values are cast to nanosecond precision values when used in :pyDataArray.expand_dims
and ::pyDataset.expand_dims
(8781
). By Spencer Clark. - CF conform handling of _FillValue/missing_value and dtype in CFMaskCoder/CFScaleOffsetCoder (
2304
,5597
,7691
,8713
, see also discussion in7654
). By Kai Mühlbauer. - Do not cast _FillValue/missing_value in CFMaskCoder if _Unsigned is provided (
8844
,8852
). - Adapt handling of copy keyword argument for numpy >= 2.0dev (
8844
,8851
,8865
). By Kai Mühlbauer. - Import trapz/trapezoid depending on numpy version (
8844
,8865
). By Kai Mühlbauer. - Warn and return bytes undecoded in case of UnicodeDecodeError in h5netcdf-backend (
5563
,8874
). By Kai Mühlbauer.
- Migrates
treenode
functionality intoxarray/core
(8757
) By Matt Savoie and Tom Nicholas. - Migrates
datatree
functionality intoxarray/core
. (:pull: 8789) By Owen Littlejohns, Matt Savoie and Tom Nicholas.
This release brings size information to the text repr
, changes to the accepted frequency strings, and various bug fixes.
Thanks to our 12 contributors:
Anderson Banihirwe, Deepak Cherian, Eivind Jahren, Etienne Schalk, Justus Magin, Marco Wolsza, Mathias Hauser, Matt Savoie, Maximilian Roos, Rambaud Pierrick, Tom Nicholas
- Added a simple
nbytes
representation in DataArrays and Datasetrepr
. (8690
,8702
). By Etienne Schalk. - Allow negative frequency strings (e.g.
"-1YE"
). These strings are for example used in :pydate_range
, and :pycftime_range
(8651
). By Mathias Hauser. - Add :py
NamedArray.expand_dims
, :pyNamedArray.permute_dims
and :pyNamedArray.broadcast_to
(8380
) By Anderson Banihirwe. - Xarray now defers to flox's heuristics to set the default method for groupby problems. This only applies to
flox>=0.9
. By Deepak Cherian. - All quantile methods (e.g. :py
DataArray.quantile
) now use numbagg for the calculation of nanquantiles (i.e., skipna=True) if it is installed. This is currently limited to the linear interpolation method (method='linear'). (7377
,8684
) By Marco Wolsza.
- :py
infer_freq
always returns the frequency strings as defined in pandas 2.2 (8612
,8627
). By Mathias Hauser.
- The dt.weekday_name parameter wasn't functional on modern pandas versions and has been removed. (
8610
,8664
) By Sam Coleman.
- Fixed a regression that prevented multi-index level coordinates being serialized after resetting or dropping the multi-index (
8628
,8672
). By Benoit Bovy. - Fix bug with broadcasting when wrapping array API-compliant classes. (
8665
,8669
) By Tom Nicholas. - Ensure :py
DataArray.unstack
works when wrapping array API-compliant classes. (8666
,8668
) By Tom Nicholas. - Fix negative slicing of Zarr arrays without dask installed. (
8252
) By Deepak Cherian. - Preserve chunks when writing time-like variables to zarr by enabling lazy CF encoding of time-like variables (
7132
,8230
,8432
,8575
). By Spencer Clark and Mattia Almansi. - Preserve chunks when writing time-like variables to zarr by enabling their lazy encoding (
7132
,8230
,8432
,8253
,8575
; see also discussion in8253
). By Spencer Clark and Mattia Almansi. - Raise an informative error if dtype encoding of time-like variables would lead to integer overflow or unsafe conversion from floating point to integer values (
8542
,8575
). By Spencer Clark. - Raise an error when unstacking a MultiIndex that has duplicates as this would lead to silent data loss (
7104
,8737
). By Mathias Hauser.
- Fix variables arg typo in Dataset.sortby() docstring (
8663
,8670
) By Tom Vo. - Fixed documentation where the use of the depreciated pandas frequency string prevented the documentation from being built. (
8638
) By Sam Coleman.
DataArray.dt
now raises anAttributeError
rather than aTypeError
when the data isn't datetime-like. (8718
,8724
) By Maximilian Roos.- Move
parallelcompat
andchunk managers
modules fromxarray/core
toxarray/namedarray
. (8319
) By Tom Nicholas and Anderson Banihirwe. - Imports
datatree
repository and history into internal location. (8688
) By Matt Savoie, Justus Magin and Tom Nicholas. - Adds :py
open_datatree
intoxarray/backends
(8697
) By Matt Savoie and Tom Nicholas. - Refactor :py
xarray.core.indexing.DaskIndexingAdapter.__getitem__
to remove an unnecessary rewrite of the indexer key (:issue: 8377,8758
) By Anderson Banihirwe.
This release is to fix a bug with the rendering of the documentation, but it also includes changes to the handling of pandas frequency strings.
- Following pandas, :py
infer_freq
will return"YE"
, instead of"Y"
(formerly"A"
). This is to be consistent with the deprecation of the latter frequency string in pandas 2.2. This is a follow up to8415
(8612
,8642
). By Mathias Hauser.
- Following pandas, the frequency string
"Y"
(formerly"A"
) is deprecated in favor of"YE"
. These strings are used, for example, in :pydate_range
, :pycftime_range
, :pyDataArray.resample
, and :pyDataset.resample
among others (8612
,8629
). By Mathias Hauser.
- Pin
sphinx-book-theme
to1.0.1
to fix a rendering issue with the sidebar in the docs. (8619
,8632
) By Tom Nicholas.
This release brings support for weights in correlation and covariance functions, a new DataArray.cumulative aggregation, improvements to xr.map_blocks, an update to our minimum dependencies, and various bugfixes.
Thanks to our 17 contributors to this release:
Abel Aoun, Deepak Cherian, Illviljan, Johan Mathe, Justus Magin, Kai Mühlbauer, Llorenç Lledó, Mark Harfouche, Markel, Mathias Hauser, Maximilian Roos, Michael Niklas, Niclas Rieger, Sébastien Celles, Tom Nicholas, Trinh Quoc Anh, and crusaderky.
- :py
xr.cov
and :pyxr.corr
now support using weights (8527
,7392
). By Llorenç Lledó. - Accept the compression arguments new in netCDF 1.6.0 in the netCDF4 backend. See netCDF4 documentation for details. Note that some new compression filters needs plugins to be installed which may not be available in all netCDF distributions. By Markel García-Díez. (
6929
,7551
) - Add :py
DataArray.cumulative
& :pyDataset.cumulative
to compute cumulative aggregations, such assum
, along a dimension — for exampleda.cumulative('time').sum()
. This is similar to pandas'.expanding
, and mostly equivalent to.cumsum
methods, or to :pyDataArray.rolling
with a window length equal to the dimension size. By Maximilian Roos. (8512
) - Decode/Encode netCDF4 enums and store the enum definition in dataarrays' dtype metadata. If multiple variables share the same enum in netCDF4, each dataarray will have its own enum definition in their respective dtype metadata. By Abel Aoun. (
8144
,8147
)
The minimum versions of some dependencies were changed (
8586
):Package Old New cartopy
0.20
0.21
dask-core
2022.7
2022.12
distributed
2022.7
2022.12
flox
0.5
0.7
iris
3.2
3.4
matplotlib-base
3.5
3.6
numpy
1.22
1.23
numba
0.55
0.56
packaging
21.3
22.0
seaborn
0.11
0.12
scipy
1.8
1.10
typing_extensions
4.3
4.4
zarr
2.12
2.13
- The squeeze kwarg to GroupBy is now deprecated. (
2157
,8507
) By Deepak Cherian.
- Support non-string hashable dimensions in :py
xarray.DataArray
(8546
,8559
). By Michael Niklas. - Reverse index output of bottleneck's rolling move_argmax/move_argmin functions (
8541
,8552
). By Kai Mühlbauer. - Vendor SerializableLock from dask and use as default lock for netcdf4 backends (
8442
,8571
). By Kai Mühlbauer. - Add tests and fixes for empty :py
CFTimeIndex
, including broken html repr (7298
,8600
). By Mathias Hauser.
- The implementation of :py
map_blocks
has changed to minimize graph size and duplication of data. This should be a strict improvement even though the graphs are not always embarassingly parallel any more. Please open an issue if you spot a regression. (8412
,8409
). By Deepak Cherian. - Remove null values before plotting. (
8535
). By Jimmy Westling. - Redirect cumulative reduction functions internally through the :py
ChunkManagerEntryPoint
, potentially allowing :py~xarray.DataArray.ffill
and :py~xarray.DataArray.bfill
to use non-dask chunked array types. (8019
) By Tom Nicholas.
This release brings new hypothesis strategies for testing, significantly faster rolling aggregations as well as ffill
and bfill
with numbagg
, a new :pyDataset.eval
method, and improvements to reading and writing Zarr arrays (including a new "a-"
mode).
Thanks to our 16 contributors:
Anderson Banihirwe, Ben Mares, Carl Andersson, Deepak Cherian, Doug Latornell, Gregorio L. Trevisan, Illviljan, Jens Hedegaard Nielsen, Justus Magin, Mathias Hauser, Max Jones, Maximilian Roos, Michael Niklas, Patrick Hoefler, Ryan Abernathey, Tom Nicholas
- Added hypothesis strategies for generating :py
xarray.Variable
objects containing arbitrary data, useful for parametrizing downstream tests. Accessible under :pytesting.strategies
, and documented in a new page on testing in the User Guide. (6911
,8404
) By Tom Nicholas. - :py
rolling
uses numbagg for most of its computations by default. Numbagg is up to 5x faster than bottleneck where parallelization is possible. Where parallelization isn't possible — for example a 1D array — it's about the same speed as bottleneck, and 2-5x faster than pandas' default functions. (8493
). numbagg is an optional dependency, so requires installing separately. - Use a concise format when plotting datetime arrays. (
8449
). By Jimmy Westling. - Avoid overwriting unchanged existing coordinate variables when appending with :py
Dataset.to_zarr
by settingmode='a-'
. By Ryan Abernathey and Deepak Cherian. - :py
~xarray.DataArray.rank
now operates on dask-backed arrays, assuming the core dim has exactly one chunk. (8475
). By Maximilian Roos. - Add a :py
Dataset.eval
method, similar to the pandas' method of the same name. (7163
). This is currently marked as experimental and doesn't yet support thenumexpr
engine. - :py
Dataset.drop_vars
& :pyDataArray.drop_vars
allow passing a callable, similar to :pyDataset.where
& :pyDataset.sortby
& others. (8511
). By Maximilian Roos.
- Explicitly warn when creating xarray objects with repeated dimension names. Such objects will also now raise when :py
DataArray.get_axis_num
is called, which means many functions will raise. This latter change is technically a breaking change, but whilst allowed, this behaviour was never actually supported! (3731
,8491
) By Tom Nicholas.
- As part of an effort to standardize the API, we're renaming the
dims
keyword arg todim
for the minority of functions which current usedims
. This started with :pyxarray.dot
& :pyDataArray.dot
and we'll gradually roll this out across all functions. The warnings are currentlyPendingDeprecationWarning
, which are silenced by default. We'll convert these toDeprecationWarning
in a future release. By Maximilian Roos. - Raise a
FutureWarning
warning that the type of :pyDataset.dims
will be changed from a mapping of dimension names to lengths to a set of dimension names. This is to increase consistency with :pyDataArray.dims
. To access a mapping of dimension names to lengths please use :pyDataset.sizes
. The same change also applies to DatasetGroupBy.dims. (8496
,8500
) By Tom Nicholas. - :py
Dataset.drop
& :pyDataArray.drop
are now deprecated, since pending deprecation for several years. :pyDataArray.drop_sel
& :pyDataArray.drop_var
replace them for labels & variables respectively. (8497
) By Maximilian Roos.
- Fix dtype inference for
pd.CategoricalIndex
when categories are backed by apd.ExtensionDtype
(8481
) - Fix writing a variable that requires transposing when not writing to a region (
8484
) By Maximilian Roos. - Static typing of
p0
andbounds
arguments of :pyxarray.DataArray.curvefit
and :pyxarray.Dataset.curvefit
was changed toMapping
(8502
). By Michael Niklas. - Fix typing of :py
xarray.DataArray.to_netcdf
and :pyxarray.Dataset.to_netcdf
whencompute
is evaluated to bool instead of a Literal (8268
). By Jens Hedegaard Nielsen.
- Added illustration of updating the time coordinate values of a resampled dataset using time offset arithmetic. This is the recommended technique to replace the use of the deprecated
loffset
parameter inresample
(8479
). By Doug Latornell. - Improved error message when attempting to get a variable which doesn't exist from a Dataset. (
8474
) By Maximilian Roos. - Fix default value of
combine_attrs
in :pyxarray.combine_by_coords
(8471
) By Gregorio L. Trevisan.
- :py
DataArray.bfill
& :pyDataArray.ffill
now use numbagg <https://github.com/numbagg/numbagg>_ by default, which is up to 5x faster where parallelization is possible. (:pull:`8339) By Maximilian Roos. - Update mypy version to 1.7 (
8448
,8501
). By Michael Niklas.
Tip
This is our 10th year anniversary release! Thank you for your love and support.
This release brings the ability to use opt_einsum
for :pyxarray.dot
by default, support for auto-detecting region
when writing partial datasets to Zarr, and the use of h5py drivers with h5netcdf
.
Thanks to the 19 contributors to this release: Aman Bagrecha, Anderson Banihirwe, Ben Mares, Deepak Cherian, Dimitri Papadopoulos Orfanos, Ezequiel Cimadevilla Alvarez, Illviljan, Justus Magin, Katelyn FitzGerald, Kai Muehlbauer, Martin Durant, Maximilian Roos, Metamess, Sam Levang, Spencer Clark, Tom Nicholas, mgunyho, templiert
- Use opt_einsum for :py
xarray.dot
by default if installed. By Deepak Cherian. (7764
,8373
). - Add
DataArray.dt.total_seconds()
method to match the Pandas API. (8435
). By Ben Mares. - Allow passing
region="auto"
in :pyDataset.to_zarr
to automatically infer the region to write in the original store. Also implement automatic transpose when dimension order does not match the original store. (7702
,8421
,8434
). By Sam Levang. - Allow the usage of h5py drivers (eg: ros3) via h5netcdf (
8360
). By Ezequiel Cimadevilla. - Enable VLEN string fill_values, preserve VLEN string dtypes (
1647
,7652
,7868
,7869
). By Kai Mühlbauer.
- drop support for cdms2. Please use xcdat instead (
8441
). By Justus Magin. - Following pandas, :py
infer_freq
will return"Y"
,"YS"
,"QE"
,"ME"
,"h"
,"min"
,"s"
,"ms"
,"us"
, or"ns"
instead of"A"
,"AS"
,"Q"
,"M"
,"H"
,"T"
,"S"
,"L"
,"U"
, or"N"
. This is to be consistent with the deprecation of the latter frequency strings (8394
,8415
). By Spencer Clark. - Bump minimum tested pint version to
>=0.22
. By Deepak Cherian. - Minimum supported versions for the following packages have changed:
h5py >=3.7
,h5netcdf>=1.1
. By Kai Mühlbauer.
- The PseudoNetCDF backend has been removed. By Deepak Cherian.
- Supplying dimension-ordered sequences to :py
DataArray.chunk
& :pyDataset.chunk
is deprecated in favor of supplying a dictionary of dimensions, or a singleint
or"auto"
argument covering all dimensions. Xarray favors using dimensions names rather than positions, and this was one place in the API where dimension positions were used. (8341
) By Maximilian Roos. - Following pandas, the frequency strings
"A"
,"AS"
,"Q"
,"M"
,"H"
,"T"
,"S"
,"L"
,"U"
, and"N"
are deprecated in favor of"Y"
,"YS"
,"QE"
,"ME"
,"h"
,"min"
,"s"
,"ms"
,"us"
, and"ns"
, respectively. These strings are used, for example, in :pydate_range
, :pycftime_range
, :pyDataArray.resample
, and :pyDataset.resample
among others (8394
,8415
). By Spencer Clark. - Rename :py
Dataset.to_array
to :pyDataset.to_dataarray
for consistency with :pyDataArray.to_dataset
& :pyopen_dataarray
functions. This is a "soft" deprecation — the existing methods work and don't raise any warnings, given the relatively small benefits of the change. By Maximilian Roos. - Finally remove
keep_attrs
kwarg from :pyDataArray.resample
and :pyDataset.resample
. These were deprecated a long time ago. By Deepak Cherian.
- Port bug fix from pandas to eliminate the adjustment of resample bin edges in the case that the resampling frequency has units of days and is greater than one day (e.g.
"2D"
,"3D"
etc.) and theclosed
argument is set to"right"
to xarray's implementation of resample for data indexed by a :pyCFTimeIndex
(8393
). By Spencer Clark. - Fix to once again support date offset strings as input to the loffset parameter of resample and test this functionality (
8422
,8399
). By Katelyn FitzGerald. - Fix a bug where :py
DataArray.to_dataset
silently drops a variable if a coordinate with the same name already exists (8433
,7823
). By András Gunyhó. - Fix for :py
DataArray.to_zarr
& :pyDataset.to_zarr
to close the created zarr store when passing a path with .zip extension (8425
). By Carl Andersson <https://github.com/CarlAndersson>_.
- Small updates to documentation on distributed writes: See
io.zarr.appending
to Zarr. By Deepak Cherian.
This release updates our minimum numpy version in pyproject.toml
to 1.22, consistent with our documentation below.
This release brings performance enhancements to reading Zarr datasets, the ability to use numbagg for reductions, an expansion in API for rolling_exp
, fixes two regressions with datetime decoding, and many other bugfixes and improvements. Groupby reductions will also use numbagg
if flox>=0.8.1
and numbagg
are both installed.
Thanks to our 13 contributors: Anderson Banihirwe, Bart Schilperoort, Deepak Cherian, Illviljan, Kai Mühlbauer, Mathias Hauser, Maximilian Roos, Michael Niklas, Pieter Eendebak, Simon Høxbro Hansen, Spencer Clark, Tom White, olimcc
- Support high-performance reductions with numbagg. This is enabled by default if
numbagg
is installed. By Deepak Cherian. (8316
) - Add
corr
,cov
,std
&var
to.rolling_exp
. By Maximilian Roos. (8307
) - :py
DataArray.where
& :pyDataset.where
accept a callable for theother
parameter, passing the object as the only argument. Previously, this was only valid for thecond
parameter. (8255
) By Maximilian Roos. .rolling_exp
functions can now take amin_weight
parameter, to only output values when there are sufficient recent non-nan values.numbagg>=0.3.1
is required. (8285
) By Maximilian Roos.- :py
DataArray.sortby
& :pyDataset.sortby
accept a callable for thevariables
parameter, passing the object as the only argument. By Maximilian Roos. .rolling_exp
functions can now operate on dask-backed arrays, assuming the core dim has exactly one chunk. (8284
). By Maximilian Roos.
- Made more arguments keyword-only (e.g.
keep_attrs
,skipna
) for many :pyxarray.DataArray
and :pyxarray.Dataset
methods (6403
). By Mathias Hauser. - :py
Dataset.to_zarr
& :pyDataArray.to_zarr
require keyword arguments after the initial 7 positional arguments. By Maximilian Roos.
- Rename :py
Dataset.reset_encoding
& :pyDataArray.reset_encoding
to :pyDataset.drop_encoding
& :pyDataArray.drop_encoding
for consistency with otherdrop
&reset
methods —drop
generally removes something, whilereset
generally resets to some default or standard value. (8287
,8259
) By Maximilian Roos.
- :py
DataArray.rename
& :pyDataset.rename
would emit a warning when the operation was a no-op. (8266
) By Simon Hansen. - Fixed a regression introduced in the previous release checking time-like units when encoding/decoding masked data (
8269
,8277
). By Kai Mühlbauer. - Fix datetime encoding precision loss regression introduced in the previous release for datetimes encoded with units requiring floating point values, and a reference date not equal to the first value of the datetime array (
8271
,8272
). By Spencer Clark. - Fix excess metadata requests when using a Zarr store. Prior to this, metadata was re-read every time data was retrieved from the array, now metadata is retrieved only once when they array is initialized. (
8290
,8297
). By Oliver McCormack. - Fix to_zarr ending in a ReadOnlyError when consolidated metadata was used and the write_empty_chunks was provided. (
8323
,8326
) By Matthijs Amesz.
- Added page on the interoperability of xarray objects. (
7992
) By Tom Nicholas. - Added xarray-regrid to the list of xarray related projects (
8272
). By Bart Schilperoort.
- More improvements to support the Python array API standard by using duck array ops in more places in the codebase. (
8267
) By Tom White.
This release continues work on the new :pyxarray.Coordinates
object, allows to provide preferred_chunks when reading from netcdf files, enables :pyxarray.apply_ufunc
to handle missing core dimensions and fixes several bugs.
Thanks to the 24 contributors to this release: Alexander Fischer, Amrest Chinkamol, Benoit Bovy, Darsh Ranjan, Deepak Cherian, Gianfranco Costamagna, Gregorio L. Trevisan, Illviljan, Joe Hamman, JR, Justus Magin, Kai Mühlbauer, Kian-Meng Ang, Kyle Sunden, Martin Raspaud, Mathias Hauser, Mattia Almansi, Maximilian Roos, András Gunyhó, Michael Niklas, Richard Kleijn, Riulinchen, Tom Nicholas and Wiktor Kraśnicki.
We welcome the following new contributors to Xarray!: Alexander Fischer, Amrest Chinkamol, Darsh Ranjan, Gianfranco Costamagna, Gregorio L. Trevisan, Kian-Meng Ang, Riulinchen and Wiktor Kraśnicki.
- Added the :py
Coordinates.assign
method that can be used to combine different collections of coordinates prior to assign them to a Dataset or DataArray (8102
) at once. By Benoît Bovy. - Provide preferred_chunks for data read from netcdf files (
1440
,7948
). By Martin Raspaud. - Added on_missing_core_dims to :py
apply_ufunc
to allow for copying or dropping a :pyDataset
's variables with missing core dimensions (8138
). By Maximilian Roos.
- The :py
Coordinates
constructor now creates a (pandas) index by default for each dimension coordinate. To keep the previous behavior (no index created), pass an empty dictionary toindexes
. The constructor now also extracts and add the indexes from another :pyCoordinates
object passed viacoords
(8107
). By Benoît Bovy. - Static typing of
xlim
andylim
arguments in plotting functions now must betuple[float, float]
to align with matplotlib requirements. (7802
,8030
). By Michael Niklas.
- Deprecate passing a :py
pandas.MultiIndex
object directly to the :pyDataset
and :pyDataArray
constructors as well as to :pyDataset.assign
and :pyDataset.assign_coords
. A new Xarray :pyCoordinates
object has to be created first using :pyCoordinates.from_pandas_multiindex
(8094
). By Benoît Bovy.
- Improved static typing of reduction methods (
6746
). By Richard Kleijn. - Fix bug where empty attrs would generate inconsistent tokens (
6970
,8101
). By Mattia Almansi. - Improved handling of multi-coordinate indexes when updating coordinates, including bug fixes (and improved warnings for deprecated features) for pandas multi-indexes (
8094
). By Benoît Bovy. - Fixed a bug in :py
merge
withcompat='minimal'
where the coordinate names were not updated properly internally (7405
,7588
,8104
). By Benoît Bovy. - Fix bug where :py
DataArray
instances on the right-hand side of :pyDataArray.__setitem__
lose dimension names (7030
,8067
). By Darsh Ranjan. - Return
float64
in presence ofNaT
in :py~core.accessor_dt.DatetimeAccessor
and special caseNaT
handling in :py~core.accessor_dt.DatetimeAccessor.isocalendar
(7928
,8084
). By Kai Mühlbauer. - Fix :py
~core.rolling.DatasetRolling.construct
with stride on Datasets without indexes. (7021
,7578
). By Amrest Chinkamol and Michael Niklas. - Calling plot with kwargs
col
,row
orhue
no longer squeezes dimensions passed via these arguments (7552
,8174
). By Wiktor Kraśnicki. - Fixed a bug where casting from
float
toint64
(undefined forNaN
) led to varying issues (7817
,7942
,7790
,6191
,7096
,1064
,7827
). By Kai Mühlbauer. - Fixed a bug where inaccurate
coordinates
silently failed to decode variable (1809
,8195
). By Kai Mühlbauer .rolling_exp
functions no longer mistakenly lose non-dimensioned coords (6528
,8114
). By Maximilian Roos.- In the event that user-provided datetime64/timedelta64 units and integer dtype encoding parameters conflict with each other, override the units to preserve an integer dtype for most faithful serialization to disk (
1064
,8201
). By Kai Mühlbauer. - Static typing of dunder ops methods (like :py
DataArray.__eq__
) has been fixed. Remaining issues are upstream problems (7780
,8204
). By Michael Niklas. - Fix type annotation for
center
argument of plotting methods (like :pyxarray.plot.dataarray_plot.pcolormesh
) (8261
). By Pieter Eendebak.
- Make documentation of :py
DataArray.where
clearer (7767
,7955
). By Riulinchen.
- Many error messages related to invalid dimensions or coordinates now always show the list of valid dims/coords (
8079
). By András Gunyhó. - Refactor of encoding and decoding times/timedeltas to preserve nanosecond resolution in arrays that contain missing values (
7827
). By Kai Mühlbauer. - Transition
.rolling_exp
functions to use .apply_ufunc internally rather than .reduce, as the start of a broader effort to move non-reducing functions away from`.reduce
, (8114
). By Maximilian Roos. - Test range of fill_value's in test_interpolate_pd_compat (
8146
,8189
). By Kai Mühlbauer.
This release brings changes to minimum dependencies, allows reading of datasets where a dimension name is associated with a multidimensional variable (e.g. finite volume ocean model output), and introduces a new :pyxarray.Coordinates
object.
Thanks to the 16 contributors to this release: Anderson Banihirwe, Articoking, Benoit Bovy, Deepak Cherian, Harshitha, Ian Carroll, Joe Hamman, Justus Magin, Peter Hill, Rachel Wegener, Riley Kuttruff, Thomas Nicholas, Tom Nicholas, ilgast, quantsnus, vallirep
The :pyxarray.Variable
class is being refactored out to a new project title 'namedarray'. See the design doc for more details. Reach out to us on this [discussion topic](#8080) if you have any thoughts.
- :py
Coordinates
can now be constructed independently of any Dataset or DataArray (it is also returned by the :pyDataset.coords
and :pyDataArray.coords
properties).Coordinates
objects are useful for passing both coordinate variables and indexes to new Dataset / DataArray objects, e.g., via their constructor or via :pyDataset.assign_coords
. We may also wrap coordinate variables in aCoordinates
object in order to skip the automatic creation of (pandas) indexes for dimension coordinates. The :pyCoordinates.from_pandas_multiindex
constructor may be used to create coordinates directly from a :pypandas.MultiIndex
object (it is preferred over passing it directly as coordinate data, which may be deprecated soon). Like Dataset and DataArray objects,Coordinates
objects may now be used in :pyalign
and :pymerge
. (6392
,7368
). By Benoît Bovy. - Visually group together coordinates with the same indexes in the index section of the text repr (
7225
). By Justus Magin. - Allow creating Xarray objects where a multidimensional variable shares its name with a dimension. Examples include output from finite volume models like FVCOM. (
2233
,7989
) By Deepak Cherian and Benoit Bovy. - When outputting :py
Dataset
objects as Zarr via :pyDataset.to_zarr
, user can now specify that chunks that will contain no valid data will not be written. Originally, this could be done by specifying"write_empty_chunks": True
in theencoding
parameter; however, this setting would not carry over when appending new data to an existing dataset. (8009
) Requireszarr>=2.11
.
The minimum versions of some dependencies were changed (
8022
):Package Old New boto3
1.20
1.24
cftime
1.5
1.6
dask-core
2022.1
2022.7
distributed
2022.1
2022.7
hfnetcdf
0.13
1.0
iris
3.1
3.2
lxml
4.7
4.9
netcdf4
1.5.7
1.6.0
numpy
1.21
1.22
pint
0.18
0.19
pydap
3.2
3.3
rasterio
1.2
1.3
scipy
1.7
1.8
toolz
0.11
0.12
typing_extensions
4.0
4.3
zarr
2.10
2.12
numbagg
0.1
0.2.1
- Added page on the internal design of xarray objects. (
7991
) By Tom Nicholas. - Added examples to docstrings of :py
Dataset.assign_attrs
, :pyDataset.broadcast_equals
, :pyDataset.equals
, :pyDataset.identical
, :pyDataset.expand_dims
,:pyDataset.drop_vars
(6793
,7937
) By Harshitha. - Add docstrings for the :py
Index
base class and add some documentation on how to create custom, Xarray-compatible indexes (6975
) By Benoît Bovy. - Added a page clarifying the role of Xarray core team members. (
7999
) By Tom Nicholas. - Fixed broken links in "See also" section of :py
Dataset.count
(8055
,8057
) By Articoking. - Extended the glossary by adding terms Aligning, Broadcasting, Merging, Concatenating, Combining, lazy, labeled, serialization, indexing (
3355
,7732
) By Harshitha.
- :py
as_variable
now consistently includes the variable name in any exceptions raised. (7995
). By Peter Hill - :py
encode_dataset_coordinates
now sorts coordinates automatically assigned to coordinates attributes during serialization (8026
,8034
). By Ian Carroll.
This release brings improvements to the documentation on wrapping numpy-like arrays, improved docstrings, and bug fixes.
- hue_style is being deprecated for scatter plots. (
7907
,7925
). By Jimmy Westling.
- Ensure no forward slashes in variable and dimension names for HDF5-based engines. (
7943
,7953
) By Kai Mühlbauer.
- Added examples to docstrings of :py
Dataset.assign_attrs
, :pyDataset.broadcast_equals
, :pyDataset.equals
, :pyDataset.identical
, :pyDataset.expand_dims
,:pyDataset.drop_vars
(6793
,7937
) By Harshitha. - Added page on wrapping chunked numpy-like arrays as alternatives to dask arrays. (
7951
) By Tom Nicholas. - Expanded the page on wrapping numpy-like "duck" arrays. (
7911
) By Tom Nicholas. - Added examples to docstrings of :py
Dataset.isel
, :pyDataset.reduce
, :pyDataset.argmin
, :pyDataset.argmax
(6793
,7881
) By Harshitha .
- Allow chunked non-dask arrays (i.e. Cubed arrays) in groupby operations. (
7941
) By Tom Nicholas.
This release adds features to curvefit
, improves the performance of concatenation, and fixes various bugs.
Thank to our 13 contributors to this release: Anderson Banihirwe, Deepak Cherian, dependabot[bot], Illviljan, Juniper Tyree, Justus Magin, Martin Fleischmann, Mattia Almansi, mgunyho, Rutger van Haasteren, Thomas Nicholas, Tom Nicholas, Tom White.
- Added support for multidimensional initial guess and bounds in :py
DataArray.curvefit
(7768
,7821
). By András Gunyhó. - Add an
errors
option to :pyDataset.curve_fit
that allows returning NaN for the parameters and covariances of failed fits, rather than failing the whole series of fits (6317
,7891
). By Dominik Stańczak and András Gunyhó.
- Deprecate the cdms2 conversion methods (
7876
) By Justus Magin.
- Improve concatenation performance (
7833
,7824
). By Jimmy Westling.
- Fix bug where weighted
polyfit
were changing the original object (5644
,7900
). By Mattia Almansi. - Don't call
CachingFileManager.__del__
on interpreter shutdown (7814
,7880
). By Justus Magin. - Preserve vlen dtype for empty string arrays (
7328
,7862
). By Tom White and Kai Mühlbauer. - Ensure dtype of reindex result matches dtype of the original DataArray (
7299
,7917
) By Anderson Banihirwe. - Fix bug where a zero-length zarr
chunk_store
was ignored as if it wasNone
(7923
) By Juniper Tyree.
- Minor improvements to support of the python array api standard, internally using the function
xp.astype()
instead of the methodarr.astype()
, as the latter is not in the standard. (7847
) By Tom Nicholas. - Xarray now uploads nightly wheels to https://pypi.anaconda.org/scientific-python-nightly-wheels/simple/ (
7863
,7865
). By Martin Fleischmann. - Stop uploading development wheels to TestPyPI (
7889
) By Justus Magin. - Added an exception catch for
AttributeError
along withImportError
when duck typing the dynamic imports in pycompat.py. This catches some name collisions between packages. (7870
,7874
)
This release adds some new methods and operators, updates our deprecation policy for python versions, fixes some bugs with groupby, and introduces experimental support for alternative chunked parallel array computation backends via a new plugin system!
Note: If you are using a locally-installed development version of xarray then pulling the changes from this release may require you to re-install. This avoids an error where xarray cannot detect dask via the new entrypoints system introduced in 7019
. See 7856
for details.
Thanks to our 14 contributors: Alan Brammer, crusaderky, David Stansby, dcherian, Deeksha, Deepak Cherian, Illviljan, James McCreight, Joe Hamman, Justus Magin, Kyle Sunden, Max Hollmann, mgunyho, and Tom Nicholas
- Added new method :py
DataArray.to_dask_dataframe
, convert a dataarray into a dask dataframe (7409
). By Deeksha. - Add support for lshift and rshift binary operators (
<<
,>>
) on :pyxr.DataArray
of type :pyint
(7727
,7741
). By Alan Brammer. - Keyword argument data='array' to both :py
xarray.Dataset.to_dict
and :pyxarray.DataArray.to_dict
will now return data as the underlying array type. Python lists are returned for data='list' or data=True. Supplying data=False only returns the schema without data.encoding=True
returns the encoding dictionary for the underlying variable also. (1599
,7739
) . By James McCreight.
- adjust the deprecation policy for python to once again align with NEP-29 (
7765
,7793
) By Justus Magin.
- Optimize
.dt
accessor performance withCFTimeIndex
. (7796
) By Deepak Cherian.
- Fix as_compatible_data for masked float arrays, now always creates a copy when mask is present (
2377
,7788
). By Max Hollmann. - Fix groupby binary ops when grouped array is subset relative to other. (
7797
). By Deepak Cherian. - Fix groupby sum, prod for all-NaN groups with
flox
. (7808
). By Deepak Cherian.
- Experimental support for wrapping chunked array libraries other than dask. A new ABC is defined - :py
xr.core.parallelcompat.ChunkManagerEntrypoint
- which can be subclassed and then registered by alternative chunked array implementations. (6807
,7019
) By Tom Nicholas.
This is a patch release to fix a bug with binning (7766
)
- Fix binning when
labels
is specified. (7766
). By Deepak Cherian.
- Added examples to docstrings for :py
xarray.core.accessor_str.StringAccessor
methods. (7669
) . By Mary Gathoni.
This is a patch release to fix a bug with binning (7759
)
- Fix binning by unsorted arrays. (
7759
)
This release includes support for pandas v2, allows refreshing of backend engines in a session, and removes deprecated backends for rasterio
and cfgrib
.
Thanks to our 19 contributors: Chinemere, Tom Coleman, Deepak Cherian, Harshitha, Illviljan, Jessica Scheick, Joe Hamman, Justus Magin, Kai Mühlbauer, Kwonil-Kim, Mary Gathoni, Michael Niklas, Pierre, Scott Henderson, Shreyal Gupta, Spencer Clark, mccloskey, nishtha981, veenstrajelmer
We welcome the following new contributors to Xarray!: Mary Gathoni, Harshitha, veenstrajelmer, Chinemere, nishtha981, Shreyal Gupta, Kwonil-Kim, mccloskey.
- New methods to reset an objects encoding (:py
Dataset.reset_encoding
, :pyDataArray.reset_encoding
). (7686
,7689
). By Joe Hamman. - Allow refreshing backend engines with :py
xarray.backends.refresh_engines
(7478
,7523
). By Michael Niklas. - Added ability to save
DataArray
objects directly to Zarr using :py~xarray.DataArray.to_zarr
. (7692
,7693
) . By Joe Hamman.
- Remove deprecated rasterio backend in favor of rioxarray (
7392
). By Scott Henderson.
- Optimize alignment with
join="exact", copy=False
by avoiding copies. (7736
) By Deepak Cherian. - Avoid unnecessary copies of
CFTimeIndex
. (7735
) By Deepak Cherian.
- Fix :py
xr.polyval
with non-system standard integer coeffs (7619
). By Shreyal Gupta and Michael Niklas. - Improve error message when trying to open a file which you do not have permission to read (
6523
,7629
). By Thomas Coleman. - Proper plotting when passing :py
~matplotlib.colors.BoundaryNorm
type argument in :pyDataArray.plot
. (4061
,7014
,7553
) By Jelmer Veenstra. - Ensure the formatting of time encoding reference dates outside the range of nanosecond-precision datetimes remains the same under pandas version 2.0.0 (
7420
,7441
). By Justus Magin and Spencer Clark. - Various dtype related fixes needed to support pandas>=2.0 (
7724
) By Justus Magin. - Preserve boolean dtype within encoding (
7652
,7720
). By Kai Mühlbauer
- Update FAQ page on how do I open format X file as an xarray dataset? (
1285
,7638
) using :py~xarray.open_dataset
By Harshitha , Tom Nicholas.
- Don't assume that arrays read from disk will be Numpy arrays. This is a step toward enabling reads from a Zarr store using the Kvikio or TensorStore libraries. (
6874
). By Deepak Cherian. - Remove internal support for reading GRIB files through the
cfgrib
backend.cfgrib
now uses the external backend interface, so no existing code should break. By Deepak Cherian. - Implement CF coding functions in
VariableCoders
(7719
). By Kai Mühlbauer - Added a config.yml file with messages for the welcome bot when a Github user creates their first ever issue or pull request or has their first PR merged. (
7685
,7685
) By Nishtha P. - Ensure that only nanosecond-precision :py
pd.Timestamp
objects continue to be used internally under pandas version 2.0.0. This is mainly to ease the transition to this latest version of pandas. It should be relaxed when addressing7493
. By Spencer Clark (7707
,7731
).
This release brings many bug fixes, and some new features. The maximum pandas version is pinned to <2
until we can support the new pandas datetime types. Thanks to our 19 contributors: Abel Aoun, Alex Goodman, Deepak Cherian, Illviljan, Jody Klymak, Joe Hamman, Justus Magin, Mary Gathoni, Mathias Hauser, Mattia Almansi, Mick, Oriol Abril-Pla, Patrick Hoefler, Paul Ockenfuß, Pierre, Shreyal Gupta, Spencer Clark, Tom Nicholas, Tom Vo
- Fix :py
xr.cov
and :pyxr.corr
now support complex valued arrays (7340
,7392
). By Michael Niklas. - Allow indexing along unindexed dimensions with dask arrays (
2511
,4276
,4663
,5873
). By Abel Aoun and Deepak Cherian. - Support dask arrays in
first
andlast
reductions. By Deepak Cherian. - Improved performance in
open_dataset
for datasets with large object arrays (7484
,7494
). By Alex Goodman and Deepak Cherian.
- Following pandas, the
base
andloffset
parameters of :pyxr.DataArray.resample
and :pyxr.Dataset.resample
have been deprecated and will be removed in a future version of xarray. Using theorigin
oroffset
parameters is recommended as a replacement for using thebase
parameter and using time offset arithmetic is recommended as a replacement for using theloffset
parameter (8459
). By Spencer Clark.
- Improve error message when using in :py
Dataset.drop_vars
to state which variables can't be dropped. (7518
) By Tom Nicholas. - Require to explicitly defining optional dimensions such as hue and markersize for scatter plots. (
7314
,7277
). By Jimmy Westling. - Fix matplotlib raising a UserWarning when plotting a scatter plot with an unfilled marker (
7313
,7318
). By Jimmy Westling. - Fix issue with
max_gap
ininterpolate_na
, when applied to multidimensional arrays. (7597
,7598
). By Paul Ockenfuß. - Fix :py
DataArray.plot.pcolormesh
which now works if one of the coordinates has str dtype (6775
,7612
). By Michael Niklas.
- Clarify language in contributor's guide (
7495
,7595
) By Tom Nicholas.
- Pin pandas to
<2
. By Deepak Cherian.
This release brings a major upgrade to :pyxarray.concat
, many bug fixes, and a bump in supported dependency versions. Thanks to our 11 contributors: Aron Gergely, Deepak Cherian, Illviljan, James Bourbeau, Joe Hamman, Justus Magin, Hauke Schulz, Kai Mühlbauer, Ken Mankoff, Spencer Clark, Tom Nicholas.
Support for
python 3.8
has been dropped and the minimum versions of some dependencies were changed (7461
):Package Old New python
3.8
3.9
numpy
1.20
1.21
pandas
1.3
1.4
dask
2021.11
2022.1
distributed
2021.11
2022.1
h5netcdf
0.11
0.13
lxml
4.6
4.7
numba
5.4
5.5
- Following pandas, the closed parameters of :py
cftime_range
and :pydate_range
are deprecated in favor of the inclusive parameters, and will be removed in a future version of xarray (6985
:,7373
). By Spencer Clark.
- :py
xarray.concat
can now concatenate variables present in some datasets but not others (508
,7400
). By Kai Mühlbauer and Scott Chamberlin. - Handle
keep_attrs
option in binary operators of :pyDataset
(7390
,7391
). By Aron Gergely. - Improve error message when using dask in :py
apply_ufunc
withoutput_sizes
not supplied. (7509
) By Tom Nicholas. - :py
xarray.Dataset.to_zarr
now drops variable encodings that have been added by xarray during reading a dataset. (7129
,7500
). By Hauke Schulz.
- Mention the flox package in GroupBy documentation and docstrings. By Deepak Cherian.
This release includes a number of bug fixes. Thanks to the 14 contributors to this release: Aron Gergely, Benoit Bovy, Deepak Cherian, Ian Carroll, Illviljan, Joe Hamman, Justus Magin, Mark Harfouche, Matthew Roeschke, Paige Martin, Pierre, Sam Levang, Tom White, stefank0.
- :py
CFTimeIndex.get_loc
has removed themethod
andtolerance
keyword arguments. Use.get_indexer([key], method=..., tolerance=...)
instead (7361
). By Matthew Roeschke.
- Avoid in-memory broadcasting when converting to a dask dataframe using
.to_dask_dataframe.
(6811
,7472
). By Jimmy Westling. - Accessing the property
.nbytes
of a DataArray, or Variable no longer accidentally triggers loading the variable into memory. - Allow numpy-only objects in :py
where
whenkeep_attrs=True
(7362
,7364
). By Sam Levang. - add a
keep_attrs
parameter to :pyDataset.pad
, :pyDataArray.pad
, and :pyVariable.pad
(7267
). By Justus Magin. - Fixed performance regression in alignment between indexed and non-indexed objects of the same shape (
7382
). By Benoît Bovy. - Preserve original dtype on accessing MultiIndex levels (
7250
,7393
). By Ian Carroll.
- Add the pre-commit hook absolufy-imports to convert relative xarray imports to absolute imports (
7204
,7370
). By Jimmy Westling.
This release includes a number of bug fixes and experimental support for Zarr V3. Thanks to the 16 contributors to this release: Deepak Cherian, Francesco Zanetta, Gregory Lee, Illviljan, Joe Hamman, Justus Magin, Luke Conibear, Mark Harfouche, Mathias Hauser, Mick, Mike Taves, Sam Levang, Spencer Clark, Tom Nicholas, Wei Ji, templiert
- Enable using offset and origin arguments in :py
DataArray.resample
and :pyDataset.resample
(7266
,7284
). By Spencer Clark. - Add experimental support for Zarr's in-progress V3 specification. (
6475
). By Gregory Lee and Joe Hamman.
The minimum versions of some dependencies were changed (
7300
):Package Old New boto
1.18
1.20
cartopy
0.19
0.20
distributed
2021.09
2021.11 dask
2021.09
2021.11 h5py
3.1
3.6
hdf5
1.10
1.12
matplotlib-base
3.4
3.5
nc-time-axis
1.3
1.4
netcdf4
1.5.3
1.5.7
packaging
20.3
21.3
pint
0.17
0.18
pseudonetcdf
3.1
3.2
typing_extensions
3.10
4.0
- The PyNIO backend has been deprecated (
4491
,7301
). By Joe Hamman.
- Fix handling of coordinate attributes in :py
where
. (7220
,7229
) By Sam Levang. - Import
nc_time_axis
when needed (7275
,7276
). By Michael Niklas. - Fix static typing of :py
xr.polyval
(7312
,7315
). By Michael Niklas. - Fix multiple reads on fsspec S3 files by resetting file pointer to 0 when reading file streams (
6813
,7304
). By David Hoese and Wei Ji Leong. - Fix :py
Dataset.assign_coords
resetting all dimension coordinates to default (pandas) index (7346
,7347
). By Benoît Bovy.
- Add example of reading and writing individual groups to a single netCDF file to I/O docs page. (
7338
) By Tom Nicholas.
This release brings a number of bugfixes and documentation improvements. Both text and HTML reprs now have a new "Indexes" section, which we expect will help with development of new Index objects. This release also features more support for the Python Array API.
Many thanks to the 16 contributors to this release: Daniel Goman, Deepak Cherian, Illviljan, Jessica Scheick, Justus Magin, Mark Harfouche, Maximilian Roos, Mick, Patrick Naylor, Pierre, Spencer Clark, Stephan Hoyer, Tom Nicholas, Tom White
- Add static typing to plot accessors (
6949
,7052
). By Michael Niklas. - Display the indexes in a new section of the text and HTML reprs (
6795
,7183
,7185
) By Justus Magin and Benoît Bovy. - Added methods :py
DataArrayGroupBy.cumprod
and :pyDatasetGroupBy.cumprod
. (5816
) By Patrick Naylor
repr(ds)
may not show the same result because it doesn't load small, lazy data anymore. Useds.head().load()
when wanting to see just a sample of the data. (6722
,7203
). By Jimmy Westling.- Many arguments of plotmethods have been made keyword-only.
xarray.plot.plot
module renamed toxarray.plot.dataarray_plot
to prevent shadowing of theplot
method. (6949
,7052
). By Michael Niklas.
- Positional arguments for all plot methods have been deprecated (
6949
,7052
). By Michael Niklas. xarray.plot.FacetGrid.axes
has been renamed toxarray.plot.FacetGrid.axs
because it's not clear ifaxes
refers to single or multipleAxes
instances. This aligns withmatplotlib.pyplot.subplots
. (7194
) By Jimmy Westling.
- Explicitly opening a file multiple times (e.g., after modifying it on disk) now reopens the file from scratch for h5netcdf and scipy netCDF backends, rather than reusing a cached version (
4240
,4862
). By Stephan Hoyer. - Fixed bug where :py
Dataset.coarsen.construct
would demote non-dimension coordinates to variables. (7233
) By Tom Nicholas. - Raise a TypeError when trying to plot empty data (
7156
,7228
). By Michael Niklas.
- Improves overall documentation around available backends, including adding docstrings for :py
xarray.backends.list_engines
Add :py__str__
to surface the new :pyBackendEntrypoint
description
andurl
attributes. (6577
,7000
) By Jessica Scheick. - Created docstring examples for :py
DataArray.cumsum
, :pyDataArray.cumprod
, :pyDataset.cumsum
, :pyDataset.cumprod
, :pyDatasetGroupBy.cumsum
, :pyDataArrayGroupBy.cumsum
. (5816
,7152
) By Patrick Naylor - Add example of using :py
DataArray.coarsen.construct
to User Guide. (7192
) By Tom Nicholas. - Rename
axes
toaxs
in plotting to align withmatplotlib.pyplot.subplots
. (7194
) By Jimmy Westling. - Add documentation of specific BackendEntrypoints (
7200
). By Michael Niklas. - Add examples to docstring for :py
DataArray.drop_vars
, :pyDataArray.reindex_like
, :pyDataArray.interp_like
. (6793
,7123
) By Daniel Goman.
- Doctests fail on any warnings (
7166
) By Maximilian Roos. - Improve import time by lazy loading
dask.distributed
(:pull: 7172). - Explicitly specify
longdouble=False
in :pycftime.date2num
when encoding times to preserve existing behavior and prevent future errors when it is eventually set toTrue
by default in cftime (7171
). By Spencer Clark. - Improved import time by lazily importing backend modules, matplotlib, dask.array and flox. (
6726
,7179
) By Michael Niklas. - Emit a warning under the development version of pandas when we convert non-nanosecond precision datetime or timedelta values to nanosecond precision. This was required in the past, because pandas previously was not compatible with non-nanosecond precision values. However pandas is currently working towards removing this restriction. When things stabilize in pandas we will likely consider relaxing this behavior in xarray as well (
7175
,7201
). By Spencer Clark.
This release brings numerous bugfixes, a change in minimum supported versions, and a new scatter plot method for DataArrays.
Many thanks to 11 contributors to this release: Anderson Banihirwe, Benoit Bovy, Dan Adriaansen, Illviljan, Justus Magin, Lukas Bindreiter, Mick, Patrick Naylor, Spencer Clark, Thomas Nicholas
- Add scatter plot for datarrays. Scatter plots now also supports 3d plots with the z argument. (
6778
) By Jimmy Westling. - Include the variable name in the error message when CF decoding fails to allow for easier identification of problematic variables (
7145
,7147
). By Spencer Clark.
The minimum versions of some dependencies were changed:
Package Old New cftime
1.4
1.5
distributed
2021.08
2021.09 dask
2021.08
2021.09 iris
2.4
3.1
nc-time-axis
1.2
1.3
numba
0.53
0.54
numpy
1.19
1.20
pandas
1.2
1.3
packaging
20.0
21.0
scipy
1.6
1.7
sparse
0.12
0.13
typing_extensions
3.7
3.10
zarr
2.8
2.10
- Remove nested function from :py
open_mfdataset
to allow Dataset objects to be pickled. (7109
,7116
) By Daniel Adriaansen. - Support for recursively defined Arrays. Fixes repr and deepcopy. (
7111
,7112
) By Michael Niklas. - Fixed :py
Dataset.transpose
to raise a more informative error. (6502
,7120
) By Patrick Naylor - Fix groupby on a multi-index level coordinate and fix :py
DataArray.to_index
for multi-index levels (convert to single index). (6836
,7105
) By Benoît Bovy. - Support for open_dataset backends that return datasets containing multi-indexes (
7139
,7150
) By Lukas Bindreiter.
This release brings a large number of bugfixes and documentation improvements, as well as an external interface for setting custom indexes!
Many thanks to our 40 contributors:
Anderson Banihirwe, Andrew Ronald Friedman, Bane Sullivan, Benoit Bovy, ColemanTom, Deepak Cherian, Dimitri Papadopoulos Orfanos, Emma Marshall, Fabian Hofmann, Francesco Nattino, ghislainp, Graham Inggs, Hauke Schulz, Illviljan, James Bourbeau, Jody Klymak, Julia Signell, Justus Magin, Keewis, Ken Mankoff, Luke Conibear, Mathias Hauser, Max Jones, mgunyho, Michael Delgado, Mick, Mike Taves, Oliver Lopez, Patrick Naylor, Paul Hockett, Pierre Manchon, Ray Bell, Riley Brady, Sam Levang, Spencer Clark, Stefaan Lippens, Tom Nicholas, Tom White, Travis A. O'Brien, and Zachary Moon.
- Add :py
Dataset.set_xindex
and :pyDataset.drop_indexes
and their DataArray counterpart for setting and dropping pandas or custom indexes given a set of arbitrary coordinates. (6971
) By Benoît Bovy and Justus Magin. - Enable taking the mean of dask-backed :py
cftime.datetime
arrays (6556
,6940
). By Deepak Cherian and Spencer Clark.
- Allow reading netcdf files where the 'units' attribute is a number. (
7085
) By Ghislain Picard. - Allow decoding of 0 sized datetimes. (
1329
,6882
) By Deepak Cherian. - Make sure DataArray.name is always a string when used as label for plotting. (
6826
,6832
) By Jimmy Westling. - :py
DataArray.nbytes
now uses thenbytes
property of the underlying array if available. (6797
) By Max Jones. - Rely on the array backend for string formatting. (
6823
). By Jimmy Westling. - Fix incompatibility with numpy 1.20. (
6818
,6821
) By Michael Niklas. - Fix side effects on index coordinate metadata after aligning objects. (
6852
,6857
) By Benoît Bovy. - Make FacetGrid.set_titles send kwargs correctly using handle.update(kwargs). (
6839
,6843
) By Oliver Lopez. - Fix bug where index variables would be changed inplace. (
6931
,6938
) By Michael Niklas. - Allow taking the mean over non-time dimensions of datasets containing dask-backed cftime arrays. (
5897
,6950
) By Spencer Clark. - Harmonize returned multi-indexed indexes when applying
concat
along new dimension. (6881
,6889
) By Fabian Hofmann. - Fix step plots with
hue
arg. (6944
) By András Gunyhó. - Avoid use of random numbers in test_weighted.test_weighted_operations_nonequal_coords. (
6504
,6961
) By Luke Conibear. - Fix multiple regression issues with :py
Dataset.set_index
and :pyDataset.reset_index
. (6992
) By Benoît Bovy. - Raise a
UserWarning
when renaming a coordinate or a dimension creates a non-indexed dimension coordinate, and suggest the user creating an index either withswap_dims
orset_index
. (6607
,6999
) By Benoît Bovy. - Use
keep_attrs=True
in grouping and resampling operations by default. (7012
) This means :pyDataset.attrs
and :pyDataArray.attrs
are now preserved by default. By Deepak Cherian. Dataset.encoding['source']
now exists when reading from a Path object. (5888
,6974
) By Thomas Coleman.- Better dtype consistency for
rolling.mean()
. (7062
,7063
) By Sam Levang. - Allow writing NetCDF files including only dimensionless variables using the distributed or multiprocessing scheduler. (
7013
,7040
) By Francesco Nattino. - Fix deepcopy of attrs and encoding of DataArrays and Variables. (
2835
,7089
) By Michael Niklas. - Fix bug where subplot_kwargs were not working when plotting with figsize, size or aspect. (
7078
,7080
) By Michael Niklas.
- Update merge docstrings. (
6935
,7033
) By Zach Moon. - Raise a more informative error when trying to open a non-existent zarr store. (
6484
,7060
) By Sam Levang. - Added examples to docstrings for :py
DataArray.expand_dims
, :pyDataArray.drop_duplicates
, :pyDataArray.reset_coords
, :pyDataArray.equals
, :pyDataArray.identical
, :pyDataArray.broadcast_equals
, :pyDataArray.bfill
, :pyDataArray.ffill
, :pyDataArray.fillna
, :pyDataArray.dropna
, :pyDataArray.drop_isel
, :pyDataArray.drop_sel
, :pyDataArray.head
, :pyDataArray.tail
. (5816
,7088
) By Patrick Naylor. - Add missing docstrings to various array properties. (
7090
) By Tom Nicholas.
- Added test for DataArray attrs deepcopy recursion/nested attrs. (
2835
,7086
) By Paul hockett.
This release brings a number of bug fixes and improvements, most notably a major internal refactor of the indexing functionality, the use of flox in groupby
operations, and experimental support for the new Python Array API standard. It also stops testing support for the abandoned PyNIO.
Much effort has been made to preserve backwards compatibility as part of the indexing refactor. We are aware of one unfixed issue.
Please also see the whats-new.2022.06.0rc0 for a full list of changes.
Many thanks to our 18 contributors: Bane Sullivan, Deepak Cherian, Dimitri Papadopoulos Orfanos, Emma Marshall, Hauke Schulz, Illviljan, Julia Signell, Justus Magin, Keewis, Mathias Hauser, Michael Delgado, Mick, Pierre Manchon, Ray Bell, Spencer Clark, Stefaan Lippens, Tom White, Travis A. O'Brien,
- Add :py
Dataset.dtypes
, :pycore.coordinates.DatasetCoordinates.dtypes
, :pycore.coordinates.DataArrayCoordinates.dtypes
properties: Mapping from variable names to dtypes. (6706
) By Michael Niklas. - Initial typing support for :py
groupby
, :pyrolling
, :pyrolling_exp
, :pycoarsen
, :pyweighted
, :pyresample
, (6702
) By Michael Niklas. - Experimental support for wrapping any array type that conforms to the python array api standard. (
6804
) By Tom White. - Allow string formatting of scalar DataArrays. (
5981
) By fmaussion.
- :py
save_mfdataset
now passes**kwargs
on to :pyDataset.to_netcdf
, allowing theencoding
andunlimited_dims
options with :pysave_mfdataset
. (6684
) By Travis A. O'Brien. - Fix backend support of pydap versions <3.3.0 (
6648
,6656
). By Hauke Schulz. - :py
Dataset.where
withdrop=True
now behaves correctly with mixed dimensions. (6227
,6690
) By Michael Niklas. - Accommodate newly raised
OutOfBoundsTimedelta
error in the development version of pandas when decoding times outside the range that can be represented with nanosecond-precision values (6716
,6717
). By Spencer Clark. - :py
open_dataset
with dask and~
in the path now resolves the home directory instead of raising an error. (6707
,6710
) By Michael Niklas. - :py
DataArrayRolling.__iter__
withcenter=True
now works correctly. (6739
,6744
) By Michael Niklas.
xarray.core.groupby
,xarray.core.rolling
,xarray.core.rolling_exp
,xarray.core.weighted
andxarray.core.resample
modules are no longer imported by default. (6702
)
This pre-release brings a number of bug fixes and improvements, most notably a major internal refactor of the indexing functionality and the use of flox in groupby
operations. It also stops testing support for the abandoned PyNIO.
Install it using
mamba create -n <name> python=3.10 xarray
python -m pip install --pre --upgrade --no-deps xarray
Many thanks to the 39 contributors:
Abel Soares Siqueira, Alex Santana, Anderson Banihirwe, Benoit Bovy, Blair Bonnett, Brewster Malevich, brynjarmorka, Charles Stern, Christian Jauvin, Deepak Cherian, Emma Marshall, Fabien Maussion, Greg Behm, Guelate Seyo, Illviljan, Joe Hamman, Joseph K Aicher, Justus Magin, Kevin Paul, Louis Stenger, Mathias Hauser, Mattia Almansi, Maximilian Roos, Michael Bauer, Michael Delgado, Mick, ngam, Oleh Khoma, Oriol Abril-Pla, Philippe Blain, PLSeuJ, Sam Levang, Spencer Clark, Stan West, Thomas Nicholas, Thomas Vogt, Tom White, Xianxiang Li
- reset_coords(drop=True) does not create indexes (
6607
)
- The zarr backend is now able to read NCZarr. By Mattia Almansi.
- Add a weighted
quantile
method to :py~core.weighted.DatasetWeighted
and :py~core.weighted.DataArrayWeighted
(6059
). By Christian Jauvin and David Huard. - Add a
create_index=True
parameter to :pyDataset.stack
and :pyDataArray.stack
so that the creation of multi-indexes is optional (5692
). By Benoît Bovy. - Multi-index levels are now accessible through their own, regular coordinates instead of virtual coordinates (
5692
). By Benoît Bovy. - Add a
display_values_threshold
option to control the total number of array elements which trigger summarization rather than full repr in (numpy) array detailed views of the html repr (6400
). By Benoît Bovy. - Allow passing chunks in
kwargs
form to :pyDataset.chunk
, :pyDataArray.chunk
, and :pyVariable.chunk
. (6471
) By Tom Nicholas. - Add :py
core.groupby.DatasetGroupBy.cumsum
and :pycore.groupby.DataArrayGroupBy.cumsum
. By Vladislav Skripniuk and Deepak Cherian. (3147
,6525
,3141
) - Expose inline_array kwarg from dask.array.from_array in :py
open_dataset
, :pyDataset.chunk
, :pyDataArray.chunk
, and :pyVariable.chunk
. (6471
) - Expose the
inline_array
kwarg from :pydask.array.from_array
in :pyopen_dataset
, :pyDataset.chunk
, :pyDataArray.chunk
, and :pyVariable.chunk
. (6471
) By Tom Nicholas. - :py
polyval
now supports :pyDataset
and :pyDataArray
args of any shape, is faster and requires less memory. (6548
) By Michael Niklas. - Improved overall typing.
- :py
Dataset.to_dict
and :pyDataArray.to_dict
may now optionally include encoding attributes. (6635
) By Joe Hamman. - Upload development versions to TestPyPI. By Justus Magin.
PyNIO support is now untested. The minimum versions of some dependencies were changed:
Package Old New cftime 1.2 1.4 dask 2.30 2021.4 distributed 2.30 2021.4 h5netcdf 0.8 0.11 matplotlib-base 3.3 3.4 numba 0.51 0.53 numpy 1.18 1.19 pandas 1.1 1.2 pint 0.16 0.17 rasterio 1.1 1.2 scipy 1.5 1.6 sparse 0.11 0.12 zarr 2.5 2.8 - The Dataset and DataArray
rename
` methods do not implicitly add or drop indexes. (:pull:`5692). By Benoît Bovy. - Many arguments like
keep_attrs
,axis
, andskipna
are now keyword only for all reduction operations like.mean
. By Deepak Cherian, Jimmy Westling. - Xarray's ufuncs have been removed, now that they can be replaced by numpy's ufuncs in all supported versions of numpy. By Maximilian Roos.
- :py
xr.polyval
now uses thecoord
argument directly instead of its index coordinate. (6548
) By Michael Niklas.
- :py
Dataset.to_zarr
now allows to write all attribute types supported by zarr-python. By Mattia Almansi. - Set
skipna=None
for allquantile
methods (e.g. :pyDataset.quantile
) and ensure it skips missing values for float dtypes (consistent with other methods). This should not change the behavior (6303
). By Mathias Hauser. - Many bugs fixed by the explicit indexes refactor, mainly related to multi-index (virtual) coordinates. See the corresponding pull-request on GitHub for more details. (
5692
). By Benoît Bovy. - Fixed "unhashable type" error trying to read NetCDF file with variable having its 'units' attribute not
str
(e.g.numpy.ndarray
) (6368
). By Oleh Khoma. - Omit warning about specified dask chunks separating chunks on disk when the underlying array is empty (e.g., because of an empty dimension) (
6401
). By Joseph K Aicher. - Fixed the poor html repr performance on large multi-indexes (
6400
). By Benoît Bovy. - Allow fancy indexing of duck dask arrays along multiple dimensions. (
6414
) By Justus Magin. - In the API for backends, support dimensions that express their preferred chunk sizes as a tuple of integers. (
6333
,6334
) By Stan West. - Fix bug in :py
where
when passing non-xarray objects withkeep_attrs=True
. (6444
,6461
) By Sam Levang. - Allow passing both
other
anddrop=True
arguments to :pyDataArray.where
and :pyDataset.where
(6466
,6467
). By Michael Delgado. - Ensure dtype encoding attributes are not added or modified on variables that contain datetime-like values prior to being passed to :py
xarray.conventions.decode_cf_variable
(6453
,6489
). By Spencer Clark. - Dark themes are now properly detected in Furo-themed Sphinx documents (
6500
,6501
). By Kevin Paul. - :py
Dataset.isel
, :pyDataArray.isel
with drop=True works as intended with scalar :pyDataArray
indexers. (6554
,6579
) By Michael Niklas. - Fixed silent overflow issue when decoding times encoded with 32-bit and below unsigned integer data types (
6589
,6598
). By Spencer Clark. - Fixed
.chunks
loading lazy data (6538
). By Deepak Cherian.
- Revise the documentation for developers on specifying a backend's preferred chunk sizes. In particular, correct the syntax and replace lists with tuples in the examples. (
6333
,6334
) By Stan West. - Mention that :py
DataArray.rename
can rename coordinates. (5458
,6665
) By Michael Niklas. - Added examples to :py
Dataset.thin
and :pyDataArray.thin
By Emma Marshall.
- GroupBy binary operations are now vectorized. Previously this involved looping over all groups. (
5804
,6160
) By Deepak Cherian. - Substantially improved GroupBy operations using flox. This is auto-enabled when
flox
is installed. Usexr.set_options(use_flox=False)
to use the old algorithm. (4473
,4498
,659
,2237
,271
). By Deepak Cherian, Anderson Banihirwe, Jimmy Westling.
- Many internal changes due to the explicit indexes refactor. See the corresponding pull-request on GitHub for more details. (
5692
). By Benoît Bovy.
This release brings a number of small improvements, as well as a move to calendar versioning (6176
).
Many thanks to the 16 contributors to the v2022.02.0 release!
Aaron Spring, Alan D. Snow, Anderson Banihirwe, crusaderky, Illviljan, Joe Hamman, Jonas Gliß, Lukas Pilz, Martin Bergemann, Mathias Hauser, Maximilian Roos, Romain Caneill, Stan West, Stijn Van Hoey, Tobias Kölling, and Tom Nicholas.
- Enabled multiplying tick offsets by floats. Allows
float
n
in :pyCFTimeIndex.shift
ifshift_freq
is betweenDay
andMicrosecond
. (6134
,6135
). By Aaron Spring. - Enable providing more keyword arguments to the pydap backend when reading OpenDAP datasets (
6274
). By Jonas Gliß <https://github.com/jgliss>. - Allow :py
DataArray.drop_duplicates
to drop duplicates along multiple dimensions at once, and add :pyDataset.drop_duplicates
. (6307
) By Tom Nicholas.
- Renamed the
interpolation
keyword of allquantile
methods (e.g. :pyDataArray.quantile
) tomethod
for consistency with numpy v1.22.0 (6108
). By Mathias Hauser.
- Variables which are chunked using dask in larger (but aligned) chunks than the target zarr chunk size can now be stored using to_zarr() (
6258
) By Tobias Kölling. - Multi-file datasets containing encoded :py
cftime.datetime
objects can be read in parallel again (6226
,6249
,6305
). By Martin Bergemann and Stan West.
- Delete files of datasets saved to disk while building the documentation and enable building on Windows via sphinx-build (
6237
). By Stan West.
This is a bugfix release to resolve (6216
, 6207
).
- Add packaging as a dependency to Xarray (
6216
,6207
). By Sebastian Weigand and Joe Hamman.
Many thanks to the 20 contributors to the v0.21.0 release!
Abel Aoun, Anderson Banihirwe, Ant Gib, Chris Roat, Cindy Chiao, Deepak Cherian, Dominik Stańczak, Fabian Hofmann, Illviljan, Jody Klymak, Joseph K Aicher, Mark Harfouche, Mathias Hauser, Matthew Roeschke, Maximilian Roos, Michael Delgado, Pascal Bourgault, Pierre, Ray Bell, Romain Caneill, Tim Heap, Tom Nicholas, Zeb Nicholls, joseph nowak, keewis.
- New top-level function :py
cross
. (3279
,5365
). By Jimmy Westling. keep_attrs
support for :pywhere
(4141
,4682
,4687
). By Justus Magin.- Enable the limit option for dask array in the following methods :py
DataArray.ffill
, :pyDataArray.bfill
, :pyDataset.ffill
and :pyDataset.bfill
(6112
) By Joseph Nowak.
- Rely on matplotlib's default datetime converters instead of pandas' (
6102
,6109
). By Jimmy Westling. - Improve repr readability when there are a large number of dimensions in datasets or dataarrays by wrapping the text once the maximum display width has been exceeded. (
5546
,5662
) By Jimmy Westling.
- Removed the lock kwarg from the zarr and pydap backends, completing the deprecation cycle started in
5256
. By Tom Nicholas. - Support for
python 3.7
has been dropped. (5892
) By Jimmy Westling.
- Preserve chunks when creating a :py
DataArray
from another :pyDataArray
(5984
). By Fabian Hofmann. - Properly support :py
DataArray.ffill
, :pyDataArray.bfill
, :pyDataset.ffill
and :pyDataset.bfill
along chunked dimensions (6112
). By Joseph Nowak. - Subclasses of
byte
andstr
(e.g.np.str_
andnp.bytes_
) will now serialise to disk rather than raising aValueError: unsupported dtype for netCDF4 variable: object
as they did previously (5264
). By Zeb Nicholls. - Fix applying function with non-xarray arguments using :py
xr.map_blocks
. By Cindy Chiao. - No longer raise an error for an all-nan-but-one argument to :py
DataArray.interpolate_na
when using method='nearest' (5994
,6144
). By Michael Delgado. - dt.season can now handle NaN and NaT. (
5876
). By Pierre Loicq. - Determination of zarr chunks handles empty lists for encoding chunks or variable chunks that occurs in certain circumstances (
5526
). By Chris Roat.
- Replace
distutils.version
withpackaging.version
(6092
). By Mathias Hauser. - Removed internal checks for
pd.Panel
(6145
). By Matthew Roeschke. - Add
pyupgrade
pre-commit hook (6152
). By Maximilian Roos.
This is a bugfix release to resolve (3391
, 5715
). It also includes performance improvements in unstacking to a sparse
array and a number of documentation improvements.
Many thanks to the 20 contributors:
Aaron Spring, Alexandre Poux, Deepak Cherian, Enrico Minack, Fabien Maussion, Giacomo Caria, Gijom, Guillaume Maze, Illviljan, Joe Hamman, Joseph Hardin, Kai Mühlbauer, Matt Henderson, Maximilian Roos, Michael Delgado, Robert Gieseke, Sebastian Weigand and Stephan Hoyer.
- Use complex nan when interpolating complex values out of bounds by default (instead of real nan) (
6019
). By Alexandre Poux.
- Significantly faster unstacking to a
sparse
array.5577
By Deepak Cherian.
- :py
xr.map_blocks
and :pyxr.corr
now work when dask is not installed (3391
,5715
,5731
). By Gijom. - Fix plot.line crash for data of shape
(1, N)
in _title_for_slice on format_item (5948
). By Sebastian Weigand. - Fix a regression in the removal of duplicate backend entrypoints (
5944
,5959
) By Kai Mühlbauer. - Fix an issue that datasets from being saved when time variables with units that
cftime
can parse but pandas can not were present (6049
). By Tim Heap.
- Better examples in docstrings for groupby and resampling reductions (
5871
). By Deepak Cherian, Maximilian Roos, Jimmy Westling . - Add list-like possibility for tolerance parameter in the reindex functions. By Antoine Gibek,
- Use
importlib
to replace functionality ofpkg_resources
in backend plugins tests. (5959
). By Kai Mühlbauer.
This is a bugfix release to fix 5930
.
- Fix a regression in the detection of the backend entrypoints (
5930
,5931
) By Justus Magin.
- Significant improvements to
api
. By Deepak Cherian.
This release brings improved support for pint arrays, methods for weighted standard deviation, variance, and sum of squares, the option to disable the use of the bottleneck library, significantly improved performance of unstack, as well as many bugfixes and internal changes.
Many thanks to the 40 contributors to this release!:
Aaron Spring, Akio Taniguchi, Alan D. Snow, arfy slowy, Benoit Bovy, Christian Jauvin, crusaderky, Deepak Cherian, Giacomo Caria, Illviljan, James Bourbeau, Joe Hamman, Joseph K Aicher, Julien Herzen, Kai Mühlbauer, keewis, lusewell, Martin K. Scherer, Mathias Hauser, Max Grover, Maxime Liquet, Maximilian Roos, Mike Taves, Nathan Lis, pmav99, Pushkar Kopparla, Ray Bell, Rio McMahon, Scott Staniewicz, Spencer Clark, Stefan Bender, Taher Chegini, Thomas Nicholas, Tomas Chor, Tom Augspurger, Victor Negîrneac, Zachary Blackwood, Zachary Moon, and Zeb Nicholls.
- Add
std
,var
,sum_of_squares
to :py~core.weighted.DatasetWeighted
and :py~core.weighted.DataArrayWeighted
. By Christian Jauvin. - Added a :py
get_options
method to xarray's root namespace (5698
,5716
) By Pushkar Kopparla. - Xarray now does a better job rendering variable names that are long LaTeX sequences when plotting (
5681
,5682
). By Tomas Chor. - Add an option (
"use_bottleneck"
) to disable the use ofbottleneck
using :pyset_options
(5560
) By Justus Magin. - Added
**kwargs
argument to :pyopen_rasterio
to access overviews (3269
). By Pushkar Kopparla. - Added
storage_options
argument to :pyto_zarr
(5601
,5615
). By Ray Bell, Zachary Blackwood and Nathan Lis. - Added calendar utilities :py
DataArray.convert_calendar
, :pyDataArray.interp_calendar
, :pydate_range
, :pydate_range_like
and :pyDataArray.dt.calendar
(5155
,5233
). By Pascal Bourgault. - Histogram plots are set with a title displaying the scalar coords if any, similarly to the other plots (
5791
,5792
). By Maxime Liquet. - Slice plots display the coords units in the same way as x/y/colorbar labels (
5847
). By Victor Negîrneac. - Added a new :py
Dataset.chunksizes
, :pyDataArray.chunksizes
, and :pyVariable.chunksizes
property, which will always return a mapping from dimension names to chunking pattern along that dimension, regardless of whether the object is a Dataset, DataArray, or Variable. (5846
,5900
) By Tom Nicholas.
The minimum versions of some dependencies were changed:
Package Old New cftime 1.1 1.2 dask 2.15 2.30 distributed 2.15 2.30 lxml 4.5 4.6 matplotlib-base 3.2 3.3 numba 0.49 0.51 numpy 1.17 1.18 pandas 1.0 1.1 pint 0.15 0.16 scipy 1.4 1.5 seaborn 0.10 0.11 sparse 0.8 0.11 toolz 0.10 0.11 zarr 2.4 2.5 - The
__repr__
of a :pyxarray.Dataset
'scoords
anddata_vars
ignorexarray.set_option(display_max_rows=...)
and show the full output when called directly as, e.g.,ds.data_vars
orprint(ds.data_vars)
(5545
,5580
). By Stefan Bender.
- Deprecate :py
open_rasterio
(4697
,5808
). By Alan Snow. - Set the default argument for roll_coords to False for :py
DataArray.roll
and :pyDataset.roll
. (5653
) By Tom Nicholas. - :py
xarray.open_mfdataset
will now error instead of warn when a value forconcat_dim
is passed alongsidecombine='by_coords'
. By Tom Nicholas.
- Fix ZeroDivisionError from saving dask array with empty dimension (:issue: 5741). By Joseph K Aicher.
- Fixed performance bug where
cftime
import attempted within various core operations ifcftime
not installed (5640
). By Luke Sewell - Fixed bug when combining named DataArrays using :py
combine_by_coords
. (5834
). By Tom Nicholas. - When a custom engine was used in :py
~xarray.open_dataset
the engine wasn't initialized properly, causing missing argument errors or inconsistent method signatures. (5684
) By Jimmy Westling. - Numbers are properly formatted in a plot's title (
5788
,5789
). By Maxime Liquet. - Faceted plots will no longer raise a pint.UnitStrippedWarning when a pint.Quantity array is plotted, and will correctly display the units of the data in the colorbar (if there is one) (
5886
). By Tom Nicholas. - With backends, check for path-like objects rather than
pathlib.Path
type, useos.fspath
(5879
). By Mike Taves. open_mfdataset()
now accepts a singlepathlib.Path
object (:issue: 5881). By Panos Mavrogiorgos.- Improved performance of :py
Dataset.unstack
(5906
). By Tom Augspurger.
- Users are instructed to try
use_cftime=True
if aTypeError
occurs when combining datasets and one of the types involved is a subclass ofcftime.datetime
(5776
). By Zeb Nicholls. - A clearer error is now raised if a user attempts to assign a Dataset to a single key of another Dataset. (
5839
) By Tom Nicholas.
- Explicit indexes refactor: avoid
len(index)
inmap_blocks
(5670
). By Deepak Cherian. - Explicit indexes refactor: decouple
xarray.Index
from xarray.Variable (:pull:`5636). By Benoit Bovy. - Fix
Mapping
argument typing to allow mypy to pass onstr
keys (5690
). By Maximilian Roos. - Annotate many of our tests, and fix some of the resulting typing errors. This will also mean our typing annotations are tested as part of CI. (
5728
). By Maximilian Roos. - Improve the performance of reprs for large datasets or dataarrays. (
5661
) By Jimmy Westling. - Use isort's float_to_top config. (
5695
). By Maximilian Roos. - Remove use of the deprecated
kind
argument in :pypandas.Index.get_slice_bound
inside :pyxarray.CFTimeIndex
tests (5723
). By Spencer Clark. - Refactor xarray.core.duck_array_ops to no longer special-case dispatching to dask versions of functions when acting on dask arrays, instead relying numpy and dask's adherence to NEP-18 to dispatch automatically. (
5571
) By Tom Nicholas. - Add an ASV benchmark CI and improve performance of the benchmarks (
5796
) By Jimmy Westling. - Use
importlib
to replace functionality ofpkg_resources
such as version setting and loading of resources. (5845
). By Martin K. Scherer.
This release brings improvements to plotting of categorical data, the ability to specify how attributes are combined in xarray operations, a new high-level :pyunify_chunks
function, as well as various deprecations, bug fixes, and minor improvements.
Many thanks to the 29 contributors to this release!:
Andrew Williams, Augustus, Aureliana Barghini, Benoit Bovy, crusaderky, Deepak Cherian, ellesmith88, Elliott Sales de Andrade, Giacomo Caria, github-actions[bot], Illviljan, Joeperdefloep, joooeey, Julia Kent, Julius Busecke, keewis, Mathias Hauser, Matthias Göbel, Mattia Almansi, Maximilian Roos, Peter Andreas Entschev, Ray Bell, Sander, Santiago Soler, Sebastian, Spencer Clark, Stephan Hoyer, Thomas Hirtz, Thomas Nicholas.
- Allow passing argument
missing_dims
to :pyVariable.transpose
and :pyDataset.transpose
(5550
,5586
) By Giacomo Caria. - Allow passing a dictionary as coords to a :py
DataArray
(5527
, reverts1539
, which had deprecated this due to python's inconsistent ordering in earlier versions). By Sander van Rijn. - Added :py
Dataset.coarsen.construct
, :pyDataArray.coarsen.construct
(5454
,5475
). By Deepak Cherian. - Xarray now uses consolidated metadata by default when writing and reading Zarr stores (
5251
). By Stephan Hoyer. - New top-level function :py
unify_chunks
. By Mattia Almansi. - Allow assigning values to a subset of a dataset using positional or label-based indexing (
3015
,5362
). By Matthias Göbel. - Attempting to reduce a weighted object over missing dimensions now raises an error (
5362
). By Mattia Almansi. - Add
.sum
to :py~xarray.DataArray.rolling_exp
and :py~xarray.Dataset.rolling_exp
for exponentially weighted rolling sums. These require numbagg 0.2.1; (5178
). By Maximilian Roos. - :py
xarray.cov
and :pyxarray.corr
now lazily check for missing values if inputs are dask arrays (4804
,5284
). By Andrew Williams. - Attempting to
concat
list of elements that are not allDataset
or allDataArray
now raises an error (5051
,5425
). By Thomas Hirtz. - allow passing a function to
combine_attrs
(4896
). By Justus Magin. - Allow plotting categorical data (
5464
). By Jimmy Westling. - Allow removal of the coordinate attribute
coordinates
on variables by setting.attrs['coordinates']= None
(5510
). By Elle Smith. - Added :py
DataArray.to_numpy
, :pyDataArray.as_numpy
, and :pyDataset.as_numpy
. (5568
). By Tom Nicholas. - Units in plot labels are now automatically inferred from wrapped :py
pint.Quantity
arrays. (5561
). By Tom Nicholas.
- The default
mode
for :pyDataset.to_zarr
whenregion
is set has changed to the newmode="r+"
, which only allows for overriding pre-existing array values. This is a safer default than the priormode="a"
, and allows for higher performance writes (5252
). By Stephan Hoyer. - The main parameter to :py
combine_by_coords
is renamed to data_objects instead of datasets so anyone calling this method using a named parameter will need to update the name accordingly (3248
,4696
). By Augustus Ijams.
- Removed the deprecated
dim
kwarg to :pyDataArray.integrate
(5630
) - Removed the deprecated
keep_attrs
kwarg to :pyDataArray.rolling
(5630
) - Removed the deprecated
keep_attrs
kwarg to :pyDataArray.coarsen
(5630
) - Completed deprecation of passing an
xarray.DataArray
to :pyVariable
- will now raise aTypeError
(5630
)
- Fix a minor incompatibility between partial datetime string indexing with a :py
CFTimeIndex
and upcoming pandas version 1.3.0 (5356
,5359
). By Spencer Clark. - Fix 1-level multi-index incorrectly converted to single index (
5384
,5385
). By Benoit Bovy. - Don't cast a duck array in a coordinate to :py
numpy.ndarray
in :pyDataArray.differentiate
(5408
) By Justus Magin. - Fix the
repr
of :pyVariable
objects withdisplay_expand_data=True
(5406
) By Justus Magin. - Plotting a pcolormesh with
xscale="log"
and/oryscale="log"
works as expected after improving the way the interval breaks are generated (5333
). By Santiago Soler - :py
combine_by_coords
can now handle combining a list of unnamedDataArray
as input (3248
,4696
). By Augustus Ijams.
- Run CI on the first & last python versions supported only; currently 3.7 & 3.9. (
5433
) By Maximilian Roos. - Publish test results & timings on each PR. (
5537
) By Maximilian Roos. - Explicit indexes refactor: add a
xarray.Index.query()
method in which one may eventually provide a custom implementation of label-based data selection (not ready yet for public use). Also refactor the internal, pandas-specific implementation intoPandasIndex.query()
andPandasMultiIndex.query()
(5322
). By Benoit Bovy.
This release reverts a regression in xarray's unstacking of dask-backed arrays.
This release is intended as a small patch release to be compatible with the new 2021.5.0 dask.distributed
release. It also includes a new drop_duplicates
method, some documentation improvements, the beginnings of our internal Index refactoring, and some bug fixes.
Thank you to all 16 contributors!
Anderson Banihirwe, Andrew, Benoit Bovy, Brewster Malevich, Giacomo Caria, Illviljan, James Bourbeau, Keewis, Maximilian Roos, Ravin Kumar, Stephan Hoyer, Thomas Nicholas, Tom Nicholas, Zachary Moon.
- Implement :py
DataArray.drop_duplicates
to remove duplicate dimension values (5239
). By Andrew Huang. - Allow passing
combine_attrs
strategy names to thekeep_attrs
parameter of :pyapply_ufunc
(5041
) By Justus Magin. - :py
Dataset.interp
now allows interpolation with non-numerical datatypes, such as booleans, instead of dropping them. (4761
5008
). By Jimmy Westling. - Raise more informative error when decoding time variables with invalid reference dates. (
5199
,5288
). By Giacomo Caria.
- Opening netCDF files from a path that doesn't end in
.nc
without supplying an explicitengine
works again (5295
), fixing a bug introduced in 0.18.0. By Stephan Hoyer
- Clean up and enhance docstrings for the :py
DataArray.plot
andDataset.plot.*
families of methods (5285
). By Zach Moon. - Explanation of deprecation cycles and how to implement them added to contributors guide. (
5289
) By Tom Nicholas.
- Explicit indexes refactor: add an
xarray.Index
base class andDataset.xindexes
/DataArray.xindexes
properties. Also renamePandasIndexAdapter
toPandasIndex
, which now inherits fromxarray.Index
(5102
). By Benoit Bovy. - Replace
SortedKeysDict
with python'sdict
, given dicts are now ordered. By Maximilian Roos. - Updated the release guide for developers. Now accounts for actions that are automated via github actions. (
5274
). By Tom Nicholas.
This release brings a few important performance improvements, a wide range of usability upgrades, lots of bug fixes, and some new features. These include a plugin API to add backend engines, a new theme for the documentation, curve fitting methods, and several new plotting functions.
Many thanks to the 38 contributors to this release: Aaron Spring, Alessandro Amici, Alex Marandon, Alistair Miles, Ana Paula Krelling, Anderson Banihirwe, Aureliana Barghini, Baudouin Raoult, Benoit Bovy, Blair Bonnett, David Trémouilles, Deepak Cherian, Gabriel Medeiros Abrahão, Giacomo Caria, Hauke Schulz, Illviljan, Mathias Hauser, Matthias Bussonnier, Mattia Almansi, Maximilian Roos, Ray Bell, Richard Kleijn, Ryan Abernathey, Sam Levang, Spencer Clark, Spencer Jones, Tammas Loughran, Tobias Kölling, Todd, Tom Nicholas, Tom White, Victor Negîrneac, Xianxiang Li, Zeb Nicholls, crusaderky, dschwoerer, johnomotani, keewis
- apply
combine_attrs
on data variables and coordinate variables when concatenating and merging datasets and dataarrays (4902
). By Justus Magin. - Add :py
Dataset.to_pandas
(5247
) By Giacomo Caria. - Add :py
DataArray.plot.surface
which wraps matplotlib's plot_surface to make surface plots (2235
5084
5101
). By John Omotani. - Allow passing multiple arrays to :py
Dataset.__setitem__
(5216
). By Giacomo Caria. - Add 'cumulative' option to :py
Dataset.integrate
and :pyDataArray.integrate
so that result is a cumulative integral, like :pyscipy.integrate.cumulative_trapezoidal
(5153
). By John Omotani. - Add
safe_chunks
option to :pyDataset.to_zarr
which allows overriding checks made to ensure Dask and Zarr chunk compatibility (5056
). By Ryan Abernathey - Add :py
Dataset.query
and :pyDataArray.query
which enable indexing of datasets and data arrays by evaluating query expressions against the values of the data variables (4984
). By Alistair Miles. - Allow passing
combine_attrs
to :pyDataset.merge
(4895
). By Justus Magin. - Support for dask.graph_manipulation (requires dask >=2021.3) By Guido Imperiale
- Add :py
Dataset.plot.streamplot
for streamplot plots with :pyDataset
variables (5003
). By John Omotani. - Many of the arguments for the :py
DataArray.str
methods now support providing an array-like input. In this case, the array provided to the arguments is broadcast against the original array and applied elementwise. - :py
DataArray.str
now supports+
,*
, and%
operators. These behave the same as they do for :pystr
, except that they follow array broadcasting rules. - A large number of new :py
DataArray.str
methods were implemented, :pyDataArray.str.casefold
, :pyDataArray.str.cat
, :pyDataArray.str.extract
, :pyDataArray.str.extractall
, :pyDataArray.str.findall
, :pyDataArray.str.format
, :pyDataArray.str.get_dummies
, :pyDataArray.str.islower
, :pyDataArray.str.join
, :pyDataArray.str.normalize
, :pyDataArray.str.partition
, :pyDataArray.str.rpartition
, :pyDataArray.str.rsplit
, and :pyDataArray.str.split
. A number of these methods allow for splitting or joining the strings in an array. (4622
) By Todd Jennings - Thanks to the new pluggable backend infrastructure external packages may now use the
xarray.backends
entry point to register additional engines to be used in :pyopen_dataset
, see the documentation inadd_a_backend
(4309
,4803
,4989
,4810
and many others). The backend refactor has been sponsored with the "Essential Open Source Software for Science" grant from the Chan Zuckerberg Initiative and developed by B-Open. By Aureliana Barghini and Alessandro Amici. - :py
~core.accessor_dt.DatetimeAccessor.date
added (4983
,4994
). By Hauke Schulz. - Implement
__getitem__
for both :py~core.groupby.DatasetGroupBy
and :py~core.groupby.DataArrayGroupBy
, inspired by pandas' :py~pandas.core.groupby.GroupBy.get_group
. By Deepak Cherian. - Switch the tutorial functions to use pooch (which is now a optional dependency) and add :py
tutorial.open_rasterio
as a way to open example rasterio files (3986
,4102
,5074
). By Justus Magin. - Add typing information to unary and binary arithmetic operators operating on :py
Dataset
, :pyDataArray
, :pyVariable
, :py~core.groupby.DatasetGroupBy
or :py~core.groupby.DataArrayGroupBy
(4904
). By Richard Kleijn. - Add a
combine_attrs
parameter to :pyopen_mfdataset
(4971
). By Justus Magin. - Enable passing arrays with a subset of dimensions to :py
DataArray.clip
& :pyDataset.clip
; these methods now use :pyxarray.apply_ufunc
; (5184
). By Maximilian Roos. - Disable the cfgrib backend if the eccodes library is not installed (
5083
). By Baudouin Raoult. - Added :py
DataArray.curvefit
and :pyDataset.curvefit
for general curve fitting applications. (4300
,4849
) By Sam Levang. - Add options to control expand/collapse of sections in display of Dataset and DataArray. The function :py
set_options
now takes keyword argumentsdisplay_expand_attrs
,display_expand_coords
,display_expand_data
,display_expand_data_vars
, all of which can be one ofTrue
to always expand,False
to always collapse, ordefault
to expand unless over a pre-defined limit (5126
). By Tom White. - Significant speedups in :py
Dataset.interp
and :pyDataArray.interp
. (4739
,4740
). By Deepak Cherian. - Prevent passing concat_dim to :py
xarray.open_mfdataset
when combine='by_coords' is specified, which should never have been possible (as :pyxarray.combine_by_coords
has no concat_dim argument to pass to). Also removes unneeded internal reordering of datasets in :pyxarray.open_mfdataset
when combine='by_coords' is specified. Fixes (5230
). By Tom Nicholas. - Implement
__setitem__
forxarray.core.indexing.DaskIndexingAdapter
if dask version supports item assignment. (5171
,5174
) By Tammas Loughran.
The minimum versions of some dependencies were changed:
Package Old New boto3 1.12 1.13 cftime 1.0 1.1 dask 2.11 2.15 distributed 2.11 2.15 matplotlib 3.1 3.2 numba 0.48 0.49 - :py
open_dataset
and :pyopen_dataarray
now accept only the first argument as positional, all others need to be passed are keyword arguments. This is part of the refactor to support external backends (4309
,4989
). By Alessandro Amici. - Functions that are identities for 0d data return the unchanged data if axis is empty. This ensures that Datasets where some variables do not have the averaged dimensions are not accidentally changed (
4885
,5207
). By David Schwörer. - :py
DataArray.coarsen
and :pyDataset.coarsen
no longer support passingkeep_attrs
via its constructor. Passkeep_attrs
via the applied function, i.e. useds.coarsen(...).mean(keep_attrs=False)
instead ofds.coarsen(..., keep_attrs=False).mean()
. Further, coarsen now keeps attributes per default (5227
). By Mathias Hauser. - switch the default of the :py
merge
combine_attrs
parameter to"override"
. This will keep the current behavior for merging theattrs
of variables but stop dropping theattrs
of the main objects (4902
). By Justus Magin.
- Warn when passing concat_dim to :py
xarray.open_mfdataset
when combine='by_coords' is specified, which should never have been possible (as :pyxarray.combine_by_coords
has no concat_dim argument to pass to). Also removes unneeded internal reordering of datasets in :pyxarray.open_mfdataset
when combine='by_coords' is specified. Fixes (5230
), via (5231
,5255
). By Tom Nicholas. - The lock keyword argument to :py
open_dataset
and :pyopen_dataarray
is now a backend specific option. It will give a warning if passed to a backend that doesn't support it instead of being silently ignored. From the next version it will raise an error. This is part of the refactor to support external backends (5073
). By Tom Nicholas and Alessandro Amici.
- Properly support :py
DataArray.ffill
, :pyDataArray.bfill
, :pyDataset.ffill
, :pyDataset.bfill
along chunked dimensions. (2699
). By Deepak Cherian. - Fix 2d plot failure for certain combinations of dimensions when x is 1d and y is 2d (
5097
,5099
). By John Omotani. - Ensure standard calendar times encoded with large values (i.e. greater than approximately 292 years), can be decoded correctly without silently overflowing (
5050
). This was a regression in xarray 0.17.0. By Zeb Nicholls. - Added support for numpy.bool_ attributes in roundtrips using h5netcdf engine with invalid_netcdf=True [which casts bool`s to `numpy.bool_] (
4981
,4986
). By Victor Negîrneac. - Don't allow passing
axis
to :pyDataset.reduce
methods (3510
,4940
). By Justus Magin. - Decode values as signed if attribute _Unsigned = "false" (
4954
) By Tobias Kölling. - Keep coords attributes when interpolating when the indexer is not a Variable. (
4239
,4839
5031
) By Jimmy Westling. - Ensure standard calendar dates encoded with a calendar attribute with some or all uppercase letters can be decoded or encoded to or from
np.datetime64[ns]
dates with or withoutcftime
installed (5093
,5180
). By Spencer Clark. - Warn on passing
keep_attrs
toresample
androlling_exp
as they are ignored, passkeep_attrs
to the applied function instead (5265
). By Mathias Hauser.
- New section on
add_a_backend
in the "Internals" chapter aimed to backend developers (4803
,4810
). By Aureliana Barghini. - Add :py
Dataset.polyfit
and :pyDataArray.polyfit
under "See also" in the docstrings of :pyDataset.polyfit
and :pyDataArray.polyfit
(5016
,5020
). By Aaron Spring. - New sphinx theme & rearrangement of the docs (
4835
). By Anderson Banihirwe.
- Enable displaying mypy error codes and ignore only specific error codes using
# type: ignore[error-code]
(5096
). By Mathias Hauser. - Replace uses of
raises_regex
with the more standardpytest.raises(Exception, match="foo")
; (5188
), (5191
). By Maximilian Roos.
This release brings a few important performance improvements, a wide range of usability upgrades, lots of bug fixes, and some new features. These include better cftime
support, a new quiver plot, better unstack
performance, more efficient memory use in rolling operations, and some python packaging improvements. We also have a few documentation improvements (and more planned!).
Many thanks to the 36 contributors to this release: Alessandro Amici, Anderson Banihirwe, Aureliana Barghini, Ayrton Bourn, Benjamin Bean, Blair Bonnett, Chun Ho Chow, DWesl, Daniel Mesejo-León, Deepak Cherian, Eric Keenan, Illviljan, Jens Hedegaard Nielsen, Jody Klymak, Julien Seguinot, Julius Busecke, Kai Mühlbauer, Leif Denby, Martin Durant, Mathias Hauser, Maximilian Roos, Michael Mann, Ray Bell, RichardScottOZ, Spencer Clark, Tim Gates, Tom Nicholas, Yunus Sevinchan, alexamici, aurghs, crusaderky, dcherian, ghislainp, keewis, rhkleijn
xarray no longer supports python 3.6
The minimum version policy was changed to also apply to projects with irregular releases. As a result, the minimum versions of some dependencies have changed:
Package Old New Python 3.6 3.7 setuptools 38.4 40.4 numpy 1.15 1.17 pandas 0.25 1.0 dask 2.9 2.11 distributed 2.9 2.11 bottleneck 1.2 1.3 h5netcdf 0.7 0.8 iris 2.2 2.4 netcdf4 1.4 1.5 pseudonetcdf 3.0 3.1 rasterio 1.0 1.1 scipy 1.3 1.4 seaborn 0.9 0.10 zarr 2.3 2.4 (
4688
,4720
,4907
,4942
)- As a result of
4684
the default units encoding for datetime-like values (np.datetime64[ns]
orcftime.datetime
) will now always be set such thatint64
values can be used. In the past, no units finer than "seconds" were chosen, which would sometimes mean thatfloat64
values were required, which would lead to inaccurate I/O round-trips. - Variables referred to in attributes like
bounds
andgrid_mapping
can be set as coordinate variables. These attributes are moved to :pyDataArray.encoding
from :pyDataArray.attrs
. This behaviour is controlled by thedecode_coords
kwarg to :pyopen_dataset
and :pyopen_mfdataset
. The full list of decoded attributes is inweather-climate
(2844
,3689
) - As a result of
4911
the output from calling :pyDataArray.sum
or :pyDataArray.prod
on an integer array withskipna=True
and a non-None value formin_count
will now be a float array rather than an integer array.
dim
argument to :pyDataArray.integrate
is being deprecated in favour of acoord
argument, for consistency with :pyDataset.integrate
. For now usingdim
issues aFutureWarning
. It will be removed in version 0.19.0 (3993
). By Tom Nicholas.- Deprecated
autoclose
kwargs from :pyopen_dataset
are removed (4725
). By Aureliana Barghini. - the return value of :py
Dataset.update
is being deprecated to make it work more like :pydict.update
. It will be removed in version 0.19.0 (4932
). By Justus Magin.
- :py
~xarray.cftime_range
and :pyDataArray.resample
now support millisecond ("L"
or"ms"
) and microsecond ("U"
or"us"
) frequencies forcftime.datetime
coordinates (4097
,4758
). By Spencer Clark. - Significantly higher
unstack
performance on numpy-backed arrays which contain missing values; 8x faster than previous versions in our benchmark, and now 2x faster than pandas (4746
). By Maximilian Roos. - Add :py
Dataset.plot.quiver
for quiver plots with :pyDataset
variables. By Deepak Cherian. - Add
"drop_conflicts"
to the strategies supported by thecombine_attrs
kwarg (4749
,4827
). By Justus Magin. - Allow installing from git archives (
4897
). By Justus Magin. - :py
~core.rolling.DataArrayCoarsen
and :py~core.rolling.DatasetCoarsen
now implement areduce
method, enabling coarsening operations with custom reduction functions (3741
,4939
). By Spencer Clark. - Most rolling operations use significantly less memory. (
4325
). By Deepak Cherian. - Add :py
Dataset.drop_isel
and :pyDataArray.drop_isel
(4658
,4819
). By Daniel Mesejo. - Xarray now leverages updates as of cftime version 1.4.1, which enable exact I/O roundtripping of
cftime.datetime
objects (4758
). By Spencer Clark. - :py
open_dataset
and :pyopen_mfdataset
now acceptfsspec
URLs (including globs for the latter) forengine="zarr"
, and so allow reading from many remote and other file systems (4461
) By Martin Durant - :py
DataArray.swap_dims
& :pyDataset.swap_dims
now accept dims in the form of kwargs as well as a dict, like most similar methods. By Maximilian Roos.
- Use specific type checks in
xarray.core.variable.as_compatible_data
instead of blanket access tovalues
attribute (2097
) By Yunus Sevinchan. - :py
DataArray.resample
and :pyDataset.resample
do not trigger computations anymore if :pyDataset.weighted
or :pyDataArray.weighted
are applied (4625
,4668
). By Julius Busecke. - :py
merge
withcombine_attrs='override'
makes a copy of the attrs (4627
). - By default, when possible, xarray will now always use values of type
int64
when encoding and decodingnumpy.datetime64[ns]
datetimes. This ensures that maximum precision and accuracy are maintained in the round-tripping process (4045
,4684
). It also enables encoding and decoding standard calendar dates with time units of nanoseconds (4400
). By Spencer Clark and Mark Harfouche. - :py
DataArray.astype
, :pyDataset.astype
and :pyVariable.astype
support theorder
andsubok
parameters again. This fixes a regression introduced in version 0.16.1 (4644
,4683
). By Richard Kleijn . - Remove dictionary unpacking when using
.loc
to avoid collision with.sel
parameters (4695
). By Anderson Banihirwe. - Fix the legend created by :py
Dataset.plot.scatter
(4641
,4723
). By Justus Magin. - Fix a crash in orthogonal indexing on geographic coordinates with
engine='cfgrib'
(4733
4737
). By Alessandro Amici. - Coordinates with dtype
str
orbytes
now retain their dtype on many operations, e.g.reindex
,align
,concat
,assign
, previously they were cast to an object dtype (2658
and4543
). By Mathias Hauser. - Limit number of data rows when printing large datasets. (
4736
,4750
). By Jimmy Westling. - Add
missing_dims
parameter to transpose (4647
,4767
). By Daniel Mesejo. - Resolve intervals before appending other metadata to labels when plotting (
4322
,4794
). By Justus Magin. - Fix regression when decoding a variable with a
scale_factor
andadd_offset
given as a list of length one (4631
). By Mathias Hauser. - Expand user directory paths (e.g.
~/
) in :pyopen_mfdataset
and :pyDataset.to_zarr
(4783
,4795
). By Julien Seguinot. - Raise DeprecationWarning when trying to typecast a tuple containing a :py
DataArray
. User now prompted to first call .data on it (4483
). By Chun Ho Chow. - Ensure that :py
Dataset.interp
raisesValueError
when interpolating outside coordinate range andbounds_error=True
(4854
,4855
). By Leif Denby. - Fix time encoding bug associated with using cftime versions greater than 1.4.0 with xarray (
4870
,4871
). By Spencer Clark. - Stop :py
DataArray.sum
and :pyDataArray.prod
computing lazy arrays when called with amin_count
parameter (4898
,4911
). By Blair Bonnett. - Fix bug preventing the
min_count
parameter to :pyDataArray.sum
and :pyDataArray.prod
working correctly when calculating over all axes of a float64 array (4898
,4911
). By Blair Bonnett. - Fix decoding of vlen strings using h5py versions greater than 3.0.0 with h5netcdf backend (
4570
,4893
). By Kai Mühlbauer. - Allow converting :py
Dataset
or :pyDataArray
objects with aMultiIndex
and at least one other dimension to apandas
object (3008
,4442
). By ghislainp.
- Add information about requirements for accessor classes (
2788
,4657
). By Justus Magin. - Start a list of external I/O integrating with
xarray
(683
,4566
). By Justus Magin. - Add concat examples and improve combining documentation (
4620
,4645
). By Ray Bell and Justus Magin. - explicitly mention that :py
Dataset.update
updates inplace (2951
,4932
). By Justus Magin. - Added docs on vectorized indexing (
4711
). By Eric Keenan.
Speed up of the continuous integration tests on azure.
- Switched to mamba and use matplotlib-base for a faster installation of all dependencies (
4672
). - Use
pytest.mark.skip
instead ofpytest.mark.xfail
for some tests that can currently not succeed (4685
). - Run the tests in parallel using pytest-xdist (
4694
).
By Justus Magin and Mathias Hauser.
- Switched to mamba and use matplotlib-base for a faster installation of all dependencies (
- Use
pyproject.toml
instead of thesetup_requires
option forsetuptools
(4897
). By Justus Magin. - Replace all usages of
assert x.identical(y)
withassert_identical(x, y)
for clearer error messages (4752
). By Maximilian Roos. - Speed up attribute style access (e.g.
ds.somevar
instead ofds["somevar"]
) and tab completion in IPython (4741
,4742
). By Richard Kleijn. - Added the
set_close
method toDataset
andDataArray
for backends to specify how to voluntary release all resources. (#4809
) By Alessandro Amici. - Update type hints to work with numpy v1.20 (
4878
). By Mathias Hauser. - Ensure warnings cannot be turned into exceptions in :py
testing.assert_equal
and the otherassert_*
functions (4864
). By Mathias Hauser. - Performance improvement when constructing DataArrays. Significantly speeds up repr for Datasets with large number of variables. By Deepak Cherian.
This release brings the ability to write to limited regions of zarr
files, open zarr files with :pyopen_dataset
and :pyopen_mfdataset
, increased support for propagating attrs
using the keep_attrs
flag, as well as numerous bugfixes and documentation improvements.
Many thanks to the 31 contributors who contributed to this release: Aaron Spring, Akio Taniguchi, Aleksandar Jelenak, alexamici, Alexandre Poux, Anderson Banihirwe, Andrew Pauling, Ashwin Vishnu, aurghs, Brian Ward, Caleb, crusaderky, Dan Nowacki, darikg, David Brochart, David Huard, Deepak Cherian, Dion Häfner, Gerardo Rivera, Gerrit Holl, Illviljan, inakleinbottle, Jacob Tomlinson, James A. Bednar, jenssss, Joe Hamman, johnomotani, Joris Van den Bossche, Julia Kent, Julius Busecke, Kai Mühlbauer, keewis, Keisuke Fujii, Kyle Cranmer, Luke Volpatti, Mathias Hauser, Maximilian Roos, Michaël Defferrard, Michal Baumgartner, Nick R. Papior, Pascal Bourgault, Peter Hausamann, PGijsbers, Ray Bell, Romain Martinez, rpgoldman, Russell Manser, Sahid Velji, Samnan Rahee, Sander, Spencer Clark, Stephan Hoyer, Thomas Zilio, Tobias Kölling, Tom Augspurger, Wei Ji, Yash Saboo, Zeb Nicholls,
- :py
~core.accessor_dt.DatetimeAccessor.weekofyear
and :py~core.accessor_dt.DatetimeAccessor.week
have been deprecated. UseDataArray.dt.isocalendar().week
instead (4534
). By Mathias Hauser. Maximilian Roos, and Spencer Clark. - :py
DataArray.rolling
and :pyDataset.rolling
no longer support passingkeep_attrs
via its constructor. Passkeep_attrs
via the applied function, i.e. useds.rolling(...).mean(keep_attrs=False)
instead ofds.rolling(..., keep_attrs=False).mean()
Rolling operations now keep their attributes per default (4510
). By Mathias Hauser.
- :py
open_dataset
and :pyopen_mfdataset
now works withengine="zarr"
(3668
,4003
,4187
). By Miguel Jimenez and Wei Ji Leong. - Unary & binary operations follow the
keep_attrs
flag (3490
,4065
,3433
,3595
,4195
). By Deepak Cherian. - Added :py
~core.accessor_dt.DatetimeAccessor.isocalendar()
that returns a Dataset with year, week, and weekday calculated according to the ISO 8601 calendar. Requires pandas version 1.1.0 or greater (4534
). By Mathias Hauser, Maximilian Roos, and Spencer Clark. - :py
Dataset.to_zarr
now supports aregion
keyword for writing to limited regions of existing Zarr stores (4035
). Seeio.zarr.appending
for full details. By Stephan Hoyer. - Added typehints in :py
align
to reflect that the same type received inobjects
arg will be returned (4522
). By Michal Baumgartner. - :py
Dataset.weighted
and :pyDataArray.weighted
are now executing value checks lazily if weights are provided as dask arrays (4541
,4559
). By Julius Busecke. - Added the
keep_attrs
keyword torolling_exp.mean()
; it now keeps attributes per default. By Mathias Hauser (4592
). - Added
freq
as property to :pyCFTimeIndex
and into theCFTimeIndex.repr
. (2416
,4597
) By Aaron Spring.
- Fix bug where reference times without padded years (e.g.
since 1-1-1
) would lose their units when being passed byencode_cf_datetime
(4422
,4506
). Such units are ambiguous about which digit represents the years (is it YMD or DMY?). Now, if such formatting is encountered, it is assumed that the first digit is the years, they are padded appropriately (to e.g.since 0001-1-1
) and a warning that this assumption is being made is issued. Previously, withoutcftime
, such times would be silently parsed incorrectly (at least based on the CF conventions) e.g. "since 1-1-1" would be parsed (viapandas
anddateutil
) tosince 2001-1-1
. By Zeb Nicholls. - Fix :py
DataArray.plot.step
. By Deepak Cherian. - Fix bug where reading a scalar value from a NetCDF file opened with the
h5netcdf
backend would raise aValueError
whendecode_cf=True
(4471
,4485
). By Gerrit Holl. - Fix bug where datetime64 times are silently changed to incorrect values if they are outside the valid date range for ns precision when provided in some other units (
4427
,4454
). By Andrew Pauling - Fix silently overwriting the
engine
key when passing :pyopen_dataset
a file object to an incompatible netCDF (4457
). Now incompatible combinations of files and engines raise an exception instead. By Alessandro Amici. - The
min_count
argument to :pyDataArray.sum()
and :pyDataArray.prod()
is now ignored when not applicable, i.e. whenskipna=False
or whenskipna=None
and the dtype does not have a missing value (4352
). By Mathias Hauser. - :py
combine_by_coords
now raises an informative error when passing coordinates with differing calendars (4495
). By Mathias Hauser. - :py
DataArray.rolling
and :pyDataset.rolling
now also keep the attributes and names of of (wrapped)DataArray
objects, previously only the global attributes were retained (4497
,4510
). By Mathias Hauser. - Improve performance where reading small slices from huge dimensions was slower than necessary (
4560
). By Dion Häfner. - Fix bug where
dask_gufunc_kwargs
was silently changed in :pyapply_ufunc
(4576
). By Kai Mühlbauer.
- document the API not supported with duck arrays (
4530
). By Justus Magin. - Mention the possibility to pass functions to :py
Dataset.where
or :pyDataArray.where
in the parameter documentation (4223
,4613
). By Justus Magin. - Update the docstring of :py
DataArray
and :pyDataset
. (4532
); By Jimmy Westling. - Raise a more informative error when :py
DataArray.to_dataframe
is is called on a scalar, (4228
); By Pieter Gijsbers. - Fix grammar and typos in the
contributing
guide (4545
). By Sahid Velji. - Fix grammar and typos in the
user-guide/io
guide (4553
). By Sahid Velji. - Update link to NumPy docstring standard in the
contributing
guide (4558
). By Sahid Velji. - Add docstrings to
isnull
andnotnull
, and fix the displayed signature (2760
,4618
). By Justus Magin.
- Optional dependencies can be installed along with xarray by specifying extras as
pip install "xarray[extra]"
whereextra
can be one ofio
,accel
,parallel
,viz
andcomplete
. See docs for updatedinstallation instructions <installation-instructions>
. (2888
,4480
). By Ashwin Vishnu, Justus Magin and Mathias Hauser. - Removed stray spaces that stem from black removing new lines (
4504
). By Mathias Hauser. - Ensure tests are not skipped in the
py38-all-but-dask
test environment (4509
). By Mathias Hauser. - Ignore select numpy warnings around missing values, where xarray handles the values appropriately, (
4536
); By Maximilian Roos. - Replace the internal use of
pd.Index.__or__
andpd.Index.__and__
withpd.Index.union
andpd.Index.intersection
as they will stop working as set operations in the future (4565
). By Mathias Hauser. - Add GitHub action for running nightly tests against upstream dependencies (
4583
). By Anderson Banihirwe. - Ensure all figures are closed properly in plot tests (
4600
). By Yash Saboo, Nirupam K N and Mathias Hauser.
This patch release fixes an incompatibility with a recent pandas change, which was causing an issue indexing with a datetime64
. It also includes improvements to rolling
, to_dataframe
, cov
& corr
methods and bug fixes. Our documentation has a number of improvements, including fixing all doctests and confirming their accuracy on every commit.
Many thanks to the 36 contributors who contributed to this release:
Aaron Spring, Akio Taniguchi, Aleksandar Jelenak, Alexandre Poux, Caleb, Dan Nowacki, Deepak Cherian, Gerardo Rivera, Jacob Tomlinson, James A. Bednar, Joe Hamman, Julia Kent, Kai Mühlbauer, Keisuke Fujii, Mathias Hauser, Maximilian Roos, Nick R. Papior, Pascal Bourgault, Peter Hausamann, Romain Martinez, Russell Manser, Samnan Rahee, Sander, Spencer Clark, Stephan Hoyer, Thomas Zilio, Tobias Kölling, Tom Augspurger, alexamici, crusaderky, darikg, inakleinbottle, jenssss, johnomotani, keewis, and rpgoldman.
- :py
DataArray.astype
and :pyDataset.astype
now preserve attributes. Keep the old behavior by passing keep_attrs=False (2049
,4314
). By Dan Nowacki and Gabriel Joel Mitchell.
- :py
~xarray.DataArray.rolling
and :py~xarray.Dataset.rolling
now accept more than 1 dimension. (4219
) By Keisuke Fujii. - :py
~xarray.DataArray.to_dataframe
and :py~xarray.Dataset.to_dataframe
now accept adim_order
parameter allowing to specify the resulting dataframe's dimensions order (4331
,4333
). By Thomas Zilio. - Support multiple outputs in :py
xarray.apply_ufunc
when usingdask='parallelized'
. (1815
,4060
). By Kai Mühlbauer. min_count
can be supplied to reductions such as.sum
when specifying multiple dimension to reduce over; (4356
). By Maximilian Roos.- :py
xarray.cov
and :pyxarray.corr
now handle missing values; (4351
). By Maximilian Roos. - Add support for parsing datetime strings formatted following the default string representation of cftime objects, i.e. YYYY-MM-DD hh:mm:ss, in partial datetime string indexing, as well as :py
~xarray.cftime_range
(4337
). By Spencer Clark. - Build
CFTimeIndex.__repr__
explicitly as :pypandas.Index
. Addcalendar
as a new property for :pyCFTimeIndex
and showcalendar
andlength
inCFTimeIndex.__repr__
(2416
,4092
) By Aaron Spring. - Use a wrapped array's
_repr_inline_
method to construct the collapsedrepr
of :pyDataArray
and :pyDataset
objects and document the new method ininternals/index
. (4248
). By Justus Magin. - Allow per-variable fill values in most functions. (
4237
). By Justus Magin. - Expose
use_cftime
option in :py~xarray.open_zarr
(2886
,3229
) By Samnan Rahee and Anderson Banihirwe.
- Fix indexing with datetime64 scalars with pandas 1.1 (
4283
). By Stephan Hoyer and Justus Magin. - Variables which are chunked using dask only along some dimensions can be chunked while storing with zarr along previously unchunked dimensions (
4312
) By Tobias Kölling. - Fixed a bug in backend caused by basic installation of Dask (
4164
,4318
) Sam Morley. - Fixed a few bugs with :py
Dataset.polyfit
when encountering deficient matrix ranks (4190
,4193
). By Pascal Bourgault. - Fixed inconsistencies between docstring and functionality for :py
DataArray.str.get
and :pyDataArray.str.wrap
(4334
). By Mathias Hauser. - Fixed overflow issue causing incorrect results in computing means of :py
cftime.datetime
arrays (4341
). By Spencer Clark. - Fixed :py
Dataset.coarsen
, :pyDataArray.coarsen
dropping attributes on original object (4120
,4360
). By Julia Kent. - fix the signature of the plot methods. (
4359
) By Justus Magin. - Fix :py
xarray.apply_ufunc
withvectorize=True
andexclude_dims
(3890
). By Mathias Hauser. - Fix KeyError when doing linear interpolation to an nd DataArray that contains NaNs (
4233
). By Jens Svensmark - Fix incorrect legend labels for :py
Dataset.plot.scatter
(4126
). By Peter Hausamann. - Fix
dask.optimize
onDataArray
producing an invalid Dask task graph (3698
) By Tom Augspurger - Fix
pip install .
when no.git
directory exists; namely when the xarray source directory has been rsync'ed by PyCharm Professional for a remote deployment over SSH. By Guido Imperiale - Preserve dimension and coordinate order during :py
xarray.concat
(2811
,4072
,4419
). By Kai Mühlbauer. - Avoid relying on :py
set
objects for the ordering of the coordinates (4409
) By Justus Magin.
- Update the docstring of :py
DataArray.copy
to remove incorrect mention of 'dataset' (3606
) By Sander van Rijn. - Removed skipna argument from :py
DataArray.count
, :pyDataArray.any
, :pyDataArray.all
. (755
) By Sander van Rijn - Update the contributing guide to use merges instead of rebasing and state that we squash-merge. (
4355
). By Justus Magin. - Make sure the examples from the docstrings actually work (
4408
). By Justus Magin. - Updated Vectorized Indexing to a clearer example. By Maximilian Roos
- Fixed all doctests and enabled their running in CI. By Justus Magin.
Relaxed the
mindeps_policy
to support:- all versions of setuptools released in the last 42 months (but no older than 38.4)
- all versions of dask and dask.distributed released in the last 12 months (but no older than 2.9)
- all versions of other packages released in the last 12 months
All are up from 6 months (
4295
) Guido Imperiale.- Use :py
dask.array.apply_gufunc <dask.array.gufunc.apply_gufunc>
instead of :pydask.array.blockwise
in :pyxarray.apply_ufunc
when usingdask='parallelized'
. (4060
,4391
,4392
) By Kai Mühlbauer. - Align
mypy
versions to0.782
acrossrequirements
and.pre-commit-config.yml
files. (4390
) By Maximilian Roos - Only load resource files when running inside a Jupyter Notebook (
4294
) By Guido Imperiale - Silenced most
numpy
warnings such asMean of empty slice
. (4369
) By Maximilian Roos - Enable type checking for :py
concat
(4238
) By Mathias Hauser. - Updated plot functions for matplotlib version 3.3 and silenced warnings in the plot tests (
4365
). By Mathias Hauser. - Versions in
pre-commit.yaml
are now pinned, to reduce the chances of conflicting versions. (4388
) By Maximilian Roos
This release adds xarray.cov & xarray.corr for covariance & correlation respectively; the idxmax & idxmin methods, the polyfit method & xarray.polyval for fitting polynomials, as well as a number of documentation improvements, other features, and bug fixes. Many thanks to all 44 contributors who contributed to this release:
Akio Taniguchi, Andrew Williams, Aurélien Ponte, Benoit Bovy, Dave Cole, David Brochart, Deepak Cherian, Elliott Sales de Andrade, Etienne Combrisson, Hossein Madadi, Huite, Joe Hamman, Kai Mühlbauer, Keisuke Fujii, Maik Riechert, Marek Jacob, Mathias Hauser, Matthieu Ancellin, Maximilian Roos, Noah D Brenowitz, Oriol Abril, Pascal Bourgault, Phillip Butcher, Prajjwal Nijhara, Ray Bell, Ryan Abernathey, Ryan May, Spencer Clark, Spencer Hill, Srijan Saurav, Stephan Hoyer, Taher Chegini, Todd, Tom Nicholas, Yohai Bar Sinai, Yunus Sevinchan, arabidopsis, aurghs, clausmichele, dmey, johnomotani, keewis, raphael dussin, risebell
- Minimum supported versions for the following packages have changed:
dask >=2.9
,distributed>=2.9
. By Deepak Cherian groupby
operations will restore coord dimension order. Passrestore_coord_dims=False
to revert to previous behavior.DataArray.transpose
will now transpose coordinates by default. Passtranspose_coords=False
to revert to previous behaviour. By Maximilian Roos- Alternate draw styles for :py
plot.step
must be passed using thedrawstyle
(ords
) keyword argument, instead of thelinestyle
(orls
) keyword argument, in line with the upstream change in Matplotlib. (3274
) By Elliott Sales de Andrade - The old
auto_combine
function has now been removed in favour of the :pycombine_by_coords
and :pycombine_nested
functions. This also means that the default behaviour of :pyopen_mfdataset
has changed to usecombine='by_coords'
as the default argument value. (2616
,3926
) By Tom Nicholas. - The
DataArray
andVariable
HTML reprs now expand the data section by default (4176
) By Stephan Hoyer.
- :py
DataArray.argmin
and :pyDataArray.argmax
now support sequences of 'dim' arguments, and if a sequence is passed return a dict (which can be passed to :pyDataArray.isel
to get the value of the minimum) of the indices for each dimension of the minimum or maximum of a DataArray. (3936
) By John Omotani, thanks to Keisuke Fujii for work in1469
. - Added :py
xarray.cov
and :pyxarray.corr
(3784
,3550
,4089
). By Andrew Williams and Robin Beer. - Implement :py
DataArray.idxmax
, :pyDataArray.idxmin
, :pyDataset.idxmax
, :pyDataset.idxmin
. (60
,3871
) By Todd Jennings - Added :py
DataArray.polyfit
and :pyxarray.polyval
for fitting polynomials. (3349
,3733
,4099
) By Pascal Bourgault. - Added :py
xarray.infer_freq
for extending frequency inferring to CFTime indexes and data (4033
). By Pascal Bourgault. chunks='auto'
is now supported in thechunks
argument of :pyDataset.chunk
. (4055
) By Andrew Williams- Control over attributes of result in :py
merge
, :pyconcat
, :pycombine_by_coords
and :pycombine_nested
using combine_attrs keyword argument. (3865
,3877
) By John Omotani - missing_dims argument to :py
Dataset.isel
, :pyDataArray.isel
and :pyVariable.isel
to allow replacing the exception when a dimension passed toisel
is not present with a warning, or just ignore the dimension. (3866
,3923
) By John Omotani - Support dask handling for :py
DataArray.idxmax
, :pyDataArray.idxmin
, :pyDataset.idxmax
, :pyDataset.idxmin
. (3922
,4135
) By Kai Mühlbauer and Pascal Bourgault. - More support for unit aware arrays with pint (
3643
,3975
,4163
) By Justus Magin. - Support overriding existing variables in
to_zarr()
withmode='a'
even withoutappend_dim
, as long as dimension sizes do not change. By Stephan Hoyer. - Allow plotting of boolean arrays. (
3766
) By Marek Jacob - Enable using MultiIndex levels as coordinates in 1D and 2D plots (
3927
). By Mathias Hauser. - A
days_in_month
accessor for :pyxarray.CFTimeIndex
, analogous to thedays_in_month
accessor for a :pypandas.DatetimeIndex
, which returns the days in the month each datetime in the index. Now days in month weights for both standard and non-standard calendars can be obtained using the :py~core.accessor_dt.DatetimeAccessor
(3935
). This feature requires cftime version 1.1.0 or greater. By Spencer Clark. - For the netCDF3 backend, added dtype coercions for unsigned integer types. (
4014
,4018
) By Yunus Sevinchan - :py
map_blocks
now accepts atemplate
kwarg. This allows use cases where the result of a computation could not be inferred automatically. By Deepak Cherian - :py
map_blocks
can now handle dask-backed xarray objects inargs
. (3818
) By Deepak Cherian - Add keyword
decode_timedelta
to :pyxarray.open_dataset
, (:pyxarray.open_dataarray
, :pyxarray.open_dataarray
, :pyxarray.decode_cf
) that allows to disable/enable the decoding of timedeltas independently of time decoding (1621
) Aureliana Barghini
- Performance improvement of :py
DataArray.interp
and :pyDataset.interp
We performs independent interpolation sequentially rather than interpolating in one large multidimensional space. (2223
) By Keisuke Fujii. - :py
DataArray.interp
now support interpolations over chunked dimensions (4155
). By Alexandre Poux. - Major performance improvement for :py
Dataset.from_dataframe
when the dataframe has a MultiIndex (4184
). By Stephan Hoyer.- :py
DataArray.reset_index
and :pyDataset.reset_index
now keep coordinate attributes (4103
). By Oriol Abril. - Axes kwargs such as
facecolor
can now be passed to :pyDataArray.plot
insubplot_kws
. This works for both single axes plots and FacetGrid plots. By Raphael Dussin. - Array items with long string reprs are now limited to a reasonable width (
3900
) By Maximilian Roos - Large arrays whose numpy reprs would have greater than 40 lines are now limited to a reasonable length. (
3905
) By Maximilian Roos
- Fix errors combining attrs in :py
open_mfdataset
(4009
,4173
) By John Omotani - If groupby receives a
DataArray
with name=None, assign a default name (158
) By Phil Butcher. - Support dark mode in VS code (
4024
) By Keisuke Fujii. - Fix bug when converting multiindexed pandas objects to sparse xarray objects. (
4019
) By Deepak Cherian. ValueError
is raised whenfill_value
is not a scalar in :pyfull_like
. (3977
) By Huite Bootsma.- Fix wrong order in converting a
pd.Series
with a MultiIndex toDataArray
. (3951
,4186
) By Keisuke Fujii and Stephan Hoyer. - Fix renaming of coords when one or more stacked coords is not in sorted order during stack+groupby+apply operations. (
3287
,3906
) By Spencer Hill - Fix a regression where deleting a coordinate from a copied :py
DataArray
can affect the original :pyDataArray
. (3899
,3871
) By Todd Jennings - Fix :py
~xarray.plot.FacetGrid
plots with a single contour. (3569
,3915
). By Deepak Cherian - Use divergent colormap if
levels
spans 0. (3524
) By Deepak Cherian - Fix :py
~xarray.plot.FacetGrid
whenvmin == vmax
. (3734
) By Deepak Cherian - Fix plotting when
levels
is a scalar andnorm
is provided. (3735
) By Deepak Cherian - Fix bug where plotting line plots with 2D coordinates depended on dimension order. (
3933
) By Tom Nicholas. - Fix
RasterioDeprecationWarning
when using avrt
inopen_rasterio
. (3964
) By Taher Chegini. - Fix
AttributeError
on displaying a :pyVariable
in a notebook context. (3972
,3973
) By Ian Castleden. - Fix bug causing :py
DataArray.interpolate_na
to always drop attributes, and added keep_attrs argument. (3968
) By Tom Nicholas. - Fix bug in time parsing failing to fall back to cftime. This was causing time variables with a time unit of 'msecs' to fail to parse. (
3998
) By Ryan May. - Fix weighted mean when passing boolean weights (
4074
). By Mathias Hauser. - Fix html repr in untrusted notebooks: fallback to plain text repr. (
4053
) By Benoit Bovy. - Fix :py
DataArray.to_unstacked_dataset
for single-dimension variables. (4049
) By Deepak Cherian - Fix :py
open_rasterio
forWarpedVRT
with specifiedsrc_crs
. (4104
) By Dave Cole.
- update the docstring of :py
DataArray.assign_coords
: clarify how to add a new coordinate to an existing dimension and illustrative example (3952
,3958
) By Etienne Combrisson. - update the docstring of :py
Dataset.diff
and :pyDataArray.diff
so it does document thedim
parameter as required. (1040
,3909
) By Justus Magin. - Updated
Calculating Seasonal Averages from Timeseries of Monthly Means <examples/monthly-means>
example notebook to take advantage of the newdays_in_month
accessor for :pyxarray.CFTimeIndex
(3935
). By Spencer Clark. - Updated the list of current core developers. (
3892
) By Tom Nicholas. - Add example for multi-dimensional extrapolation and note different behavior of
kwargs
in :pyDataset.interp
and :pyDataArray.interp
for 1-d and n-d interpolation (3956
). By Matthias Riße. - Apply
black
to all the code in the documentation (4012
) By Justus Magin. - Narrative documentation now describes :py
map_blocks
:dask.automatic-parallelization
. By Deepak Cherian. - Document
.plot
,.dt
,.str
accessors the way they are called. (3625
,3988
) By Justus Magin. - Add documentation for the parameters and return values of :py
DataArray.sel
. By Justus Magin.
- Raise more informative error messages for chunk size conflicts when writing to zarr files. By Deepak Cherian.
- Run the
isort
pre-commit hook only on python source files and update theflake8
version. (3750
,3711
) By Justus Magin. - Add blackdoc to the list of checkers for development. (
4177
) By Justus Magin. - Add a CI job that runs the tests with every optional dependency except
dask
. (3794
,3919
) By Justus Magin. - Use
async
/await
for the asynchronous distributed tests. (3987
,3989
) By Justus Magin. - Various internal code clean-ups (
4026
,4038
). By Prajjwal Nijhara.
This release brings many new features such as :pyDataset.weighted
methods for weighted array reductions, a new jupyter repr by default, and the start of units integration with pint. There's also the usual batch of usability improvements, documentation additions, and bug fixes.
- Raise an error when assigning to the
.values
or.data
attribute of dimension coordinates i.e.IndexVariable
objects. This has been broken since v0.12.0. Please use :pyDataArray.assign_coords
or :pyDataset.assign_coords
instead. (3470
,3862
) By Deepak Cherian
- Weighted array reductions are now supported via the new :py
DataArray.weighted
and :pyDataset.weighted
methods. Seecomput.weighted
. (422
,2922
). By Mathias Hauser. - The new jupyter notebook repr (
Dataset._repr_html_
andDataArray._repr_html_
) (introduced in 0.14.1) is now on by default. To disable, usexarray.set_options(display_style="text")
. By Julia Signell. - Added support for :py
pandas.DatetimeIndex
-style rounding ofcftime.datetime
objects directly via a :pyCFTimeIndex
or via the :py~core.accessor_dt.DatetimeAccessor
. By Spencer Clark - Support new h5netcdf backend keyword phony_dims (available from h5netcdf v0.8.0 for :py
~xarray.backends.H5NetCDFStore
. By Kai Mühlbauer. - Add partial support for unit aware arrays with pint. (
3706
,3611
) By Justus Magin. - :py
Dataset.groupby
and :pyDataArray.groupby
now raise a TypeError on multiple string arguments. Receiving multiple string arguments often means a user is attempting to pass multiple dimensions as separate arguments and should instead pass a single list of dimensions. (3802
) By Maximilian Roos - :py
map_blocks
can now apply functions that add new unindexed dimensions. By Deepak Cherian - An ellipsis (
...
) is now supported in thedims
argument of :pyDataset.stack
and :pyDataArray.stack
, meaning all unlisted dimensions, similar to its meaning in :pyDataArray.transpose
. (3826
) By Maximilian Roos - :py
Dataset.where
and :pyDataArray.where
accept a lambda as a first argument, which is then called on the input; replicating pandas' behavior. By Maximilian Roos. skipna
is available in :pyDataset.quantile
, :pyDataArray.quantile
, :pycore.groupby.DatasetGroupBy.quantile
, :pycore.groupby.DataArrayGroupBy.quantile
(3843
,3844
) By Aaron Spring.- Add a diff summary for testing.assert_allclose. (
3617
,3847
) By Justus Magin.
- Fix :py
Dataset.interp
when indexing array shares coordinates with the indexed variable (3252
). By David Huard. - Fix recombination of groups in :py
Dataset.groupby
and :pyDataArray.groupby
when performing an operation that changes the size of the groups along the grouped dimension. By Eric Jansen. - Fix use of multi-index with categorical values (
3674
). By Matthieu Ancellin. - Fix alignment with
join="override"
when some dimensions are unindexed. (3681
). By Deepak Cherian. - Fix :py
Dataset.swap_dims
and :pyDataArray.swap_dims
producing index with name reflecting the previous dimension name instead of the new one (3748
,3752
). By Joseph K Aicher. - Use
dask_array_type
instead ofdask_array.Array
for type checking. (3779
,3787
) By Justus Magin. - :py
concat
can now handle coordinate variables only present in one of the objects to be concatenated whencoords="different"
. By Deepak Cherian. - xarray now respects the over, under and bad colors if set on a provided colormap. (
3590
,3601
) By johnomotani. coarsen
androlling
now respectxr.set_options(keep_attrs=True)
to preserve attributes. :pyDataset.coarsen
accepts a keyword argumentkeep_attrs
to change this setting. (3376
,3801
) By Andrew Thomas.- Delete associated indexes when deleting coordinate variables. (
3746
). By Deepak Cherian. - Fix :py
Dataset.to_zarr
when usingappend_dim
andgroup
simultaneously. (3170
). By Matthias Meyer. - Fix html repr on :py
Dataset
with non-string keys (3807
). By Maximilian Roos.
- Fix documentation of :py
DataArray
removing the deprecated mention that when omitted, dims are inferred from a coords-dict. (3821
) By Sander van Rijn. - Improve the :py
where
docstring. By Maximilian Roos - Update the installation instructions: only explicitly list recommended dependencies (
3756
). By Mathias Hauser.
- Remove the internal
import_seaborn
function which handled the deprecation of theseaborn.apionly
entry point (3747
). By Mathias Hauser. - Don't test pint integration in combination with datetime objects. (
3778
,3788
) By Justus Magin. - Change test_open_mfdataset_list_attr to only run with dask installed (
3777
,3780
). By Bruno Pagani. - Preserve the ability to index with
method="nearest"
with a :pyCFTimeIndex
with pandas versions greater than 1.0.1 (3751
). By Spencer Clark. - Greater flexibility and improved test coverage of subtracting various types of objects from a :py
CFTimeIndex
. By Spencer Clark. - Update Azure CI MacOS image, given pending removal. By Maximilian Roos
- Remove xfails for scipy 1.0.1 for tests that append to netCDF files (
3805
). By Mathias Hauser. - Remove conversion to
pandas.Panel
, given its removal in pandas in favor of xarray's objects. By Maximilian Roos
This release brings many improvements to xarray's documentation: our examples are now binderized notebooks (click here) and we have new example notebooks from our SciPy 2019 sprint (many thanks to our contributors!).
This release also features many API improvements such as a new :py~core.accessor_dt.TimedeltaAccessor
and support for :pyCFTimeIndex
in :py~DataArray.interpolate_na
); as well as many bug fixes.
- Bumped minimum tested versions for dependencies:
- numpy 1.15
- pandas 0.25
- dask 2.2
- distributed 2.2
- scipy 1.3
- Remove
compat
andencoding
kwargs fromDataArray
, which have been deprecated since 0.12. (3650
). Instead, specify theencoding
kwarg when writing to disk or set the :pyDataArray.encoding
attribute directly. By Maximilian Roos. - :py
xarray.dot
, :pyDataArray.dot
, and the@
operator now usealign="inner"
(except whenxarray.set_options(arithmetic_join="exact")
;3694
) by Mathias Hauser.
- Implement :py
DataArray.pad
and :pyDataset.pad
. (2605
,3596
). By Mark Boer. - :py
DataArray.sel
and :pyDataset.sel
now support :pypandas.CategoricalIndex
. (3669
) By Keisuke Fujii. - Support using an existing, opened h5netcdf
File
with :py~xarray.backends.H5NetCDFStore
. This permits creating an :py~xarray.Dataset
from a h5netcdfFile
that has been opened using other means (3618
). By Kai Mühlbauer. - Implement
median
andnanmedian
for dask arrays. This works by rechunking to a single chunk along all reduction axes. (2999
). By Deepak Cherian. - :py
~xarray.concat
now preserves attributes from the first Variable. (2575
,2060
,1614
) By Deepak Cherian. - :py
Dataset.quantile
, :pyDataArray.quantile
andGroupBy.quantile
now work with dask Variables. By Deepak Cherian. - Added the
count
reduction method to both :py~core.rolling.DatasetCoarsen
and :py~core.rolling.DataArrayCoarsen
objects. (3500
) By Deepak Cherian - Add
meta
kwarg to :py~xarray.apply_ufunc
; this is passed on to :pydask.array.blockwise
. (3660
) By Deepak Cherian. - Add
attrs_file
option in :py~xarray.open_mfdataset
to choose the source file for global attributes in a multi-file dataset (2382
,3498
). By Julien Seguinot. - :py
Dataset.swap_dims
and :pyDataArray.swap_dims
now allow swapping to dimension names that don't exist yet. (3636
) By Justus Magin. - Extend :py
~core.accessor_dt.DatetimeAccessor
properties and support.dt
accessor for timedeltas via :py~core.accessor_dt.TimedeltaAccessor
(3612
) By Anderson Banihirwe. - Improvements to interpolating along time axes (
3641
,3631
). By David Huard.- Support :py
CFTimeIndex
in :pyDataArray.interpolate_na
- define 1970-01-01 as the default offset for the interpolation index for both :py
pandas.DatetimeIndex
and :pyCFTimeIndex
, - use microseconds in the conversion from timedelta objects to floats to avoid overflow errors.
- Support :py
- Applying a user-defined function that adds new dimensions using :py
apply_ufunc
andvectorize=True
now works withdask > 2.0
. (3574
,3660
). By Deepak Cherian. - Fix :py
~xarray.combine_by_coords
to allow for combining incomplete hypercubes of Datasets (3648
). By Ian Bolliger. - Fix :py
~xarray.combine_by_coords
when combining cftime coordinates which span long time intervals (3535
). By Spencer Clark. - Fix plotting with transposed 2D non-dimensional coordinates. (
3138
,3441
) By Deepak Cherian. - :py
plot.FacetGrid.set_titles
can now replace existing row titles of a :py~xarray.plot.FacetGrid
plot. In addition :py~xarray.plot.FacetGrid
gained two new attributes: :py~xarray.plot.FacetGrid.col_labels
and :py~xarray.plot.FacetGrid.row_labels
contain :pymatplotlib.text.Text
handles for both column and row labels. These can be used to manually change the labels. By Deepak Cherian. - Fix issue with Dask-backed datasets raising a
KeyError
on some computations involving :pymap_blocks
(3598
). By Tom Augspurger. - Ensure :py
Dataset.quantile
, :pyDataArray.quantile
issue the correct error whenq
is out of bounds (3634
) by Mathias Hauser. - Fix regression in xarray 0.14.1 that prevented encoding times with certain
dtype
,_FillValue
, andmissing_value
encodings (3624
). By Spencer Clark - Raise an error when trying to use :py
Dataset.rename_dims
to rename to an existing name (3438
,3645
) By Justus Magin. - :py
Dataset.rename
, :pyDataArray.rename
now check for conflicts with MultiIndex level names. - :py
Dataset.merge
no longer fails when passed a :pyDataArray
instead of a :pyDataset
. By Tom Nicholas. - Fix a regression in :py
Dataset.drop
: allow passing any iterable when dropping variables (3552
,3693
) By Justus Magin. - Fixed errors emitted by
mypy --strict
in modules that import xarray. (3695
) by Guido Imperiale. - Allow plotting of binned coordinates on the y axis in :py
plot.line
and :pyplot.step
plots (3571
,3685
) by Julien Seguinot. - setuptools is now marked as a dependency of xarray (
3628
) by Richard Höchenberger.
- Switch doc examples to use nbsphinx and replace
sphinx_gallery
scripts with Jupyter notebooks. (3105
,3106
,3121
) By Ryan Abernathey. - Added
example notebook <examples/ROMS_ocean_model>
demonstrating use of xarray with Regional Ocean Modeling System (ROMS) ocean hydrodynamic model output. (3116
) By Robert Hetland. - Added
example notebook <examples/ERA5-GRIB-example>
demonstrating the visualization of ERA5 GRIB data. (3199
) By Zach Bruick and Stephan Siemen. - Added examples for :py
DataArray.quantile
, :pyDataset.quantile
andGroupBy.quantile
. (3576
) By Justus Magin. - Add new
example notebook <examples/apply_ufunc_vectorize_1d>
example notebook demonstrating vectorization of a 1D function using :pyapply_ufunc
, dask and numba. By Deepak Cherian. - Added example for :py
~xarray.map_blocks
. (3667
) By Riley X. Brady.
- Make sure dask names change when rechunking by different chunk sizes. Conversely, make sure they stay the same when rechunking by the same chunk size. (
3350
) By Deepak Cherian. - 2x to 5x speed boost (on small arrays) for :py
Dataset.isel
, :pyDataArray.isel
, and :pyDataArray.__getitem__
when indexing by int, slice, list of int, scalar ndarray, or 1-dimensional ndarray. (3533
) by Guido Imperiale. - Removed internal method
Dataset._from_vars_and_coord_names
, which was dominated byDataset._construct_direct
. (3565
) By Maximilian Roos. - Replaced versioneer with setuptools-scm. Moved contents of setup.py to setup.cfg. Removed pytest-runner from setup.py, as per deprecation notice on the pytest-runner project. (
3714
) by Guido Imperiale. - Use of isort is now enforced by CI. (
3721
) by Guido Imperiale
Broken compatibility with
cftime < 1.0.3
. By Deepak Cherian.Warning
cftime version 1.0.4 is broken (cftime/126); please use version 1.0.4.2 instead.
- All leftover support for dates from non-standard calendars through
netcdftime
, the module included in versions of netCDF4 prior to 1.4 that eventually became the cftime package, has been removed in favor of relying solely on the standalonecftime
package (3450
). By Spencer Clark.
- Added the
sparse
option to :py~xarray.DataArray.unstack
, :py~xarray.Dataset.unstack
, :py~xarray.DataArray.reindex
, :py~xarray.Dataset.reindex
(3518
). By Keisuke Fujii. - Added the
fill_value
option to :pyDataArray.unstack
and :pyDataset.unstack
(3518
,3541
). By Keisuke Fujii. - Added the
max_gap
kwarg to :py~xarray.DataArray.interpolate_na
and :py~xarray.Dataset.interpolate_na
. This controls the maximum size of the data gap that will be filled by interpolation. By Deepak Cherian. - Added :py
Dataset.drop_sel
& :pyDataArray.drop_sel
for dropping labels. :pyDataset.drop_vars
& :pyDataArray.drop_vars
have been added for dropping variables (including coordinates). The existing :pyDataset.drop
& :pyDataArray.drop
methods remain as a backward compatible option for dropping either labels or variables, but using the more specific methods is encouraged. (3475
) By Maximilian Roos - Added :py
Dataset.map
&GroupBy.map
&Resample.map
for mapping / applying a function over each item in the collection, reflecting the widely used and least surprising name for this operation. The existingapply
methods remain for backward compatibility, though using themap
methods is encouraged. (3459
) By Maximilian Roos - :py
Dataset.transpose
and :pyDataArray.transpose
now support an ellipsis (...
) to represent all 'other' dimensions. For example, to move one dimension to the front, use.transpose('x', ...)
. (3421
) By Maximilian Roos - Changed
xr.ALL_DIMS
to equal python'sEllipsis
(...
), and changed internal usages to use...
directly. As before, you can use this to instruct agroupby
operation to reduce over all dimensions. While we have no plans to removexr.ALL_DIMS
, we suggest using...
. (3418
) By Maximilian Roos - :py
xarray.dot
, and :pyDataArray.dot
now support thedims=...
option to sum over the union of dimensions of all input arrays (3423
) by Mathias Hauser. - Added new
Dataset._repr_html_
andDataArray._repr_html_
to improve representation of objects in Jupyter. By default this feature is turned off for now. Enable it withxarray.set_options(display_style="html")
. (3425
) by Benoit Bovy and Julia Signell. - Implement dask deterministic hashing for xarray objects. Note that xarray objects with a dask.array backend already used deterministic hashing in previous releases; this change implements it when whole xarray objects are embedded in a dask graph, e.g. when :py
DataArray.map_blocks
is invoked. (3378
,3446
,3515
) By Deepak Cherian and Guido Imperiale. - Add the documented-but-missing :py
~core.groupby.DatasetGroupBy.quantile
. - xarray now respects the
DataArray.encoding["coordinates"]
attribute when writing to disk. Seeio.coordinates
for more. (3351
,3487
) By Deepak Cherian. - Add the documented-but-missing :py
~core.groupby.DatasetGroupBy.quantile
. (3525
,3527
). By Justus Magin.
- Ensure an index of type
CFTimeIndex
is not converted to aDatetimeIndex
when calling :pyDataset.rename
, :pyDataset.rename_dims
and :pyDataset.rename_vars
. By Mathias Hauser. (3522
). - Fix a bug in :py
DataArray.set_index
in case that an existing dimension becomes a level variable of MultiIndex. (3520
). By Keisuke Fujii. - Harmonize
_FillValue
,missing_value
during encoding and decoding steps. (3502
) By Anderson Banihirwe. - Fix regression introduced in v0.14.0 that would cause a crash if dask is installed but cloudpickle isn't (
3401
) by Rhys Doyle - Fix grouping over variables with NaNs. (
2383
,3406
). By Deepak Cherian. - Make alignment and concatenation significantly more efficient by using dask names to compare dask objects prior to comparing values after computation. This change makes it more convenient to carry around large non-dimensional coordinate variables backed by dask arrays. Existing workarounds involving
reset_coords(drop=True)
should now be unnecessary in most cases. (3068
,3311
,3454
,3453
). By Deepak Cherian. - Add support for cftime>=1.0.4. By Anderson Banihirwe.
- Rolling reduction operations no longer compute dask arrays by default. (
3161
). In addition, theallow_lazy
kwarg toreduce
is deprecated. By Deepak Cherian. - Fix
GroupBy.reduce
when reducing over multiple dimensions. (3402
). By Deepak Cherian - Allow appending datetime and bool data variables to zarr stores. (
3480
). By Akihiro Matsukawa. - Add support for numpy >=1.18 (); bugfix mean() on datetime64 arrays on dask backend (
3409
,3537
). By Guido Imperiale. - Add support for pandas >=0.26 (
3440
). By Deepak Cherian. - Add support for pseudonetcdf >=3.1 (
3485
). By Barron Henderson.
- Fix leap year condition in monthly means example. By Mickaël Lalande.
- Fix the documentation of :py
DataArray.resample
and :pyDataset.resample
, explicitly stating that a datetime-like dimension is required. (3400
) By Justus Magin. - Update the
terminology
page to address multidimensional coordinates. (3410
) By Jon Thielen. - Fix the documentation of :py
Dataset.integrate
and :pyDataArray.integrate
and add an example to :pyDataset.integrate
. (3469
) By Justus Magin.
Added integration tests against pint. (
3238
,3447
,3493
,3508
) by Justus Magin.Note
At the moment of writing, these tests as well as the ability to use pint in general require a highly experimental version of pint (install with
pip install git+https://github.com/andrewgsavage/pint.git@refs/pull/6/head)
. Even with it, interaction with non-numpy array libraries, e.g. dask or sparse, is broken.- Use Python 3.6 idioms throughout the codebase. (
3419
) By Maximilian Roos - Run basic CI tests on Python 3.8. (
3477
) By Maximilian Roos - Enable type checking on default sentinel values (
3472
) By Maximilian Roos - Add
Variable._replace
for simpler replacing of a subset of attributes (3472
) By Maximilian Roos
This release introduces a rolling policy for minimum dependency versions:
mindeps_policy
.Several minimum versions have been increased:
Package Old New Python 3.5.3 3.6 numpy 1.12 1.14 pandas 0.19.2 0.24 dask 0.16 (tested: 2.4) 1.2 bottleneck 1.1 (tested: 1.2) 1.2 matplotlib 1.5 (tested: 3.1) 3.1 Obsolete patch versions (x.y.Z) are not tested anymore. The oldest supported versions of all optional dependencies are now covered by automated tests (before, only the very latest versions were tested).
(
3222
,3293
,3340
,3346
,3358
). By Guido Imperiale.- Dropped the
drop=False
optional parameter from :pyVariable.isel
. It was unused and doesn't make sense for a Variable. (3375
). By Guido Imperiale. - Remove internal usage of :py
collections.OrderedDict
. After dropping support for Python <=3.5, most uses ofOrderedDict
in xarray were no longer necessary. We have removed the internal use of theOrderedDict
in favor of Python's builtindict
object which is now ordered itself. This change will be most obvious when interacting with theattrs
property on Dataset and DataArray objects. (3380
,3389
). By Joe Hamman.
- Added :py
~xarray.map_blocks
, modeled after :pydask.array.map_blocks
. Also added :pyDataset.unify_chunks
, :pyDataArray.unify_chunks
and :pytesting.assert_chunks_equal
. (3276
). By Deepak Cherian and Guido Imperiale.
core.groupby.GroupBy
enhancements. By Deepak Cherian.Added a repr (
3344
). Example:>>> da.groupby("time.season") DataArrayGroupBy, grouped over 'season' 4 groups with labels 'DJF', 'JJA', 'MAM', 'SON'
- Added a
GroupBy.dims
property that mirrors the dimensions of each group (3344
).
- Speed up :py
Dataset.isel
up to 33% and :pyDataArray.isel
up to 25% for small arrays (2799
,3375
). By Guido Imperiale.
- Reintroduce support for
weakref
(broken in v0.13.0). Support has been reinstated for :py~xarray.DataArray
and :py~xarray.Dataset
objects only. Internal xarray objects remain unaddressable by weakref in order to save memory (3317
). By Guido Imperiale. - Line plots with the
x
ory
argument set to a 1D non-dimensional coord now plot the correct data for 2D DataArrays (3334
). By Tom Nicholas. - Make :py
~xarray.concat
more robust when merging variables present in some datasets but not others (508
). By Deepak Cherian. - The default behaviour of reducing across all dimensions for :py
~xarray.core.groupby.DataArrayGroupBy
objects has now been properly removed as was done for :py~xarray.core.groupby.DatasetGroupBy
in 0.13.0 (3337
). Usexarray.ALL_DIMS
if you need to replicate previous behaviour. Also raise nicer error message when no groups are created (1764
). By Deepak Cherian. - Fix error in concatenating unlabeled dimensions (
3362
). By Deepak Cherian. - Warn if the
dim
kwarg is passed to rolling operations. This is redundant since a dimension is specified when the :py~core.rolling.DatasetRolling
or :py~core.rolling.DataArrayRolling
object is created. (3362
). By Deepak Cherian.
- Created a glossary of important xarray terms (
2410
,3352
). By Gregory Gundersen. - Created a "How do I..." section (
howdoi
) for solutions to common questions. (3357
). By Deepak Cherian. - Add examples for :py
Dataset.swap_dims
and :pyDataArray.swap_dims
(3331
,3331
). By Justus Magin. - Add examples for :py
align
, :pymerge
, :pycombine_by_coords
, :pyfull_like
, :pyzeros_like
, :pyones_like
, :pyDataset.pipe
, :pyDataset.assign
, :pyDataset.reindex
, :pyDataset.fillna
(3328
). By Anderson Banihirwe. - Fixed documentation to clean up an unwanted file created in
ipython
example (3353
). By Gregory Gundersen.
This release includes many exciting changes: wrapping of NEP18 compliant numpy-like arrays; new :py~Dataset.plot.scatter
plotting method that can scatter two DataArrays
in a Dataset
against each other; support for converting pandas DataFrames to xarray objects that wrap pydata/sparse
; and more!
- This release increases the minimum required Python version from 3.5.0 to 3.5.3 (
3089
). By Guido Imperiale. - The
isel_points
andsel_points
methods are removed, having been deprecated since v0.10.0. These are redundant with theisel
/sel
methods. Seevectorized_indexing
for the details By Maximilian Roos - The
inplace
kwarg for public methods now raises an error, having been deprecated since v0.11.0. By Maximilian Roos - :py
~xarray.concat
now requires thedim
argument. Itsindexers
,mode
andconcat_over
kwargs have now been removed. By Deepak Cherian - Passing a list of colors in
cmap
will now raise an error, having been deprecated since v0.6.1. Most xarray objects now define
__slots__
. This reduces overall RAM usage by ~22% (not counting the underlying numpy buffers); on CPython 3.7/x64, a trivial DataArray has gone down from 1.9kB to 1.5kB.Caveats:
- Pickle streams produced by older versions of xarray can't be loaded using this release, and vice versa.
- Any user code that was accessing the
__dict__
attribute of xarray objects will break. The best practice to attach custom metadata to xarray objects is to use theattrs
dictionary. - Any user code that defines custom subclasses of xarray classes must now explicitly define
__slots__
itself. Subclasses that don't add any attributes must state so by defining__slots__ = ()
right after the class header. Omitting__slots__
will now cause aFutureWarning
to be logged, and will raise an error in a later release.
(
3250
) by Guido Imperiale.- The default dimension for :py
Dataset.groupby
, :pyDataset.resample
, :pyDataArray.groupby
and :pyDataArray.resample
reductions is now the grouping or resampling dimension. - :py
DataArray.to_dataset
requiresname
to be passed as a kwarg (previously ambiguous positional arguments were deprecated) - Reindexing with variables of a different dimension now raise an error (previously deprecated)
xarray.broadcast_array
is removed (previously deprecated in favor of :py~xarray.broadcast
)Variable.expand_dims
is removed (previously deprecated in favor of :pyVariable.set_dims
)
- xarray can now wrap around any NEP18 compliant numpy-like library (important: read notes about
NUMPY_EXPERIMENTAL_ARRAY_FUNCTION
in the above link). Added explicit test coverage for sparse. (3117
,3202
). This requires sparse>=0.8.0. By Nezar Abdennur and Guido Imperiale. - :py
~Dataset.from_dataframe
and :py~DataArray.from_series
now supportsparse=True
for converting pandas objects into xarray objects wrapping sparse arrays. This is particularly useful with sparsely populated hierarchical indexes. (3206
) By Stephan Hoyer. The xarray package is now discoverable by mypy (although typing hints coverage is not complete yet). mypy type checking is now enforced by CI. Libraries that depend on xarray and use mypy can now remove from their setup.cfg the lines:
[mypy-xarray] ignore_missing_imports = True
(
2877
,3088
,3090
,3112
,3117
,3207
) By Guido Imperiale and Maximilian Roos.- Added :py
DataArray.broadcast_like
and :pyDataset.broadcast_like
. By Deepak Cherian and David Mertz. - Dataset plotting API for visualizing dependencies between two DataArrays! Currently only :py
Dataset.plot.scatter
is implemented. By Yohai Bar Sinai and Deepak Cherian - Added :py
DataArray.head
, :pyDataArray.tail
and :pyDataArray.thin
; as well as :pyDataset.head
, :pyDataset.tail
and :pyDataset.thin
methods. (319
) By Gerardo Rivera.
Multiple enhancements to :py
~xarray.concat
and :py~xarray.open_mfdataset
. By Deepak Cherian- Added
compat='override'
. When merging, this option picks the variable from the first dataset and skips all comparisons. - Added
join='override'
. When aligning, this only checks that index sizes are equal among objects and skips checking indexes for equality. - :py
~xarray.concat
and :py~xarray.open_mfdataset
now support thejoin
kwarg. It is passed down to :py~xarray.align
. - :py
~xarray.concat
now calls :py~xarray.merge
on variables that are not concatenated (i.e. variables withoutconcat_dim
whendata_vars
orcoords
are"minimal"
). :py~xarray.concat
passes its newcompat
kwarg down to :py~xarray.merge
. (2064
)
Users can avoid a common bottleneck when using :py
~xarray.open_mfdataset
on a large number of files with variables that are known to be aligned and some of which need not be concatenated. Slow equality comparisons can now be avoided, for e.g.:data = xr.open_mfdataset(files, concat_dim='time', data_vars='minimal', coords='minimal', compat='override', join='override')
- Added
- In :py
~xarray.Dataset.to_zarr
, passingmode
is not mandatory ifappend_dim
is set, as it will automatically be set to'a'
internally. By David Brochart. - Added the ability to initialize an empty or full DataArray with a single value. (
277
) By Gerardo Rivera. - :py
~xarray.Dataset.to_netcdf()
now supports theinvalid_netcdf
kwarg when used withengine="h5netcdf"
. It is passed toh5netcdf.File
. By Ulrich Herter. xarray.Dataset.drop
now supports keyword arguments; dropping index labels by using bothdim
andlabels
or using a :py~core.coordinates.DataArrayCoordinates
object are deprecated (2910
). By Gregory Gundersen.- Added examples of :py
Dataset.set_index
and :pyDataArray.set_index
, as well are more specific error messages when the user passes invalid arguments (3176
). By Gregory Gundersen. - :py
Dataset.filter_by_attrs
now filters the coordinates as well as the variables. By Spencer Jones.
- Improve "missing dimensions" error message for :py
~xarray.apply_ufunc
(2078
). By Rick Russotto. - :py
~xarray.DataArray.assign_coords
now supports dictionary arguments (3231
). By Gregory Gundersen. - Fix regression introduced in v0.12.2 where
copy(deep=True)
would convert unicode indices to dtype=object (3094
). By Guido Imperiale. - Improved error handling and documentation for .expand_dims() read-only view.
- Fix tests for big-endian systems (
3125
). By Graham Inggs. - XFAIL several tests which are expected to fail on ARM systems due to a
datetime
issue in NumPy (2334
). By Graham Inggs. - Fix KeyError that arises when using .sel method with float values different from coords float type (
3137
). By Hasan Ahmad. - Fixed bug in
combine_by_coords()
causing a ValueError if the input had an unused dimension with coordinates which were not monotonic (3150
). By Tom Nicholas. - Fixed crash when applying
distributed.Client.compute()
to a DataArray (3171
). By Guido Imperiale. - Better error message when using groupby on an empty DataArray (
3037
). By Hasan Ahmad. - Fix error that arises when using open_mfdataset on a series of netcdf files having differing values for a variable attribute of type list. (
3034
) By Hasan Ahmad. - Prevent :py
~xarray.DataArray.argmax
and :py~xarray.DataArray.argmin
from calling dask compute (3237
). By Ulrich Herter. - Plots in 2 dimensions (pcolormesh, contour) now allow to specify levels as numpy array (
3284
). By Mathias Hauser. - Fixed bug in
DataArray.quantile
failing to keep attributes when keep_attrs was True (3304
). By David Huard.
- Created a PR checklist as a quick reference for tasks before creating a new PR or pushing new commits. By Gregory Gundersen.
- Fixed documentation to clean up unwanted files created in
ipython
examples (3227
). By Gregory Gundersen.
- New methods :py
Dataset.to_stacked_array
and :pyDataArray.to_unstacked_dataset
for reshaping Datasets of variables with different dimensions (1317
). This is useful for feeding data from xarray into machine learning models, as described inreshape.stacking_different
. By Noah Brenowitz.
- Support for renaming
Dataset
variables and dimensions independently with :py~Dataset.rename_vars
and :py~Dataset.rename_dims
(3026
). By Julia Kent. - Add
scales
,offsets
,units
anddescriptions
attributes to :py~xarray.DataArray
returned by :py~xarray.open_rasterio
. (3013
) By Erle Carrara.
- Resolved deprecation warnings from newer versions of matplotlib and dask.
- Compatibility fixes for the upcoming pandas 0.25 and NumPy 1.17 releases. By Stephan Hoyer.
- Fix summaries for multiindex coordinates (
3079
). By Jonas Hörsch. - Fix HDF5 error that could arise when reading multiple groups from a file at once (
2954
). By Stephan Hoyer.
Two new functions, :py
~xarray.combine_nested
and :py~xarray.combine_by_coords
, allow for combining datasets along any number of dimensions, instead of the one-dimensional list of datasets supported by :py~xarray.concat
.The new
combine_nested
will accept the datasets as a nested list-of-lists, and combine by applying a series of concat and merge operations. The newcombine_by_coords
instead uses the dimension coordinates of datasets to order them.:py
~xarray.open_mfdataset
can use eithercombine_nested
orcombine_by_coords
to combine datasets along multiple dimensions, by specifying the argumentcombine='nested'
orcombine='by_coords'
.The older function
auto_combine
has been deprecated, because its functionality has been subsumed by the new functions. To avoid FutureWarnings switch to usingcombine_nested
orcombine_by_coords
, (or set thecombine
argument inopen_mfdataset
). (2159
) By Tom Nicholas.- :py
~xarray.DataArray.rolling_exp
and :py~xarray.Dataset.rolling_exp
added, similar to pandas'pd.DataFrame.ewm
method. Calling.mean
on the resulting object will return an exponentially weighted moving average. By Maximilian Roos. - New :py
DataArray.str <core.accessor_str.StringAccessor>
for string related manipulations, based onpandas.Series.str
. By 0x0L. - Added
strftime
method to.dt
accessor, making it simpler to hand a datetimeDataArray
to other code expecting formatted dates and times. (2090
). :py~xarray.CFTimeIndex.strftime
is also now available on :pyCFTimeIndex
. By Alan Brammer and Ryan May. GroupBy.quantile
is now a method ofGroupBy
objects (3018
). By David Huard.- Argument and return types are added to most methods on
DataArray
andDataset
, allowing static type checking both within xarray and external libraries. Type checking with mypy is enabled in CI (though not required yet). By Guido Imperiale and Maximilian Roos.
- Add
keepdims
argument for reduce operations (2170
) By Scott Wales. - Enable
@
operator for DataArray. This is equivalent to :pyDataArray.dot
By Maximilian Roos. - Add
fill_value
argument for reindex, align, and merge operations to enable custom fill values. (2876
) By Zach Griffith. - :py
DataArray.transpose
now accepts a keyword argumenttranspose_coords
which enables transposition of coordinates in the same way as :pyDataset.transpose
. :pyDataArray.groupby
:pyDataArray.groupby_bins
, and :pyDataArray.resample
now accept a keyword argumentrestore_coord_dims
which keeps the order of the dimensions of multi-dimensional coordinates intact (1856
). By Peter Hausamann. - Clean up Python 2 compatibility in code (
2950
) By Guido Imperiale. - Better warning message when supplying invalid objects to
xr.merge
(2948
). By Mathias Hauser. - Add
errors
keyword argument toDataset.drop
and :pyDataset.drop_dims
that allows ignoring errors if a passed label or dimension is not in the dataset (2994
). By Andrew Ross.
- Implement :py
~xarray.load_dataset
and :py~xarray.load_dataarray
as alternatives to :py~xarray.open_dataset
and :py~xarray.open_dataarray
to open, load into memory, and close files, returning the Dataset or DataArray. These functions are helpful for avoiding file-lock errors when trying to write to files opened usingopen_dataset()
oropen_dataarray()
. (2887
) By Dan Nowacki. - It is now possible to extend existing
io.zarr
datasets, by usingmode='a'
and the newappend_dim
argument in :py~xarray.Dataset.to_zarr
. By Jendrik Jördening, David Brochart, Ryan Abernathey and Shikhar Goenka. xr.open_zarr
now accepts manually specified chunks with thechunks=
parameter.auto_chunk=True
is equivalent tochunks='auto'
for backwards compatibility. Theoverwrite_encoded_chunks
parameter is added to remove the original zarr chunk encoding. By Lily Wang.- netCDF chunksizes are now only dropped when original_shape is different, not when it isn't found. (
2207
) By Karel van de Plassche. - Character arrays' character dimension name decoding and encoding handled by
var.encoding['char_dim_name']
(2895
) By James McCreight. - open_rasterio() now supports rasterio.vrt.WarpedVRT with custom transform, width and height (
2864
). By Julien Michel.
- Rolling operations on xarray objects containing dask arrays could silently compute the incorrect result or use large amounts of memory (
2940
). By Stephan Hoyer. - Don't set encoding attributes on bounds variables when writing to netCDF. (
2921
) By Deepak Cherian. - NetCDF4 output: variables with unlimited dimensions must be chunked (not contiguous) on output. (
1849
) By James McCreight. - indexing with an empty list creates an object with zero-length axis (
2882
) By Mayeul d'Avezac. - Return correct count for scalar datetime64 arrays (
2770
) By Dan Nowacki. - Fixed max, min exception when applied to a multiIndex (
2923
) By Ian Castleden - A deep copy deep-copies the coords (
1463
) By Martin Pletcher. - Increased support for missing_value (
2871
) By Deepak Cherian. - Removed usages of pytest.config, which is deprecated (
2988
) By Maximilian Roos. - Fixed performance issues with cftime installed (
3000
) By 0x0L. - Replace incorrect usages of message in pytest assertions with match (
3011
) By Maximilian Roos. - Add explicit pytest markers, now required by pytest (
3032
). By Maximilian Roos. - Test suite fixes for newer versions of pytest (
3011
,3032
). By Maximilian Roos and Stephan Hoyer.
- Allow
expand_dims
method to support inserting/broadcasting dimensions with size > 1. (2710
) By Martin Pletcher.
- Dataset.copy(deep=True) now creates a deep copy of the attrs (
2835
). By Andras Gefferth. - Fix incorrect
indexes
resulting from variousDataset
operations (e.g.,swap_dims
,isel
,reindex
,[]
) (2842
,2856
). By Stephan Hoyer.
Highlights include:
- Removed support for Python 2. This is the first version of xarray that is Python 3 only!
- New :py
~xarray.DataArray.coarsen
and :py~xarray.DataArray.integrate
methods. Seecompute.coarsen
andcompute.using_coordinates
for details. - Many improvements to cftime support. See below for details.
- The
compat
argument toDataset
and theencoding
argument toDataArray
are deprecated and will be removed in a future release. (1188
) By Maximilian Roos.
- Resampling of standard and non-standard calendars indexed by :py
~xarray.CFTimeIndex
is now possible. (2191
). By Jwen Fai Low and Spencer Clark. - Taking the mean of arrays of :py
cftime.datetime
objects, and by extension, use of :py~xarray.DataArray.coarsen
with :pycftime.datetime
coordinates is now possible. By Spencer Clark. - Internal plotting now supports
cftime.datetime
objects as time series. (2164
) By Julius Busecke and Spencer Clark. - :py
~xarray.cftime_range
now supports QuarterBegin and QuarterEnd offsets (2663
). By Jwen Fai Low - :py
~xarray.open_dataset
now accepts ause_cftime
argument, which can be used to require thatcftime.datetime
objects are always used, or never used when decoding dates encoded with a standard calendar. This can be used to ensure consistent date types are returned when using :py~xarray.open_mfdataset
(1263
) and/or to silence serialization warnings raised if dates from a standard calendar are found to be outside the :pypandas.Timestamp
-valid range (2754
). By Spencer Clark. - :py
pandas.Series.dropna
is now supported for a :pypandas.Series
indexed by a :py~xarray.CFTimeIndex
(2688
). By Spencer Clark.
- Added ability to open netcdf4/hdf5 file-like objects with
open_dataset
. Requires (h5netcdf>0.7 and h5py>2.9.0). (2781
) By Scott Henderson - Add
data=False
option toto_dict()
methods. (2656
) By Ryan Abernathey - :py
DataArray.coarsen
and :pyDataset.coarsen
are newly added. Seecompute.coarsen
for details. (2525
) By Keisuke Fujii. - Upsampling an array via interpolation with resample is now dask-compatible, as long as the array is not chunked along the resampling dimension. By Spencer Clark.
- :py
xarray.testing.assert_equal
and :pyxarray.testing.assert_identical
now provide a more detailed report showing what exactly differs between the two objects (dimensions / coordinates / variables / attributes) (1507
). By Benoit Bovy. - Add
tolerance
option toresample()
methodsbfill
,pad
,nearest
. (2695
) By Hauke Schulz. - :py
DataArray.integrate
and :pyDataset.integrate
are newly added. Seecompute.using_coordinates
for the detail. (1332
) By Keisuke Fujii. - Added :py
~xarray.Dataset.drop_dims
(1949
). By Kevin Squire.
- Silenced warnings that appear when using pandas 0.24. By Stephan Hoyer
- Interpolating via resample now internally specifies
bounds_error=False
as an argument toscipy.interpolate.interp1d
, allowing for interpolation from higher frequencies to lower frequencies. Datapoints outside the bounds of the original time coordinate are now filled with NaN (2197
). By Spencer Clark. - Line plots with the
x
argument set to a non-dimensional coord now plot the correct data for 1D DataArrays. (2725
). By Tom Nicholas. - Subtracting a scalar
cftime.datetime
object from a :pyCFTimeIndex
now results in a :pypandas.TimedeltaIndex
instead of raising aTypeError
(2671
). By Spencer Clark. - backend_kwargs are no longer ignored when using open_dataset with pynio engine (:issue:'2380') By Jonathan Joyce.
- Fix
open_rasterio
creating a WKT CRS instead of PROJ.4 withrasterio
1.0.14+ (2715
). By David Hoese. - Masking data arrays with :py
xarray.DataArray.where
now returns an array with the name of the original masked array (2748
and2457
). By Yohai Bar-Sinai. - Fixed error when trying to reduce a DataArray using a function which does not require an axis argument. (
2768
) By Tom Nicholas. - Concatenating a sequence of :py
~xarray.DataArray
with varying names sets the name of the output array toNone
, instead of the name of the first input array. If the names are the same it sets the name to that, instead to the name of the first DataArray in the list as it did before. (2775
). By Tom Nicholas. - Per the CF conventions section on calendars, specifying
'standard'
as the calendar type in :py~xarray.cftime_range
now correctly refers to the'gregorian'
calendar instead of the'proleptic_gregorian'
calendar (2761
).
- Saving files with times encoded with reference dates with timezones (e.g. '2000-01-01T00:00:00-05:00') no longer raises an error (
2649
). By Spencer Clark. - Fixed performance regression with
open_mfdataset
(2662
). By Tom Nicholas. - Fixed supplying an explicit dimension in the
concat_dim
argument to toopen_mfdataset
(2647
). By Ben Root.
Removes inadvertently introduced setup dependency on pytest-runner (2641
). Otherwise, this release is exactly equivalent to 0.11.1.
Warning
This is the last xarray release that will support Python 2.7. Future releases will be Python 3 only, but older versions of xarray will always be available for Python 2.7 users. For the more details, see:
Xarray Github issue discussing dropping Python 2 <1829>
- Python 3 Statement
- Tips on porting to Python 3
This minor release includes a number of enhancements and bug fixes, and two (slightly) breaking changes.
- Minimum rasterio version increased from 0.36 to 1.0 (for
open_rasterio
) - Time bounds variables are now also decoded according to CF conventions (
2565
). The previous behavior was to decode them only if they had specific time attributes, now these attributes are copied automatically from the corresponding time coordinate. This might break downstream code that was relying on these variables to be brake downstream code that was relying on these variables to be not decoded. By Fabien Maussion.
- Ability to read and write consolidated metadata in zarr stores (
2558
). By Ryan Abernathey. - :py
CFTimeIndex
uses slicing for string indexing when possible (like :pypandas.DatetimeIndex
), which avoids unnecessary copies. By Stephan Hoyer - Enable passing
rasterio.io.DatasetReader
orrasterio.vrt.WarpedVRT
toopen_rasterio
instead of file path string. Allows for in-memory reprojection, see (2588
). By Scott Henderson. - Like :py
pandas.DatetimeIndex
, :pyCFTimeIndex
now supports "dayofyear" and "dayofweek" accessors (2597
). Note this requires a version of cftime greater than 1.0.2. By Spencer Clark. - The option
'warn_for_unclosed_files'
(False by default) has been added to allow users to enable a warning when files opened by xarray are deallocated but were not explicitly closed. This is mostly useful for debugging; we recommend enabling it in your test suites if you use xarray for IO. By Stephan Hoyer - Support Dask
HighLevelGraphs
by Matthew Rocklin. - :py
DataArray.resample
and :pyDataset.resample
now supports theloffset
kwarg just like pandas. By Deepak Cherian - Datasets are now guaranteed to have a
'source'
encoding, so the source file name is always stored (2550
). By Tom Nicholas. - The
apply
methods forDatasetGroupBy
,DataArrayGroupBy
,DatasetResample
andDataArrayResample
now support passing positional arguments to the applied function as a tuple to theargs
argument. By Matti Eskelinen. - 0d slices of ndarrays are now obtained directly through indexing, rather than extracting and wrapping a scalar, avoiding unnecessary copying. By Daniel Wennberg.
- Added support for
fill_value
with :py~xarray.DataArray.shift
and :py~xarray.Dataset.shift
By Maximilian Roos
- Ensure files are automatically closed, if possible, when no longer referenced by a Python variable (
2560
). By Stephan Hoyer - Fixed possible race conditions when reading/writing to disk in parallel (
2595
). By Stephan Hoyer - Fix h5netcdf saving scalars with filters or chunks (
2563
). By Martin Raspaud. - Fix parsing of
_Unsigned
attribute set by OPENDAP servers. (2583
). By Deepak Cherian - Fix failure in time encoding when exporting to netCDF with versions of pandas less than 0.21.1 (
2623
). By Spencer Clark. - Fix MultiIndex selection to update label and level (
2619
). By Keisuke Fujii.
- Finished deprecations (changed behavior with this release):
Dataset.T
has been removed as a shortcut for :pyDataset.transpose
. Call :pyDataset.transpose
directly instead.- Iterating over a
Dataset
now includes only data variables, not coordinates. Similarly, callinglen
andbool
on aDataset
now includes only data variables. DataArray.__contains__
(used by Python'sin
operator) now checks array data, not coordinates.- The old resample syntax from before xarray 0.10, e.g.,
data.resample('1D', dim='time', how='mean')
, is no longer supported will raise an error in most cases. You need to use the new resample syntax instead, e.g.,data.resample(time='1D').mean()
ordata.resample({'time': '1D'}).mean()
.
- New deprecations (behavior will be changed in xarray 0.12):
- Reduction of :py
DataArray.groupby
and :pyDataArray.resample
without dimension argument will change in the next release. Now we warn a FutureWarning. By Keisuke Fujii. - The
inplace
kwarg of a number of DataArray and Dataset methods is being deprecated and will be removed in the next release. By Deepak Cherian.
- Reduction of :py
- Refactored storage backends:
Xarray's storage backends now automatically open and close files when necessary, rather than requiring opening a file with
autoclose=True
. A global least-recently-used cache is used to store open files; the default limit of 128 open files should suffice in most cases, but can be adjusted if necessary withxarray.set_options(file_cache_maxsize=...)
. Theautoclose
argument toopen_dataset
and related functions has been deprecated and is now a no-op.This change, along with an internal refactor of xarray's storage backends, should significantly improve performance when reading and writing netCDF files with Dask, especially when working with many files or using Dask Distributed. By Stephan Hoyer
- Support for non-standard calendars used in climate science:
- Xarray will now always use :py
cftime.datetime
objects, rather than by default trying to coerce them intonp.datetime64[ns]
objects. A :py~xarray.CFTimeIndex
will be used for indexing along time coordinates in these cases. - A new method :py
~xarray.CFTimeIndex.to_datetimeindex
has been added to aid in converting from a :py~xarray.CFTimeIndex
to a :pypandas.DatetimeIndex
for the remaining use-cases where using a :py~xarray.CFTimeIndex
is still a limitation (e.g. for resample or plotting). - Setting the
enable_cftimeindex
option is now a no-op and emits aFutureWarning
.
- Xarray will now always use :py
- :py
xarray.DataArray.plot.line
can now accept multidimensional coordinate variables as input. hue must be a dimension name in this case. (2407
) By Deepak Cherian. - Added support for Python 3.7. (
2271
). By Joe Hamman. - Added support for plotting data with pandas.Interval coordinates, such as those created by :py
~xarray.DataArray.groupby_bins
By Maximilian Maahn. - Added :py
~xarray.CFTimeIndex.shift
for shifting the values of a CFTimeIndex by a specified frequency. (2244
). By Spencer Clark. - Added support for using
cftime.datetime
coordinates with :py~xarray.DataArray.differentiate
, :py~xarray.Dataset.differentiate
, :py~xarray.DataArray.interp
, and :py~xarray.Dataset.interp
. By Spencer Clark - There is now a global option to either always keep or always discard dataset and dataarray attrs upon operations. The option is set with
xarray.set_options(keep_attrs=True)
, and the default is to use the old behaviour. By Tom Nicholas. - Added a new backend for the GRIB file format based on ECMWF cfgrib python driver and ecCodes C-library. (
2475
) By Alessandro Amici, sponsored by ECMWF. - Resample now supports a dictionary mapping from dimension to frequency as its first argument, e.g.,
data.resample({'time': '1D'}).mean()
. This is consistent with other xarray functions that accept either dictionaries or keyword arguments. By Stephan Hoyer. - The preferred way to access tutorial data is now to load it lazily with :py
xarray.tutorial.open_dataset
. :pyxarray.tutorial.load_dataset
calls Dataset.load() prior to returning (and is now deprecated). This was changed in order to facilitate using tutorial datasets with dask. By Joe Hamman. DataArray
can now usexr.set_option(keep_attrs=True)
and retain attributes in binary operations, such as (+, -, * ,/
). Default behaviour is unchanged (Attributes will be dismissed). By Michael Blaschek
FacetGrid
now properly uses thecbar_kwargs
keyword argument. (1504
,1717
) By Deepak Cherian.- Addition and subtraction operators used with a CFTimeIndex now preserve the index's type. (
2244
). By Spencer Clark. - We now properly handle arrays of
datetime.datetime
anddatetime.timedelta
provided as coordinates. (2512
) By Deepak Cherian. xarray.DataArray.roll
correctly handles multidimensional arrays. (2445
) By Keisuke Fujii.xarray.plot()
now properly accepts anorm
argument and does not override the norm'svmin
andvmax
. (2381
) By Deepak Cherian.xarray.DataArray.std()
now correctly acceptsddof
keyword argument. (2240
) By Keisuke Fujii.- Restore matplotlib's default of plotting dashed negative contours when a single color is passed to
DataArray.contour()
e.g.colors='k'
. By Deepak Cherian. - Fix a bug that caused some indexing operations on arrays opened with
open_rasterio
to error (2454
). By Stephan Hoyer. - Subtracting one CFTimeIndex from another now returns a
pandas.TimedeltaIndex
, analogous to the behavior for DatetimeIndexes (2484
). By Spencer Clark. - Adding a TimedeltaIndex to, or subtracting a TimedeltaIndex from a CFTimeIndex is now allowed (
2484
). By Spencer Clark. - Avoid use of Dask's deprecated
get=
parameter in tests by Matthew Rocklin. - An
OverflowError
is now accurately raised and caught during the encoding process if a reference date is used that is so distant that the dates must be encoded using cftime rather than NumPy (2272
). By Spencer Clark. - Chunked datasets can now roundtrip to Zarr storage continually with to_zarr and
open_zarr
(2300
). By Lily Wang.
This minor release contains a number of backwards compatible enhancements.
Announcements of note:
- Xarray is now a NumFOCUS fiscally sponsored project! Read the announcement for more details.
- We have a new
roadmap
that outlines our future development plans. Dataset.apply
now properly documents the way func is called. By Matti Eskelinen.
- :py
~xarray.DataArray.differentiate
and :py~xarray.Dataset.differentiate
are newly added. (1332
) By Keisuke Fujii. - Default colormap for sequential and divergent data can now be set via :py
~xarray.set_options()
(2394
) By Julius Busecke. - min_count option is newly supported in :py
~xarray.DataArray.sum
, :py~xarray.DataArray.prod
and :py~xarray.Dataset.sum
, and :py~xarray.Dataset.prod
. (2230
) By Keisuke Fujii. - :py
~plot.plot()
now accepts the kwargsxscale, yscale, xlim, ylim, xticks, yticks
just like pandas. Alsoxincrease=False, yincrease=False
now use matplotlib's axis inverting methods instead of setting limits. By Deepak Cherian. (2224
) - DataArray coordinates and Dataset coordinates and data variables are now displayed as a b ... y z rather than a b c d .... (
1186
) By Seth P. - A new CFTimeIndex-enabled :py
cftime_range
function for use in generating dates from standard or non-standard calendars. By Spencer Clark. - When interpolating over a
datetime64
axis, you can now provide a datetime string instead of adatetime64
object. E.g.da.interp(time='1991-02-01')
(2284
) By Deepak Cherian. - A clear error message is now displayed if a
set
ordict
is passed in place of an array (2331
) By Maximilian Roos. - Applying
unstack
to a large DataArray or Dataset is now much faster if the MultiIndex has not been modified after stacking the indices. (1560
) By Maximilian Maahn. - You can now control whether or not to offset the coordinates when using the
roll
method and the current behavior, coordinates rolled by default, raises a deprecation warning unless explicitly setting the keyword argument. (1875
) By Andrew Huang. - You can now call
unstack
without arguments to unstack every MultiIndex in a DataArray or Dataset. By Julia Signell. - Added the ability to pass a data kwarg to
copy
to create a new object with the same metadata as the original object but using new values. By Julia Signell.
xarray.plot.imshow()
correctly uses theorigin
argument. (2379
) By Deepak Cherian.- Fixed
DataArray.to_iris()
failure while creatingDimCoord
by falling back to creatingAuxCoord
. Fixed dependency onvar_name
attribute being set. (2201
) By Thomas Voigt. - Fixed a bug in
zarr
backend which prevented use with datasets with invalid chunk size encoding after reading from an existing store (2278
). By Joe Hamman. - Tests can be run in parallel with pytest-xdist By Tony Tung.
- Follow up the renamings in dask; from dask.ghost to dask.overlap By Keisuke Fujii.
- Now raises a ValueError when there is a conflict between dimension names and level names of MultiIndex. (
2299
) By Keisuke Fujii. - Follow up the renamings in dask; from dask.ghost to dask.overlap By Keisuke Fujii.
- Now :py
~xarray.apply_ufunc
raises a ValueError when the size ofinput_core_dims
is inconsistent with the number of arguments. (2341
) By Keisuke Fujii. - Fixed
Dataset.filter_by_attrs()
behavior not matchingnetCDF4.Dataset.get_variables_by_attributes()
. When more than onekey=value
is passed intoDataset.filter_by_attrs()
it will now return a Dataset with variables which pass all the filters. (2315
) By Andrew Barna.
Xarray no longer supports python 3.4. Additionally, the minimum supported versions of the following dependencies has been updated and/or clarified:
- pandas: 0.18 -> 0.19
- NumPy: 1.11 -> 1.12
- Dask: 0.9 -> 0.16
- Matplotlib: unspecified -> 1.5
(
2204
). By Joe Hamman.
- :py
~xarray.DataArray.interp_like
and :py~xarray.Dataset.interp_like
methods are newly added. (2218
) By Keisuke Fujii. - Added support for curvilinear and unstructured generic grids to :py
~xarray.DataArray.to_cdms2
and :py~xarray.DataArray.from_cdms2
(2262
). By Stephane Raynaud.
- Fixed a bug in
zarr
backend which prevented use with datasets with incomplete chunks in multiple dimensions (2225
). By Joe Hamman. - Fixed a bug in :py
~Dataset.to_netcdf
which prevented writing datasets when the arrays had different chunk sizes (2254
). By Mike Neish. - Fixed masking during the conversion to cdms2 objects by :py
~xarray.DataArray.to_cdms2
(2262
). By Stephane Raynaud. - Fixed a bug in 2D plots which incorrectly raised an error when 2D coordinates weren't monotonic (
2250
). By Fabien Maussion. - Fixed warning raised in :py
~Dataset.to_netcdf
due to deprecation of effective_get in dask (2238
). By Joe Hamman.
- Plot labels now make use of metadata that follow CF conventions (
2135
). By Deepak Cherian and Ryan Abernathey. - Line plots now support facetting with
row
andcol
arguments (2107
). By Yohai Bar Sinai. - :py
~xarray.DataArray.interp
and :py~xarray.Dataset.interp
methods are newly added. Seeinterp
for the detail. (2079
) By Keisuke Fujii.
- Fixed a bug in
rasterio
backend which prevented use withdistributed
. Therasterio
backend now returns pickleable objects (2021
). By Joe Hamman.
The minor release includes a number of bug-fixes and backwards compatible enhancements.
- New PseudoNetCDF backend for many Atmospheric data formats including GEOS-Chem, CAMx, NOAA arlpacked bit and many others. See
io.PseudoNetCDF
for more details. By Barron Henderson. - The :py
Dataset
constructor now aligns :pyDataArray
arguments indata_vars
to indexes set explicitly incoords
, where previously an error would be raised. (674
) By Maximilian Roos. - :py
~DataArray.sel
, :py~DataArray.isel
& :py~DataArray.reindex
, (and their :pyDataset
counterparts) now support supplying adict
as a first argument, as an alternative to the existing approach of supplying kwargs. This allows for more robust behavior of dimension names which conflict with other keyword names, or are not strings. By Maximilian Roos. - :py
~DataArray.rename
now supports supplying**kwargs
, as an alternative to the existing approach of supplying adict
as the first argument. By Maximilian Roos. - :py
~DataArray.cumsum
and :py~DataArray.cumprod
now support aggregation over multiple dimensions at the same time. This is the default behavior when dimensions are not specified (previously this raised an error). By Stephan Hoyer - :py
DataArray.dot
and :pydot
are partly supported with older dask<0.17.4. (related to2203
) By Keisuke Fujii. - Xarray now uses Versioneer to manage its version strings. (
1300
). By Joe Hamman.
- Fixed a regression in 0.10.4, where explicitly specifying
dtype='S1'
ordtype=str
inencoding
withto_netcdf()
raised an error (2149
). Stephan Hoyer - :py
apply_ufunc
now directly validates output variables (1931
). By Stephan Hoyer. - Fixed a bug where
to_netcdf(..., unlimited_dims='bar')
yielded NetCDF files with spurious 0-length dimensions (i.e.b
,a
, andr
) (2134
). By Joe Hamman. - Removed spurious warnings with
Dataset.update(Dataset)
(2161
) andarray.equals(array)
whenarray
containsNaT
(2162
). By Stephan Hoyer. - Aggregations with :py
Dataset.reduce
(includingmean
,sum
, etc) no longer drop unrelated coordinates (1470
). Also fixed a bug where non-scalar data-variables that did not include the aggregation dimension were improperly skipped. By Stephan Hoyer - Fix
~DataArray.stack
with non-unique coordinates on pandas 0.23 (2160
). By Stephan Hoyer - Selecting data indexed by a length-1
CFTimeIndex
with a slice of strings now behaves as it does when using a length-1DatetimeIndex
(i.e. it no longer falsely returns an empty array when the slice includes the value in the index) (2165
). By Spencer Clark. - Fix
DataArray.groupby().reduce()
mutating coordinates on the input array when grouping over dimension coordinates with duplicated entries (2153
). By Stephan Hoyer - Fix
Dataset.to_netcdf()
cannot create group withengine="h5netcdf"
(2177
). By Stephan Hoyer
The minor release includes a number of bug-fixes and backwards compatible enhancements. A highlight is CFTimeIndex
, which offers support for non-standard calendars used in climate modeling.
- New FAQ entry,
ecosystem
. By Deepak Cherian. assigning_values
now includes examples on how to select and assign values to a :py~xarray.DataArray
with.loc
. By Chiara Lepore.
- Add an option for using a
CFTimeIndex
for indexing times with non-standard calendars and/or outside the Timestamp-valid range; this index enables a subset of the functionality of a standardpandas.DatetimeIndex
. SeeCFTimeIndex
for full details. (789
,1084
,1252
) By Spencer Clark with help from Stephan Hoyer. - Allow for serialization of
cftime.datetime
objects (789
,1084
,2008
,1252
) using the standalonecftime
library. By Spencer Clark. - Support writing lists of strings as netCDF attributes (
2044
). By Dan Nowacki. - :py
~xarray.Dataset.to_netcdf
withengine='h5netcdf'
now accepts h5py encoding settingscompression
andcompression_opts
, along with the NetCDF4-Python style settingsgzip=True
andcomplevel
. This allows using any compression plugin installed in hdf5, e.g. LZF (1536
). By Guido Imperiale. - :py
~xarray.dot
on dask-backed data will now calldask.array.einsum
. This greatly boosts speed and allows chunking on the core dims. The function now requires dask >= 0.17.3 to work on dask-backed data (2074
). By Guido Imperiale. plot.line()
learned new kwargs:xincrease
,yincrease
that change the direction of the respective axes. By Deepak Cherian.- Added the
parallel
option to :pyopen_mfdataset
. This option usesdask.delayed
to parallelize the open and preprocessing steps withinopen_mfdataset
. This is expected to provide performance improvements when opening many files, particularly when used in conjunction with dask's multiprocessing or distributed schedulers (1981
). By Joe Hamman. - New
compute
option in :py~xarray.Dataset.to_netcdf
, :py~xarray.Dataset.to_zarr
, and :py~xarray.save_mfdataset
to allow for the lazy computation of netCDF and zarr stores. This feature is currently only supported by the netCDF4 and zarr backends. (1784
). By Joe Hamman.
ValueError
is raised when coordinates with the wrong size are assigned to a :pyDataArray
. (2112
) By Keisuke Fujii.- Fixed a bug in :py
~xarray.DataArray.rolling
with bottleneck. Also, fixed a bug in rolling an integer dask array. (2113
) By Keisuke Fujii. - Fixed a bug where keep_attrs=True flag was neglected if :py
apply_ufunc
was used with :pyVariable
. (2114
) By Keisuke Fujii. - When assigning a :py
DataArray
to :pyDataset
, any conflicted non-dimensional coordinates of the DataArray are now dropped. (2068
) By Keisuke Fujii. - Better error handling in
open_mfdataset
(2077
). By Stephan Hoyer. plot.line()
does not callautofmt_xdate()
anymore. Instead it changes the rotation and horizontal alignment of labels without removing the x-axes of any other subplots in the figure (if any). By Deepak Cherian.- Colorbar limits are now determined by excluding ±Infs too. By Deepak Cherian. By Joe Hamman.
- Fixed
to_iris
to maintain lazy dask array after conversion (2046
). By Alex Hilson and Stephan Hoyer.
The minor release includes a number of bug-fixes and backwards compatible enhancements.
- :py
~xarray.DataArray.isin
and :py~xarray.Dataset.isin
methods, which test each value in the array for whether it is contained in the supplied list, returning a bool array. Seeselecting values with isin
for full details. Similar to thenp.isin
function. By Maximilian Roos. - Some speed improvement to construct :py
~xarray.core.rolling.DataArrayRolling
object (1993
) By Keisuke Fujii. - Handle variables with different values for
missing_value
and_FillValue
by masking values for both attributes; previously this resulted in aValueError
. (2016
) By Ryan May.
- Fixed
decode_cf
function to operate lazily on dask arrays (1372
). By Ryan Abernathey. - Fixed labeled indexing with slice bounds given by xarray objects with datetime64 or timedelta64 dtypes (
1240
). By Stephan Hoyer. - Attempting to convert an xarray.Dataset into a numpy array now raises an informative error message. By Stephan Hoyer.
- Fixed a bug in decode_cf_datetime where
int32
arrays weren't parsed correctly (2002
). By Fabien Maussion. - When calling xr.auto_combine() or xr.open_mfdataset() with a concat_dim, the resulting dataset will have that one-element dimension (it was silently dropped, previously) (
1988
). By Ben Root.
The minor release includes a number of bug-fixes and enhancements, along with one possibly backwards incompatible change.
- The addition of
__array_ufunc__
for xarray objects (see below) means that NumPy ufunc methods (e.g.,np.add.reduce
) that previously worked onxarray.DataArray
objects by converting them into NumPy arrays will now raiseNotImplementedError
instead. In all cases, the work-around is simple: convert your objects explicitly into NumPy arrays before calling the ufunc (e.g., with.values
).
- Added :py
~xarray.dot
, equivalent to :pynumpy.einsum
. Also, :py~xarray.DataArray.dot
now supportsdims
option, which specifies the dimensions to sum over. (1951
) By Keisuke Fujii. - Support for writing xarray datasets to netCDF files (netcdf4 backend only) when using the dask.distributed scheduler (
1464
). By Joe Hamman. - Support lazy vectorized-indexing. After this change, flexible indexing such as orthogonal/vectorized indexing, becomes possible for all the backend arrays. Also, lazy
transpose
is now also supported. (1897
) By Keisuke Fujii. Implemented NumPy's
__array_ufunc__
protocol for all xarray objects (1617
). This enables using NumPy ufuncs directly onxarray.Dataset
objects with recent versions of NumPy (v1.13 and newer):python
ds = xr.Dataset({"a": 1}) np.sin(ds)
This obliviates the need for the
xarray.ufuncs
module, which will be deprecated in the future when xarray drops support for older versions of NumPy. By Stephan Hoyer.- Improve :py
~xarray.DataArray.rolling
logic. :py~xarray.core.rolling.DataArrayRolling
object now supports :py~xarray.core.rolling.DataArrayRolling.construct
method that returns a view of the DataArray / Dataset object with the rolling-window dimension added to the last axis. This enables more flexible operation, such as strided rolling, windowed rolling, ND-rolling, short-time FFT and convolution. (1831
,1142
,819
) By Keisuke Fujii. - :py
~plot.line()
learned to make plots with data on x-axis if so specified. (575
) By Deepak Cherian.
- Raise an informative error message when using
apply_ufunc
with numpy v1.11 (1956
). By Stephan Hoyer. - Fix the precision drop after indexing datetime64 arrays (
1932
). By Keisuke Fujii. - Silenced irrelevant warnings issued by
open_rasterio
(1964
). By Stephan Hoyer. - Fix kwarg colors clashing with auto-inferred cmap (
1461
) By Deepak Cherian. - Fix :py
~xarray.plot.imshow
error when passed an RGB array with size one in a spatial dimension. By Zac Hatfield-Dodds.
The minor release includes a number of bug-fixes and backwards compatible enhancements.
- Added a new guide on
contributing
(640
) By Joe Hamman. - Added apply_ufunc example to
/examples/weather-data.ipynb#Toy-weather-data
(1844
). By Liam Brannigan. - New entry Why don’t aggregations return Python scalars? in the
getting-started-guide/faq
(1726
). By 0x0L.
New functions and methods:
- Added :py
DataArray.to_iris
and :pyDataArray.from_iris
for converting data arrays to and from Iris Cubes with the same data and coordinates (621
and37
). By Neil Parley and Duncan Watson-Parris. - Experimental support for using Zarr as storage layer for xarray (
1223
). By Ryan Abernathey and Joe Hamman. - New :py
~xarray.DataArray.rank
on arrays and datasets. Requires bottleneck (1731
). By 0x0L. .dt
accessor can now ceil, floor and round timestamps to specified frequency. By Deepak Cherian.
Plotting enhancements:
xarray.plot.imshow
now handles RGB and RGBA images. Saturation can be adjusted withvmin
andvmax
, or withrobust=True
. By Zac Hatfield-Dodds.- :py
~plot.contourf()
learned to contour 2D variables that have both a 1D coordinate (e.g. time) and a 2D coordinate (e.g. depth as a function of time) (1737
). By Deepak Cherian. - :py
~plot.plot()
rotates x-axis ticks if x-axis is time. By Deepak Cherian. - :py
~plot.line()
can draw multiple lines if provided with a 2D variable. By Deepak Cherian.
Other enhancements:
Reduce methods such as :py
DataArray.sum()
now handles object-type array.python
da = xr.DataArray(np.array([True, False, np.nan], dtype=object), dims="x") da.sum()
(
1866
) By Keisuke Fujii.- Reduce methods such as :py
DataArray.sum()
now acceptsdtype
arguments. (1838
) By Keisuke Fujii. - Added nodatavals attribute to DataArray when using :py
~xarray.open_rasterio
. (1736
). By Alan Snow. - Use
pandas.Grouper
class in xarray resample methods rather than the deprecatedpandas.TimeGrouper
class (1766
). By Joe Hamman. - Experimental support for parsing ENVI metadata to coordinates and attributes in :py
xarray.open_rasterio
. By Matti Eskelinen. - Reduce memory usage when decoding a variable with a scale_factor, by converting 8-bit and 16-bit integers to float32 instead of float64 (
1840
), and keeping float16 and float32 as float32 (1842
). Correspondingly, encoded variables may also be saved with a smaller dtype. By Zac Hatfield-Dodds. - Speed of reindexing/alignment with dask array is orders of magnitude faster when inserting missing values (
1847
). By Stephan Hoyer. - Fix
axis
keyword ignored when applyingnp.squeeze
toDataArray
(1487
). By Florian Pinault. netcdf4-python
has moved the its time handling in thenetcdftime
module to a standalone package (netcdftime). As such, xarray now considers netcdftime an optional dependency. One benefit of this change is that it allows for encoding/decoding of datetimes with non-standard calendars without thenetcdf4-python
dependency (1084
). By Joe Hamman.
New functions/methods
- New :py
~xarray.DataArray.rank
on arrays and datasets. Requires bottleneck (1731
). By 0x0L.
- Rolling aggregation with
center=True
option now gives the same result with pandas including the last element (1046
). By Keisuke Fujii. - Support indexing with a 0d-np.ndarray (
1921
). By Keisuke Fujii. - Added warning in api.py of a netCDF4 bug that occurs when the filepath has 88 characters (
1745
). By Liam Brannigan. - Fixed encoding of multi-dimensional coordinates in :py
~Dataset.to_netcdf
(1763
). By Mike Neish. - Fixed chunking with non-file-based rasterio datasets (
1816
) and refactored rasterio test suite. By Ryan Abernathey - Bug fix in open_dataset(engine='pydap') (
1775
) By Keisuke Fujii. - Bug fix in vectorized assignment (
1743
,1744
). Now item assignment to :py~DataArray.__setitem__
checks - Bug fix in vectorized assignment (
1743
,1744
). Now item assignment to :pyDataArray.__setitem__
checks coordinates of target, destination and keys. If there are any conflict among these coordinates,IndexError
will be raised. By Keisuke Fujii. - Properly point
DataArray.__dask_scheduler__
todask.threaded.get
. By Matthew Rocklin. - Bug fixes in :py
DataArray.plot.imshow
: all-NaN arrays and arrays with size one in some dimension can now be plotted, which is good for exploring satellite imagery (1780
). By Zac Hatfield-Dodds. - Fixed
UnboundLocalError
when opening netCDF file (1781
). By Stephan Hoyer. - The
variables
,attrs
, anddimensions
properties have been deprecated as part of a bug fix addressing an issue where backends were unintentionally loading the datastores data and attributes repeatedly during writes (1798
). By Joe Hamman. - Compatibility fixes to plotting module for NumPy 1.14 and pandas 0.22 (
1813
). By Joe Hamman. - Bug fix in encoding coordinates with
{'_FillValue': None}
in netCDF metadata (1865
). By Chris Roth. - Fix indexing with lists for arrays loaded from netCDF files with
engine='h5netcdf
(1864
). By Stephan Hoyer. - Corrected a bug with incorrect coordinates for non-georeferenced geotiff files (
1686
). Internally, we now use the rasterio coordinate transform tool instead of doing the computations ourselves. Aparse_coordinates
kwarg has been added to :py~open_rasterio
(set toTrue
per default). By Fabien Maussion. - The colors of discrete colormaps are now the same regardless if seaborn is installed or not (
1896
). By Fabien Maussion. - Fixed dtype promotion rules in :py
where
and :pyconcat
to match pandas (1847
). A combination of strings/numbers or unicode/bytes now promote to object dtype, instead of strings or unicode. By Stephan Hoyer. - Fixed bug where :py
~xarray.DataArray.isnull
was loading data stored as dask arrays (1937
). By Joe Hamman.
This is a major release that includes bug fixes, new features and a few backwards incompatible changes. Highlights include:
- Indexing now supports broadcasting over dimensions, similar to NumPy's vectorized indexing (but better!).
- :py
~DataArray.resample
has a new groupby-like API like pandas. - :py
~xarray.apply_ufunc
facilitates wrapping and parallelizing functions written for NumPy arrays. - Performance improvements, particularly for dask and :py
open_mfdataset
.
xarray now supports a form of vectorized indexing with broadcasting, where the result of indexing depends on dimensions of indexers, e.g.,
array.sel(x=ind)
withind.dims == ('y',)
. Alignment between coordinates on indexed and indexing objects is also now enforced. Due to these changes, existing uses of xarray objects to index other xarray objects will break in some cases.The new indexing API is much more powerful, supporting outer, diagonal and vectorized indexing in a single interface. The
isel_points
andsel_points
methods are deprecated, since they are now redundant with theisel
/sel
methods. Seevectorized_indexing
for the details (1444
,1436
). By Keisuke Fujii and Stephan Hoyer.A new resampling interface to match pandas' groupby-like API was added to :py
Dataset.resample
and :pyDataArray.resample
(1272
).Timeseries resampling <resampling>
is fully supported for data with arbitrary dimensions as is both downsampling and upsampling (including linear, quadratic, cubic, and spline interpolation).Old syntax:
In [1]: ds.resample("24H", dim="time", how="max") Out[1]: <xarray.Dataset> [...]
New syntax:
In [1]: ds.resample(time="24H").max() Out[1]: <xarray.Dataset> [...]
Note that both versions are currently supported, but using the old syntax will produce a warning encouraging users to adopt the new syntax. By Daniel Rothenberg.
- Calling
repr()
or printing xarray objects at the command line or in a Jupyter Notebook will not longer automatically compute dask variables or load data on arrays lazily loaded from disk (1522
). By Guido Imperiale. - Supplying
coords
as a dictionary to theDataArray
constructor without also supplying an explicitdims
argument is no longer supported. This behavior was deprecated in version 0.9 but will now raise an error (727
). - Several existing features have been deprecated and will change to new behavior in xarray v0.11. If you use any of them with xarray v0.10, you should see a
FutureWarning
that describes how to update your code:Dataset.T
has been deprecated an alias forDataset.transpose()
(1232
). In the next major version of xarray, it will provide short-cut lookup for variables or attributes with name'T'
.DataArray.__contains__
(e.g.,key in data_array
) currently checks for membership inDataArray.coords
. In the next major version of xarray, it will check membership in the array data found inDataArray.values
instead (1267
).- Direct iteration over and counting a
Dataset
(e.g.,[k for k in ds]
,ds.keys()
,ds.values()
,len(ds)
andif ds
) currently includes all variables, both data and coordinates. For improved usability and consistency with pandas, in the next major version of xarray these will change to only include data variables (884
). Useds.variables
,ds.data_vars
ords.coords
as alternatives.
- Changes to minimum versions of dependencies:
- Old numpy < 1.11 and pandas < 0.18 are no longer supported (
1512
). By Keisuke Fujii. - The minimum supported version bottleneck has increased to 1.1 (
1279
). By Joe Hamman.
- Old numpy < 1.11 and pandas < 0.18 are no longer supported (
New functions/methods
- New helper function :py
~xarray.apply_ufunc
for wrapping functions written to work on NumPy arrays to support labels on xarray objects (770
).apply_ufunc
also support automatic parallelization for many functions with dask. Seecomput.wrapping-custom
anddask.automatic-parallelization
for details. By Stephan Hoyer. - Added new method :py
Dataset.to_dask_dataframe
, convert a dataset into a dask dataframe. This allows lazy loading of data from a dataset containing dask arrays (1462
). By James Munroe. New function :py
~xarray.where
for conditionally switching between values in xarray objects, like :pynumpy.where
:In [1]: import xarray as xr
In [2]: arr = xr.DataArray([[1, 2, 3], [4, 5, 6]], dims=("x", "y"))
In [3]: xr.where(arr % 2, "even", "odd") Out[3]: <xarray.DataArray (x: 2, y: 3)> array([['even', 'odd', 'even'], ['odd', 'even', 'odd']], dtype='<U4') Dimensions without coordinates: x, y
Equivalently, the :py
~xarray.Dataset.where
method also now supports theother
argument, for filling with a value other thanNaN
(576
). By Stephan Hoyer.- Added :py
~xarray.show_versions
function to aid in debugging (1485
). By Joe Hamman.
Performance improvements
- :py
~xarray.concat
was computing variables that aren't in memory (e.g. dask-based) multiple times; :py~xarray.open_mfdataset
was loading them multiple times from disk. Now, both functions will instead load them at most once and, if they do, store them in memory in the concatenated array/dataset (1521
). By Guido Imperiale. - Speed-up (x 100) of
xarray.conventions.decode_cf_datetime
. By Christian Chwala.
IO related improvements
- Unicode strings (
str
on Python 3) are now round-tripped successfully even when written as character arrays (e.g., as netCDF3 files or when usingengine='scipy'
) (1638
). This is controlled by the_Encoding
attribute convention, which is also understood directly by the netCDF4-Python interface. Seeio.string-encoding
for full details. By Stephan Hoyer. - Support for
data_vars
andcoords
keywords from :py~xarray.concat
added to :py~xarray.open_mfdataset
(438
). Using these keyword arguments can significantly reduce memory usage and increase speed. By Oleksandr Huziy. Support for :py
pathlib.Path
objects added to :py~xarray.open_dataset
, :py~xarray.open_mfdataset
,xarray.to_netcdf
, and :py~xarray.save_mfdataset
(799
):In [2]: from pathlib import Path # In Python 2, use pathlib2!
In [3]: data_dir = Path("data/")
In [4]: one_file = data_dir / "dta_for_month_01.nc"
In [5]: xr.open_dataset(one_file) Out[5]: <xarray.Dataset> [...]
By Willi Rath.
- You can now explicitly disable any default
_FillValue
(NaN
for floating point values) by passing the encoding{'_FillValue': None}
(1598
). By Stephan Hoyer. - More attributes available in :py
~xarray.Dataset.attrs
dictionary when raster files are opened with :py~xarray.open_rasterio
. By Greg Brener. - Support for NetCDF files using an
_Unsigned
attribute to indicate that a a signed integer data type should be interpreted as unsigned bytes (1444
). By Eric Bruning. - Support using an existing, opened netCDF4
Dataset
with :py~xarray.backends.NetCDF4DataStore
. This permits creating an :py~xarray.Dataset
from a netCDF4Dataset
that has been opened using other means (1459
). By Ryan May. - Changed :py
~xarray.backends.PydapDataStore
to take a Pydap dataset. This permits opening Opendap datasets that require authentication, by instantiating a Pydap dataset with a session object. Also added :pyxarray.backends.PydapDataStore.open
which takes a url and session object (1068
). By Philip Graae. - Support reading and writing unlimited dimensions with h5netcdf (
1636
). By Joe Hamman.
Other improvements
- Added
_ipython_key_completions_
to xarray objects, to enable autocompletion for dictionary-like access in IPython, e.g.,ds['tem
+ tab ->ds['temperature']
(1628
). By Keisuke Fujii. - Support passing keyword arguments to
load
,compute
, andpersist
methods. Any keyword arguments supplied to these methods are passed on to the corresponding dask function (1523
). By Joe Hamman. - Encoding attributes are now preserved when xarray objects are concatenated. The encoding is copied from the first object (
1297
). By Joe Hamman and Gerrit Holl. - Support applying rolling window operations using bottleneck's moving window functions on data stored as dask arrays (
1279
). By Joe Hamman. - Experimental support for the Dask collection interface (
1674
). By Matthew Rocklin.
- Suppress
RuntimeWarning
issued bynumpy
for "invalid value comparisons" (e.g.NaN
). Xarray now behaves similarly to pandas in its treatment of binary and unary operations on objects with NaNs (1657
). By Joe Hamman. - Unsigned int support for reduce methods with
skipna=True
(1562
). By Keisuke Fujii. Fixes to ensure xarray works properly with pandas 0.21:
- Fix :py
~xarray.DataArray.isnull
method (1549
). - :py
~xarray.DataArray.to_series
and :py~xarray.Dataset.to_dataframe
should not return apandas.MultiIndex
for 1D data (1548
). - Fix plotting with datetime64 axis labels (
1661
).
By Stephan Hoyer.
- Fix :py
- :py
~xarray.open_rasterio
method now shifts the rasterio coordinates so that they are centered in each pixel (1468
). By Greg Brener. - :py
~xarray.Dataset.rename
method now doesn't throw errors if someVariable
is renamed to the same name as anotherVariable
as long as that otherVariable
is also renamed (1477
). This method now does throw when twoVariables
would end up with the same name after the rename (since one of them would get overwritten in this case). By Prakhar Goel. - Fix :py
xarray.testing.assert_allclose
to actually useatol
andrtol
arguments when called onDataArray
objects (1488
). By Stephan Hoyer. - xarray
quantile
methods now properly raise aTypeError
when applied to objects with data stored asdask
arrays (1529
). By Joe Hamman. - Fix positional indexing to allow the use of unsigned integers (
1405
). By Joe Hamman and Gerrit Holl. - Creating a :py
Dataset
now raisesMergeError
if a coordinate shares a name with a dimension but is comprised of arbitrary dimensions (1120
). By Joe Hamman. - :py
~xarray.open_rasterio
method now skips rasterio'scrs
attribute if its value isNone
(1520
). By Leevi Annala. - Fix :py
xarray.DataArray.to_netcdf
to return bytes when no path is provided (1410
). By Joe Hamman. - Fix :py
xarray.save_mfdataset
to properly raise an informative error when objects other thanDataset
are provided (1555
). By Joe Hamman. - :py
xarray.Dataset.copy
would not preserve the encoding property (1586
). By Guido Imperiale. - :py
xarray.concat
would eagerly load dask variables into memory if the first argument was a numpy variable (1588
). By Guido Imperiale. - Fix bug in :py
~xarray.Dataset.to_netcdf
when writing in append mode (1215
). By Joe Hamman. - Fix
netCDF4
backend to properly roundtrip theshuffle
encoding option (1606
). By Joe Hamman. - Fix bug when using
pytest
class decorators to skipping certain unittests. The previous behavior unintentionally causing additional tests to be skipped (1531
). By Joe Hamman. - Fix pynio backend for upcoming release of pynio with Python 3 support (
1611
). By Ben Hillman. - Fix
seaborn
import warning for Seaborn versions 0.8 and newer when theapionly
module was deprecated. (1633
). By Joe Hamman. - Fix COMPAT: MultiIndex checking is fragile (
1833
). By Florian Pinault. - Fix
rasterio
backend for Rasterio versions 1.0alpha10 and newer. (1641
). By Chris Holden.
- Suppress warning in IPython autocompletion, related to the deprecation of
.T
attributes (1675
). By Keisuke Fujii. - Fix a bug in lazily-indexing netCDF array. (
1688
) By Keisuke Fujii. - (Internal bug) MemoryCachedArray now supports the orthogonal indexing. Also made some internal cleanups around array wrappers (
1429
). By Keisuke Fujii. - (Internal bug) MemoryCachedArray now always wraps
np.ndarray
byNumpyIndexingAdapter
. (1694
) By Keisuke Fujii. - Fix importing xarray when running Python with
-OO
(1706
). By Stephan Hoyer. - Saving a netCDF file with a coordinates with a spaces in its names now raises an appropriate warning (
1689
). By Stephan Hoyer. - Fix two bugs that were preventing dask arrays from being specified as coordinates in the DataArray constructor (
1684
). By Joe Hamman. - Fixed
apply_ufunc
withdask='parallelized'
for scalar arguments (1697
). By Stephan Hoyer. - Fix "Chunksize cannot exceed dimension size" error when writing netCDF4 files loaded from disk (
1225
). By Stephan Hoyer. - Validate the shape of coordinates with names matching dimensions in the DataArray constructor (
1709
). By Stephan Hoyer. - Raise
NotImplementedError
when attempting to save a MultiIndex to a netCDF file (1547
). By Stephan Hoyer. - Remove netCDF dependency from rasterio backend tests. By Matti Eskelinen
- Fixed unexpected behavior in
Dataset.set_index()
andDataArray.set_index()
introduced by pandas 0.21.0. Setting a new index with a single variable resulted in 1-levelpandas.MultiIndex
instead of a simplepandas.Index
(1722
). By Benoit Bovy. - Fixed unexpected memory loading of backend arrays after
print
. (1720
). By Keisuke Fujii.
This release includes a number of backwards compatible enhancements and bug fixes.
- New :py
~xarray.Dataset.sortby
method toDataset
andDataArray
that enable sorting along dimensions (967
). Seethe docs <reshape.sort>
for examples. By Chun-Wei Yuan and Kyle Heuton. - Add
.dt
accessor to DataArrays for computing datetime-like properties for the values they contain, similar topandas.Series
(358
). By Daniel Rothenberg. - Renamed internal dask arrays created by
open_dataset
to match new dask conventions (1343
). By Ryan Abernathey. - :py
~xarray.as_variable
is now part of the public API (1303
). By Benoit Bovy. - :py
~xarray.align
now supportsjoin='exact'
, which raises an error instead of aligning when indexes to be aligned are not equal. By Stephan Hoyer. - New function :py
~xarray.open_rasterio
for opening raster files with the rasterio library. Seethe docs <io.rasterio>
for details. By Joe Hamman, Nic Wayand and Fabien Maussion
- Fix error from repeated indexing of datasets loaded from disk (
1374
). By Stephan Hoyer. - Fix a bug where
.isel_points
wrongly assigns unselected coordinate todata_vars
. By Keisuke Fujii. - Tutorial datasets are now checked against a reference MD5 sum to confirm successful download (
1392
). By Matthew Gidden. DataArray.chunk()
now accepts dask specific kwargs likeDataset.chunk()
does. By Fabien Maussion.- Support for
engine='pydap'
with recent releases of Pydap (3.2.2+), including on Python 3 (1174
).
- A new gallery allows to add interactive examples to the documentation. By Fabien Maussion.
- Fix test suite failure caused by changes to
pandas.cut
function (1386
). By Ryan Abernathey. - Enhanced tests suite by use of
@network
decorator, which is controlled via--run-network-tests
command line argument topy.test
(1393
). By Matthew Gidden.
Remove an inadvertently introduced print statement.
This minor release includes bug-fixes and backwards compatible enhancements.
- New :py
~xarray.DataArray.persist
method to Datasets and DataArrays to enable persisting data in distributed memory when using Dask (1344
). By Matthew Rocklin. - New :py
~xarray.DataArray.expand_dims
method forDataArray
andDataset
(1326
). By Keisuke Fujii.
- Fix
.where()
withdrop=True
when arguments do not have indexes (1350
). This bug, introduced in v0.9, resulted in xarray producing incorrect results in some cases. By Stephan Hoyer. - Fixed writing to file-like objects with :py
~xarray.Dataset.to_netcdf
(1320
). Stephan Hoyer. - Fixed explicitly setting
engine='scipy'
withto_netcdf
when not providing a path (1321
). Stephan Hoyer. - Fixed open_dataarray does not pass properly its parameters to open_dataset (
1359
). Stephan Hoyer. - Ensure test suite works when runs from an installed version of xarray (
1336
). Use@pytest.mark.slow
instead of a custom flag to mark slow tests. By Stephan Hoyer
The minor release includes bug-fixes and backwards compatible enhancements.
rolling
on Dataset is now supported (859
)..rolling()
on Dataset is now supported (859
). By Keisuke Fujii.- When bottleneck version 1.1 or later is installed, use bottleneck for rolling
var
,argmin
,argmax
, andrank
computations. Also, rolling median now accepts amin_periods
argument (1276
). By Joe Hamman. - When
.plot()
is called on a 2D DataArray and only one dimension is specified withx=
ory=
, the other dimension is now guessed (1291
). By Vincent Noel. - Added new method :py
~Dataset.assign_attrs
toDataArray
andDataset
, a chained-method compatible implementation of thedict.update
method on attrs (1281
). By Henry S. Harrison. - Added new
autoclose=True
argument to :py~xarray.open_mfdataset
to explicitly close opened files when not in use to prevent occurrence of an OS Error related to too many open files (1198
). Note, the default isautoclose=False
, which is consistent with previous xarray behavior. By Phillip J. Wolfram. - The
repr()
ofDataset
andDataArray
attributes uses a similar format to coordinates and variables, with vertically aligned entries truncated to fit on a single line (1319
). Hopefully this will stop people writingdata.attrs = {}
and discarding metadata in notebooks for the sake of cleaner output. The full metadata is still available asdata.attrs
. By Zac Hatfield-Dodds. - Enhanced tests suite by use of
@slow
and@flaky
decorators, which are controlled via--run-flaky
and--skip-slow
command line arguments topy.test
(1336
). By Stephan Hoyer and Phillip J. Wolfram. - New aggregation on rolling objects :py
~core.rolling.DataArrayRolling.count
which providing a rolling count of valid values (1138
).
- Rolling operations now keep preserve original dimension order (
1125
). By Keisuke Fujii. - Fixed
sel
withmethod='nearest'
on Python 2.7 and 64-bit Windows (1140
). Stephan Hoyer. - Fixed
where
withdrop='True'
for empty masks (1341
). By Stephan Hoyer and Phillip J. Wolfram.
Renamed the "Unindexed dimensions" section in the Dataset
and DataArray
repr (added in v0.9.0) to "Dimensions without coordinates" (1199
).
This major release includes five months worth of enhancements and bug fixes from 24 contributors, including some significant changes that are not fully backwards compatible. Highlights include:
- Coordinates are now optional in the xarray data model, even for dimensions.
- Changes to caching, lazy loading and pickling to improve xarray's experience for parallel computing.
- Improvements for accessing and manipulating
pandas.MultiIndex
levels. - Many new methods and functions, including :py
~DataArray.quantile
, :py~DataArray.cumsum
, :py~DataArray.cumprod
:py~DataArray.combine_first
:py~DataArray.set_index
, :py~DataArray.reset_index
, :py~DataArray.reorder_levels
, :py~xarray.full_like
, :py~xarray.zeros_like
, :py~xarray.ones_like
:py~xarray.open_dataarray
, :py~DataArray.compute
, :pyDataset.info
, :pytesting.assert_equal
, :pytesting.assert_identical
, and :pytesting.assert_allclose
.
Index coordinates for each dimensions are now optional, and no longer created by default
1017
. You can identify such dimensions without coordinates by their appearance in list of "Dimensions without coordinates" in theDataset
orDataArray
repr:In [1]: xr.Dataset({"foo": (("x", "y"), [[1, 2]])}) Out[1]: <xarray.Dataset> Dimensions: (x: 1, y: 2) Dimensions without coordinates: x, y Data variables: foo (x, y) int64 1 2
This has a number of implications:
- :py
~align
and :py~Dataset.reindex
can now error, if dimensions labels are missing and dimensions have different sizes. - Because pandas does not support missing indexes, methods such as
to_dataframe
/from_dataframe
andstack
/unstack
no longer roundtrip faithfully on all inputs. Use :py~Dataset.reset_index
to remove undesired indexes. Dataset.__delitem__
and :py~Dataset.drop
no longer delete/drop variables that have dimensions matching a deleted/dropped variable.DataArray.coords.__delitem__
is now allowed on variables matching dimension names..sel
and.loc
now handle indexing along a dimension without coordinate labels by doing integer based indexing. Seeindexing.missing_coordinates
for an example.- :py
~Dataset.indexes
is no longer guaranteed to include all dimensions names as keys. The new method :py~Dataset.get_index
has been added to get an index for a dimension guaranteed, falling back to produce a defaultRangeIndex
if necessary.
- :py
- The default behavior of
merge
is nowcompat='no_conflicts'
, so some merges will now succeed in cases that previously raisedxarray.MergeError
. Setcompat='broadcast_equals'
to restore the previous default. Seecombining.no_conflicts
for more details. - Reading :py
~DataArray.values
no longer always caches values in a NumPy array1128
. Caching of.values
on variables read from netCDF files on disk is still the default when :pyopen_dataset
is called withcache=True
. By Guido Imperiale and Stephan Hoyer. - Pickling a
Dataset
orDataArray
linked to a file on disk no longer caches its values into memory before pickling (1128
). Instead, pickle stores file paths and restores objects by reopening file references. This enables preliminary, experimental use of xarray for opening files with dask.distributed. By Stephan Hoyer. - Coordinates used to index a dimension are now loaded eagerly into :py
pandas.Index
objects, instead of loading the values lazily. By Guido Imperiale. - Automatic levels for 2d plots are now guaranteed to land on
vmin
andvmax
when these kwargs are explicitly provided (1191
). The automated level selection logic also slightly changed. By Fabien Maussion. DataArray.rename()
behavior changed to strictly change theDataArray.name
if called with string argument, or strictly change coordinate names if called with dict-like argument. By Markus Gonser.- By default
to_netcdf()
add a_FillValue = NaN
attributes to float types. By Frederic Laliberte. repr
onDataArray
objects uses an shortened display for NumPy array data that is less likely to overflow onto multiple pages (1207
). By Stephan Hoyer.- xarray no longer supports python 3.3, versions of dask prior to v0.9.0, or versions of bottleneck prior to v1.0.
- Renamed the
Coordinate
class from xarray's low level API to :py~xarray.IndexVariable
.Variable.to_variable
andVariable.to_coord
have been renamed to :py~xarray.Variable.to_base_variable
and :py~xarray.Variable.to_index_variable
. - Deprecated supplying
coords
as a dictionary to theDataArray
constructor without also supplying an explicitdims
argument. The old behavior encouraged relying on the iteration order of dictionaries, which is a bad practice (727
). - Removed a number of methods deprecated since v0.7.0 or earlier:
load_data
,vars
,drop_vars
,dump
,dumps
and thevariables
keyword argument toDataset
. - Removed the dummy module that enabled
import xray
.
- Added new method :py
~DataArray.combine_first
toDataArray
andDataset
, based on the pandas method of the same name (seecombine
). By Chun-Wei Yuan. - Added the ability to change default automatic alignment (arithmetic_join="inner") for binary operations via :py
~xarray.set_options()
(seemath automatic alignment
). By Chun-Wei Yuan. - Add checking of
attr
names and values when saving to netCDF, raising useful error messages if they are invalid. (911
). By Robin Wilson. - Added ability to save
DataArray
objects directly to netCDF files using :py~xarray.DataArray.to_netcdf
, and to load directly from netCDF files using :py~xarray.open_dataarray
(915
). These remove the need to convert aDataArray
to aDataset
before saving as a netCDF file, and deals with names to ensure a perfect 'roundtrip' capability. By Robin Wilson. - Multi-index levels are now accessible as "virtual" coordinate variables, e.g.,
ds['time']
can pull out the'time'
level of a multi-index (seecoordinates
).sel
also accepts providing multi-index levels as keyword arguments, e.g.,ds.sel(time='2000-01')
(seemulti-level indexing
). By Benoit Bovy. - Added
set_index
,reset_index
andreorder_levels
methods to easily create and manipulate (multi-)indexes (seereshape.set_index
). By Benoit Bovy. - Added the
compat
option'no_conflicts'
tomerge
, allowing the combination of xarray objects with disjoint (742
) or overlapping (835
) coordinates as long as all present data agrees. By Johnnie Gray. Seecombining.no_conflicts
for more details. - It is now possible to set
concat_dim=None
explicitly in :py~xarray.open_mfdataset
to disable inferring a dimension along which to concatenate. By Stephan Hoyer. - Added methods :py
DataArray.compute
, :pyDataset.compute
, and :pyVariable.compute
as a non-mutating alternative to :py~DataArray.load
. By Guido Imperiale. - Adds DataArray and Dataset methods :py
~xarray.DataArray.cumsum
and :py~xarray.DataArray.cumprod
. By Phillip J. Wolfram. - New properties :py
Dataset.sizes
and :pyDataArray.sizes
for providing consistent access to dimension length on bothDataset
andDataArray
(921
). By Stephan Hoyer. - New keyword argument
drop=True
for :py~DataArray.sel
, :py~DataArray.isel
and :py~DataArray.squeeze
for dropping scalar coordinates that arise from indexing.DataArray
(242
). By Stephan Hoyer. - New top-level functions :py
~xarray.full_like
, :py~xarray.zeros_like
, and :py~xarray.ones_like
By Guido Imperiale. - Overriding a preexisting attribute with :py
~xarray.register_dataset_accessor
or :py~xarray.register_dataarray_accessor
now issues a warning instead of raising an error (1082
). By Stephan Hoyer. - Options for axes sharing between subplots are exposed to :py
~xarray.plot.FacetGrid
and :py~xarray.plot.plot
, so axes sharing can be disabled for polar plots. By Bas Hoonhout. - New utility functions :py
~xarray.testing.assert_equal
, :py~xarray.testing.assert_identical
, and :py~xarray.testing.assert_allclose
for asserting relationships between xarray objects, designed for use in a pytest test suite. figsize
,size
andaspect
plot arguments are now supported for all plots (897
). Seeplotting.figsize
for more details. By Stephan Hoyer and Fabien Maussion.- New :py
~Dataset.info
method to summarizeDataset
variables and attributes. The method prints to a buffer (e.g.stdout
) with output similar to what the command line utilityncdump -h
produces (1150
). By Joe Hamman. - Added the ability write unlimited netCDF dimensions with the
scipy
andnetcdf4
backends via the newxray.Dataset.encoding
attribute or via theunlimited_dims
argument toxray.Dataset.to_netcdf
. By Joe Hamman. - New :py
~DataArray.quantile
method to calculate quantiles from DataArray objects (1187
). By Joe Hamman.
groupby_bins
now restores empty bins by default (1019
). By Ryan Abernathey.- Fix issues for dates outside the valid range of pandas timestamps (
975
). By Mathias Hauser. - Unstacking produced flipped array after stacking decreasing coordinate values (
980
). By Stephan Hoyer. - Setting
dtype
via theencoding
parameter ofto_netcdf
failed if the encoded dtype was the same as the dtype of the original array (873
). By Stephan Hoyer. - Fix issues with variables where both attributes
_FillValue
andmissing_value
are set toNaN
(997
). By Marco Zühlke. .where()
and.fillna()
now preserve attributes (1009
). By Fabien Maussion.- Applying :py
broadcast()
to an xarray object based on the dask backend won't accidentally convert the array from dask to numpy anymore (978
). By Guido Imperiale. Dataset.concat()
now preserves variables order (1027
). By Fabien Maussion.- Fixed an issue with pcolormesh (
781
). A newinfer_intervals
keyword gives control on whether the cell intervals should be computed or not. By Fabien Maussion. - Grouping over an dimension with non-unique values with
groupby
gives correct groups. By Stephan Hoyer. - Fixed accessing coordinate variables with non-string names from
.coords
. By Stephan Hoyer. - :py
~xarray.DataArray.rename
now simultaneously renames the array and any coordinate with the same name, when supplied via a :pydict
(1116
). By Yves Delley. - Fixed sub-optimal performance in certain operations with object arrays (
1121
). By Yves Delley. - Fix
.groupby(group)
whengroup
has datetime dtype (1132
). By Jonas Sølvsteen. - Fixed a bug with facetgrid (the
norm
keyword was ignored,1159
). By Fabien Maussion. - Resolved a concurrency bug that could cause Python to crash when simultaneously reading and writing netCDF4 files with dask (
1172
). By Stephan Hoyer. - Fix to make
.copy()
actually copy dask arrays, which will be relevant for future releases of dask in which dask arrays will be mutable (1180
). By Stephan Hoyer. - Fix opening NetCDF files with multi-dimensional time variables (
1229
). By Stephan Hoyer.
xarray.Dataset.isel_points
andxarray.Dataset.sel_points
now use vectorised indexing in numpy and dask (1161
), which can result in several orders of magnitude speedup. By Jonathan Chambers.
This release includes a number of bug fixes and minor enhancements.
- :py
~xarray.broadcast
and :py~xarray.concat
now auto-align inputs, usingjoin=outer
. Previously, these functions raisedValueError
for non-aligned inputs. By Guido Imperiale.
- New documentation on
panel transition
. By Maximilian Roos. - New
Dataset
andDataArray
methods :py~xarray.Dataset.to_dict
and :py~xarray.Dataset.from_dict
to allow easy conversion between dictionaries and xarray objects (432
). Seedictionary IO<dictionary io>
for more details. By Julia Signell. - Added
exclude
andindexes
optional parameters to :py~xarray.align
, andexclude
optional parameter to :py~xarray.broadcast
. By Guido Imperiale. - Better error message when assigning variables without dimensions (
971
). By Stephan Hoyer. - Better error message when reindex/align fails due to duplicate index values (
956
). By Stephan Hoyer.
- Ensure xarray works with h5netcdf v0.3.0 for arrays with
dtype=str
(953
). By Stephan Hoyer. Dataset.__dir__()
(i.e. the method python calls to get autocomplete options) failed if one of the dataset's keys was not a string (852
). By Maximilian Roos.Dataset
constructor can now take arbitrary objects as values (647
). By Maximilian Roos.- Clarified
copy
argument for :py~xarray.DataArray.reindex
and :py~xarray.align
, which now consistently always return new xarray objects (927
). - Fix
open_mfdataset
withengine='pynio'
(936
). By Stephan Hoyer. groupby_bins
sorted bin labels as strings (952
). By Stephan Hoyer.- Fix bug introduced by v0.8.0 that broke assignment to datasets when both the left and right side have the same non-unique index values (
956
).
- Fix bug in v0.8.0 that broke assignment to Datasets with non-unique indexes (
943
). By Stephan Hoyer.
This release includes four months of new features and bug fixes, including several breaking changes.
- Dropped support for Python 2.6 (
855
). - Indexing on multi-index now drop levels, which is consistent with pandas. It also changes the name of the dimension / coordinate when the multi-index is reduced to a single index (
802
). - Contour plots no longer add a colorbar per default (
866
). Filled contour plots are unchanged. DataArray.values
and.data
now always returns an NumPy array-like object, even for 0-dimensional arrays with object dtype (867
). Previously,.values
returned native Python objects in such cases. To convert the values of scalar arrays to Python objects, use the.item()
method.
- Groupby operations now support grouping over multidimensional variables. A new method called :py
~xarray.Dataset.groupby_bins
has also been added to allow users to specify bins for grouping. The new features are described ingroupby.multidim
and/examples/multidimensional-coords.ipynb
. By Ryan Abernathey. - DataArray and Dataset method :py
where
now supports adrop=True
option that clips coordinate elements that are fully masked. By Phillip J. Wolfram. - New top level :py
merge
function allows for combining variables from any number ofDataset
and/orDataArray
variables. Seemerge
for more details. By Stephan Hoyer. - :py
DataArray.resample
and :pyDataset.resample
now support thekeep_attrs=False
option that determines whether variable and dataset attributes are retained in the resampled object. By Jeremy McGibbon. - Better multi-index support in :py
DataArray.sel
, :pyDataArray.loc
, :pyDataset.sel
and :pyDataset.loc
, which now behave more closely to pandas and which also accept dictionaries for indexing based on given level names and labels (seemulti-level indexing
). By Benoit Bovy. - New (experimental) decorators :py
~xarray.register_dataset_accessor
and :py~xarray.register_dataarray_accessor
for registering custom xarray extensions without subclassing. They are described in the new documentation page oninternals
. By Stephan Hoyer. - Round trip boolean datatypes. Previously, writing boolean datatypes to netCDF formats would raise an error since netCDF does not have a bool datatype. This feature reads/writes a dtype attribute to boolean variables in netCDF files. By Joe Hamman.
- 2D plotting methods now have two new keywords (cbar_ax and cbar_kwargs), allowing more control on the colorbar (
872
). By Fabien Maussion. - New Dataset method :py
Dataset.filter_by_attrs
, akin tonetCDF4.Dataset.get_variables_by_attributes
, to easily filter data variables using its attributes. Filipe Fernandes.
- Attributes were being retained by default for some resampling operations when they should not. With the
keep_attrs=False
option, they will no longer be retained by default. This may be backwards-incompatible with some scripts, but the attributes may be kept by adding thekeep_attrs=True
option. By Jeremy McGibbon. - Concatenating xarray objects along an axis with a MultiIndex or PeriodIndex preserves the nature of the index (
875
). By Stephan Hoyer. - Fixed bug in arithmetic operations on DataArray objects whose dimensions are numpy structured arrays or recarrays
861
,837
. By Maciek Swat. decode_cf_timedelta
now accepts arrays withndim
>1 (842
).This fixes issue
665
. Filipe Fernandes.
- Fix a bug where xarray.ufuncs that take two arguments would incorrectly use to numpy functions instead of dask.array functions (
876
). By Stephan Hoyer. - Support for pickling functions from
xarray.ufuncs
(901
). By Stephan Hoyer. Variable.copy(deep=True)
no longer converts MultiIndex into a base Index (769
). By Benoit Bovy.- Fixes for groupby on dimensions with a multi-index (
867
). By Stephan Hoyer. - Fix printing datasets with unicode attributes on Python 2 (
892
). By Stephan Hoyer. - Fixed incorrect test for dask version (
891
). By Stephan Hoyer. - Fixed dim argument for isel_points/sel_points when a pandas.Index is passed. By Stephan Hoyer.
- :py
~xarray.plot.contour
now plots the correct number of contours (866
). By Fabien Maussion.
This release includes two new, entirely backwards compatible features and several bug fixes.
- New DataArray method :py
DataArray.dot
for calculating the dot product of two DataArrays along shared dimensions. By Dean Pospisil. Rolling window operations on DataArray objects are now supported via a new :py
DataArray.rolling
method. For example:- In [1]: import xarray as xr
...: import numpy as np
In [2]: arr = xr.DataArray(np.arange(0, 7.5, 0.5).reshape(3, 5), dims=("x", "y"))
In [3]: arr Out[3]: <xarray.DataArray (x: 3, y: 5)> array([[ 0. , 0.5, 1. , 1.5, 2. ], [ 2.5, 3. , 3.5, 4. , 4.5], [ 5. , 5.5, 6. , 6.5, 7. ]]) Coordinates: * x (x) int64 0 1 2 * y (y) int64 0 1 2 3 4
In [4]: arr.rolling(y=3, min_periods=2).mean() Out[4]: <xarray.DataArray (x: 3, y: 5)> array([[ nan, 0.25, 0.5 , 1. , 1.5 ], [ nan, 2.75, 3. , 3.5 , 4. ], [ nan, 5.25, 5.5 , 6. , 6.5 ]]) Coordinates: * x (x) int64 0 1 2 * y (y) int64 0 1 2 3 4
See
comput.rolling
for more details. By Joe Hamman.
- Fixed an issue where plots using pcolormesh and Cartopy axes were being distorted by the inference of the axis interval breaks. This change chooses not to modify the coordinate variables when the axes have the attribute
projection
, allowing Cartopy to handle the extent of pcolormesh plots (781
). By Joe Hamman. - 2D plots now better handle additional coordinates which are not
DataArray
dimensions (788
). By Fabien Maussion.
This is a bug fix release that includes two small, backwards compatible enhancements. We recommend that all users upgrade.
- Numerical operations now return empty objects on no overlapping labels rather than raising
ValueError
(739
). - :py
~pandas.Series
is now supported as valid input to theDataset
constructor (740
).
- Restore checks for shape consistency between data and coordinates in the DataArray constructor (
758
). - Single dimension variables no longer transpose as part of a broader
.transpose
. This behavior was causingpandas.PeriodIndex
dimensions to lose their type (749
) - :py
~xarray.Dataset
labels remain as their native type on.to_dataset
. Previously they were coerced to strings (745
) - Fixed a bug where replacing a
DataArray
index coordinate would improperly align the coordinate (725
). DataArray.reindex_like
now maintains the dtype of complex numbers when reindexing leads to NaN values (738
).Dataset.rename
andDataArray.rename
support the old and new names being the same (724
).- Fix :py
~xarray.Dataset.from_dataframe
for DataFrames with Categorical column and a MultiIndex index (737
). - Fixes to ensure xarray works properly after the upcoming pandas v0.18 and NumPy v1.11 releases.
The following individuals contributed to this release:
- Edward Richards
- Maximilian Roos
- Rafael Guedes
- Spencer Hill
- Stephan Hoyer
This major release includes redesign of :py~xarray.DataArray
internals, as well as new methods for reshaping, rolling and shifting data. It includes preliminary support for :pypandas.MultiIndex
, as well as a number of other features and bug fixes, several of which offer improved compatibility with pandas.
The project formerly known as "xray" is now "xarray", pronounced "x-array"! This avoids a namespace conflict with the entire field of x-ray science. Renaming our project seemed like the right thing to do, especially because some scientists who work with actual x-rays are interested in using this project in their work. Thanks for your understanding and patience in this transition. You can now find our documentation and code repository at new URLs:
To ease the transition, we have simultaneously released v0.7.0 of both xray
and xarray
on the Python Package Index. These packages are identical. For now, import xray
still works, except it issues a deprecation warning. This will be the last xray release. Going forward, we recommend switching your import statements to import xarray as xr
.
The internal data model used by
xray.DataArray
has been rewritten to fix several outstanding issues (367
,634
, this stackoverflow report). Internally,DataArray
is now implemented in terms of._variable
and._coords
attributes instead of holding variables in aDataset
object.This refactor ensures that if a DataArray has the same name as one of its coordinates, the array and the coordinate no longer share the same data.
In practice, this means that creating a DataArray with the same
name
as one of its dimensions no longer automatically uses that array to label the corresponding coordinate. You will now need to provide coordinate labels explicitly. Here's the old behavior:In [2]: xray.DataArray([4, 5, 6], dims="x", name="x") Out[2]: <xray.DataArray 'x' (x: 3)> array([4, 5, 6]) Coordinates: * x (x) int64 4 5 6
and the new behavior (compare the values of the
x
coordinate):In [2]: xray.DataArray([4, 5, 6], dims="x", name="x") Out[2]: <xray.DataArray 'x' (x: 3)> array([4, 5, 6]) Coordinates: * x (x) int64 0 1 2
- It is no longer possible to convert a DataArray to a Dataset with
xray.DataArray.to_dataset
if it is unnamed. This will now raiseValueError
. If the array is unnamed, you need to supply thename
argument.
Basic support for :py
~pandas.MultiIndex
coordinates on xray objects, including indexing, :py~DataArray.stack
and :py~DataArray.unstack
:In [7]: df = pd.DataFrame({"foo": range(3), "x": ["a", "b", "b"], "y": [0, 0, 1]})
In [8]: s = df.set_index(["x", "y"])["foo"]
In [12]: arr = xray.DataArray(s, dims="z")
In [13]: arr Out[13]: <xray.DataArray 'foo' (z: 3)> array([0, 1, 2]) Coordinates: * z (z) object ('a', 0) ('b', 0) ('b', 1)
In [19]: arr.indexes["z"] Out[19]: MultiIndex(levels=[[u'a', u'b'], [0, 1]], labels=[[0, 1, 1], [0, 0, 1]], names=[u'x', u'y'])
In [14]: arr.unstack("z") Out[14]: <xray.DataArray 'foo' (x: 2, y: 2)> array([[ 0., nan], [ 1., 2.]]) Coordinates: * x (x) object 'a' 'b' * y (y) int64 0 1
In [26]: arr.unstack("z").stack(z=("x", "y")) Out[26]: <xray.DataArray 'foo' (z: 4)> array([ 0., nan, 1., 2.]) Coordinates: * z (z) object ('a', 0) ('a', 1) ('b', 0) ('b', 1)
See
reshape.stack
for more details.Warning
xray's MultiIndex support is still experimental, and we have a long to-do list of desired additions (
719
), including better display of multi-index levels when printing aDataset
, and support for saving datasets with a MultiIndex to a netCDF file. User contributions in this area would be greatly appreciated.- Support for reading GRIB, HDF4 and other file formats via PyNIO. See
io.pynio
for more details. - Better error message when a variable is supplied with the same name as one of its dimensions.
- Plotting: more control on colormap parameters (
642
).vmin
andvmax
will not be silently ignored anymore. Settingcenter=False
prevents automatic selection of a divergent colormap. New
xray.Dataset.shift
andxray.Dataset.roll
methods for shifting/rotating datasets or arrays along a dimension:python
array = xray.DataArray([5, 6, 7, 8], dims="x") array.shift(x=2) array.roll(x=2)
Notice that
shift
moves data independently of coordinates, butroll
moves both data and coordinates.- Assigning a
pandas
object directly as aDataset
variable is now permitted. Its index names correspond to thedims
of theDataset
, and its data is aligned. - Passing a :py
pandas.DataFrame
orpandas.Panel
to a Dataset constructor is now permitted. New function
xray.broadcast
for explicitly broadcastingDataArray
andDataset
objects against each other. For example:python
a = xray.DataArray([1, 2, 3], dims="x") b = xray.DataArray([5, 6], dims="y") a b a2, b2 = xray.broadcast(a, b) a2 b2
- Fixes for several issues found on
DataArray
objects with the same name as one of their coordinates (seev0.7.0.breaking
for more details). DataArray.to_masked_array
always returns masked array with mask being an array (not a scalar value) (684
)- Allows for (imperfect) repr of Coords when underlying index is PeriodIndex (
645
). - Fixes for several issues found on
DataArray
objects with the same name as one of their coordinates (seev0.7.0.breaking
for more details). - Attempting to assign a
Dataset
orDataArray
variable/attribute using attribute-style syntax (e.g.,ds.foo = 42
) now raises an error rather than silently failing (656
,714
). - You can now pass pandas objects with non-numpy dtypes (e.g.,
categorical
ordatetime64
with a timezone) into xray without an error (716
).
The following individuals contributed to this release:
- Antony Lee
- Fabien Maussion
- Joe Hamman
- Maximilian Roos
- Stephan Hoyer
- Takeshi Kanmae
- femtotrader
This release contains a number of bug and compatibility fixes, as well as enhancements to plotting, indexing and writing files to disk.
Note that the minimum required version of dask for use with xray is now version 0.6.
- The handling of colormaps and discrete color lists for 2D plots in
xray.DataArray.plot
was changed to provide more compatibility with matplotlib'scontour
andcontourf
functions (538
). Now discrete lists of colors should be specified usingcolors
keyword, rather thancmap
.
- Faceted plotting through
xray.plot.FacetGrid
and thexray.plot.plot
method. Seeplotting.faceting
for more details and examples. xray.Dataset.sel
andxray.Dataset.reindex
now support thetolerance
argument for controlling nearest-neighbor selection (629
):In [5]: array = xray.DataArray([1, 2, 3], dims="x")
In [6]: array.reindex(x=[0.9, 1.5], method="nearest", tolerance=0.2) Out[6]: <xray.DataArray (x: 2)> array([ 2., nan]) Coordinates: * x (x) float64 0.9 1.5
This feature requires pandas v0.17 or newer.
- New
encoding
argument inxray.Dataset.to_netcdf
for writing netCDF files with compression, as described in the new documentation section onio.netcdf.writing_encoded
. - Add
xray.Dataset.real
andxray.Dataset.imag
attributes to Dataset and DataArray (553
). - More informative error message with
xray.Dataset.from_dataframe
if the frame has duplicate columns. - xray now uses deterministic names for dask arrays it creates or opens from disk. This allows xray users to take advantage of dask's nascent support for caching intermediate computation results. See
555
for an example.
- Forwards compatibility with the latest pandas release (v0.17.0). We were using some internal pandas routines for datetime conversion, which unfortunately have now changed upstream (
569
). - Aggregation functions now correctly skip
NaN
for data forcomplex128
dtype (554
). - Fixed indexing 0d arrays with unicode dtype (
568
). xray.DataArray.name
and Dataset keys must be a string or None to be written to netCDF (533
).xray.DataArray.where
now uses dask instead of numpy if either the array orother
is a dask array. Previously, ifother
was a numpy array the method was evaluated eagerly.- Global attributes are now handled more consistently when loading remote datasets using
engine='pydap'
(574
). - It is now possible to assign to the
.data
attribute of DataArray objects. coordinates
attribute is now kept in the encoding dictionary after decoding (610
).- Compatibility with numpy 1.10 (
617
).
The following individuals contributed to this release:
- Ryan Abernathey
- Pete Cable
- Clark Fitzgerald
- Joe Hamman
- Stephan Hoyer
- Scott Sinclair
This release includes numerous bug fixes and enhancements. Highlights include the introduction of a plotting module and the new Dataset and DataArray methods xray.Dataset.isel_points
, xray.Dataset.sel_points
, xray.Dataset.where
and xray.Dataset.diff
. There are no breaking changes from v0.5.2.
- Plotting methods have been implemented on DataArray objects
xray.DataArray.plot
through integration with matplotlib (185
). For an introduction, seeplotting
. - Variables in netCDF files with multiple missing values are now decoded as NaN after issuing a warning if open_dataset is called with mask_and_scale=True.
- We clarified our rules for when the result from an xray operation is a copy vs. a view (see
copies_vs_views
for more details). - Dataset variables are now written to netCDF files in order of appearance when using the netcdf4 backend (
479
). Added
xray.Dataset.isel_points
andxray.Dataset.sel_points
to support pointwise indexing of Datasets and DataArrays (475
).- In [1]: da = xray.DataArray(
...: np.arange(56).reshape((7, 8)), ...: coords={"x": list("abcdefg"), "y": 10 * np.arange(8)}, ...: dims=["x", "y"], ...: )
- y (y) int64 0 10 20 30 40 50 60 70
- x (x) |S1 'a' 'b' 'c' 'd' 'e' 'f' 'g'
# we can index by position along each dimension In [3]: da.isel_points(x=[0, 1, 6], y=[0, 1, 0], dim="points") Out[3]: <xray.DataArray (points: 3)> array([ 0, 9, 48]) Coordinates: y (points) int64 0 10 0 x (points) |S1 'a' 'b' 'g' * points (points) int64 0 1 2
# or equivalently by label In [9]: da.sel_points(x=["a", "b", "g"], y=[0, 10, 0], dim="points") Out[9]: <xray.DataArray (points: 3)> array([ 0, 9, 48]) Coordinates: y (points) int64 0 10 0 x (points) |S1 'a' 'b' 'g' * points (points) int64 0 1 2
New
xray.Dataset.where
method for masking xray objects according to some criteria. This works particularly well with multi-dimensional data:python
ds = xray.Dataset(coords={"x": range(100), "y": range(100)}) ds["distance"] = np.sqrt(ds.x*2 + ds.y*2)
@savefig where_example.png width=4in height=4in ds.distance.where(ds.distance < 100).plot()
- Added new methods
xray.DataArray.diff
andxray.Dataset.diff
for finite difference calculations along a given axis. New
xray.DataArray.to_masked_array
convenience method for returning a numpy.ma.MaskedArray.python
da = xray.DataArray(np.random.random_sample(size=(5, 4))) da.where(da < 0.5) da.where(da < 0.5).to_masked_array(copy=True)
- Added new flag "drop_variables" to
xray.open_dataset
for excluding variables from being parsed. This may be useful to drop variables with problems or inconsistent values.
- Fixed aggregation functions (e.g., sum and mean) on big-endian arrays when bottleneck is installed (
489
). - Dataset aggregation functions dropped variables with unsigned integer dtype (
505
). .any()
and.all()
were not lazy when used on xray objects containing dask arrays.- Fixed an error when attempting to saving datetime64 variables to netCDF files when the first element is
NaT
(528
). - Fix pickle on DataArray objects (
515
). - Fixed unnecessary coercion of float64 to float32 when using netcdf3 and netcdf4_classic formats (
526
).
This release contains bug fixes, several additional options for opening and saving netCDF files, and a backwards incompatible rewrite of the advanced options for xray.concat
.
- The optional arguments
concat_over
andmode
inxray.concat
have been removed and replaced bydata_vars
andcoords
. The new arguments are both more easily understood and more robustly implemented, and allowed us to fix a bug whereconcat
accidentally loaded data into memory. If you set values for these optional arguments manually, you will need to update your code. The default behavior should be unchanged.
xray.open_mfdataset
now supports apreprocess
argument for preprocessing datasets prior to concatenaton. This is useful if datasets cannot be otherwise merged automatically, e.g., if the original datasets have conflicting index coordinates (443
).xray.open_dataset
andxray.open_mfdataset
now use a global thread lock by default for reading from netCDF files with dask. This avoids possible segmentation faults for reading from netCDF4 files when HDF5 is not configured properly for concurrent access (444
).- Added support for serializing arrays of complex numbers with engine='h5netcdf'.
The new
xray.save_mfdataset
function allows for saving multiple datasets to disk simultaneously. This is useful when processing large datasets with dask.array. For example, to save a dataset too big to fit into memory to one file per year, we could write:In [1]: years, datasets = zip(*ds.groupby("time.year"))
In [2]: paths = ["%s.nc" % y for y in years]
In [3]: xray.save_mfdataset(datasets, paths)
- Fixed
min
,max
,argmin
andargmax
for arrays with string or unicode types (453
). xray.open_dataset
andxray.open_mfdataset
support supplying chunks as a single integer.- Fixed a bug in serializing scalar datetime variable to netCDF.
- Fixed a bug that could occur in serialization of 0-dimensional integer arrays.
- Fixed a bug where concatenating DataArrays was not always lazy (
464
). - When reading datasets with h5netcdf, bytes attributes are decoded to strings. This allows conventions decoding to work properly on Python 3 (
451
).
This minor release fixes a few bugs and an inconsistency with pandas. It also adds the pipe
method, copied from pandas.
- Added
xray.Dataset.pipe
, replicating the new pandas method in version 0.16.2. Seetransforming datasets
for more details. xray.Dataset.assign
andxray.Dataset.assign_coords
now assign new variables in sorted (alphabetical) order, mirroring the behavior in pandas. Previously, the order was arbitrary.
xray.concat
fails in an edge case involving identical coordinate variables (425
)- We now decode variables loaded from netCDF3 files with the scipy engine using native endianness (
416
). This resolves an issue when aggregating these arrays with bottleneck installed.
The headline feature in this release is experimental support for out-of-core computing (data that doesn't fit into memory) with user-guide/dask
. This includes a new top-level function xray.open_mfdataset
that makes it easy to open a collection of netCDF (using dask) as a single xray.Dataset
object. For more on dask, read the blog post introducing xray + dask and the new documentation section user-guide/dask
.
Dask makes it possible to harness parallelism and manipulate gigantic datasets with xray. It is currently an optional dependency, but it may become required in the future.
The logic used for choosing which variables are concatenated with
xray.concat
has changed. Previously, by default any variables which were equal across a dimension were not concatenated. This lead to some surprising behavior, where the behavior of groupby and concat operations could depend on runtime values (268
). For example:In [1]: ds = xray.Dataset({"x": 0})
In [2]: xray.concat([ds, ds], dim="y") Out[2]: <xray.Dataset> Dimensions: () Coordinates: empty Data variables: x int64 0
Now, the default always concatenates data variables:
python
ds = xray.Dataset({"x": 0})
python
xray.concat([ds, ds], dim="y")
To obtain the old behavior, supply the argument
concat_over=[]
.
New
xray.Dataset.to_dataarray
and enhancedxray.DataArray.to_dataset
methods make it easy to switch back and forth between arrays and datasets:python
- ds = xray.Dataset(
{"a": 1, "b": ("x", [1, 2, 3])}, coords={"c": 42}, attrs={"Conventions": "None"},
) ds.to_dataarray() ds.to_dataarray().to_dataset(dim="variable")
New
xray.Dataset.fillna
method to fill missing values, modeled off the pandas method of the same name:python
array = xray.DataArray([np.nan, 1, np.nan, 3], dims="x") array.fillna(0)
fillna
works on bothDataset
andDataArray
objects, and uses index based alignment and broadcasting like standard binary operations. It also can be applied by group, as illustrated in/examples/weather-data.ipynb#Fill-missing-values-with-climatology
.New
xray.Dataset.assign
andxray.Dataset.assign_coords
methods patterned off the new :pyDataFrame.assign <pandas.DataFrame.assign>
method in pandas:python
ds = xray.Dataset({"y": ("x", [1, 2, 3])}) ds.assign(z=lambda ds: ds.y**2) ds.assign_coords(z=("x", ["a", "b", "c"]))
These methods return a new Dataset (or DataArray) with updated data or coordinate variables.
xray.Dataset.sel
now supports themethod
parameter, which works like the parameter of the same name onxray.Dataset.reindex
. It provides a simple interface for doing nearest-neighbor interpolation:In [12]: ds.sel(x=1.1, method="nearest") Out[12]: <xray.Dataset> Dimensions: () Coordinates: x int64 1 Data variables: y int64 2
In [13]: ds.sel(x=[1.1, 2.1], method="pad") Out[13]: <xray.Dataset> Dimensions: (x: 2) Coordinates: * x (x) int64 1 2 Data variables: y (x) int64 2 3
See
nearest neighbor lookups
for more details.- You can now control the underlying backend used for accessing remote datasets (via OPeNDAP) by specifying
engine='netcdf4'
orengine='pydap'
. - xray now provides experimental support for reading and writing netCDF4 files directly via h5py with the h5netcdf package, avoiding the netCDF4-Python package. You will need to install h5netcdf and specify
engine='h5netcdf'
to try this feature. - Accessing data from remote datasets now has retrying logic (with exponential backoff) that should make it robust to occasional bad responses from DAP servers.
You can control the width of the Dataset repr with
xray.set_options
. It can be used either as a context manager, in which case the default is restored outside the context:python
ds = xray.Dataset({"x": np.arange(1000)}) with xray.set_options(display_width=40): print(ds)
Or to set a global option:
In [1]: xray.set_options(display_width=80)
The default value for the
display_width
option is 80.
- The method
load_data()
has been renamed to the more succinctxray.Dataset.load
.
The release contains bug fixes and several new features. All changes should be fully backwards compatible.
- New documentation sections on
time-series
andcombining multiple files
. xray.Dataset.resample
lets you resample a dataset or data array to a new temporal resolution. The syntax is the same as pandas, except you need to supply the time dimension explicitly:python
time = pd.date_range("2000-01-01", freq="6H", periods=10) array = xray.DataArray(np.arange(10), [("time", time)]) array.resample("1D", dim="time")
You can specify how to do the resampling with the
how
argument and other options such asclosed
andlabel
let you control labeling:python
array.resample("1D", dim="time", how="sum", label="right")
If the desired temporal resolution is higher than the original data (upsampling), xray will insert missing values:
python
array.resample("3H", "time")
first
andlast
methods on groupby objects let you take the first or last examples from each group along the grouped axis:python
array.groupby("time.day").first()
These methods combine well with
resample
:python
array.resample("1D", dim="time", how="first")
xray.Dataset.swap_dims
allows for easily swapping one dimension out for another:python
ds = xray.Dataset({"x": range(3), "y": ("x", list("abc"))}) ds ds.swap_dims({"x": "y"})
This was possible in earlier versions of xray, but required some contortions.
xray.open_dataset
andxray.Dataset.to_netcdf
now accept anengine
argument to explicitly select which underlying library (netcdf4 or scipy) is used for reading/writing a netCDF file.
- Fixed a bug where data netCDF variables read from disk with
engine='scipy'
could still be associated with the file on disk, even after closing the file (341
). This manifested itself in warnings about mmapped arrays and segmentation faults (if the data was accessed). - Silenced spurious warnings about all-NaN slices when using nan-aware aggregation methods (
344
). - Dataset aggregations with
keep_attrs=True
now preserve attributes on data variables, not just the dataset itself. - Tests for xray now pass when run on Windows (
360
). - Fixed a regression in v0.4 where saving to netCDF could fail with the error
ValueError: could not automatically determine time units
.
This is one of the biggest releases yet for xray: it includes some major changes that may break existing code, along with the usual collection of minor enhancements and bug fixes. On the plus side, this release includes all hitherto planned breaking changes, so the upgrade path for xray should be smoother going forward.
We now automatically align index labels in arithmetic, dataset construction, merging and updating. This means the need for manually invoking methods like
xray.align
andxray.Dataset.reindex_like
should be vastly reduced.For arithmetic<math automatic alignment>
, we align based on the intersection of labels:python
lhs = xray.DataArray([1, 2, 3], [("x", [0, 1, 2])]) rhs = xray.DataArray([2, 3, 4], [("x", [1, 2, 3])]) lhs + rhs
For dataset construction and merging<merge>
, we align based on the union of labels:python
xray.Dataset({"foo": lhs, "bar": rhs})
For update and __setitem__<update>
, we align based on the original object:python
lhs.coords["rhs"] = rhs lhs
Aggregations like
mean
ormedian
now skip missing values by default:python
xray.DataArray([1, 2, np.nan, 3]).mean()
You can turn this behavior off by supplying the keyword argument
skipna=False
.These operations are lightning fast thanks to integration with bottleneck, which is a new optional dependency for xray (numpy is used if bottleneck is not installed).
Scalar coordinates no longer conflict with constant arrays with the same value (e.g., in arithmetic, merging datasets and concat), even if they have different shape (
243
). For example, the coordinatec
here persists through arithmetic, even though it has different shapes on each DataArray:python
a = xray.DataArray([1, 2], coords={"c": 0}, dims="x") b = xray.DataArray([1, 2], coords={"c": ("x", [0, 0])}, dims="x") (a + b).coords
This functionality can be controlled through the
compat
option, which has also been added to thexray.Dataset
constructor.Datetime shortcuts such as
'time.month'
now return aDataArray
with the name'month'
, not'time.month'
(345
). This makes it easier to index the resulting arrays when they are used withgroupby
:python
- time = xray.DataArray(
pd.date_range("2000-01-01", periods=365), dims="time", name="time"
) counts = time.groupby("time.month").count() counts.sel(month=2)
Previously, you would need to use something like
counts.sel(**{'time.month': 2}})
, which is much more awkward.The
season
datetime shortcut now returns an array of string labels such `'DJF'`:In[92]: ds = xray.Dataset({"t": pd.date_range("2000-01-01", periods=12, freq="M")}) In[93]: ds["t.season"] Out[93]: <xarray.DataArray 'season' (t: 12)> array(['DJF', 'DJF', 'MAM', ..., 'SON', 'SON', 'DJF'], dtype='<U3') Coordinates: * t (t) datetime64[ns] 2000-01-31 2000-02-29 ... 2000-11-30 2000-12-31
Previously, it returned numbered seasons 1 through 4.
- We have updated our use of the terms of "coordinates" and "variables". What were known in previous versions of xray as "coordinates" and "variables" are now referred to throughout the documentation as "coordinate variables" and "data variables". This brings xray in closer alignment to CF Conventions. The only visible change besides the documentation is that
Dataset.vars
has been renamedDataset.data_vars
. - You will need to update your code if you have been ignoring deprecation warnings: methods and attributes that were deprecated in xray v0.3 or earlier (e.g.,
dimensions
,attributes
`) have gone away.
Support for
xray.Dataset.reindex
with a fill method. This provides a useful shortcut for upsampling:python
data = xray.DataArray([1, 2, 3], [("x", range(3))]) data.reindex(x=[0.5, 1, 1.5, 2, 2.5], method="pad")
This will be especially useful once pandas 0.16 is released, at which point xray will immediately support reindexing with method='nearest'.
- Use functions that return generic ndarrays with DataArray.groupby.apply and Dataset.apply (
327
and329
). Thanks Jeff Gerard! - Consolidated the functionality of
dumps
(writing a dataset to a netCDF3 bytestring) intoxray.Dataset.to_netcdf
(333
). xray.Dataset.to_netcdf
now supports writing to groups in netCDF4 files (333
). It also finally has a full docstring -- you should read it!xray.open_dataset
andxray.Dataset.to_netcdf
now work on netCDF3 files when netcdf4-python is not installed as long as scipy is available (333
).The new
xray.Dataset.drop
andxray.DataArray.drop
methods makes it easy to drop explicitly listed variables or index labels:python
# drop variables ds = xray.Dataset({"x": 0, "y": 1}) ds.drop("x")
# drop index labels arr = xray.DataArray([1, 2, 3], coords=[("x", list("abc"))]) arr.drop(["a", "c"], dim="x")
xray.Dataset.broadcast_equals
has been added to correspond to the newcompat
option.- Long attributes are now truncated at 500 characters when printing a dataset (
338
). This should make things more convenient for working with datasets interactively. - Added a new documentation example,
/examples/monthly-means.ipynb
. Thanks Joe Hamman!
- Several bug fixes related to decoding time units from netCDF files (
316
,330
). Thanks Stefan Pfenninger! - xray no longer requires
decode_coords=False
when reading datasets with unparsable coordinate attributes (308
). - Fixed
DataArray.loc
indexing with...
(318
). - Fixed an edge case that resulting in an error when reindexing multi-dimensional variables (
315
). - Slicing with negative step sizes (
312
). - Invalid conversion of string arrays to numeric dtype (
305
). - Fixed
repr()
on dataset objects with non-standard dates (347
).
dump
anddumps
have been deprecated in favor ofxray.Dataset.to_netcdf
.drop_vars
has been deprecated in favor ofxray.Dataset.drop
.
The biggest feature I'm excited about working toward in the immediate future is supporting out-of-core operations in xray using Dask, a part of the Blaze project. For a preview of using Dask with weather data, read this blog post by Matthew Rocklin. See 328
for more details.
This release focused on bug-fixes, speedups and resolving some niggling inconsistencies.
There are a few cases where the behavior of xray differs from the previous version. However, I expect that in almost all cases your code will continue to run unmodified.
Warning
xray now requires pandas v0.15.0 or later. This was necessary for supporting TimedeltaIndex without too many painful hacks.
Arrays of :py
datetime.datetime
objects are now automatically cast todatetime64[ns]
arrays when stored in an xray object, using machinery borrowed from pandas:python
from datetime import datetime
xray.Dataset({"t": [datetime(2000, 1, 1)]})
- xray now has support (including serialization to netCDF) for :py
~pandas.TimedeltaIndex
. :pydatetime.timedelta
objects are thus accordingly cast totimedelta64[ns]
objects when appropriate. - Masked arrays are now properly coerced to use
NaN
as a sentinel value (259
).
Due to popular demand, we have added experimental attribute style access as a shortcut for dataset variables, coordinates and attributes:
python
ds = xray.Dataset({"tmin": ([], 25, {"units": "celsius"})}) ds.tmin.units
Tab-completion for these variables should work in editors such as IPython. However, setting variables or attributes in this fashion is not yet supported because there are some unresolved ambiguities (
300
).You can now use a dictionary for indexing with labeled dimensions. This provides a safe way to do assignment with labeled dimensions:
python
array = xray.DataArray(np.zeros(5), dims=["x"]) array[dict(x=slice(3))] = 1 array
- Non-index coordinates can now be faithfully written to and restored from netCDF files. This is done according to CF conventions when possible by using the
coordinates
attribute on a data variable. When not possible, xray defines a globalcoordinates
attribute. - Preliminary support for converting
xray.DataArray
objects to and from CDATcdms2
variables. - We sped up any operation that involves creating a new Dataset or DataArray (e.g., indexing, aggregation, arithmetic) by a factor of 30 to 50%. The full speed up requires cyordereddict to be installed.
- Fix for
to_dataframe()
with 0d string/object coordinates (287
) - Fix for
to_netcdf
with 0d string variable (284
) - Fix writing datetime64 arrays to netcdf if NaT is present (
270
) - Fix align silently upcasts data arrays when NaNs are inserted (
264
)
- I am contemplating switching to the terms "coordinate variables" and "data variables" instead of the (currently used) "coordinates" and "variables", following their use in CF Conventions (
293
). This would mostly have implications for the documentation, but I would also change theDataset
attributevars
todata
. - I no longer certain that automatic label alignment for arithmetic would be a good idea for xray -- it is a feature from pandas that I have not missed (
186
). - The main API breakage that I do anticipate in the next release is finally making all aggregation operations skip missing values by default (
130
). I'm pretty sick of writingds.reduce(np.nanmean, 'time')
. - The next version of xray (0.4) will remove deprecated features and aliases whose use currently raises a warning.
If you have opinions about any of these anticipated changes, I would love to hear them -- please add a note to any of the referenced GitHub issues.
This is mostly a bug-fix release to make xray compatible with the latest release of pandas (v0.15).
We added several features to better support working with missing values and exporting xray objects to pandas. We also reorganized the internal API for serializing and deserializing datasets, but this change should be almost entirely transparent to users.
Other than breaking the experimental DataStore API, there should be no backwards incompatible changes.
- Added
xray.Dataset.count
andxray.Dataset.dropna
methods, copied from pandas, for working with missing values (247
,58
). - Added
xray.DataArray.to_pandas
for converting a data array into the pandas object with the same dimensionality (1D to Series, 2D to DataFrame, etc.) (255
). - Support for reading gzipped netCDF3 files (
239
). - Reduced memory usage when writing netCDF files (
251
). - 'missing_value' is now supported as an alias for the '_FillValue' attribute on netCDF variables (
245
). - Trivial indexes, equivalent to
range(n)
wheren
is the length of the dimension, are no longer written to disk (245
).
- Compatibility fixes for pandas v0.15 (
262
). - Fixes for display and indexing of
NaT
(not-a-time) (238
,240
) - Fix slicing by label was an argument is a data array (
250
). - Test data is now shipped with the source distribution (
253
). - Ensure order does not matter when doing arithmetic with scalar data arrays (
254
). - Order of dimensions preserved with
DataArray.to_dataframe
(260
).
- Revamped coordinates: "coordinates" now refer to all arrays that are not used to index a dimension. Coordinates are intended to allow for keeping track of arrays of metadata that describe the grid on which the points in "variable" arrays lie. They are preserved (when unambiguous) even though mathematical operations.
- Dataset math
xray.Dataset
objects now support all arithmetic operations directly. Dataset-array operations map across all dataset variables; dataset-dataset operations act on each pair of variables with the same name. - GroupBy math: This provides a convenient shortcut for normalizing by the average value of a group.
- The dataset
__repr__
method has been entirely overhauled; dataset objects now show their values when printed. - You can now index a dataset with a list of variables to return a new dataset:
ds[['foo', 'bar']]
.
Dataset.__eq__
andDataset.__ne__
are now element-wise operations instead of comparing all values to obtain a single boolean. Use the methodxray.Dataset.equals
instead.
Dataset.noncoords
is deprecated: useDataset.vars
instead.Dataset.select_vars
deprecated: index aDataset
with a list of variable names instead.DataArray.select_vars
andDataArray.drop_vars
deprecated: usexray.DataArray.reset_coords
instead.
This is major release that includes some new features and quite a few bug fixes. Here are the highlights:
- There is now a direct constructor for
DataArray
objects, which makes it possible to create a DataArray without using a Dataset. This is highlighted in the refreshedtutorial
. - You can perform aggregation operations like
mean
directly onxray.Dataset
objects, thanks to Joe Hamman. These aggregation methods also worked on grouped datasets. - xray now works on Python 2.6, thanks to Anna Kuznetsova.
- A number of methods and attributes were given more sensible (usually shorter) names:
labeled
->sel
,indexed
->isel
,select
->select_vars
,unselect
->drop_vars
,dimensions
->dims
,coordinates
->coords
,attributes
->attrs
. - New
xray.Dataset.load_data
andxray.Dataset.close
methods for datasets facilitate lower level of control of data loaded from disk.
xray 0.1.1 is a bug-fix release that includes changes that should be almost entirely backwards compatible with v0.1:
- Python 3 support (
53
) - Required numpy version relaxed to 1.7 (
129
) - Return numpy.datetime64 arrays for non-standard calendars (
126
) - Support for opening datasets associated with NetCDF4 groups (
127
) - Bug-fixes for concatenating datetime arrays (
134
)
Special thanks to new contributors Thomas Kluyver, Joe Hamman and Alistair Miles.
Initial release.