Skip to content

Commit

Permalink
Merge branch 'feature/add_performance_test' of https://github.com/jdi…
Browse files Browse the repository at this point in the history
…matteo/great_expectations into feature/add_performance_test

* 'feature/add_performance_test' of https://github.com/jdimatteo/great_expectations:
  [MAINTENANCE] Tests for RuntimeDataConnector at Datasource-level (Spark and Pandas) (great-expectations#3318)
  [MAINTENANCE] Tests for RuntimeDataConnector at DataContext-level (great-expectations#3304)
  [BUGFIX] SQL dialect doesn't register for BigQuery for V2 (great-expectations#3324)
  [WIP] [FEATURE] add backend args to run_diagnostics (great-expectations#3257)
  Release Prep release-prep-2021-08-26 (great-expectations#3320)
  Docs] hide stubbed core skills (great-expectations#3316)
  [MAINTENANCE] Write integration/E2E tests for both `GCSDataConnectors` (great-expectations#3301)
  [DOCS] Standardize capitalization of various technologies in `docs` (great-expectations#3312)
  [DOCS] Fix misc errors in "How to create renderers for Custom Expectations" (great-expectations#3315)
  docs: Remove misc TODOs to tidy up docs (great-expectations#3313)
  [DOCS] GDOC-217 remove stub links (great-expectations#3314)
  [FEATURE] Enable `GCS DataConnector` integration with `PandasExecutionEngine` (great-expectations#3264)
  • Loading branch information
Shinnnyshinshin committed Aug 31, 2021
2 parents 23f3c4d + f3d430d commit 97d2dd1
Show file tree
Hide file tree
Showing 100 changed files with 32,727 additions and 631 deletions.
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import json
import logging
import traceback
from typing import Any, Dict, Optional, Tuple

import numpy as np
Expand Down Expand Up @@ -26,11 +28,13 @@
ColumnMetricProvider,
column_aggregate_value,
)
from great_expectations.expectations.metrics.import_manager import F, sa
from great_expectations.expectations.metrics.metric_provider import (
MetricProvider,
metric_value,
from great_expectations.expectations.metrics.column_aggregate_metric_provider import (
ColumnAggregateMetricProvider,
column_aggregate_partial,
column_aggregate_value,
)
from great_expectations.expectations.metrics.import_manager import F, sa
from great_expectations.expectations.metrics.metric_provider import metric_value
from great_expectations.expectations.util import render_evaluation_parameter_string
from great_expectations.render.renderer.renderer import renderer
from great_expectations.render.types import RenderedStringTemplateContent
Expand All @@ -42,6 +46,32 @@
)
from great_expectations.validator.validation_graph import MetricConfiguration

logger = logging.getLogger(__name__)

try:
from sqlalchemy.exc import ProgrammingError
from sqlalchemy.sql import Select
except ImportError:
logger.debug(
"Unable to load SqlAlchemy context; install optional sqlalchemy dependency for support"
)
ProgrammingError = None
Select = None

try:
from sqlalchemy.engine.row import Row
except ImportError:
try:
from sqlalchemy.engine.row import RowProxy

Row = RowProxy
except ImportError:
logger.debug(
"Unable to load SqlAlchemy Row class; please upgrade you sqlalchemy installation to the latest version."
)
RowProxy = None
Row = None


class ColumnSkew(ColumnMetricProvider):
"""MetricProvider Class for Aggregate Mean MetricProvider"""
Expand All @@ -55,34 +85,80 @@ def _pandas(cls, column, abs=False, **kwargs):
return np.abs(stats.skew(column))
return stats.skew(column)

#
# @metric_value(engine=SqlAlchemyExecutionEngine, metric_fn_type="value")
# def _sqlalchemy(
# cls,
# execution_engine: "SqlAlchemyExecutionEngine",
# metric_domain_kwargs: Dict,
# metric_value_kwargs: Dict,
# metrics: Dict[Tuple, Any],
# runtime_configuration: Dict,
# ):
# (
# selectable,
# compute_domain_kwargs,
# accessor_domain_kwargs,
# ) = execution_engine.get_compute_domain(
# metric_domain_kwargs, MetricDomainTypes.COLUMN
# )
# column_name = accessor_domain_kwargs["column"]
# column = sa.column(column_name)
# sqlalchemy_engine = execution_engine.engine
# dialect = sqlalchemy_engine.dialect
#
# column_median = None
#
# # TODO: compute the value and return it
#
# return column_median
#
@metric_value(engine=SqlAlchemyExecutionEngine)
def _sqlalchemy(
cls,
execution_engine: "SqlAlchemyExecutionEngine",
metric_domain_kwargs: Dict,
metric_value_kwargs: Dict,
metrics: Dict[Tuple, Any],
runtime_configuration: Dict,
):
(
selectable,
compute_domain_kwargs,
accessor_domain_kwargs,
) = execution_engine.get_compute_domain(
metric_domain_kwargs, MetricDomainTypes.COLUMN
)

column_name = accessor_domain_kwargs["column"]
column = sa.column(column_name)
sqlalchemy_engine = execution_engine.engine
dialect = sqlalchemy_engine.dialect

column_mean = _get_query_result(
func=sa.func.avg(column * 1.0),
selectable=selectable,
sqlalchemy_engine=sqlalchemy_engine,
)

column_count = _get_query_result(
func=sa.func.count(column),
selectable=selectable,
sqlalchemy_engine=sqlalchemy_engine,
)

if dialect.name.lower() == "mssql":
standard_deviation = sa.func.stdev(column)
else:
standard_deviation = sa.func.stddev_samp(column)

column_std = _get_query_result(
func=standard_deviation,
selectable=selectable,
sqlalchemy_engine=sqlalchemy_engine,
)

column_third_moment = _get_query_result(
func=sa.func.sum(sa.func.pow(column - column_mean, 3)),
selectable=selectable,
sqlalchemy_engine=sqlalchemy_engine,
)

column_skew = column_third_moment / (column_std ** 3) / (column_count - 1)
if metric_value_kwargs["abs"]:
return np.abs(column_skew)
else:
return column_skew


def _get_query_result(func, selectable, sqlalchemy_engine):
simple_query: Select = sa.select(func).select_from(selectable)

try:
result: Row = sqlalchemy_engine.execute(simple_query).fetchone()[0]
return result
except ProgrammingError as pe:
exception_message: str = "An SQL syntax Exception occurred."
exception_traceback: str = traceback.format_exc()
exception_message += (
f'{type(pe).__name__}: "{str(pe)}". Traceback: "{exception_traceback}".'
)
logger.error(exception_message)
raise pe()

#
# @metric_value(engine=SparkDFExecutionEngine, metric_fn_type="value")
# def _spark(
# cls,
Expand Down Expand Up @@ -229,27 +305,31 @@ class ExpectColumnSkewToBeBetween(ColumnExpectation):
"title": "positive_test_positive_skew",
"exact_match_out": False,
"include_in_gallery": True,
"tolerance": 0.1,
"in": {"column": "a", "min_value": 0.25, "max_value": 10},
"out": {"success": True, "observed_value": 1.6974323016687487},
},
{
"title": "negative_test_no_skew",
"exact_match_out": False,
"include_in_gallery": True,
"tolerance": 0.1,
"in": {"column": "b", "min_value": 0.25, "max_value": 10},
"out": {"success": False, "observed_value": -0.07638895580386174},
},
{
"title": "positive_test_negative_skew",
"exact_match_out": False,
"include_in_gallery": True,
"tolerance": 0.1,
"in": {"column": "c", "min_value": -10, "max_value": -0.5},
"out": {"success": True, "observed_value": -0.9979514313860596},
},
{
"title": "negative_test_abs_skew",
"exact_match_out": False,
"include_in_gallery": True,
"tolerance": 0.1,
"in": {
"column": "c",
"abs": True,
Expand All @@ -262,6 +342,7 @@ class ExpectColumnSkewToBeBetween(ColumnExpectation):
"title": "positive_test_abs_skew",
"exact_match_out": False,
"include_in_gallery": True,
"tolerance": 0.1,
"in": {
"column": "c",
"abs": True,
Expand All @@ -271,7 +352,17 @@ class ExpectColumnSkewToBeBetween(ColumnExpectation):
"out": {"success": True, "observed_value": 0.9979514313860596},
},
],
},
"test_backends": [
{
"backend": "pandas",
"dialects": None,
},
{
"backend": "sqlalchemy",
"dialects": ["mysql", "postgresql"],
},
],
}
]

# This dictionary contains metadata for display in the public gallery
Expand Down Expand Up @@ -401,4 +492,5 @@ def _validate(

if __name__ == "__main__":
self_check_report = ExpectColumnSkewToBeBetween().run_diagnostics()

print(json.dumps(self_check_report, indent=2))
10 changes: 5 additions & 5 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,12 +51,12 @@ If you are using GitHub pages for hosting, this command is a convenient way to b

## Other relevant files

The following are a few details about other files docusaurus uses that you may wish to be familiar with.
The following are a few details about other files Docusaurus uses that you may wish to be familiar with.

- `../sidebars.js`: javascript that specifies the sidebar/navigation used in docs pages
- `../sidebars.js`: JavaScript that specifies the sidebar/navigation used in docs pages
- `../src`: non-docs pages live here
- `../static`: static assets used in docs pages (such as css) live here
- `../docusaurus.config.js`: the configuration file for docusaurus
- `../babel.config.js`: babel config file used when building
- `../static`: static assets used in docs pages (such as CSS) live here
- `../docusaurus.config.js`: the configuration file for Docusaurus
- `../babel.config.js`: Babel config file used when building
- `../package.json`: dependencies and scripts
- `../yarn.lock`: dependency lock file that ensures reproducibility
17 changes: 17 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,23 @@ title: Changelog
---

### Develop
* [FEATURE] Add "test_backends" key to Expectation.examples for specifying test backends and dialects (#3257)


### 0.13.31
* [FEATURE] Enable `GCS DataConnector` integration with `PandasExecutionEngine` (#3264)
* [FEATURE] Enable column_pair expectations and tests for Spark (#3294)
* [FEATURE] Implement `InferredAssetGCSDataConnector` (#3284)
* [FEATURE]/CHANGE run time format (#3272) (Thanks @serialbandicoot)
* [DOCS] Fix misc errors in "How to create renderers for Custom Expectations" (#3315)
* [DOCS] GDOC-217 remove stub links (#3314)
* [DOCS] Remove misc TODOs to tidy up docs (#3313)
* [DOCS] Standardize capitalization of various technologies in `docs` (#3312)
* [DOCS] Fix broken link to Contributor docs (#3295) (Thanks @discdiver)
* [MAINTENANCE] Additional tests for RuntimeDataConnector at Datasource-level (query) (#3288)
* [MAINTENANCE] Update GCSStoreBackend + tests (#2630) (Thanks @hmandsager)
* [MAINTENANCE] Write integration/E2E tests for `ConfiguredAssetAzureDataConnector` (#3204)
* [MAINTENANCE] Write integration/E2E tests for both `GCSDataConnectors` (#3301)

### 0.13.30
* [FEATURE] Implement Spark Decorators and Helpers; Demonstrate on MulticolumnSumEqual Metric (#3289)
Expand Down
4 changes: 2 additions & 2 deletions docs/contributing/contributing_checklist.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,15 @@ Once your code is ready, please go through the following checklist before submit

* You can also rebase your branch from upstream/develop. In general, the steps are:

* Run git fetch upstream then git rebase upstream/develop.
* Run `git fetch` upstream then `git rebase upstream/develop`.

* Fix any merge conflicts that arise from the rebase.

* Make sure to add and commit all your changes in this step.

* Re-run tests to ensure the rebase did not introduce any new issues.

* Atlassian and Github both have good tutorials for rebasing: [Atlassian’s tutorial](https://www.atlassian.com/git/tutorials/git-forks-and-upstreams), [Github’s tutorial](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/syncing-a-fork).
* Atlassian and GitHub both have good tutorials for rebasing: [Atlassian’s tutorial](https://www.atlassian.com/git/tutorials/git-forks-and-upstreams), [GitHub’s tutorial](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/syncing-a-fork).

#### 4. Have you written and run all the tests you need?

Expand Down
8 changes: 4 additions & 4 deletions docs/contributing/contributing_misc.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ GE core team members use this checklist to ship releases.

1. If this is a major release (incrementing either the first or second version number) the manual acceptance testing must be completed.

* This [private google doc](https://docs.google.com/document/d/16QJPSCawEkwuEjShZeHa01TlQm9nbUwS6GwmFewJ3EY) outlines the procedure. (Note this will be made public eventually)
* This [private Google Doc](https://docs.google.com/document/d/16QJPSCawEkwuEjShZeHa01TlQm9nbUwS6GwmFewJ3EY) outlines the procedure. (Note this will be made public eventually)

2. Merge all approved PRs into `develop`.

Expand Down Expand Up @@ -48,17 +48,17 @@ GE core team members use this checklist to ship releases.
11. Check [PyPI](https://pypi.org/project/great-expectations/#history) for the new release


12. Create an annotated git tag:
12. Create an annotated Git tag:

* Run `git tag -a $VERSION -m $VERSION` with the correct new version.

* Push the tag up by running `git push origin $VERSION` with the correct new version.

* Merge main into develop so that the tagged commit becomes part of the history for develop: git checkout develop; git pull; git merge main
* Merge main into develop so that the tagged commit becomes part of the history for develop: `git checkout develop; git pull; git merge main`

* On develop, add a new “Develop” section header to changelog.md, and push the updated file with message “Update changelog for develop”

13. [Create the release on GitHub](https://github.com/great-expectations/great_expectations/releases) with the version number. Copy the changelog notes into the release notes, and update any rst-specific links to use github issue numbers.
13. [Create the release on GitHub](https://github.com/great-expectations/great_expectations/releases) with the version number. Copy the changelog notes into the release notes, and update any rst-specific links to use GitHub issue numbers.

* The deploy step will automatically create a draft for the release.

Expand Down
14 changes: 7 additions & 7 deletions docs/contributing/contributing_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ This is not required, but highly recommended.

* This will ensure that sure you have the right libraries installed in your Python environment.

* Note that you can also substitute requirements-dev-test.txt to only install requirements required for testing all backends, and requirements-dev-spark.txt or requirements-dev-sqlalchemy.txt if you would like to add support for Spark or sqlalchemy tests, respectively. For some database backends, such as MSSQL additional driver installation may required in your environment; see below for more information.
* Note that you can also substitute requirements-dev-test.txt to only install requirements required for testing all backends, and requirements-dev-spark.txt or requirements-dev-sqlalchemy.txt if you would like to add support for Spark or SQLAlchemy tests, respectively. For some database backends, such as MSSQL additional driver installation may required in your environment; see below for more information.

* [Installing Microsoft ODBC driver for MacOS](https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/install-microsoft-odbc-driver-sql-server-macos)

Expand All @@ -78,11 +78,11 @@ This is not required, but highly recommended.
*`-e` will install Great Expectations in “editable” mode. This is not required, but is often very convenient as a developer.

### (Optional) Configure resources for testing and documentation
Depending on which features of Great Expectations you want to work on, you may want to configure different backends for local testing, such as postgresql and Spark. Also, there are a couple of extra steps if you want to build documentation locally.
Depending on which features of Great Expectations you want to work on, you may want to configure different backends for local testing, such as PostgreSQL and Spark. Also, there are a couple of extra steps if you want to build documentation locally.

#### If you want to develop against local postgresql:
#### If you want to develop against local PostgreSQL:

* To simplify setup, the repository includes a `docker-compose` file that can stand up a local postgresql container. To use it, you’ll need to have [docker installed](https://docs.docker.com/install/).
* To simplify setup, the repository includes a `docker-compose` file that can stand up a local PostgreSQL container. To use it, you’ll need to have [Docker installed](https://docs.docker.com/install/).

* Navigate to `assets/docker/postgresql` in your `great_expectations` repo and run `docker-compose up -d`

Expand All @@ -96,7 +96,7 @@ Depending on which features of Great Expectations you want to work on, you may w

* Once you’re done testing, you can shut down your postgesql container by running `docker-compose down` from the same directory.

* Caution: If another service is using port 5432, docker may start the container but silently fail to set up the port. In that case, you will probably see errors like this:
* Caution: If another service is using port 5432, Docker may start the container but silently fail to set up the port. In that case, you will probably see errors like this:

````console
psycopg2.OperationalError: could not connect to server: Connection refused
Expand All @@ -116,7 +116,7 @@ Depending on which features of Great Expectations you want to work on, you may w

#### If you want to develop against local mysql:

* To simplify setup, the repository includes a `docker-compose` file that can stand up a local mysqldb container. To use it, you’ll need to have [docker installed](https://docs.docker.com/install/).
* To simplify setup, the repository includes a `docker-compose` file that can stand up a local mysqldb container. To use it, you’ll need to have [Docker installed](https://docs.docker.com/install/).

* Navigate to `assets/docker/mysql` in your `great_expectations` repo and run `docker-compose up -d`

Expand All @@ -130,7 +130,7 @@ Depending on which features of Great Expectations you want to work on, you may w

* Once you’re done testing, you can shut down your mysql container by running `docker-compose down` from the same directory.

* Caution: If another service is using port 3306, docker may start the container but silently fail to set up the port.
* Caution: If another service is using port 3306, Docker may start the container but silently fail to set up the port.

#### If you want to develop against local Spark:

Expand Down
2 changes: 1 addition & 1 deletion docs/contributing/contributing_style.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Within the table of contents, each section has specific role to play. Broadly sp

* **Tutorials** help users and contributors get started quickly. Along the way they orient new users to concepts that will be important to know later.

* **How-to guides** help users accomplish specific goals that go beyond the generic tutorials. Article titles within this section always start with “How to”: “How to create custom Expectations”. They often reference specific tools or infrastructure: “How to validate Expectations from within a notebook”, “How to build data docs in S3.”
* **How-to guides** help users accomplish specific goals that go beyond the generic tutorials. Article titles within this section always start with “How to”: “How to create custom Expectations”. They often reference specific tools or infrastructure: “How to validate Expectations from within a notebook”, “How to build Data Docs in S3.”

* **Reference** articles explain the architecture of Great Expectations. These articles explain core concepts, discuss alternatives and options, and provide context, history, and direction for the project. Reference articles avoid giving specific technical advice. They also avoid implementation details that can be captured in docstrings instead.

Expand Down

0 comments on commit 97d2dd1

Please sign in to comment.