Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bigtable: systest module setup flakes creating clusters. #3

Closed
tseaver opened this issue Oct 24, 2019 · 3 comments
Closed

Bigtable: systest module setup flakes creating clusters. #3

tseaver opened this issue Oct 24, 2019 · 3 comments
Assignees
Labels
api: bigtable Issues related to the googleapis/python-bigtable API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@tseaver
Copy link
Contributor

tseaver commented Oct 24, 2019

From this CI failure:

__________ ERROR at setup of TestInstanceAdminAPI.test_cluster_exists __________

target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7fb024845240>>)
predicate = <function if_exception_type.<locals>.if_exception_type_predicate at 0x7fb0249bc488>
sleep_generator = <generator object exponential_sleep_generator at 0x7fb024880930>
deadline = 10, on_error = None

    def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
        """Call a function and retry if it fails.

        This is the lowest-level retry helper. Generally, you'll use the
        higher-level retry helper :class:`Retry`.

        Args:
            target(Callable): The function to call and retry. This must be a
                nullary function - apply arguments with `functools.partial`.
            predicate (Callable[Exception]): A callable used to determine if an
                exception raised by the target should be considered retryable.
                It should return True to retry or False otherwise.
            sleep_generator (Iterable[float]): An infinite iterator that determines
                how long to sleep between retries.
            deadline (float): How long to keep retrying the target.
            on_error (Callable): A function to call while processing a retryable
                exception.  Any error raised by this function will *not* be
                caught.

        Returns:
            Any: the return value of the target function.

        Raises:
            google.api_core.RetryError: If the deadline is exceeded while retrying.
            ValueError: If the sleep generator stops yielding values.
            Exception: If the target raises a method that isn't retryable.
        """
        if deadline is not None:
            deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
                seconds=deadline
            )
        else:
            deadline_datetime = None

        last_exc = None

        for sleep in sleep_generator:
            try:
>               return target()

../api_core/google/api_core/retry.py:182:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <google.api_core.operation.Operation object at 0x7fb024845240>

    def _done_or_raise(self):
        """Check if the future is done and raise if it's not."""
        if not self.done():
>           raise _OperationNotComplete()
E           google.api_core.future.polling._OperationNotComplete

../api_core/google/api_core/future/polling.py:81: _OperationNotComplete

The above exception was the direct cause of the following exception:

self = <google.api_core.operation.Operation object at 0x7fb024845240>
timeout = 10

    def _blocking_poll(self, timeout=None):
        """Poll and wait for the Future to be resolved.

        Args:
            timeout (int):
                How long (in seconds) to wait for the operation to complete.
                If None, wait indefinitely.
        """
        if self._result_set:
            return

        retry_ = self._retry.with_deadline(timeout)

        try:
>           retry_(self._done_or_raise)()

../api_core/google/api_core/future/polling.py:101:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = (), kwargs = {}
target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7fb024845240>>)
sleep_generator = <generator object exponential_sleep_generator at 0x7fb024880930>

    @general_helpers.wraps(func)
    def retry_wrapped_func(*args, **kwargs):
        """A wrapper that calls target function with retry."""
        target = functools.partial(func, *args, **kwargs)
        sleep_generator = exponential_sleep_generator(
            self._initial, self._maximum, multiplier=self._multiplier
        )
        return retry_target(
            target,
            self._predicate,
            sleep_generator,
            self._deadline,
>           on_error=on_error,
        )

../api_core/google/api_core/retry.py:277:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7fb024845240>>)
predicate = <function if_exception_type.<locals>.if_exception_type_predicate at 0x7fb0249bc488>
sleep_generator = <generator object exponential_sleep_generator at 0x7fb024880930>
deadline = 10, on_error = None

    def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
        """Call a function and retry if it fails.

        This is the lowest-level retry helper. Generally, you'll use the
        higher-level retry helper :class:`Retry`.

        Args:
            target(Callable): The function to call and retry. This must be a
                nullary function - apply arguments with `functools.partial`.
            predicate (Callable[Exception]): A callable used to determine if an
                exception raised by the target should be considered retryable.
                It should return True to retry or False otherwise.
            sleep_generator (Iterable[float]): An infinite iterator that determines
                how long to sleep between retries.
            deadline (float): How long to keep retrying the target.
            on_error (Callable): A function to call while processing a retryable
                exception.  Any error raised by this function will *not* be
                caught.

        Returns:
            Any: the return value of the target function.

        Raises:
            google.api_core.RetryError: If the deadline is exceeded while retrying.
            ValueError: If the sleep generator stops yielding values.
            Exception: If the target raises a method that isn't retryable.
        """
        if deadline is not None:
            deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
                seconds=deadline
            )
        else:
            deadline_datetime = None

        last_exc = None

        for sleep in sleep_generator:
            try:
                return target()

            # pylint: disable=broad-except
            # This function explicitly must deal with broad exceptions.
            except Exception as exc:
                if not predicate(exc):
                    raise
                last_exc = exc
                if on_error is not None:
                    on_error(exc)

            now = datetime_helpers.utcnow()
            if deadline_datetime is not None and deadline_datetime < now:
                six.raise_from(
                    exceptions.RetryError(
                        "Deadline of {:.1f}s exceeded while calling {}".format(
                            deadline, target
                        ),
                        last_exc,
                    ),
>                   last_exc,
                )

../api_core/google/api_core/retry.py:202:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

value = None, from_value = _OperationNotComplete()

>   ???
E   google.api_core.exceptions.RetryError: Deadline of 10.0s exceeded while calling functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7fb024845240>>), last exception:

<string>:3: RetryError

During handling of the above exception, another exception occurred:

    def setUpModule():
        from google.cloud.exceptions import GrpcRendezvous
        from google.cloud.bigtable.enums import Instance

        # See: https://github.com/googleapis/google-cloud-python/issues/5928
        interfaces = table_admin_config.config["interfaces"]
        iface_config = interfaces["google.bigtable.admin.v2.BigtableTableAdmin"]
        methods = iface_config["methods"]
        create_table = methods["CreateTable"]
        create_table["timeout_millis"] = 90000

        Config.IN_EMULATOR = os.getenv(BIGTABLE_EMULATOR) is not None

        if Config.IN_EMULATOR:
            credentials = EmulatorCreds()
            Config.CLIENT = Client(admin=True, credentials=credentials)
        else:
            Config.CLIENT = Client(admin=True)

        Config.INSTANCE = Config.CLIENT.instance(INSTANCE_ID, labels=LABELS)
        Config.CLUSTER = Config.INSTANCE.cluster(
            CLUSTER_ID, location_id=LOCATION_ID, serve_nodes=SERVE_NODES
        )
        Config.INSTANCE_DATA = Config.CLIENT.instance(
            INSTANCE_ID_DATA, instance_type=Instance.Type.DEVELOPMENT, labels=LABELS
        )
        Config.CLUSTER_DATA = Config.INSTANCE_DATA.cluster(
            CLUSTER_ID_DATA, location_id=LOCATION_ID
        )

        if not Config.IN_EMULATOR:
            retry = RetryErrors(GrpcRendezvous, error_predicate=_retry_on_unavailable)
            instances, failed_locations = retry(Config.CLIENT.list_instances)()

            if len(failed_locations) != 0:
                raise ValueError("List instances failed in module set up.")

            EXISTING_INSTANCES[:] = instances

            # After listing, create the test instances.
            admin_op = Config.INSTANCE.create(clusters=[Config.CLUSTER])
            admin_op.result(timeout=10)
            data_op = Config.INSTANCE_DATA.create(clusters=[Config.CLUSTER_DATA])
>           data_op.result(timeout=10)

tests/system.py:141:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../api_core/google/api_core/future/polling.py:122: in result
    self._blocking_poll(timeout=timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <google.api_core.operation.Operation object at 0x7fb024845240>
timeout = 10

    def _blocking_poll(self, timeout=None):
        """Poll and wait for the Future to be resolved.

        Args:
            timeout (int):
                How long (in seconds) to wait for the operation to complete.
                If None, wait indefinitely.
        """
        if self._result_set:
            return

        retry_ = self._retry.with_deadline(timeout)

        try:
            retry_(self._done_or_raise)()
        except exceptions.RetryError:
            raise concurrent.futures.TimeoutError(
>               "Operation did not complete within the designated " "timeout."
            )
E           concurrent.futures._base.TimeoutError: Operation did not complete within the designated timeout.

../api_core/google/api_core/future/polling.py:104: TimeoutError

Repeated multiple times.

@HemangChothani
Copy link
Contributor

I ran this test for 10000 times and try to reproduce it. Nevertheless, I didn't catch this error.. I assume it could be the one-time problem.

Screenshot from 2019-11-04 17-16-07
Screenshot from 2019-11-04 18-36-56
Screenshot from 2019-11-04 16-01-00

@crwilcox crwilcox transferred this issue from googleapis/google-cloud-python Jan 31, 2020
@product-auto-label product-auto-label bot added the api: bigtable Issues related to the googleapis/python-bigtable API. label Jan 31, 2020
@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Feb 3, 2020
@frankyn frankyn added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Feb 4, 2020
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Apr 21, 2020
@mf2199
Copy link
Contributor

mf2199 commented May 14, 2020

@tseaver, I've run the tests independently for over 1k cycles, as @HemangChothani did last November, and so did @paul1319. No errors were caught. It is possible that the failure was a one-time event, for unrelated reasons. Unless there's a way to reproduce it, I'd suggest closing the issue for now and making a note of it for the future.

@kolea2
Copy link
Collaborator

kolea2 commented May 20, 2020

Per @mf2199's analysis, I'm going to close this for now. Let's reopen if it occurs again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/python-bigtable API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

6 participants