Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

append_rows call is slow it take between 1-2 secs avg time #719

Open
amandaolens opened this issue Dec 6, 2023 · 0 comments
Open

append_rows call is slow it take between 1-2 secs avg time #719

amandaolens opened this issue Dec 6, 2023 · 0 comments
Labels
api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API.

Comments

@amandaolens
Copy link

append_rows call is slow it take between 1-2 secs avg time . sample code i'm using

   ```
 for row in [data]:
            # print(row,type(row))
            message = self._get_proto_message(table_id)
            # unknown_fields = ParseDict(row, message).unknown_fields
            # print(unknown_fields)
            for field_name, value in row.items():
                # print(field_name)
                # print(type(value))
                # if field_name in unknown_fields :
                #     continue
                if field_name == "createdAt":
                    timestamp_format = "%Y-%m-%dT%H:%M:%S.%fZ"
                    corrected_value = value.strip("'")  # Remove single quotes
                    timestamp_datetime = datetime.strptime(
                        corrected_value, timestamp_format
                    )

                    # Convert datetime to Epoch timestamp in microseconds
                    timestamp_microseconds = int(
                        timestamp_datetime.timestamp() * 1e6
                    )

                    setattr(message, field_name, timestamp_microseconds)
                else:
                    if field_name in ["firstTimeFilter"]:
                        if value == "True":
                            setattr(message, field_name, True)
                        else:
                            setattr(message, field_name, False)
                    elif field_name.startswith("u_"):
                        # message[field_name] = []
                        repeated_field = getattr(message, field_name, None)
                        for x in value:
                            if repeated_field is not None:
                                repeated_field.extend(x)
                    else:
                        setattr(message, field_name, value)

            serialized_rows.append(message.SerializeToString())
        stream_name = self.write_stream.name
     

        proto_schema = ProtoSchema()
        proto_descriptor = descriptor_pb2.DescriptorProto()
        self._copy_proto_descriptor(proto_descriptor, table_id)
        proto_schema.proto_descriptor = proto_descriptor
        proto_data = AppendRowsRequest.ProtoData()
        proto_data.writer_schema = proto_schema

        request = AppendRowsRequest()
        proto_rows = ProtoRows()
        proto_rows.serialized_rows = serialized_rows
        proto_data = AppendRowsRequest.ProtoData()
        proto_data.rows = proto_rows
        proto_data.writer_schema = proto_schema
        request.proto_rows = proto_data
        request.write_stream = stream_name

        start = time.time()

        await self.write_client.append_rows(iter([request]))
        print(""" V1 {} """.format(time.time() - start))
@product-auto-label product-auto-label bot added the api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. label Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API.
Projects
None yet
Development

No branches or pull requests

1 participant