Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDB condition operator arity not handled correctly, produces semi-cryptic error #4113

Open
crh23 opened this issue May 3, 2024 · 3 comments
Assignees
Labels
bug This issue is a confirmed bug. dynamodb p2 This is a standard priority issue

Comments

@crh23
Copy link

crh23 commented May 3, 2024

Describe the bug

The binary condition operators in boto3.dynamodb.conditions, such as And and Or, don't correctly handle being given three or more arguments. This is not caught client-side, and causes a pretty cryptic error.

Expected Behavior

Either the operators are handled correctly (ideally with type hinting), or a more relevant error is raised

Current Behavior

Errors are like

botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the Scan operation: Value provided in ExpressionAttributeNames unused in expressions: keys: {#n2}

the generated request body from the below SSCCE is

{"TableName": "my_table", "FilterExpression": "(#n0 > :v0 AND #n1 < :v1)", "ExpressionAttributeNames": {"#n0": "age", "#n1": "age", "#n2": "height"}, "ExpressionAttributeValues": {":v0": {"N": "20"}, ":v1": {"N": "30"}, ":v2": {"N": "180"}}}

which is indeed invalid as indicated in the error message.

Reproduction Steps

Example I would expect to work but doesn't:

import boto3
from boto3.dynamodb.conditions import Attr, And

# Table doesn't need to exist to reproduce, since request is validated syntactically before hitting table
table_name = "my_table"

dynamodb = boto3.resource("dynamodb")

table = dynamodb.Table(table_name)

response = table.scan(
    FilterExpression=And(
        Attr("age").gt(20),
        Attr("age").lt(30),
        Attr("height").gt(180),
    )
)

Current workaround:

import boto3
from boto3.dynamodb.conditions import Attr, And
from functools import reduce

# Table does need to exist if you want a successful response!
table_name = "my_table"

dynamodb = boto3.resource("dynamodb")

table = dynamodb.Table(table_name)

response = table.scan(
    FilterExpression=reduce(
        And, [
            Attr("age").gt(20),
            Attr("age").lt(30),
            Attr("height").gt(180),
        ]
    )
)

Possible Solution

Explanation:

The conditions all inherit from ConditionBase, which has properties expression_operator (the character in the DynamoDB condition expression syntax used to indicate the relevant operator), values (the operands), and expression_format (a string with python format fields {operator} and some of {0}, {1}, {2}).

The expression_operator and expression_format are defined by the operator, and the values are set on construction. For example, And:

class And(ConditionBase):
expression_operator = 'AND'
expression_format = '({0} {operator} {1})'

Note that ConditionBase's constructor takes an arbitrary number of values.

The number of values passed to the condition operator is never validated, which is the root cause of this error. The core of the expression builder is

def _build_expression(
self,
condition,
attribute_name_placeholders,
attribute_value_placeholders,
is_key_condition,
):
expression_dict = condition.get_expression()
replaced_values = []
for value in expression_dict['values']:
# Build the necessary placeholders for that value.
# Placeholders are built for both attribute names and values.
replaced_value = self._build_expression_component(
value,
attribute_name_placeholders,
attribute_value_placeholders,
condition.has_grouped_values,
is_key_condition,
)
replaced_values.append(replaced_value)
# Fill out the expression using the operator and the
# values that have been replaced with placeholders.
return expression_dict['format'].format(
*replaced_values, operator=expression_dict['operator']
)

Note that every value passed (the number of which is never checked) gets a placeholder value, but only as many as there are number format fields in expression_format will actually be included in the final condition expression (since Python's str.format() silently ignores extraneous arguments).

Possible fix:

Add a property arity or number_of_values to ConditionBase, and have the constructor validate that a valid number has been received. Optionally, for quality of life also expand the And and Or constructs to take an arbitrary number of arguments (either by transparently expanding to "{0} AND {1} AND {2} AND {3}" or "((({0} AND {1}) AND {2}) AND {3})")

Additional Information/Context

No response

SDK version used

1.34.97

Environment details (OS name and version, etc.)

Windows 10

@crh23 crh23 added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels May 3, 2024
@crh23
Copy link
Author

crh23 commented May 3, 2024

As an aside, those operators don't seem to be documented anywhere, causing #4112 (doc issue now opened as #4117)

@RyanFitzSimmonsAK RyanFitzSimmonsAK self-assigned this May 9, 2024
@RyanFitzSimmonsAK RyanFitzSimmonsAK added investigating This issue is being investigated and/or work is in progress to resolve the issue. dynamodb p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels May 9, 2024
@RyanFitzSimmonsAK
Copy link
Contributor

Hi @crh23, thanks for reaching out. You should be able to chain these conditions using logical operators. The FilterExpression you used as an example would be written as FilterExpression = Attr("age").gt(20) & Attr("age").lt(30) & Attr("height").gt(180). Using this, I was able to successfully scan a table and have all three conditions applied. Does this fulfill your use case, or am I misunderstanding the issue in some way?

@RyanFitzSimmonsAK RyanFitzSimmonsAK added response-requested Waiting on additional information or feedback. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels May 9, 2024
@crh23
Copy link
Author

crh23 commented May 10, 2024

Chaining does work, since using & makes Python collapse it down to a bunch of binary operations. I definitely don't think that there is missing functionality here, just a sharp edge for devs to catch themselves on.

The main issues are the lack of documentation (#4117) and the lack of validation (this issue). It would be reasonable to expect the example I gave to work, but:

  1. It doesn't work (OK)
  2. This isn't caught until the request is sent (not great - it would be easy to catch this client-side)
  3. The error message doesn't obviously relate to the actual cause of the error (not OK - makes debugging unnecessarily hard).

For context, this was raised by an actual user who I helped to fix their code - their code was similar to my given example, and they didn't understand why it wasn't working.

(this would also be addressed with proper type signatures for boto3, but that's a somewhat larger fix :P )

@RyanFitzSimmonsAK RyanFitzSimmonsAK removed the response-requested Waiting on additional information or feedback. label May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. dynamodb p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

2 participants