Fix sanitizer errors #1266

antamel · 2023-10-10T10:35:30Z

Hello!

In the build with address and undefined behavior sanitizers, my colleague
Andrew Bille andrewbille@gmail.com found several errors
in the current master branch.

The build was made with options like that:

CC=gcc-11
CXX=g++-11
CPPFLAGS="-ggdb -O0 -g3 -fno-omit-frame-pointer -fsanitize=address -fsanitize=undefined -fsanitize=alignment -fno-sanitize-recover=all -fno-sanitize=nonnull-attribute -fstack-protector"
LDFLAGS='-fsanitize=address -fsanitize=undefined -static-libasan -static-libubsan'
ASAN_OPTIONS=detect_leaks=0:abort_on_error=1:halt_on_error=1:disable_coredump=0:strict_string_checks=1:check_initialization_order=1:strict_init_order=1:detect_odr_violation=0

There are several "load of misaligned address" sanitizer errors in the log
during installcheck tests with PG at REL_15_STABLE (6db384f2f9).
See master-at-a04fd69b.zip attached. (Other errors are masked by this, about them below.)
Some of these errors are not related to age, is a bug of the sanitizer itself
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111382).
The other ones i tried to fix with the Fix sanitizer aligment errors with memcpy. commit.

After that "misaligned" errors disappeared and other errors became visible.
Please see postmaster.log in the after-first-fix-at-de3633d2.zip.
The second commit Fix sanitizer "invalid values for type '_Bool' " errors with palloc0. fixes
runtime errors: "load of value xxx, which is not a valid value for type '_Bool'"
while the third Fix sanitizer heap-buffer-overflow errors during csv files parsing. one
fixes "heap-buffer-overflow on address.." one during csv file parsing.

On applying the first three patches, another error heap-buffer-overflow
became visible. See log in before-last-fix-at-2840ce1f.zip.
The last commit Save the metadata representing by the extensible node with the null t…
fixes it add all tests began to pass without errors with
REL_15_STABLE and REL_14_STABLE branches.

Would be happy for any comments and concerns.

With kind regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

…erminator.

jrgemignani

A lot of these changes appear unnecessary and need justification before being considered and added. This is not to say that they won't be added, they just need to be discussed first.

jrgemignani · 2023-10-10T17:03:26Z

src/backend/utils/load/ag_load_edges.c

@@ -160,7 +160,7 @@ int create_edges_from_csv_file(char *file_path,
    struct csv_parser p;
    char buf[1024];
    size_t bytes_read;
-    unsigned char options = 0;
+    unsigned char options = CSV_APPEND_NULL;


Please elaborate why. This is not dismissing it, but inquiring why.

jrgemignani · 2023-10-10T17:03:34Z

src/backend/utils/load/ag_load_labels.c

@@ -194,7 +194,7 @@ int create_labels_from_csv_file(char *file_path,
    struct csv_parser p;
    char buf[1024];
    size_t bytes_read;
-    unsigned char options = 0;
+    unsigned char options = CSV_APPEND_NULL;


Please elaborate why. This is not dismissing it, but inquiring why.

jrgemignani · 2023-10-10T17:35:45Z

src/backend/executor/cypher_set.c

@@ -255,7 +255,7 @@ static agtype_value *replace_entity_in_path(agtype_value *path,
    agtype *prop_agtype;
    int i;

-    r = palloc(sizeof(agtype_value));
+    r = palloc0(sizeof(agtype_value));


While it is generally good form to pre-initialize to zero, is it really necessary if the routine that uses it, sets it up properly? In the case of these iterator functions, which are generally slow, it takes up additional cpu cycles. So, I'm not sure this is necessary. Can you justify adding this?

jrgemignani · 2023-10-10T17:35:56Z

src/backend/utils/adt/agtype.c

@@ -9043,7 +9043,7 @@ agtype_value *agtype_composite_to_agtype_value_binary(agtype *a)
                        errmsg("cannot convert agtype scalar objects to binary agtype_value objects")));
    }

-    result = palloc(sizeof(agtype_value));
+    result = palloc0(sizeof(agtype_value));


While it is generally good form to pre-initialize to zero, is it really necessary if the routine that uses it, sets it up properly? In the case of these iterator functions, which are generally slow, it takes up additional cpu cycles. So, I'm not sure this is necessary. Can you justify adding this?

jrgemignani · 2023-10-10T17:36:19Z

src/backend/utils/adt/agtype_ext.c

@@ -195,7 +195,7 @@ static void ag_deserialize_composite(char *base, enum agtype_value_type type,
    //offset container by the extended type header
    char *container_base = base + AGT_HEADER_SIZE;

-    r = palloc(sizeof(agtype_value));
+    r = palloc0(sizeof(agtype_value));


While it is generally good form to pre-initialize to zero, is it really necessary if the routine that uses it, sets it up properly? In the case of these iterator functions, which are generally slow, it takes up additional cpu cycles. So, I'm not sure this is necessary. Can you justify adding this?

jrgemignani · 2023-10-10T17:55:02Z

src/backend/utils/adt/agtype_util.c

@@ -1306,7 +1306,7 @@ bool agtype_deep_contains(agtype_iterator **val, agtype_iterator **m_contained)
                    uint32 j = 0;

                    /* Make room for all possible values */
-                    lhs_conts = palloc(sizeof(agtype_value) * num_lhs_elems);
+                    lhs_conts = palloc0(sizeof(agtype_value) * num_lhs_elems);


Similar to the above palloc0 comment. I'm not opposed to adding this in, I'm just not sure that it is necessary.

jrgemignani · 2023-10-10T18:17:34Z

src/backend/utils/adt/agtype_ext.c

@@ -59,7 +59,7 @@ bool ag_serialize_extended_type(StringInfo buffer, agtentry *agtentry,
        /* copy in the int_value data */
        numlen = sizeof(int64);
        offset = reserve_from_buffer(buffer, numlen);
-        *((int64 *)(buffer->data + offset)) = scalar_val->val.int_value;
+        memcpy( buffer->data + offset, &scalar_val->val.int_value, numlen);


I don't see why this is necessary, maybe it is just me, could you please elaborate?

jrgemignani · 2023-10-10T18:17:40Z

src/backend/utils/adt/agtype_ext.c

@@ -70,7 +70,7 @@ bool ag_serialize_extended_type(StringInfo buffer, agtentry *agtentry,
        /* copy in the float_value data */
        numlen = sizeof(scalar_val->val.float_value);
        offset = reserve_from_buffer(buffer, numlen);
-        *((float8 *)(buffer->data + offset)) = scalar_val->val.float_value;
+        memcpy(buffer->data + offset, &scalar_val->val.int_value, numlen);


I don't see why this is necessary, maybe it is just me, could you please elaborate?

jrgemignani · 2023-10-10T18:17:46Z

src/backend/utils/adt/agtype_ext.c

@@ -156,12 +156,12 @@ void ag_deserialize_extended_type(char *base_addr, uint32 offset,
    {
    case AGT_HEADER_INTEGER:
        result->type = AGTV_INTEGER;
-        result->val.int_value = *((int64 *)(base + AGT_HEADER_SIZE));
+        memcpy(&result->val.int_value, base + AGT_HEADER_SIZE, sizeof(int64));


I don't see why this is necessary, maybe it is just me, could you please elaborate?

jrgemignani · 2023-10-10T18:17:54Z

src/backend/utils/adt/agtype_ext.c

        break;

    case AGT_HEADER_FLOAT:
        result->type = AGTV_FLOAT;
-        result->val.float_value = *((float8 *)(base + AGT_HEADER_SIZE));
+        memcpy(&result->val.float_value, base + AGT_HEADER_SIZE, sizeof(float8));


I don't see why this is necessary, maybe it is just me, could you please elaborate?

rafsun42 · 2024-02-15T23:45:37Z

+1 for the palloc0 changes as PostgreSQL recommends it as well:

https://www.postgresql.org/docs/current/xfunc-c.html

Always zero the bytes of your structures using memset (or allocate them with palloc0 in the first place). Even if you assign to each field of your structure, there might be alignment padding (holes in the structure) that contain garbage values. Without this, it's difficult to support hash indexes or hash joins, as you must pick out only the significant bits of your data structure to compute a hash. The planner also sometimes relies on comparing constants via bitwise equality, so you can get undesirable planning results if logically-equivalent values aren't bitwise equal.

github-actions · 2024-05-11T00:07:58Z

This PR is stale because it has been open 45 days with no activity. Remove "Abondoned" label or comment or this will be closed in 7 days.

jrgemignani · 2024-05-13T18:06:56Z

@rafsun42 As the original creator just posted this PR and has yet to respond since, should we break this into 2 PRs, one for memcpy, one for palloc0, attribute this PR and implement ourselves?

antamel added 4 commits October 10, 2023 11:35

Fix sanitizer aligment errors with memcpy.

de3633d

Fix sanitizer "invalid values for type '_Bool' " errors with palloc0.

3afbad6

Fix sanitizer heap-buffer-overflow errors during csv files parsing.

2840ce1

Save the metadata representing by the extensible node with the null t…

71f445a

…erminator.

github-actions bot added the master label Oct 10, 2023

jrgemignani requested changes Oct 10, 2023

View reviewed changes

github-actions bot added the Stale Stale issues/PRs label May 11, 2024

github-actions bot removed the Stale Stale issues/PRs label May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix sanitizer errors #1266

Fix sanitizer errors #1266

antamel commented Oct 10, 2023

jrgemignani left a comment •

edited

jrgemignani Oct 10, 2023

jrgemignani Oct 10, 2023

jrgemignani Oct 10, 2023

jrgemignani Oct 10, 2023

jrgemignani Oct 10, 2023

jrgemignani Oct 10, 2023

jrgemignani Oct 10, 2023

jrgemignani Oct 10, 2023

jrgemignani Oct 10, 2023

jrgemignani Oct 10, 2023

rafsun42 commented Feb 15, 2024

github-actions bot commented May 11, 2024

jrgemignani commented May 13, 2024

Fix sanitizer errors #1266

Are you sure you want to change the base?

Fix sanitizer errors #1266

Conversation

antamel commented Oct 10, 2023

jrgemignani left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafsun42 commented Feb 15, 2024

github-actions bot commented May 11, 2024

jrgemignani commented May 13, 2024

jrgemignani left a comment •

edited