Support different word sizes for distinct memory regions #466

priyasiddharth · 2022-11-04T17:01:55Z

Is your feature request related to a problem? Please describe.
Today, we use a single word size for all memory. This is not optimal. A memory region may only be accessed in an aligned (word) granularity or may need to support byte-wise access. For such situations, we degrade to using byte access for all memories leading to more complex VC. We would like to keep word accesses VCs wherever it is correct.

Describe the solution you'd like
Infer an alignment type for each memory region and compute access strategy based on that.

For more discussion see (private) post in: https://seahornteam.slack.com/archives/C042CLDJDA7/p1667577583935569

agurfinkel · 2022-11-04T23:17:56Z

@priyasiddharth talk to @igcontreras . Prepreq for this is to compute for each dsa-node the minimum alignment required.
Then we can have different word size for different DSA nodes. One option to consider is to also split DSA nodes by alignment by requiring that aliasing is allowed only over equally aligned regions of memory.

priyasiddharth · 2022-11-21T01:23:39Z

I've prototyped local memory wordsize in priyasiddharth@bd0021f

IIUC node->isOffsetCollapsed() does not work as intended. It returns false for unaligned memcpy shown below. I gather the memcpy is unaligned because align 1 is given. The full example is from array_list_push_back

call i32 @shadow.mem.store(i32 109, i32 %shadow.mem.109.2, i8* null), !dbg !806, !shadow.mem !223, !shadow.mem.def !223
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 16 %_146, i8* nonnull align 1 %_149, i64 %_91, i1 false) #11, !dbg !806, !noalias !595

To reproduce:

1. build `align2` branch from https://github.com/priyasiddharth/seahorn/tree/align2
2. build `verify-c-common` 
3.  ./verify --cex seahorn/jobs/array_list_push_back/ --horn-bv2-word-size=8 (fails)* and  ./verify --cex seahorn/jobs/array_list_push_back/ --horn-bv2-word-size=1 (passes)
4. I expect the failing test case to pass because `isCollapsed` should return true

* Error: Word size and pointer are not aligned and alignment is not ignored! (BvOpSem2MemRepr.cc:125)

@igcontreras I would appreciate if you can tell me what I am missing or change sea-dsa to do the right thing.

igcontreras · 2022-11-21T15:28:10Z

I don't know if memcpy is fully supported. @caballa would know more.
I believe the code is here: https://github.com/seahorn/sea-dsa/blob/master/src/DsaLocal.cc#L1172 but it does not take into account the number of bytes to be copied.

Also, I don't know if this could be an issue but on line 1197 there is an assert that the destination of the copy has a cell (https://github.com/seahorn/sea-dsa/blob/master/src/DsaLocal.cc#L1197) but then it is not used, a new cell is created: https://github.com/seahorn/sea-dsa/blob/master/src/DsaLocal.cc#L1203

agurfinkel · 2022-11-21T15:38:48Z

@priyasiddharth can you provide more information about what is known about the node in question. It is likely that the node in question is not collapsed. This would be an expected result. A node is collapsed if its fields are used in some complex way. This node probably has no fields, so there is no reason to collapse it.

The node might be labeled as a sequence/array.

It is possible that the node is not marked as anything since DSA mostly cares about memory that stores pointers, but this node might not have any pointers in it.

In summary, we need to refine when we classify a node as aligned. One simple addition is to check whether it is a destination of any unaligned memcpy, for example.

igcontreras · 2022-11-21T15:47:24Z

@priyasiddharth you can run the seahorn command with options --mem-dot --sea-dsa-dot-outdir=tmp. It will create .dot files with the memory graphs.
Then run

$ dot -Tpdf tmp/XXX.dot -o tmp/XXX.pdf

to see memory looks like.

priyasiddharth · 2022-11-21T16:25:42Z

This is one instruction

call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 16 %_146, i8* nonnull align 1 %_149, i64 %_91, i1 false) #11, !dbg !806, !noalias !595

This is the associated Node info

(seadsa::Node) $0 = {
  m_graph = 0x000055555c786f40
  m_nodeType = {
    shadow = 0
    foreign = 0
    alloca = 1
    heap = 0
    global = 0
    externFunc = 0
    externGlobal = 0
    unknown = 0
    incomplete = 0
    modified = 1
    read = 1
    array = 1
    offset_collapsed = 0
    type_collapsed = 0
    external = 0
    inttoptr = 0
    ptrtoint = 0
    vastart = 0
    dead = 0
    null = 0
  }

main.mem.pdf

agurfinkel · 2022-11-21T16:35:15Z

great. it is marked as array, so this is good enough for alignment of 1. Would have been better if the size is also 1. Is this known?

priyasiddharth · 2022-11-21T18:55:55Z

I don't know how to answer that question unambiguously.

Perhaps you mean to ask if the compiler can statically infer the size of the data structure. The answer appears to be no since it is non deterministic.

https://sourcegraph.com/github.com/seahorn/verify-c-common@05bc66396b400a7af788a518fe50599434d33a30/-/blob/seahorn/lib/array_list_helper.c?L26

agurfinkel · 2022-11-21T20:40:49Z

Use these API to check whether the node is an Array/Sequence, and, optionally, if the element size is 1:

https://github.com/seahorn/sea-dsa/blob/fda066eff4271bafa626d0fa425ed34519d1fcaa/include/seadsa/Graph.hh#L675

https://github.com/seahorn/sea-dsa/blob/fda066eff4271bafa626d0fa425ed34519d1fcaa/include/seadsa/Graph.hh#L734

The rule is then, alignment = 1 if node is collapsed or node is array.
Or, if that rule is too weak

alignment = 1 if node is collapsed or (node is array and node size is 1)

You should be able to add this easily to the existing code you have to see if this solves the immediate problem.
Then we can refactor the code to support more such conditions.

priyasiddharth · 2022-11-23T15:53:11Z

One insight (atleast for me) is that there is difference in semantics of memory defined using memcpy between sea-dsa and BvOpSem2.

Problem definition

When copying bytes where source access is unaligned and dest access is aligned, in sea-dsa the source memory has unaligned access and destination memory has aligned access only. However, for BvOpSem2, a memcpy is a join point (ite) where either the destination memory is chosen or the source memory is chosen depending on the address being accessed. Thus, the sort of both the source and destination memory has to be the same. This defeats the purpose of having different sorts for memory values (e.g. bv8 vs bv32) to reduce the number of needed loads.

Solution

One solution is to propagate the unaligned access property such that if one of source or dest is unaligned, the other is too. This would make sure that if one memory is unaligned then all memories interacting with are unaligned too. This is drastic and perf would likely be close to what it is today ... setting everything to bv8 if unaligned access is seen.
A better solution is join memories more intelligently. When joining dest and source memories, we will bitwise manipulate the returned value from source such that its memsort matches that of dest. This ensures that load penalty (multiple select instructions) is only paid when accessing the unaligned sub-memory of any memory (atleast for case1).

Case 1 (src=unaligned (byte access), dst = aligned)

After memory is joined and a load occurs, we expect the load to be always aligned.

A memcpy ite with destArray: ptrSize -> 32bv and srcArray: ptrSize -> 8bv

ite(cond,
      array::select(dstarray, dst_index),
      bv::concat(
          array::select(srcarray, src_index), 
          array::select(srcarray, alu::add(src_index, 1),
          array::select(srcarray, alu::add(src_index, 2),
          array::select(srcarray, alu::add(src_index, 3))

Case 2 (src=aligned, dst = unaligned)

After memory is joined and a load occurs, we expect the load can be unaligned

A memcpy ite with destArray: ptrSize -> 8bv and srcArray: ptrSize -> 32bv. Assume getBase(index) returns the base (aligned) index given any index and getOffset(index) returns the offset in indices.

ite(cond,
      array::select(dstarray, dst_index),
      bv::extract(array::select(srcarray, 
                          getBase(src_index)), 
          alu::mult(getOffset(src_indx), 8)  /* start extract (in bits) */,  
          8 /* len in bits */) )

The following is a distilled test that should pass once this logic is implemented. It is also a useful artifact to study what is happening under-the-hood.

#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "seahorn/seahorn.h"

extern bool nd_bool();
extern uint32_t nd_uint32();
extern char nd_char();
extern size_t nd_size();
extern void memhavoc(void *ptr, size_t size);
extern void sea_printf(const char *format, ...);

typedef struct {
  char *data;
  size_t len;
} Buffer;

/**
 * This program copies data from one buffer to another using memcpy.
 * The source access is unaligned. The dest access is aligned.
 **/
int main() {
  // create Buffer b with havoc'ed data
  size_t len = nd_size();
  char *data = (char *) malloc(len);
  assume(len < 16);
  memhavoc(data, len);
  Buffer b = {.data = data, .len = len};
  // choose an unaligned offset to copy from
  size_t offset = nd_size();
  size_t copylen = nd_size();
  // The copylen is a multiple of word size
  assume (copylen % 8 == 0);
  size_t hot_idx = nd_size();
  assume (hot_idx < copylen);
  assume (offset < len);
  assume (copylen < len);
  assume (offset + copylen < len);

  // create Buffer b2 with havoc'ed data
  size_t len2 = nd_size();
  assume (len < len2);
  assume(len2 < 24);
  char *data2 = (char *) malloc(len2);
  memhavoc(data2, len2);
  Buffer b2 = {.data = data2, .len = len2};
  // non deterministically record a byte from source address
  char oldbyte = *(b.data + offset + hot_idx);

  // operation under verification
  // src access is unaligned
  // dst access is aligned
  memcpy(b2.data, b.data + offset, copylen);
  // record same byte from dest address
  char newbyte = *(b2.data + 0 + hot_idx);
  sea_printf("len:%ld len2:%ld: offset:%ld copylen:%ld hotidx: %ld", len, len2, offset, copylen, hot_idx);
  sassert(oldbyte == newbyte);
}

agurfinkel · 2022-11-23T17:18:34Z

Lets talk. If there is a difference in semantics, it is unintentional.

sea-dsa treats memcpy as sufficient conditions to loose field sensitivity (i.e., collapse), unless the copy does not affect any pointers. This 'unless' step is likely unsound and (probably) only optionaly enabled.

sea-dsa does not care about alignment of memcpy.

BvOpSem should support all forms of alignment (eventually). When it was written, there was only one alignment mark for memcpy that applied to both source and destination. Now, there are separate alignment for source and dest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support different word sizes for distinct memory regions #466

Support different word sizes for distinct memory regions #466

priyasiddharth commented Nov 4, 2022

agurfinkel commented Nov 4, 2022

priyasiddharth commented Nov 21, 2022 •

edited

igcontreras commented Nov 21, 2022

agurfinkel commented Nov 21, 2022

igcontreras commented Nov 21, 2022

priyasiddharth commented Nov 21, 2022

agurfinkel commented Nov 21, 2022

priyasiddharth commented Nov 21, 2022 •

edited

agurfinkel commented Nov 21, 2022

priyasiddharth commented Nov 23, 2022 •

edited

agurfinkel commented Nov 23, 2022

Support different word sizes for distinct memory regions #466

Support different word sizes for distinct memory regions #466

Comments

priyasiddharth commented Nov 4, 2022

agurfinkel commented Nov 4, 2022

priyasiddharth commented Nov 21, 2022 • edited

igcontreras commented Nov 21, 2022

agurfinkel commented Nov 21, 2022

igcontreras commented Nov 21, 2022

priyasiddharth commented Nov 21, 2022

agurfinkel commented Nov 21, 2022

priyasiddharth commented Nov 21, 2022 • edited

agurfinkel commented Nov 21, 2022

priyasiddharth commented Nov 23, 2022 • edited

Problem definition

Solution

Case 1 (src=unaligned (byte access), dst = aligned)

Case 2 (src=aligned, dst = unaligned)

agurfinkel commented Nov 23, 2022

priyasiddharth commented Nov 21, 2022 •

edited

priyasiddharth commented Nov 21, 2022 •

edited

priyasiddharth commented Nov 23, 2022 •

edited