memory question

hi,

i have the following question:
how could memory non-alignment affect the memory operations how to solve it?

Youka

I like this this explanation.
For aligned memory, there're functions, like _aligned_malloc and _aligned_free. Here a nice implementation of these.

how does the following code work?

// cache line
#define ALIGN 64

void *aligned_malloc(int size) {
    void *mem = malloc(size+ALIGN+sizeof(void*));
    void **ptr = (void**)((long)(mem+ALIGN+sizeof(void*)) & ~(ALIGN-1));
    ptr[-1] = mem;
    return ptr;
}

void aligned_free(void *ptr) {
    free(((void**)ptr)[-1]);
}

thetopcoder schrieb:

how does the following code work?

// cache line
#define ALIGN 64

void *aligned_malloc(int size) {
    void *mem = malloc(size+ALIGN+sizeof(void*));
    void **ptr = (void**)((long)(mem+ALIGN+sizeof(void*)) & ~(ALIGN-1));
    ptr[-1] = mem;
    return ptr;
}

void aligned_free(void *ptr) {
    free(((void**)ptr)[-1]);
}

The 'aligned_malloc' code mallocs n+ALIGN bytes of memory and returns an address that is a multiple of ALIGN. The real address that 'malloc' returned is stored at index -1 of the address delivered by 'aligned_mallocÄ´', because the heap management needs it to free the whole memory block if not needed anymore.

A number of things:

Firstly this is generally unnecessary because malloc is guaranteeed to return a slab of memory suitably aligned for all types. The times when you have to deal with memory alignment are times when you do strange things on raw memory, and then a function like this is not going to help you.

Secondly this particular implementation has, strictly speaking, undefined behaviour because long is not guaranteed to be able to hold pointer values without truncation. It would be better to use uintptr_t as defined in <stdint.h>.

Thirdly, the explanation on stackoverflow is...semi-correct. There are processors that can only access memory on machine-word boundaries, but this is not true for the most common processors (those of the x86 family). If you write code that uses improperly aligned memory, your code is not going to be portable, but there's a good chance it will work on desktop PCs (if perhaps a bit more slowly).

On processors that do not allow memory operations to cross word boundaries, compilers may generate multiple memory operations to simulate the same effect, but only where they know it is necessary. This means that if you write code like

void some_function(int *p) {
  // compiler assumes p is properly aligned.
  *p = 0;
}

char foo[sizeof(int) * 2];
some_function((int*) (foo + 1)); // caller does not care.

and try to run it in a SPARC, it will crash. If you run it on an x86, it will work. Keep this in mind when writing network code.

Moreover, to make things more complicated: Back in 1997, Intel introduced SIMD (single instruction, multiple data) instructions to the x86 family with the Pentium MMX processors; today this extension is called SSE (you may have heard of it). These have stringent alignment requirements (data must be aligned along 16-byte boundaries), so using SSE instructions on unaligned data will result in processor faults and likely program crashes. For a long time, this has only been a problem if you were using sse intrinsics manually, in which case it was your own bloody fault if you ran into trouble, but recently compilers (clang in particular) have been catching up with this. This means that code like

// DO NOT USE THIS!

double **two_dimensional_double_array(size_t height, size_t width) {
  // all in one, data behind index
  double **p = malloc(sizeof(double*) * height + height * width * sizeof(double));

  if(p != NULL) {
    for(size_t i = 0; i < height; ++i) {
      p[i] = (double*) (p + height) + width * i; // <-- this is now a time bomb.
    }
  }

  return p;
}

that would have worked five years ago now has a good chance of causing mayhem at completely different places in the program (particularly on i686, where sizeof(double*) < sizeof(double), iff height is odd) because the doubles are no longer properly aligned and the compiler may generate SSE instructions to do calculations with them. As you may imagine, this is something of a debugging nightmare if you actually employ this sort of technique.

So, what's the take-away from all this?

Well, the most important bit is to not descend into this sort of mess if you can reasonably avoid it. Instances where you actually need to work with raw, unaligned memory are rare because type declarations make the compiler give you properly aligned memory for the declared type and malloc memory is properly aligned for everything. The issue really only arises if you do type punning like in the code above, and you Do Not Want To Do That unless you have compelling reasons, know what you're doing and are very careful.