SSE problems with -m32



  • Hi,

    i wrote this code:

    #include <stdlib.h>
    #include <stdio.h>
    #include <iostream>
    #include <xmmintrin.h>
    
    int main()
    {
      float *outIt;
      __m128 image4 ;
      int k;
      float sum;
    
      posix_memalign( (void**)&outIt, 16, sizeof(float)*8 );
      sum=0;
    
      printf("sum: %f\n", sum );
      for( k=0; k<8; k+=4 )
      {
        image4 = _mm_cvtpu8_ps( _mm_setr_pi8 ( 1+k, 2+k, 3+k, 4+k, 5+k, 6+k, 7+k, 8+k ) );
        _mm_store_ps( outIt, image4 );
        std::cerr <<  "16-byte aligned: " << outIt << std::endl;
        printf( "outIt[0]: %f, outIt[1]: %f, outIt[3]: %f, outIt[3]: %f\n", outIt[0], outIt[0], outIt[2], outIt[3] );
        printf( "outIt[0]: %f, outIt[1]: %f, outIt[3]: %f, outIt[3]: %f\n", outIt[0], outIt[1], outIt[2], outIt[3] );
        sum = outIt[0] + outIt[1] + outIt[2] + outIt[3];
        outIt += 4;
        printf("sum: %f\n", sum );
      }
    }
    

    When I compile this on a 64-bit (Linux) system, I get:

    sum: 0.000000
    16-byte aligned: 0x100100080
    outIt[0]: 1.000000, outIt[1]: 1.000000, outIt[3]: 3.000000, outIt[3]: 4.000000
    outIt[0]: 1.000000, outIt[1]: 2.000000, outIt[3]: 3.000000, outIt[3]: 4.000000
    sum: 10.000000
    16-byte aligned: 0x100100090
    outIt[0]: 5.000000, outIt[1]: 5.000000, outIt[3]: 7.000000, outIt[3]: 8.000000
    outIt[0]: 5.000000, outIt[1]: 6.000000, outIt[3]: 7.000000, outIt[3]: 8.000000
    sum: 26.000000

    what is I expect. However, if I compile it for 32-bit mode, the output is:

    sum: 0.000000
    16-byte aligned: 0x100160
    outIt[0]: nan, outIt[1]: 1.000000, outIt[3]: 3.000000, outIt[3]: 4.000000
    outIt[0]: 1.000000, outIt[1]: 2.000000, outIt[3]: 3.000000, outIt[3]: 4.000000
    sum: 10.000000
    16-byte aligned: 0x100170
    outIt[0]: nan, outIt[1]: nan, outIt[3]: 7.000000, outIt[3]: nan
    outIt[0]: 5.000000, outIt[1]: 6.000000, outIt[3]: 7.000000, outIt[3]: 8.000000
    sum: 26.000000

    There are always NaNs when I try to access elements of outIt for the first time. Does anybody know, what I am doing wrong?

    Thanks,

    A.O.


  • Mod

    depending on your compiler version and used flags this code might generate code using mmx-registers, destroying the contents of the fpu-registers (there is no single cpu instruction for _mm_cvtpu8_ps, there is no simple way to convert packed 8-bit integers to packed 32-bit integers). an _mm_empty() intrinsic is missing.

    image4 = _mm_cvtpu8_ps( _mm_setr_pi8 ( 1+k, 2+k, 3+k, 4+k, 5+k, 6+k, 7+k, 8+k ) );
        _mm_store_ps( outIt, image4 );
        _mm_empty();
        std::cerr <<  "16-byte aligned: " << outIt << std::endl;
    

    Under 64-bit normal math is probably be done using the sse-registers, so the problem doesnt surface, or not exists because mmx-registers arent used in the first place.



  • Thanks, that's it!

    A.O.


Log in to reply