Slow operator== and dirty hack



  • Hello!

    I have C++ program which contains the following code fragment:

    template<class T> struct Pixel {
    	T v0, v1, v2, v3;
    
    	inline bool operator==(Pixel<T> const &p) const {
    		//return
    		//	reinterpret_cast<unsigned int const &>(  v0) ==
    		//	reinterpret_cast<unsigned int const &>(p.v0);
    		return v0 == p.v0 && v1 == p.v1 && v2 == p.v2 && v3 == p.v3;
    	}
    

    The type T is currently unsigned char , but I planning to use float and double also. The program runs in 6 seconds.

    When I uncomment the dirty hack (3 commented lines in code snippet above), then program runs in 5 seconds, which is much faster.

    I investigated a problem, and found that compilers are not smart enough to optimise this expression. For example, compiler forces lazy behavior of an && operator. You can see assembler output here:

    https://godbolt.org/g/lrdBh9

    I investigated deeper and found the same problem for assignment!

    https://godbolt.org/g/DHqmfE

    Please help me to:

    • fix the code so it is both fast and elegant;
    • understand what is going on here.

    Thanks!



  • Try std::array !

    You have 4 variables of the same type called v0..v3
    -> use std::array<T, 4> v;

    operator== is straightforward, just apply == to the array!

    For your test code:

    bool compare3(Test const &t1, Test const &t2) {
      return t1.v == t2.v;
    }
    

    With clang (3.8, 3.9) assembly output of compare1 and compare3 is identical while gcc (4.9, 5, 6) calls memcmp for the array.



  • Thanks, wob.

    For now I have following solutions:

    Align the structure. Helps for GCC only:

    struct __attribute__((aligned(4))) Test {
      unsigned char a,b,c,d;
    };
    

    Use std::array . Helps for Clang only:

    struct Test {
      std::array<unsigned char, 4> v;
    };
    

    Use bit field. Helps both for Clang and GCC, not for ICC:

    struct Test {
       unsigned int a: 8, b: 8, c: 8, d: 8;
    };
    

    So, Intel compiler is the most stupid one.

    Is it possible to extent bitfield idea to templates?


Anmelden zum Antworten