virtual function performance issue



  • I am seeing a performance issue with virtual functions and it impacts every function call with virtual functions. why am i seeing virtual fns execute slower? Is it because virtual function call requires run time type of owning class to be identified or it has to search for proper class binding or it requires a table lookup at runtime before calling or class must use memory to maintain a table of virtual function pointers? please clarify me here.

    In below code for both the call on obj.f(); and on
    obj2ref.f(); there is the possibility that at run time obj is of some derived class, so it has to search for the actual type in order to determine which f() is to be called.

    `` cpp
    struct Class{
    virtual void f() {...}
    };

    struct Class2 : public Class {
    virtual void f() {...}
    };

    Class obj;

    obj.f();

    Class2 obj2;
    obj2.f();
    Class& obj2ref = obj2;

    obj2ref.f();
    ``



  • Virtual functions are internally tables/arrays of functions pointers (called vtables). You have a vtable for every virtual function. Usually there is the function pointer of the base class in the first field of this table/array and in last field is the virtual function of the last derived class. (Well, modern implementations use some kind of hash mapping.) If you call a virtual function, the C++ runtime (compiled in your binary or external lib) has to find the proper function pointer first. So it searches this table first and then executes the function pointer. "Searches" may be not the right description here, it is more like an offset-jumptable. But in the end this still uses more computing power then calling a common function.

    If you implementing something where every cycle counts, using dynamic inheritance may be a very bad design decission. For example the Vector implementations you are using in 3d software. You could base your Vector3 on Vector2 and Vector4 (for quaternions) on Vector2, but you never going to see this in properly working code. There it kills your performance very drastically.



  • @VLSI_Akiko Ok, so does that mean the v-table does not take a huge amount of storage? I just want to clarify if the slowness is because "the virtual function call requires the run time type of the owning class to be identified", when comparing it to a statically bound call ?



  • @tampere2021 sagte in virtual function performance issue:

    @VLSI_Akiko Ok, so does that mean the v-table does not take a huge amount of storage? I just want to clarify if the slowness is because "the virtual function call requires the run time type of the owning class to be identified", when comparing it to a statically bound call ?

    Yeah, the costly part is the one where the runtime has to figure out which instance of the virtual function needs to be called. So, fun fact: the bigger the function computing overhead is, the less is the impact of calling the virtual function. 😁 The virtual function call overhead is static, because of the offset-jumptable approach, which is compareable to accessing an array by index.



  • @VLSI_Akiko sagte in virtual function performance issue:

    "the virtual function call requires the run time type of the owning class to be identified

    Do you agree the reason for slowness is due to the virtual function call requires the run time type of the owning class to be identified?



  • It should also be mentioned that compilers can perform an optimization called "devirtualization". If the compiler can determine the actual dynamic type of the variable the virtual member function is called on at compile time, it will omit the vtable lookup and you get performance similar to static inheritance. So it might be a good idea to make the relevant code visible to the compiler (header/inline/global optmizations like LTO, even C++20 Modules might help - not sure) and to avoid hiding it behind an interface with the dynamic type only visible in a different compilation unit.



  • @tampere2021 sagte in virtual function performance issue:

    @VLSI_Akiko sagte in virtual function performance issue:

    "the virtual function call requires the run time type of the owning class to be identified

    Do you agree the reason for slowness is due to the virtual function call requires the run time type of the owning class to be identified?

    Ah yeah right, I didn't answer that one. No, the vtables are part of their class-tree, so there is no requirement to find the proper vtable fist. Finding the proper function call in the vtable is the only additional cost.

    @Finnegan sagte in virtual function performance issue:

    It should also be mentioned that compilers can perform an optimization called "devirtualization". If the compiler can determine the actual dynamic type of the variable the virtual member function is called on at compile time, it will omit the vtable lookup and you get performance similar to static inheritance. So it might be a good idea to make the relevant code visible to the compiler (header/inline/global optmizations like LTO, even C++20 Modules might help - not sure) and to avoid hiding it behind an interface with the dynamic type only visible in a different compilation unit.

    Yeah, correctly. But this is quite an aggressive optimization you may not get in the lower optimization levels or if you use a lot of volatile constructions which confuses the optimizer. That can get quite funny, like dead code elemination starts to remove actually used code. Happend in an older version gcc and was easy to reproduce.



  • @tampere2021 sagte in virtual function performance issue:

    Do you agree the reason for slowness is due to the virtual function call requires the run time type of the owning class to be identified?

    Judging from your other posts, I would disagree / not answer this question at all. Are you really really sure the "slowness" is because of virtual functions? Did you measure that? How? Measuring the overhead can be an art. Maybe the compiler has even removed the lookup. Maybe there's something else going on.

    Since conclusions about what to do to make a program perform better (faster) are hard, I would probably look elsewhere first.



  • @VLSI_Akiko Hi VLSI, did you say that slowness is due to the virtual function call requires the run time type of the owning class to be identified? If you agree on this, then I am clear what needs to be done.

    1. Look into the object and get the v-table pointer
    2. Look up the v-table, get the address of the function and call the function at that address.


  • @tampere2021 To avoid confusion,i summarized as below as to one of these could be the reasons for slowness in virtual functions when compared to statically bound call: Which one do you agree? please clarify me as its a huge code base and there are lot of virtual functions involved here.

    1)The virtual function call has to search for the correct class binding
    2)The class must use memory to maintain a table of virtual function pointers
    3)The virtual function call requires the run time type of the owning class to be identified
    4)The virtual function call requires a table lookup at runtime before calling.



  • @wob sagte in virtual function performance issue:

    @tampere2021 sagte in virtual function performance issue:

    Do you agree the reason for slowness is due to the virtual function call requires the run time type of the owning class to be identified?

    Judging from your other posts, I would disagree / not answer this question at all. Are you really really sure the "slowness" is because of virtual functions?

    No, the question was if virtual functions perform less. Yes, they do. Are they the only factor? Nope, there may be more, but the code example is to incomplete for this to point out other issues. Though, using a diamond-shaped inheritance may be also make this much worse.

    Did you measure that? How? Measuring the overhead can be an art.

    Myself? Yeah, about 20 years ago. But today I trust into the works of people who made themselfs a name of doing such intense work. If you want to dig into this, including meassuring how much cycles an specific opcode needs, Agner Fog is your man. He is the Fabrice Bellard of meassuring computing hardware. (hint: If you don't know Bellard, you really missed out on something.)

    EDIT: Did you measure that? How?

    1. TSC on more modern x86
    2. debug registers available since 80486 on x86 or most PowerPC or MIPS based architectures
    3. PMU on ARM or bigger PowerPC architecture
    4. Qemu + kvm with a connected debugger/perf tool
    5. debug mode on m68k (easy on an Amiga with connected serial cable and running SuShi/Enforcer)
    6. perf counters (oprofile) on modern hardware
    7. looking up generated Assembler code (for example in Compiler Explorer)

    Maybe the compiler has even removed the lookup. Maybe there's something else going on.

    This happens mostly if you use only one instance of the class-tree, but yeah. Like I said before, the code example is to small to point out more. So, using proper exception correctness on function signatures (noexcept, noexcept(false)) may also improve the performance.



  • @tampere2021: Read Virtual method table and e.g. Understandig Virtual Tables in C++ to answer your questions (in short: there is no real search, just an indirection + offset).



  • @VLSI_Akiko sagte in virtual function performance issue:

    So, using proper exception correctness on function signatures (noexcept, noexcept(false)) may also improve the performance.

    IMHO using noexcept specifications everywhere can quickly turn into a rabbit hole and it is often hardly worth the effort. However, i do recommend using it for constructors, as container classes can often employ more efficient code when those are noexcept. Like having noexcept move constructors for types in a std::vector in case the objects need to be moved during a resize/erase/insert or noexcept copy/default constructors for when objects need to be constructed in reserved memory. The possibility of exceptions being thrown otherwise often requires more complicated and less efficient code.



  • @Th69 sagte in virtual function performance issue:

    @tampere2021: Read Virtual method table and e.g. Understandig Virtual Tables in C++ to answer your questions (in short: there is no real search, just an indirection + offset).

    Yeah, but it is still an implementation detail and not every compiler does it that way. Especially the pre C++98 compilers can be a bit weird (SAS/C is such an odd one).

    @Finnegan sagte in virtual function performance issue:

    IMHO using noexcept specifications everywhere can quickly turn into a rabbit hole and it is often hardly worth the effort. However, i do recommend using it for constructors, as container classes can often employ more efficient code when those are noexcept. Like having noexcept move constructors for types in a std::vector in case the objects need to be moved during a resize/erase/insert or noexcept copy/default constructors for when objects need to be constructed in reserved memory. The possibility of exceptions being thrown otherwise often requires more complicated and less efficient code.

    Oh damnit, no, I mean not to use it everywhere. You really need to check the C++ references which of the C++-lib functions actually throw.



  • @VLSI_Akiko Calling a virtual function is roughly equivalent to calling a function through a pointer stored in an array.When the function is made virtual, C++ determines which function is to be invoked at the runtime based on the type of the object pointed by the base class pointer.





  • @tampere2021 sagte in virtual function performance issue:

    @VLSI_Akiko Calling a virtual function is roughly equivalent to calling a function through a pointer stored in an array.When the function is made virtual, C++ determines which function is to be invoked at the runtime based on the type of the object pointed by the base class pointer.

    I said this in a later post, see:
    @VLSI_Akiko sagte in virtual function performance issue:

    The virtual function call overhead is static, because of the offset-jumptable approach, which is compareable to accessing an array by index.

    And I'm aware that I talked about hash mapping before here:
    @VLSI_Akiko sagte in virtual function performance issue:

    (Well, modern implementations use some kind of hash mapping.)

    I wrote that in braces, because I'm writing in more languages than C++ and sometimes I'm not 100% sure which language compiler uses which approach. Yeah, it is wrong for C++, hence the reason I wrote about offset-jumptable later.

    @tampere2021 sagte in virtual function performance issue:

    @tampere2021 READ SECTION 5.3 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1666.pdf

    Now I'm confused. Why did you end up asking these questions here if you are able to use a search engine of your choice to find this technical report and also willing to read through it?



  • @VLSI_Akiko I am just trying to clarify the exact reason for the root cause of the slowness issue with virtual functions.Based on discussion so far, i conclude that the virtual function call requires the run time type of the owning class to be identified, due to which it might have an impact in performance by slowing due compared to a statically bound call.



  • @tampere2021 sagte in virtual function performance issue:

    @VLSI_Akiko I am just trying to clarify the exact reason for the root cause of the slowness issue with virtual functions.Based on discussion so far, i conclude that the virtual function call requires the run time type of the owning class to be identified, due to which it might have an impact in performance by slowing due compared to a statically bound call.

    Sorry, if it got a bit heated. Can you show more or better some actual code?



  • @VLSI_Akiko > "the virtual function call requires the run time type of the owning class to be identified"

    In the typical implementation, the object in question would have a v-table pointer as part of its object layout. That (and the table it points to) is what is looked up at run-time.

    struct Sample{ 
        virtual void f() = 0 ;
        int i ;
    }; 
     
    
    int foo( Sample& c )
    { 
        c.f() ; 
        /*
         mov     rax, QWORD PTR [rdi] // get the vtbl pointer in the object (into rax)
         call    [QWORD PTR [rax]] // look up the vtbl and call the function
        */
    }
    ``

Log in to reply