virtual function performance issue

VLSI_Akiko

@tampere2021 sagte in virtual function performance issue:

@VLSI_Akiko Ok, so does that mean the v-table does not take a huge amount of storage? I just want to clarify if the slowness is because "the virtual function call requires the run time type of the owning class to be identified", when comparing it to a statically bound call ?

Yeah, the costly part is the one where the runtime has to figure out which instance of the virtual function needs to be called. So, fun fact: the bigger the function computing overhead is, the less is the impact of calling the virtual function. The virtual function call overhead is static, because of the offset-jumptable approach, which is compareable to accessing an array by index.

tampere2021

@VLSI_Akiko sagte in virtual function performance issue:

"the virtual function call requires the run time type of the owning class to be identified

Do you agree the reason for slowness is due to the virtual function call requires the run time type of the owning class to be identified?

Finnegan

It should also be mentioned that compilers can perform an optimization called "devirtualization". If the compiler can determine the actual dynamic type of the variable the virtual member function is called on at compile time, it will omit the vtable lookup and you get performance similar to static inheritance. So it might be a good idea to make the relevant code visible to the compiler (header/inline/global optmizations like LTO, even C++20 Modules might help - not sure) and to avoid hiding it behind an interface with the dynamic type only visible in a different compilation unit.

VLSI_Akiko

@tampere2021 sagte in virtual function performance issue:

@VLSI_Akiko sagte in virtual function performance issue:

"the virtual function call requires the run time type of the owning class to be identified

Do you agree the reason for slowness is due to the virtual function call requires the run time type of the owning class to be identified?

Ah yeah right, I didn't answer that one. No, the vtables are part of their class-tree, so there is no requirement to find the proper vtable fist. Finding the proper function call in the vtable is the only additional cost.

@Finnegan sagte in virtual function performance issue:

It should also be mentioned that compilers can perform an optimization called "devirtualization". If the compiler can determine the actual dynamic type of the variable the virtual member function is called on at compile time, it will omit the vtable lookup and you get performance similar to static inheritance. So it might be a good idea to make the relevant code visible to the compiler (header/inline/global optmizations like LTO, even C++20 Modules might help - not sure) and to avoid hiding it behind an interface with the dynamic type only visible in a different compilation unit.

Yeah, correctly. But this is quite an aggressive optimization you may not get in the lower optimization levels or if you use a lot of volatile constructions which confuses the optimizer. That can get quite funny, like dead code elemination starts to remove actually used code. Happend in an older version gcc and was easy to reproduce.

wob

@tampere2021 sagte in virtual function performance issue:

Do you agree the reason for slowness is due to the virtual function call requires the run time type of the owning class to be identified?

Judging from your other posts, I would disagree / not answer this question at all. Are you really really sure the "slowness" is because of virtual functions? Did you measure that? How? Measuring the overhead can be an art. Maybe the compiler has even removed the lookup. Maybe there's something else going on.

Since conclusions about what to do to make a program perform better (faster) are hard, I would probably look elsewhere first.

tampere2021

@VLSI_Akiko Hi VLSI, did you say that slowness is due to the virtual function call requires the run time type of the owning class to be identified? If you agree on this, then I am clear what needs to be done.

Look into the object and get the v-table pointer
Look up the v-table, get the address of the function and call the function at that address.

tampere2021

@tampere2021 To avoid confusion,i summarized as below as to one of these could be the reasons for slowness in virtual functions when compared to statically bound call: Which one do you agree? please clarify me as its a huge code base and there are lot of virtual functions involved here.

1)The virtual function call has to search for the correct class binding
2)The class must use memory to maintain a table of virtual function pointers
3)The virtual function call requires the run time type of the owning class to be identified
4)The virtual function call requires a table lookup at runtime before calling.

VLSI_Akiko

@wob sagte in virtual function performance issue:

@tampere2021 sagte in virtual function performance issue:

Do you agree the reason for slowness is due to the virtual function call requires the run time type of the owning class to be identified?

Judging from your other posts, I would disagree / not answer this question at all. Are you really really sure the "slowness" is because of virtual functions?

No, the question was if virtual functions perform less. Yes, they do. Are they the only factor? Nope, there may be more, but the code example is to incomplete for this to point out other issues. Though, using a diamond-shaped inheritance may be also make this much worse.

Did you measure that? How? Measuring the overhead can be an art.

Myself? Yeah, about 20 years ago. But today I trust into the works of people who made themselfs a name of doing such intense work. If you want to dig into this, including meassuring how much cycles an specific opcode needs, Agner Fog is your man. He is the Fabrice Bellard of meassuring computing hardware. (hint: If you don't know Bellard, you really missed out on something.)

EDIT: Did you measure that? How?

TSC on more modern x86
debug registers available since 80486 on x86 or most PowerPC or MIPS based architectures
PMU on ARM or bigger PowerPC architecture
Qemu + kvm with a connected debugger/perf tool
debug mode on m68k (easy on an Amiga with connected serial cable and running SuShi/Enforcer)
perf counters (oprofile) on modern hardware
looking up generated Assembler code (for example in Compiler Explorer)

Maybe the compiler has even removed the lookup. Maybe there's something else going on.

This happens mostly if you use only one instance of the class-tree, but yeah. Like I said before, the code example is to small to point out more. So, using proper exception correctness on function signatures (noexcept, noexcept(false)) may also improve the performance.

Th69

@tampere2021: Read Virtual method table and e.g. Understandig Virtual Tables in C++ to answer your questions (in short: there is no real search, just an indirection + offset).

Finnegan

@VLSI_Akiko sagte in virtual function performance issue:

So, using proper exception correctness on function signatures (noexcept, noexcept(false)) may also improve the performance.

IMHO using noexcept specifications everywhere can quickly turn into a rabbit hole and it is often hardly worth the effort. However, i do recommend using it for constructors, as container classes can often employ more efficient code when those are noexcept. Like having noexcept move constructors for types in a std::vector in case the objects need to be moved during a resize/erase/insert or noexcept copy/default constructors for when objects need to be constructed in reserved memory. The possibility of exceptions being thrown otherwise often requires more complicated and less efficient code.

VLSI_Akiko

@Th69 sagte in virtual function performance issue:

@tampere2021: Read Virtual method table and e.g. Understandig Virtual Tables in C++ to answer your questions (in short: there is no real search, just an indirection + offset).

Yeah, but it is still an implementation detail and not every compiler does it that way. Especially the pre C++98 compilers can be a bit weird (SAS/C is such an odd one).

@Finnegan sagte in virtual function performance issue:

IMHO using noexcept specifications everywhere can quickly turn into a rabbit hole and it is often hardly worth the effort. However, i do recommend using it for constructors, as container classes can often employ more efficient code when those are noexcept. Like having noexcept move constructors for types in a std::vector in case the objects need to be moved during a resize/erase/insert or noexcept copy/default constructors for when objects need to be constructed in reserved memory. The possibility of exceptions being thrown otherwise often requires more complicated and less efficient code.

Oh damnit, no, I mean not to use it everywhere. You really need to check the C++ references which of the C++-lib functions actually throw.

tampere2021

@VLSI_Akiko Calling a virtual function is roughly equivalent to calling a function through a pointer stored in an array.When the function is made virtual, C++ determines which function is to be invoked at the runtime based on the type of the object pointed by the base class pointer.

tampere2021

@tampere2021 READ SECTION 5.3 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1666.pdf

VLSI_Akiko

@tampere2021 sagte in virtual function performance issue:

@VLSI_Akiko Calling a virtual function is roughly equivalent to calling a function through a pointer stored in an array.When the function is made virtual, C++ determines which function is to be invoked at the runtime based on the type of the object pointed by the base class pointer.

I said this in a later post, see:
@VLSI_Akiko sagte in virtual function performance issue:

The virtual function call overhead is static, because of the offset-jumptable approach, which is compareable to accessing an array by index.

And I'm aware that I talked about hash mapping before here:
@VLSI_Akiko sagte in virtual function performance issue:

(Well, modern implementations use some kind of hash mapping.)

I wrote that in braces, because I'm writing in more languages than C++ and sometimes I'm not 100% sure which language compiler uses which approach. Yeah, it is wrong for C++, hence the reason I wrote about offset-jumptable later.

@tampere2021 sagte in virtual function performance issue:

@tampere2021 READ SECTION 5.3 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1666.pdf

Now I'm confused. Why did you end up asking these questions here if you are able to use a search engine of your choice to find this technical report and also willing to read through it?

tampere2021

@VLSI_Akiko I am just trying to clarify the exact reason for the root cause of the slowness issue with virtual functions.Based on discussion so far, i conclude that the virtual function call requires the run time type of the owning class to be identified, due to which it might have an impact in performance by slowing due compared to a statically bound call.

VLSI_Akiko

@tampere2021 sagte in virtual function performance issue:

@VLSI_Akiko I am just trying to clarify the exact reason for the root cause of the slowness issue with virtual functions.Based on discussion so far, i conclude that the virtual function call requires the run time type of the owning class to be identified, due to which it might have an impact in performance by slowing due compared to a statically bound call.

Sorry, if it got a bit heated. Can you show more or better some actual code?

tampere2021

@VLSI_Akiko > "the virtual function call requires the run time type of the owning class to be identified"

In the typical implementation, the object in question would have a v-table pointer as part of its object layout. That (and the table it points to) is what is looked up at run-time.

struct Sample{ 
    virtual void f() = 0 ;
    int i ;
}; 
 

int foo( Sample& c )
{ 
    c.f() ; 
    /*
     mov     rax, QWORD PTR [rdi] // get the vtbl pointer in the object (into rax)
     call    [QWORD PTR [rax]] // look up the vtbl and call the function
    */
}
``

hustbaer

@tampere2021
It's hard to compare virtual vs. non-virtual function calls. Mainly because a non-virtual function call can often be inlined = the call disappears completely. This also often leads to more optimization opportunities, e.g. by constant propagation.
(In theory, virtual calls can also be inlined in some situations, but in practice you will not see this very often. Vs. inlining with non-virtual functions, which is very very common.)

Assuming that the non-virtual function call is really a call (no inlining), then a non-virtual function call - on most platforms - is just a direct call. Which is typically very fast.

A virtual call on the other hand requires at least one pointer to be fetched from the object + an indirect function call. ~~In a typical implementation, it's even more because there's typically at least one additional indirection.~~

So ~~2 loads~~ 1 load + one indirect call for virtual vs. just a direct call for non-virtual. The main problem there is that the second load depends on the first one. This can noticeably slow down the call. Especially if the actual function that is called changes.

As long as the called function is always the same, modern CPUs will often remember and predict the correct target address & speculatively execute that piece of code. At least when compiled without Spectre-mitigation

EDIT: Seems I mis-remembered and the 2nd load isn't necessary. Hm. Strange. But the generated code doesn't lie
(Of course there is a second load, because the indirect call is loading the target address from memory, but that's not what I meant.)

tampere2021

@hustbaer hi hustbaer, do you know why virtual function are slower compared to statically bound call? which of below reasons do you agree here? Based on discussions, here, i feel option 3.
1)The virtual function call has to search for the correct class binding
2)The class must use memory to maintain a table of virtual function pointers
3)The virtual function call requires the run time type of the owning class to be identified
4)The virtual function call requires a table lookup at runtime before calling.

hustbaer

@tampere2021
You're quite persistent in trying to get people to pick one of those options. Which means you want to know this for some kind of test/assignment/... Which means I won't answer. Also the choices are worded in a strange way, a bit ambiguous. Many of them could be interpreted in a way so that the answer is "yes".