virtual function performance issue

VLSI_Akiko

@wob sagte in virtual function performance issue:

@tampere2021 sagte in virtual function performance issue:

Do you agree the reason for slowness is due to the virtual function call requires the run time type of the owning class to be identified?

Judging from your other posts, I would disagree / not answer this question at all. Are you really really sure the "slowness" is because of virtual functions?

No, the question was if virtual functions perform less. Yes, they do. Are they the only factor? Nope, there may be more, but the code example is to incomplete for this to point out other issues. Though, using a diamond-shaped inheritance may be also make this much worse.

Did you measure that? How? Measuring the overhead can be an art.

Myself? Yeah, about 20 years ago. But today I trust into the works of people who made themselfs a name of doing such intense work. If you want to dig into this, including meassuring how much cycles an specific opcode needs, Agner Fog is your man. He is the Fabrice Bellard of meassuring computing hardware. (hint: If you don't know Bellard, you really missed out on something.)

EDIT: Did you measure that? How?

TSC on more modern x86
debug registers available since 80486 on x86 or most PowerPC or MIPS based architectures
PMU on ARM or bigger PowerPC architecture
Qemu + kvm with a connected debugger/perf tool
debug mode on m68k (easy on an Amiga with connected serial cable and running SuShi/Enforcer)
perf counters (oprofile) on modern hardware
looking up generated Assembler code (for example in Compiler Explorer)

Maybe the compiler has even removed the lookup. Maybe there's something else going on.

This happens mostly if you use only one instance of the class-tree, but yeah. Like I said before, the code example is to small to point out more. So, using proper exception correctness on function signatures (noexcept, noexcept(false)) may also improve the performance.

Th69

@tampere2021: Read Virtual method table and e.g. Understandig Virtual Tables in C++ to answer your questions (in short: there is no real search, just an indirection + offset).

Finnegan

@VLSI_Akiko sagte in virtual function performance issue:

So, using proper exception correctness on function signatures (noexcept, noexcept(false)) may also improve the performance.

IMHO using noexcept specifications everywhere can quickly turn into a rabbit hole and it is often hardly worth the effort. However, i do recommend using it for constructors, as container classes can often employ more efficient code when those are noexcept. Like having noexcept move constructors for types in a std::vector in case the objects need to be moved during a resize/erase/insert or noexcept copy/default constructors for when objects need to be constructed in reserved memory. The possibility of exceptions being thrown otherwise often requires more complicated and less efficient code.

VLSI_Akiko

@Th69 sagte in virtual function performance issue:

@tampere2021: Read Virtual method table and e.g. Understandig Virtual Tables in C++ to answer your questions (in short: there is no real search, just an indirection + offset).

Yeah, but it is still an implementation detail and not every compiler does it that way. Especially the pre C++98 compilers can be a bit weird (SAS/C is such an odd one).

@Finnegan sagte in virtual function performance issue:

IMHO using noexcept specifications everywhere can quickly turn into a rabbit hole and it is often hardly worth the effort. However, i do recommend using it for constructors, as container classes can often employ more efficient code when those are noexcept. Like having noexcept move constructors for types in a std::vector in case the objects need to be moved during a resize/erase/insert or noexcept copy/default constructors for when objects need to be constructed in reserved memory. The possibility of exceptions being thrown otherwise often requires more complicated and less efficient code.

Oh damnit, no, I mean not to use it everywhere. You really need to check the C++ references which of the C++-lib functions actually throw.

tampere2021

@VLSI_Akiko Calling a virtual function is roughly equivalent to calling a function through a pointer stored in an array.When the function is made virtual, C++ determines which function is to be invoked at the runtime based on the type of the object pointed by the base class pointer.

tampere2021

@tampere2021 READ SECTION 5.3 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1666.pdf

VLSI_Akiko

@tampere2021 sagte in virtual function performance issue:

@VLSI_Akiko Calling a virtual function is roughly equivalent to calling a function through a pointer stored in an array.When the function is made virtual, C++ determines which function is to be invoked at the runtime based on the type of the object pointed by the base class pointer.

I said this in a later post, see:
@VLSI_Akiko sagte in virtual function performance issue:

The virtual function call overhead is static, because of the offset-jumptable approach, which is compareable to accessing an array by index.

And I'm aware that I talked about hash mapping before here:
@VLSI_Akiko sagte in virtual function performance issue:

(Well, modern implementations use some kind of hash mapping.)

I wrote that in braces, because I'm writing in more languages than C++ and sometimes I'm not 100% sure which language compiler uses which approach. Yeah, it is wrong for C++, hence the reason I wrote about offset-jumptable later.

@tampere2021 sagte in virtual function performance issue:

@tampere2021 READ SECTION 5.3 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1666.pdf

Now I'm confused. Why did you end up asking these questions here if you are able to use a search engine of your choice to find this technical report and also willing to read through it?

tampere2021

@VLSI_Akiko I am just trying to clarify the exact reason for the root cause of the slowness issue with virtual functions.Based on discussion so far, i conclude that the virtual function call requires the run time type of the owning class to be identified, due to which it might have an impact in performance by slowing due compared to a statically bound call.

VLSI_Akiko

@tampere2021 sagte in virtual function performance issue:

@VLSI_Akiko I am just trying to clarify the exact reason for the root cause of the slowness issue with virtual functions.Based on discussion so far, i conclude that the virtual function call requires the run time type of the owning class to be identified, due to which it might have an impact in performance by slowing due compared to a statically bound call.

Sorry, if it got a bit heated. Can you show more or better some actual code?

tampere2021

@VLSI_Akiko > "the virtual function call requires the run time type of the owning class to be identified"

In the typical implementation, the object in question would have a v-table pointer as part of its object layout. That (and the table it points to) is what is looked up at run-time.

struct Sample{ 
    virtual void f() = 0 ;
    int i ;
}; 
 

int foo( Sample& c )
{ 
    c.f() ; 
    /*
     mov     rax, QWORD PTR [rdi] // get the vtbl pointer in the object (into rax)
     call    [QWORD PTR [rax]] // look up the vtbl and call the function
    */
}
``

hustbaer

@tampere2021
It's hard to compare virtual vs. non-virtual function calls. Mainly because a non-virtual function call can often be inlined = the call disappears completely. This also often leads to more optimization opportunities, e.g. by constant propagation.
(In theory, virtual calls can also be inlined in some situations, but in practice you will not see this very often. Vs. inlining with non-virtual functions, which is very very common.)

Assuming that the non-virtual function call is really a call (no inlining), then a non-virtual function call - on most platforms - is just a direct call. Which is typically very fast.

A virtual call on the other hand requires at least one pointer to be fetched from the object + an indirect function call. ~~In a typical implementation, it's even more because there's typically at least one additional indirection.~~

So ~~2 loads~~ 1 load + one indirect call for virtual vs. just a direct call for non-virtual. The main problem there is that the second load depends on the first one. This can noticeably slow down the call. Especially if the actual function that is called changes.

As long as the called function is always the same, modern CPUs will often remember and predict the correct target address & speculatively execute that piece of code. At least when compiled without Spectre-mitigation

EDIT: Seems I mis-remembered and the 2nd load isn't necessary. Hm. Strange. But the generated code doesn't lie
(Of course there is a second load, because the indirect call is loading the target address from memory, but that's not what I meant.)

tampere2021

@hustbaer hi hustbaer, do you know why virtual function are slower compared to statically bound call? which of below reasons do you agree here? Based on discussions, here, i feel option 3.
1)The virtual function call has to search for the correct class binding
2)The class must use memory to maintain a table of virtual function pointers
3)The virtual function call requires the run time type of the owning class to be identified
4)The virtual function call requires a table lookup at runtime before calling.

hustbaer

@tampere2021
You're quite persistent in trying to get people to pick one of those options. Which means you want to know this for some kind of test/assignment/... Which means I won't answer. Also the choices are worded in a strange way, a bit ambiguous. Many of them could be interpreted in a way so that the answer is "yes".

tampere2021

@hustbaer I have summarized these points based on discussions so far. its not test or assignment. Being a huge code base, trying to narrow down the root cause to a specific info and hence need to clarify on this.

hustbaer

@VLSI_Akiko sagte in virtual function performance issue:

Yeah, the costly part is the one where the runtime has to figure out which instance of the virtual function needs to be called.

I'm pretty sure you know what you're talking about, but you describe it in a strange way. I would never say the runtime needs to "figure that out". There is no "runtime" involved, just a few machine code instructions. Also there is nothing that I would call "figuring out". One load and one indirect jump and Bob's your uncle. Of course that's still enough to make things slow, especially when the call target changes frequently.

tampere2021

@hustbaer it takes longer to invoke a virtual method and that extra memory is required to store the information needed for the lookup. Virtual function calls must be resolved at run time by performing a vtable lookup, whereas non-virtual function calls can be resolved at compile time. This can make virtual function calls slower than non-virtual calls. In reality, this overhead may be negligible, particularly if our function does non-trivial work or if it is not called frequently.

hustbaer

@tampere2021 sagte in virtual function performance issue:

@hustbaer it takes longer to invoke a virtual method and that extra memory is required to store the information needed for the lookup.

Well, yes, kind-of. Yes, it takes memory. But usually not much. Each object typically only grows by the size of one pointer. (More if the class uses multiple inheritance.)
And then, one vtable is needed per class. If you have lots of classes with lots of virtual functions, this can also add up to a lot. But it's usually not a big problem.

Virtual function calls must be resolved at run time by performing a vtable lookup, whereas non-virtual function calls can be resolved at compile time.

Yes

This can make virtual function calls slower than non-virtual calls. In reality, this overhead may be negligible, particularly if our function does non-trivial work or if it is not called frequently.

Yes. I'd even go as far as to say: it's very often negligible.

But again: this comparison only makes sense when comparing to non-inlined direct calls. Whenever you see a noticeable slowdown from making a function virtual, chances are that the reason isn't actually the overhead of the virtual function call, but the fact that it can no longer be inlined.

Finnegan

Just wanted to throw in a broader argument: If the problem you're trying to solve really requires a solution akin to dynamic polymorphism, C++ virtual functions are probably one of the most efficient ways to go about it. In that case, you will have to pay the additional cost either way, likely in any language - e.g. in C you would probably solve that problem with a table of function pointers, which is roughly equivalent.

If you can somehow avoid that overhead - either via the compiler's devirtualization optimization or by rephrasing your program somehow using static types, static polymorphism like CRTP - chances are that you didn't really need dynamic polymorphism in the first place.

So if you are worried about the performance impact of virtual function calls, try to establish if they are really necessary or if an altenative would also work. If not, i'd assume you cannot avoid that overhead anyways. My suggestion would be don't use what you don't need and in return C++ won't make you pay for what you don't use (mostly)

VLSI_Akiko

@hustbaer sagte in virtual function performance issue:

@VLSI_Akiko sagte in virtual function performance issue:

Yeah, the costly part is the one where the runtime has to figure out which instance of the virtual function needs to be called.

I'm pretty sure you know what you're talking about, but you describe it in a strange way. I would never say the runtime needs to "figure that out". There is no "runtime" involved, just a few machine code instructions. Also there is nothing that I would call "figuring out". One load and one indirect jump and Bob's your uncle. Of course that's still enough to make things slow, especially when the call target changes frequently.

Yeah sorry, I'm used to talk to people which never programmed in their life or only did some lines in BASIC. From time to time I also teach C++20 to people who must learn it (to keep their jobs) but really don't want to learn it. You would be surprised how hard it is to explain pointers to them - or even better - what a pointer to an array of pointers is (you know, the char ** in main()). That can keep them busy for weeks. For some strange reason people understand it better if you put an entity behind stuff.

Back to the topic: Is the code compiled in debug mode or does it even run in some kind of analyzer (performance, debug, etc.)? What OS and compiler is used?

tampere2021

@VLSI_Akiko I used VS2019(v142 compiler) on 21H1 OS with Release build.