Memory consumption issues.
-
Hi all,
I'm confused about the following situation.
First I do
size_t count = 1000 * 1000 * 100; vector<double> tmp; getchar (); tmp.resize (count); for (size_t i = 0; i < count; ++i) { tmp[i] = i; } getchar ();Measured the used memory before and after the getchar() calls it shows that for each vector element 8 Bytes are used. Ok - so far so good - nothing special - but ....
size_t count = 1000 * 1000 * 100; vector<double*> tmp; getchar (); tmp.resize (count); getchar (); for (size_t i = 0; i < count; ++i) { tmp[i] = new double(i); } getchar (); for (size_t i = 0; i < count; ++i) { delete tmp[i]; tmp[i] = NULL; } getchar ();Again I measured the memory consumption before and after the getchar() calls. At the second getchar() each vector element takes 8 Bytes - now wonder for me that's for the pointer. Next after the first loop I calculated 32 Bytes for each double pointer. That in total is 40 Bytes for every vector element. How can this be? And at least after the delete (at the last getchar() call) no memory was released to the system.
I'm using gcc 4.8.3 but I don't think that's relevant here. All memory measurements are done with top.
Hopefully anybody can explain the 40 Bytes to me. In my mind 16 Bytes would be enough
for every vector element.Fine regards
Michael
-
1. Each chunk of memory allocated on the heap requires some additional metadata.
2. Did you test this in release mode with NDEBUG defined? Otherwise it is likely that there is even more additional metadata for debugging purposes allocated.
3. Common heap managers request memory from the operating system in pages and don't release those pages immediately after the memory is freed. Trust the heap manager to do the right thing
-
In the case of Linux, you probably should trust the memory manager to do the wrong thing, because the default one used by glibc (ptmalloc, I think) is not very good.
It does not give back any pages to the OS for small allocation sizes.I suggest using jemalloc (which is as simple as adding -ljemalloc to your linker command line). You'll find that not only does it return the pages to the OS, but it will also lower memory consumption to roughly 16 bytes per double for your example.
-
aqw schrieb:
In the case of Linux, you probably should trust the memory manager to do the wrong thing, because the default one used by glibc (ptmalloc, I think) is not very good.
It does not give back any pages to the OS for small allocation sizes.Explain, why that is bad! You almost never want to spend time on returning a few pages to the OS. It is just an unnecessary waste of resources and chances are, the very next allocation will need those pages again.
Now, there are certainly a lot of good reasons to use an alternative memory manager from time to time. But there is no best memory manager for all situations* and the wish to eagerly return 40 bytes to the OS is not a good reason to switch to another memory manager.
*: Of course, they all claim to be the fastest, have the least overhead, and be the most lock-free implementation and so on. But that is because they only advertise benchmarks for the use cases they were made for.
-
It's not that glibc is delaying the deallocation of pages using some sort of garbage collector that only runs when needed - it does not free them at all. For example, if you temporarily need a 1 GB data structure like std::set or std::map during the initialization phase of your program, it will keep that 1 GB allocated forever - even if only needs a few dozen MB during normal operation. That's not a particularly uncommon pattern.
It doesn't even have performance gains to show for it - jemalloc's performance is similar to glibc for single-threaded operation and it vastly outperforms glibc in multi-threaded programs.
Yes, there's no memory manager that's perfect for all situations, but ptmalloc seems pretty weak for a general-purpose one.
-
aqw schrieb:
It's not that glibc is delaying the deallocation of pages using some sort of garbage collector that only runs when needed - it does not free them at all.
I know.
For example, if you temporarily need a 1 GB data structure like std::set or std::map during the initialization phase of your program, it will keep that 1 GB allocated forever - even if only needs a few dozen MB during normal operation. That's not a particularly uncommon pattern.
That is not a terribly common pattern. Or we are not talking about the same kind of software. This forum mostly deals with rather casual problems and the thread openers problems seems to be rather casual, too. Your professional, business critical, millions of lines of code software probably wants to explore the possibilities of alternative memory managers but your small hobbyist tool is usually very well of with the default runtime environment. Yes, it would almost certainly be faster or slimmer or both with another memory manager but the gains are so small that they are just not worth the effort.
-
I can't agree to SeppJ because this problem is not casual at all.
I've different applications that use a big set of input data which is transformed to another big set of output data. Although I'm not allone one the server. So if all processes don't release their memory to the system this will lead to problems some day.
The second part of my thread - the usage of 40 Bytes instead of 16 Bytes is also a problem for me. The sample with 1E+8 values in a vector is realistic for my environment. And if the consumtion of this very small object is blown up such - how is this in a context where the vector element is not a double object but a more complex one with e.g. some doubles in it?
Bests
Michael
-
M++ich schrieb:
I've different applications that use a big set of input data which is transformed to another big set of output data. Although I'm not allone one the server. So if all processes don't release their memory to the system this will lead to problems some day.
So all processes will run forever? Why is that?
The second part of my thread - the usage of 40 Bytes instead of 16 Bytes is also a problem for me. The sample with 1E+8 values in a vector is realistic for my environment. And if the consumtion of this very small object is blown up such - how is this in a context where the vector element is not a double object but a more complex one with e.g. some doubles in it?
The overhead per object will always be the same. So, a 100 byte object will use something like 132 Bytes.
But: Why safe pointers to objects in a container? Why don't you just allocate all objects at once as is common practice? An array or vector of 100 objects a 100 byte will then use about 100*100+32=10032 bytes.
-
So all processes will run forever? Why is that?
No, not forever but for a longer time (5 Minutes). The problem is that the count of parallel processes increase.
The overhead per object will always be the same. So, a 100 byte object will use something like 132 Bytes.
Can you please explain for what these extra Bytes are? This is only heap relevant and for all types the same? This means an object on the heap that itself points to an object costs twice an extra of 32 Bytes?
But: Why safe pointers to objects in a container? Why don't you just allocate all objects at once as is common practice? An array or vector of 100 objects a 100 byte will then use about 100*100+32=10032 bytes.
The sample was quite simple to show the problem. In my code it's a 2D-array of vector elements that each takes an unpredictable count of objects.
-
M++ich schrieb:
Can you please explain for what these extra Bytes are?
Bookkeeping. The implementation of the memory manager needs to store information about how much memory has been malloc'ed where. Also in debug builds it is a common technique to mark the beginning and end of a malloc'ed block with special guard values.
This is only heap relevant and for all types the same?
Ususally only on the heap. And types don't exist at this level. We are talking about raw memory.
This means an object on the heap that itself points to an object costs twice an extra of 32 Bytes?
Every malloc will invoke a small space overhead.
This is almost the exact same explanation as the one that LordJaxom already gave you at the very beginning of this thread.
-
Question solved - thank you all.