Memory Managment

tty

Beschreibung des MM von Badestrand:
http://www.c-plusplus.net/forum/viewtopic-var-t-is-261588-and-postdays-is-0-and-postorder-is-asc-and-start-is-4.html

---------------------------------------

Hi Sub-Forum,

ich hab mir nen paar gedanken über ein Memory Managment Konzept gemacht das die Seiten für die einzelnen Bereiche (Kernel, Applikation, Treiber) sauber trennt und "erweiterbar" ist.

Big-System

Das OS braucht einen Speicher-Verwaltungs Algorithmus der einzelne Seiten (4kb) Allokiert (eine oder mehrere)(die seiten können im Kernel-Land oder im User-Land oder für die Treiber benutzt werden und wir nennen sie hier "batch"), die Access-flags (read-only oder read/write| supervisor level oder user level) werden entsprechend der Anfrage für die Seiten gesetzt.
Dabei speichern wir "in band" (die Kontroll Datenstrukturen werden in der Datenstruktur drinnen gespeichert) die Verwaltungs-Daten (s_manage) in einer Seite (4k) vor dem Bereich.

Die Struktur der Verwaltungsdaten:

enum batch_level {
	USERLAND,
	KERNELLAND,
	DRIVERLAND
};

struct s_manage {
	uint count; // how many pages follow after this until the next s_mange 4kb page?

	// maybe we can combine this two flags in one byte
	bool last;  // is this the last allocated "batch" of pages
	bool free;  // is this "batch" that follows free

	// if it is not free we save here the data about the batch
	batch_level level; // this is the level for who it was allocated
	// TODO< maybe we save here for who exactly we had allocated it ? (except for kernel) >
};

Wenn der Kernel zum Beispiel 10 4-kb Seiten braucht sieht die Struktur nach dem reservieren wie folgt aus (die einzelnen Mappings für die Seiten):

Page 0                Page 1 - 10       Page 11
s_manage            |                 | s_manage
count = 10          | Data for Kernel | last = true
last = false        | (batch)         |
free = false        |                 |
level = KERNELLAND  |                 |

(Die Flags für die Seiten des Batches werden entsprechend der Anfrage gesetzt)

Small-System

Das Small System Teilt den Speicherbereich der vom Big-System gekommen ist nochmal ein weil der Kernel/Die Userland Programme meistens keine 4kb-Brocken brauchen.

Die Kontroll-Strukturen werden "out of Band" (außerhalb des Speicherbereichs wo die daten für das Programm/den Treiber später gespeicher werden) gespeichert.
Das hat den Vorteil das man es nicht so einfach überschreiben kann (wenn die Seite(n) wo die Kontroll-Daten liegen als Supervisor gekennzeichnet sind kann man als Treiber(Ring1/2?) und als User(ring3) nicht drauf zugreifen).

Wir nennen die Nutz-Strukturen hier "Junks".

Nutz-Strukturen
---------------

Das Prinzip ist das wir eine 4kb Seite n mal zerstückeln und so die Junks erhalten (wobei n = 1, 2, 3 ... 4096/8)(4096/8 weil ein 8 Byte-"Alignment" sehr gut ist).

n = 1
|                    4 kB Seite                                                 |

n = 2
|              2kb Junk                 |                 2 kb Junk             |

n = 3
4096/3 = 1365,3...(periode)

man sieht schon das der erste Junk 8 Byte aligned ist aber der zweite Junk ist es nicht, also was tun wir?

Wir wissen das der Junk kleiner sein soll als 1265 und durch 8 ohne rest teilbar (wegen dem Aligment), das heißt...

1365 % 8 = 5
so ziehen wir 5 bytes von der größe ab und wir liegen richtig
1365-5 = 1360

Das bedeutet das jeder Junk 1360 Bytes groß ist und die restlichen (4096 - 1360*3 = 16) Bytes werden am ende "verschmissen".
|       1360            |    1360               | 1368                   | ... |

usw.

Das Prinzip kann man natürlich auch nicht nur mit Seiten sondern auch mit größeren (8kb, 12kb, ...) Speicherbereichen machen.

Kontrollstrukturen
------------------

In den Kontrollstrukturen müssen wir nur speichern wie groß ein Junk ist und welche belegt/frei sind.

Irgendwelche Kritik oder Vorschläge, andere Techniken, pro/contra währ produktiv...

Badestrand

Hi tty,

so ganz habe ich dein Konzept noch nicht verstanden. Wo genau willst du die Daten des Big Systems speichern? Und das Small-System, ist das dann quasi sowas wie ein Heap?

tty

Hallo Badestrand,

Der Speicher der das Small-System bereitstellt kann man natürlich auch als "Heap" benutzen, aber es kann auch das konzept des pro-Programm-Heap ganz ersetzen (was auch Vorteile hat).
Die Speicherbereiche die das small-System abgibt haben eine n*4bk-Seiten Größe und sie können theoretisch belibig im Arbeitsspeicher gemappt werden (und weil es Seiten sind kann man auch die Zugriffsrechte setzen).

Die Daten des Big-System werden in den einzelnen Seiten (4kb) vor den "Nutzdaten"(fürn Kernel oder für die Anwendungen) Gespeichert.
Wobei man unter Daten folgendes versteht:
- was folgt nach der Seite in der die Daten über die folgenden Seiten Gespeichert sind?
- wie viele seiten folgen?

und das Beispiel was ich versucht habe zu illustrieren war das:

Page 0                Page 1 - 10       Page 11
s_manage            |                 | s_manage
count = 10          | Data for Kernel | last = true
last = false        | (batch)         |
free = false        |                 |
level = KERNELLAND  |                 |

Wobei das aus der Sicht des Virtuellen Speichers ist (der physikalische Speicher kann über das Mapping oberhalb von 16 MB oder sonstwo gemappt werden, ist eigentlich egal wo das genau hinkommt.

Hoffentlich habe ich das verständlich erklärt.
Ich kann auch später bissel Code "nachschieben" (natürlich gut dokumentiert).

Erhard Henkes

Beschreibung des MM von Badestrand:
http://www.c-plusplus.net/forum/viewtopic-var-t-is-261588-and-postdays-is-0-and-postorder-is-asc-and-start-is-4.html

Hi Erhard, magst du das mal in's Forum schreiben, in den Memory-Management Thread von tty? Ich kann mich z.Zt. nicht einloggen, ab Sonntag aber wohl wieder...

von "Badestrand" per mail:

Hab da mal ein Dokument zum Memory Management verfasst*. Ich hoffe es genügt so, ansonsten bitte konkrete Kritik abgeben (und keine Scheu, ich kann damit umgehen). Falls im Source Code noch irgendwo was unklar ist, sponsore ich auch da gerne noch einige Kommentare. Den Bereich von MMIO hatte ich auf 0xF....... beschränkt, weil Erhard ID-Mapping brauchte und ich nicht zulassen wollte, dass plötzlich irgendwo mittendrin in den Heap oder gar in unseren Code irgendwelche Hardware-Sachen reingeschrieben werden. However, Erhard und ich hatten zum Glück am Anfang aneinander vorbeigeredet; da hatte ich was gecodet, was den angeforderten physischen Speicher einfach an eine freie virtuelle Adresse mappt und alles konfliktfrei abläuft. Ich baue es die Tage mal wieder ein, vielleicht funktioniert's ja. IRC und wieder ordentliches Internet geht bei mir hoffentlich wieder ab Sonntag (ich will das lieber nicht erklären, damit könnte ich mittlerweile ein ganzes Buch füllen..).

Auch generell zu Kritik am Memory System bin ich offen. Aber bitte auch hier konkrete Anmerkungen.

* http://www.file-upload.net/download-2297022/PrettyOS-Memory-Layout.pdf.html

@tty: Ich werde mir dein Konzept am Wochenende mal zu Gemüte führen. Mal sehen, wenn sich das als einfacher (zum Verständnis und in der Implementierung) erweist, könnten wir das Memory System auch neu schreiben.

Badestrand

PrettyOS Memory Layout

General Memory Layout

0 MB - 16 MB         For DMA only. Kernel code resides somewhere inbetween, I think
16 MB - 20 MB        For placement allocation while the heap is not set up. Here resides
                       - "Physical" bit table                        (128 KB)
                       - Kernel page directory                       (circa 8 KB)
                       - Kernel's page tables for 0 MB - 20 MB       (20 KB)
                       - Kernel's page tables for 3 GB - 4 GB        (1024 KB)
                       - Heap's "region" list (nongrowable)          (Remaining)
20 MB - 3 GB         Intended for user programs' code, stack, heap
3 GB - 0xFFF00000    For the kernel heap only; malloc'd memory resides here
0xFFF00000 - 4 GB    Memory mapped stuff. E.g. for the PCI area.

About the code - Preface

The whole memory management code is contained in four files: paging.h, paging.c, kheap.h, kheap.c.
The interface (stuff in the header files) is designed to be as small as possible. The code itself is aimed to be most easy to understand, so the concepts and implementations are held rather simple than performant (also some optimizations were applied).
The implementation supports memory of up to 4 GB, requiring a page size of 4 KB.

About the physical memory management

As in most hobby OS, the physical memory is managed via a bit table, each bit reflects the state of a physical 4 KB memory frame. The state can be "free" or "reserved", resulting in a 0-bit for "free" and a 1-bit for "reserved".
To avoid calculations the bit table is 128 KB of size, thus covering 4 GB of memory.

No functions or data are exported, all functionality is used directly by the paging module. The code resides in the paging.c file, all functions and variables marked as "static".

Internal data structures:
The BIOS provides a "memory map", each entry specifying a memory block's address, size and whether it is usable for us.

typedef struct
{
    uint64_t base;   // The region's address
    uint64_t size;   // The region's size
    uint32_t type;   // Is "1" for "free"
    uint32_t ext;    // Unimportant for us
} __attribute__((packed)) mem_map_entry_t;

The internal variables:
Pointer to the bit table. We access 32 bits at a time since we cannot directly access each bit on it's own.

uint32_t* bittable;

When we (linearly) search for a free frame, it is a waste to always start searching at bit 0. So an optimization is applied in that we remember the first possible integer which might contain a free bit.

uint32_t first_free_dword;

The internal functions:
Scans through the memory map to find out whether the specified physical range (from beg inclusively to end exclusively) is free for use. Independent from the bit table, does not use the internal variables. Returns true if the area is free for use.

bool memorymap_availability( const mem_map_entry_t* entries, uint64_t beg, uint64_t end )

Sets the appropiate bits in the bittable for the physical address range given by addr_begin and addr_end .

void phys_set_bits( uint32_t addr_begin, uint32_t addr_end, bool reserved )

Does the initialization. At first, the memory map's entries get constrained to 4 GB in case they exceed that mark. Subsequently we check whether the region 16 MB - 20 MB is marked as free in the memory map as we want to store our private data here.
Afterwards we allocate the bit table, set it's bits according to the memory map, reserve 0 MB - 20 MB, initialize first_free_dword and determine and return the amount of available memory.

uint32_t phys_init()

Allocates (flips the appropiate bit from "free" to "reserved") a random 4 KB physical frame and returns the physical address. Adjusts the first_free_dword variable if necessary.

uint32_t phys_alloc()

De-allocates the given 4 KB physical frame by setting it's bit to "reserved" and besides eventually adjusts first_free_dword.

void phys_free( uint32_t addr )

About the paging facility

Uses conventional data structures and concepts. There are page directories and page tables. However, there may be one unconventional detail: All page tables for memory that belongs to the kernel are pre-allocated so there won't be a need to adjust all user page directories in case of (e.g. heap-) changes.

Several functions are exported but neither the datastructures nor the kernel's page directory address are. The code resides in the paging.c file.

Internal data structures:
Holds the page tables' physical and virtual pointers as well as it's own physical address.

struct page_directory_
{
    uint32_t      codes[1024];
    page_table_t* tables[1024];
    uint32_t      pd_phys_addr;
} __attribute__((packed));

Holds the page mappings: Their physical addresses and flags.

typedef struct
{
    uint32_t pages[1024];
} page_table_t;

The internal variables:
A pointer to the kernel's page directory.

page_directory_t* kernel_pd;

The internal functions:
Returns the mapped physical address for the given page directory and virtual address.

uint32_t paging_get_phys_addr( page_directory_t* pd, void* virt_addr )

The exported functions:
Setups the physical and paging management. Invokes phys_init , allocates and fills the kernel's page directory. Does identity mapping for 0 MB - 20 MB, sets the video area user-accessible (although I think we don't need that any more), initializes the page table for the kernel heap. The heap's page table entries remain unmapped here, i.e. they don't specify a physical address. Finally the actual paging is enabled in the CPU.

uint32_t paging_install()

Allocates the given amount (size) of memory and maps it consecutively to the given virtual address. E.g. for enhancing the user's heap, call paging_alloc( user_pd, heap_end, _1MB, MEM_USER | MEM_WRITE ).

bool paging_alloc( page_directory_t* pd, void* virt_addr, uint32_t size, uint32_t flags )

Frees the memory that was allocated by paging_alloc . As the paging module does not store the size for manually allocated blocks, you have to provide it here.

void paging_free( page_directory_t* pd, void* virt_addr, uint32_t size )

Creates a page directory, copies the kernel's entries into it and returns it's address.

page_directory_t* paging_create_user_pd()

Delete a page directory that you created by a call to paging_create_user_pd . All valid mappings of memory registered in that page directory that do not belong to the kernel get freed. Finally the page directory itself is freed.

void paging_destroy_user_pd( page_directory_t* pd )

Switches to the given page directory.

void paging_switch( page_directory_t* pd )

About the heap

The heap provides the malloc/free-functionality, i.e. dynamic allocation of memory. It manages a certain amount of continuous virtual memory, starting at "heap_start". Whenever more memory is requested than there is available, the heap expands.
For expansion, the heap asks the paging module to map physical memory to the following virtual addresses and increases it's "heap_size" variable afterwards.

To manage the free and reserved (allocated) areas of the heap, an array of "region" elements is held. Each region specifies it's size and whether it is reserved/allocated. Free regions always get merged. Regions don't store their addresses, the third region's address is calculated by adding the first and second region's size to "heap_start":
region_3_addr = heap_start + regions[0].size + regions[1].size.

Before the heap is set up memory is allocated on a "placement address". This is an identity mapped area of continuous memory, the allocation just moves a pointer forward by the requested size and returns it's previous value.

The heap's management data is placed at this placement address, too. Since this area cannot grow, the heap has a maximum amount of region-objects ("region_max_count").

Internal data structures:
Holds the info about a single region.

typedef struct
{
    uint32_t size;
    bool     reserved;
} region_t;

The internal variables:
Pointer to the array of region objects. This array is not resizeable as it resides in the placement allocation area.

region_t* regions = NULL;

Number of currently used region objects and the maximum count of region objects that fit into the regions-array.

uint32_t  region_count = 0;
uint32_t  region_max_count = 0;

Current size that the heap takes up. Grows dynamically when needed.

uint32_t  heap_size = 0;

The internal functions:
Receives a pointer to the heap's end and appends size bytes of memory to it. The region-array is adjusted, too.

bool heap_grow( uint32_t size, char* heap_end )

The exported functions:
Setups the internal variables.

void heap_install()

Allocates the requested amount of memory, assuring the given alignment. Eventually lets the heap grow.

void* malloc( uint32_t size, uint32_t alignment )

Frees the previously allocated memory.

void free( void* addr )

Miscellaneous

Maps some virtual address to the given physical address and returns that virtual address. Use this for memory mapped IO.

void* paging_acquire_pcimem( uint32_t phys_addr )

Erhard Henkes

wir haben im makefile -O. Das bedeutet ja minimale Optimierung. Wenn man es entfernt oder -O0 (o null) schreibt, kracht es in malloc.

kernel/kheap.c: In function 'malloc':
kernel/kheap.c:104: error: initializer element is not constant

Badestrand

Erhard Henkes schrieb:

kernel/kheap.c:104: error: initializer element is not constant

Witzig, weil ist 'ne Konstante. Dann wird es halt zum define.
Crasht dann aber mit -O0 beim Übergang von init nach main. Ist echt anstrengend zu debuggen.

Badestrand

Steckt man die paar Zeilen der init-Funktion direkt an den Anfang der main-Funktion, klappt alles. Verrücktes Verhalten. Ich würde sagen, wir lassen es erstmal bei -O.

taljeth

Der Code ist sehr wahrscheinlich auch mit -O kaputt, nur subtiler. Solche Bugs werden vom Liegenlassen jedenfalls meistens nicht einfacher.

Badestrand

Da hast du wohl recht.

abc.w

Oder es fehlt einfach irgendwo ein "volatile"...

taljeth

Bei Treibern mag das ja regelmäßig vorkommen, aber eine Speicherverwaltung, die volatile braucht, wäre mir suspekt.

Badestrand

An der Speicherverwaltung kann es gar nicht liegen, die wird erst später initialisiert, nach dem Crash. Der Code, der bis zum Absturz durchlaufen wird, ist auch relativ überschaubar. Direkt nach dem Eintritt in die main-Funktion wird die "init"-Funktion aufgerufen, die folgende Sachen ausführt:

clear_screen();
settextcolor(14,0);
printformat("PrettyOS [Version 0.0.0.221]\n");
gdt_install();
idt_install();
timer_install();
keyboard_install();
syscall_install();
settextcolor(15,0);

Auf den ersten Blick in diese Funktionen sieht das alles recht ungefährlich aus. Die Sachen laufen auch gut durch, beim return geht irgendwas schief, scheint also der Stack kaputt zu sein. Bloß woher Ich bin schon dabei, zu debuggen und zu disassemblern, ist aber eine eklige Arbeit..

Ist eher alles ein Problem mit den Kompilerflags. Folgendes Beispiel:

// ckernel.c Revision 219

int main()
{
for(;;); // <- Endlosschleife :)

    init();
(...)
}

Ohne Flag "-O" stürzt PrettyOS ab, woraus vorerst nur folgen soll, daß die Endlosschleife nicht ausgeführt wurde (also "verschwunden" ist).

Mit Flag "-O" bleibt PrettyOS in der Endlosschleife hängen, was heißt, daß sie nicht "verschwunden" ist.

Daraus folgt, daß der GCC auch ohne Angabe von Optionen zur Optimierung den Quellcode wie auch immer verändert. Insofern wirst Du wahrscheinlich im Quellcode auch keine Fehler finden.

Badestrand

Da läuft beim Linken was schief, aber ist der Grund jetzt klar. Der eigentlich als Einstiegscode gedachte "KernelStart" liegt irgendwo in der Mitte des Binaries.

Mit Compiler-Option -O0 wird die "init"-Funktion nicht ge-inline-t und liegt am Anfang des Binaries, d.h. der Code springt direkt in die init-Funktion, ohne vorher in der "main" gewesen zu sein. Logisch, dass dann keine gültige Rücksprungadresse auf dem Stack liegt und es kracht.

Mit -O wird die "init"-Funktion ge-inline-t und die "main"-Funktion liegt direkt am Anfang des Binaries. Damit klappt es zwar, ist aber nicht im Sinne des Erfinders, die "KernelStart"-Funktion gibt es ja nicht umsonst, sie setzt z.B. den Stack auf 0x01900000.

D.h. irgendwie bekommt der Linker es nicht hin, die "KernelMain" an den Anfang des Binaries zu setzen, obwohl die Funktion im Linkerscript als "ENTRY" markiert ist. Hat jemand eine Idee?

edit: Danke, +gjim+!

Badestrand

Ok, habs. Im Linkerscript muss STARTUP(kernel.o) gesetzt sein, das ENTRY spielt keine Rolle. Ich comitte.

edit: Was für ein Krampf

XanClic

Trotzdem könnte man hier mal anmerken, dass man genau wegen sowas ELF oder ähnliche Formate will (die also einen Entrypoint haben)...

Erhard Henkes

Eine Frage aus dem IRC:

ich frag mich, ob es wirklich nötig ist, die page tables für den kernel bereich im voraus zu allokieren.

lk

Kann jemand die meory map irgendwie erwitern ? Mehr details wären cool, vielleicht auch graphisch.

Apropo, ich kann auch osdev.org empfehlen.

Mr X

Im Repository befindet sich unter /documentation/memory.txt eine detailiertere und aktuellere Übersicht.