Tuesday, May 26, 2009

Memory Area Management

memory area is a sequence of contiguous physical addresses. this idea came up from the event that the kerne uses the same kind of objects. so grouping them will save much mem space.

qus: what happens when cache is increased or decreased? it is not supposed to get contig pages anymore. it will get another group of contig pages.

there are 13 list of memory areas of sizes from 32 to 131072 bytes. each list has 2 caches: dma and normal.

user processes do not have access to or use the slab allocator. it is solely used by the kernel.

2 types: general (used by the slab alloc) and specific (used by the kernel)

kmem_cache keeps track of the other caches.

pages belong to a slab will have the PG_Slab flag set on them. it is not possible to get pages from ZONE_HIGHMEM because page_address() will return NULL in that case :D.

every physical or abstract entity has a descriptor in the kernel. a page frame within a slab has its PG_Slab flags set, lru.next=slab desc, lru.prev=cache desc. lru is used in buddy allocator to make a chain of free consecutive pages. as the page is no longer in buddy alloc, we can use this lru field to do something useful. slab pages are pseudo-free as they are free but not tracked by the buddy alloc.

slab page frames can be tracked by s_mem of slab desc and cachedesc.gfporder

Saturday, May 23, 2009

Page Cache

during hibernation, RAM is written to swap space.

page cache is a set of radix trees that helps to quickly find out a page from the address space of object of the owner.

kinds of pages in page cache
  • regular file, directories
  • block device data
  • swapped out processes
  • file of special fs (eg, ipc shm)
page owner is the inode object. (the idea of owner of a page comes only when the page cache is in action). r/w depends on the type of owner. 3 types of owner:
  • regular file
  • block device file
  • swap area and swap cache
inode contains address space object which is the page cache (pages and methods). block device also has an inode for it. the idea of the owner based on uniqueness of resource.

the radix tree of the pages in a page cache is searched in a way similar to page table lookup. the number of bits taken for indexing depends on the height of the tree.

find_get_page() increases the page usage counter (page->count). find_lock_page() additionally sets the PG_locked flag.

the radix tree is a kind of database for the pages (by utilizing the tag field)

PG_writeback means the page is currently being written back to disk (tag[1]).

A buffer page is a page with buffer heads. every page in a page cache should be a buffer page ... isn't it?

all anon/file reads and writes are executed on RAM only. if a page gets written, the PG_dirty flag is set during page fault. if a dirty page is in an addr space, then it must be written back to disk.

the private field of a buffer page points to the buffer head of the first block.

Sunday, May 17, 2009

Memory Mapping

2 kinds of pages: anonymous pages (pure RAM) and file mapped pages (backed by a file on disk/blk device). 2 levels of pages: physical and logical structure.

anonymous pages have mapping=NULL, mapped pages have mapping=addr_space. mapped pages also form a radix tree under the addr spc object.

the layer above page layer is the memory region layer. that layer keeps some flag to specify the kind of pages underneath.

initially, the memory region is not linked with the page frames. during page faults, the mem reg gets pages (anon, map). during page reclaim the link is broken temporarily.

the processes use file object. the memory region links the file object to the (logical) pages. if the pages are anon, then mem reg does not use the 'vm_pgoff' and 'vm_file'. but if the pages are mapped, then 'vm_file'=file object, 'vm_pgoff'=location in the file.

the address_space object links the physical pages and the virtual memory regions associated with a file.
  • page_tree member is the radix tree of physical pages (the page cache)
  • i_mmap is the radix priority search tree (PST) of the memory regions objects. PST is used for reverse mapping.
mapping (address space) has the host field pointing to the inode. file object->f_mapping = address space. one object on disk is one address space object. different modes of files are tracked by the file objects.

if vma_area_struct.vm_file is NULL then this region does not map any file.

if a memory region does not map a file, its nopage method is NULL (anonymous mapping).

Tuesday, May 5, 2009

Handling Swap-Page Faults

read_swap_cache_async locks the page and block layer unlocks when it finishes the reading.

read ahead pages are kept in swap cache.

swap area is on disk. swap cache is in RAM.

the first part of read_swap_cache_async reserves RAM pages for the swap-in and puts them neatly in the swap cache space. then in the 2nd part, swap_readpage reads the page from swap area (disk/ram) to the swap cache.

swap_readpage initiates the data transfer by { get_swap_bio: req gen, submit_bio: req send }

swap cache is just a radix tree.

swap entry in the swap area index is kept in page private field.

swapin_readahead just an additional layer around read_swap_cache_async ... tune it by using /proc/sys/vm/page-cluster