Wednesday, December 14, 2011

Swapper Subsystem Files Structure

swap.c: it moves pages around in the lru lists
swapfile.c: it handles actual swapping of the pages to backing store
                  it also provides function calls to control swapping from usermode
swap_state.c: it handles swap cache
vmscan.c: glues the swapper subsystem to the memory manager.
                 provides more high-level functions that the memory manager.
                 kswapd is here. 

Tuesday, December 13, 2011

Discussion on swap.c/put_page related functions

There are two phases of these functions:
1. delete the page from lru cache (__page_cache_release)
2. freeing the page to memory allocator

Consider the allocation process: 1. page is allocated, 2. page table entries are fixed 3. page is added to lru cache.

In the put page functions, page table entries are not handled. So the control path should fix/remove appropriate page table entries depending before calling thes functions.


Thursday, December 8, 2011

Get Tasks Page Tables

step 1: get the current task
step 2: get the runqueue
step 3: get the cfs rbtree root
step 4: traverse the tree (recursive)
for each node, get the scheduling entity
from the scheduling entity, get the task pointer
from the task structure, get the pgd
print pgd information :D

Notes on rbtree

definition
----------

#define RB_RED 0
#define RB_BLACK 1
struct rb_node
{
unsigned long rb_parent_color;
int rb_color;
struct rb_node *rb_right;
struct rb_node *rb_left;
} __attribute__((aligned(sizeof(long))));

usage:
-------

Notes on Swapping (how to totally smash it)

modified pages (backed by the block device) are not swapped out. but it can be synchronized with the block device.

kernel can discard pages (backed by a file) which are not modifiable.

page reclaim = { selecting, swapping, synchronizing, discarding }

private mapping = map a file in memory but changes to that memory is not reflected to the backing block device.

pages candidates for the swap area
1. MAP_PRIVATE, 2. MAP_ANONYMOUS, 3. IPC pages

inside swap areas, there are slots and clusters. swap areas may have different priorities. clustering makes it fast to do readaheads to/from swap space.

bitmap is used to keep track of used and free slots in swap area.

to reduce swapping pressure, there are kswapd (normal mem pressure) and direct reclaim (extreme memory pressure).

when swapping is not enough, there is OOM killer. so, we can see there are levels of mechanisms to keep the system running.

there are 2 lists of each zone: active and inactive. pages are transferred between these 2 lists based on their h/w accessed bit.

swap_info_struct keeps track of each swap areas. similar to the way kernel keeps track of a block device, it keeps track of the swap area.

the first slot of a swap area is used for identification purpose and keeping state information about the swap partition. (i have to see how that is done)

extent list works like a page table in VM. they keep track of linear page slots of a swap area to the scattered blocks on disk. for file-based swap area, the number of extent list structures are more than partition-based swap area where blocks are sequential.

Reverse Mapping

vma_area_struct: 3 members are needed: shared, anon_vma_node, anon_vma

2 alternatives: anonymous pages and file-mapped pages

General MM Notes

Linux kernel space has 4 kinds of memory mapping
1. direct mapping
2. vmalloc mapping
3. kmap mapping
4. fixmap mapping

direct mapping is used for low memory pages which are used to store permanent data structures, page tables, interrupt tables and so on.

vmalloc mapping is used for non-contiguous memory mapping. i can use it any way possible. if you have a space reserved in vmalloc space and a bunch of pages in your hand, you can map those pages to that vmalloc space by modifying the kernel page tables. (example usage: by insmod to store the modules)

kmap mapping is kind of the vmalloc space but only one pgd entry in master kernel page table is dedicated for kmap virtual address.

fixmap mapping is similar to kmap mapping. (example usage: map a page table residing on a high mem page)

all these 4 kinds of memory virtual and physical pages are given as soon as the mapping request is placed and served. to page fault here means something terribly wrong going on inside.

user process pages are similar to kernel vmalloc space except they are in virtual address in user space and the physical pages are not allocated and mapped as long as there is page fault generated on those virtual addresss spaces.