Tuesday, January 3, 2012

buffered_rmqueue analysis

when the page (and the block) in the buddy, the mapcount equals PAGE_BUDDY_MAPCOUNT_VALUE

when the page is in pcp list, the page private records the migratetype

private field is sometimes used to store the order or the migrate type temporarily when the page is inside the buddy system. 

event:
sometimes we do not have enough pages in a specified migratetype. to sometimes it helps if we can move some free pages around in a page block to the page list of corresponding migratetype.



Wednesday, December 14, 2011

Swapper Subsystem Files Structure

swap.c: it moves pages around in the lru lists
swapfile.c: it handles actual swapping of the pages to backing store
                  it also provides function calls to control swapping from usermode
swap_state.c: it handles swap cache
vmscan.c: glues the swapper subsystem to the memory manager.
                 provides more high-level functions that the memory manager.
                 kswapd is here. 

Tuesday, December 13, 2011

Discussion on swap.c/put_page related functions

There are two phases of these functions:
1. delete the page from lru cache (__page_cache_release)
2. freeing the page to memory allocator

Consider the allocation process: 1. page is allocated, 2. page table entries are fixed 3. page is added to lru cache.

In the put page functions, page table entries are not handled. So the control path should fix/remove appropriate page table entries depending before calling thes functions.


Thursday, December 8, 2011

Get Tasks Page Tables

step 1: get the current task
step 2: get the runqueue
step 3: get the cfs rbtree root
step 4: traverse the tree (recursive)
for each node, get the scheduling entity
from the scheduling entity, get the task pointer
from the task structure, get the pgd
print pgd information :D

Notes on rbtree

definition
----------

#define RB_RED 0
#define RB_BLACK 1
struct rb_node
{
unsigned long rb_parent_color;
int rb_color;
struct rb_node *rb_right;
struct rb_node *rb_left;
} __attribute__((aligned(sizeof(long))));

usage:
-------

Notes on Swapping (how to totally smash it)

modified pages (backed by the block device) are not swapped out. but it can be synchronized with the block device.

kernel can discard pages (backed by a file) which are not modifiable.

page reclaim = { selecting, swapping, synchronizing, discarding }

private mapping = map a file in memory but changes to that memory is not reflected to the backing block device.

pages candidates for the swap area
1. MAP_PRIVATE, 2. MAP_ANONYMOUS, 3. IPC pages

inside swap areas, there are slots and clusters. swap areas may have different priorities. clustering makes it fast to do readaheads to/from swap space.

bitmap is used to keep track of used and free slots in swap area.

to reduce swapping pressure, there are kswapd (normal mem pressure) and direct reclaim (extreme memory pressure).

when swapping is not enough, there is OOM killer. so, we can see there are levels of mechanisms to keep the system running.

there are 2 lists of each zone: active and inactive. pages are transferred between these 2 lists based on their h/w accessed bit.

swap_info_struct keeps track of each swap areas. similar to the way kernel keeps track of a block device, it keeps track of the swap area.

the first slot of a swap area is used for identification purpose and keeping state information about the swap partition. (i have to see how that is done)

extent list works like a page table in VM. they keep track of linear page slots of a swap area to the scattered blocks on disk. for file-based swap area, the number of extent list structures are more than partition-based swap area where blocks are sequential.

Reverse Mapping

vma_area_struct: 3 members are needed: shared, anon_vma_node, anon_vma

2 alternatives: anonymous pages and file-mapped pages