Monday, November 21, 2011

Dissection of Page Fault Handler

__do_fault (memory.c, 3.1.1)

as this function creates new mapping for a page which does not exist yet, TLB entries are not altered.

This function has two parts:
1. allocate a page and prepare it appropriately.
2. fix page tables to point to this page.

In the first if block we see there are three tasks:
1. prepare the anonymous vma
2. allocate the page
3. register the page with memory control group
       if unable, release the page
in all three cases, return VM_FAULT_OOM on failure.

the COW page is allocated before any other processing because it will reduce the lock_page() holding time on page cache.

how to detect, if it is a COW request: FAULT_FLAG_WRITE and the vma is NOT shared. COW is specially used when there is a fork from parent and a new process is created. but the memory pages are not allocated right away. instead, the virtual memory regions are marked not sharable and the page table entry is marked read only. so next time there is a page fault, it can detect COW.

after fixing COW page, the function prepares vmf structure. (vm_fault)

next, the fs fault handler is invoked. when the control reaches __do_fault, it is already decided that there is fs code involved.


...

the backing address space might want to know that the page is about to become writable or not. the filesystem code implements this functionality. in that case, the vma->vm_ops->page_mkwrite function will be present.




Wednesday, March 23, 2011

Small Test Project Idea on Reverse Mapping?

Difficulty Levels:
[6 - very easy, 5 - easy, 4 - medium, 3 - hard, 2 - very hard, 1 - extremely hard]

P1. find out mapcounts of each of the memory pages. -- L6
P2. find out PTEs for a shared anonymous pages (involves handing simple anon region) -- L4
P3. find out PTEs for a shared file page (involves handling address space) -- L3
P4. find out name of the processes that are sharing a page. -- L5 after P2/3 are done
P5. force remap of shared page -- L3
P6. measure TLB effect using 'perf' -- L4

P7. find out PDT, PMD, PTEs for a given memory region descriptor -- L4
P8. find out number of allocated page frames for a process (using its mm descriptor rss fields) -- L5

P9. Restrict memory for a process.











Monday, March 21, 2011

Page Reclamation III (Reverse Mapping)

LWN article (anon_mm) (anon_vma) (comparison)

from PLKA:
page structure contains a single element to implement reverse mapping { atomic_t _mapcount; }.
two other data structures are needed: priority search tree for file address space, and a linked list for anon address space.
region descriptor (vm_area_struct) has all info to generate reverse mapping: union shared, anon_vma_node, anon_vma
this is called object-based reverse mapping because a page is not directly associated with the process. rather a memory regions are associated with the page (, therefore, the processes  too)

anon mapping:
page->mapping: points to anon vma inside memory region object desc (vma)
page->index: relative position of the page in the memory region
last bit of page->mapping will be 1 for anon mapping (PAGE_MAPPING_ANON) or 0 for file mapping.
adding into any mapping increments page->_mapcount count.

mapcount and activity are not synonymous. because mapcount is static, whereas activity means page is being actively used right now. activity means the _PAGE_ACCESSED bit is set in the page table entry for that page in a memory regions of a process. so, during page_referenced() function, we need to get each memory region for that page, get the page table entry, check _PAGE_ACCESSED bit, clear it if it is set. interestingly, page_referenced means the number of _PAGE_ACCESSED bits set for that particular page for all the processes (memory regions) that are using that page.

from ULK3:
page structures stores backward link to memory region desc. a mem reg des contains PGD which can be used to find the PTE for that page. thus, we can get the list of PTEs from a given page structure easily. to find the number of places from where this page is mapped, we can use the page->_mapcount field. to see if the mapping is file or anon, we have to look into the last bit of page->mapping. page->index contains the relative position of that page from the beginning of the mem reg.
[note: a page in the buddy should have a mapcount of -1., non-shared page mapcount 0, shared page mapcount 1+]

now, page->mapping links data structures that connects the memory regions for this page.
page->mapping = NULL, this page is in swap cache.
page->mapping = anon_vma if last bit is 1 (anon mapping)
page->mapping = address_space if last bit is 0 (file mapping)

anon memory desc:
when kernel assigns the first page to an anonymous memory region, it allocates anon_vma data structure which has a lock and a listhead.
memory regions are kept in that list. mem_reg->anon_vma = anon_vma, mem_reg->anon_vma_node maintains the list.
notice there is a lock involved here, so think about it when considering scalability with too many shared (anonymous) pages.

[note: vma->vm_pgoff = offset of memory region in the mapped file, page->index = offset of the page in the memory region]

to find the PTE, we must need the actual linear address of the page (in that memory region). it is very important. if somehow we can't figure out the linear address of the page for a memory region, we need to search the all the PTEs in that particular memory region for that page, this happens for non-linear memory mapping. For a particular memory region, we can get the PTEs because we have the beginning and the ending addresses. So, it is easy to do a query into the page table structures to view its current state.

A page might have different linear addresses depending on the memory region it was mapped to. To find PTE, we need PGD and the linear address. whenever thinking about page mapped in memory, think about both linear address and physical address.

[pitfalls]
mremap() may crash the party by directly modifying page table entries.
if PTE says the page is accessed, then unmapping won't take place as that page is considered in-use.
locked/reserved memory regions can also nullify remapping effort.

file address space desc:








Friday, March 18, 2011

Page Reclamation II (Policy)

There are four levels of page activities: AR (00,01,10,11) [A=PG_Active, R=PG_Referenced page flag]

page reference flag is removed each time an activity check is performed. there are two types of functionalities: activity check and page moving between lists.

mark_page_active() pushes the pages towards AR=11 and page_referenced() pushes the page towards AR=00. ironically, page_referenced should have been called page_ref_killed()... :))

page_referenced() additionally tells us how many references to this page has been made so far after it has been mapped in.

swap token overrides resetting PG_reference bit and keeps it set so that the process does not suffer from swapping under heavy swapping pressure.

shrink_active_list() puts some active pages from active lru list to inactive list. shrink_inactive_list() swaps out some pages from inactive lru list.

active and inactive lists are protected by spinlock zone->lru_lock

PG_lru is set only when the page is on LRU list. they use local list to get pages out of LRU list into local list to avoid holding the lru_lock for a long time.





Thursday, March 17, 2011

Source Files Used in Page Reclamation

generic:
pagevec.h -- page vectors


functional:
mm/swap.c -- LRU, activate_page(), mark_page_accessed()
mm/rmap.c -- page unmapping, page_referenced()
mm/vmscan.c -- isolate_lru_pages()

Page Usage Counter

increments:
page_cache_get() in lru_cache_add() -- because this page is now in the LRU cache

decrements:
try_to_unmap_one() -- because one more process stopped using this page after a successful unmap

Locks and Sequential Processing in Swapping

lru_cache_add() -- disable preemption