Friday, November 19, 2010

Huge Pages in Linux

ref: http://lwn.net/Articles/374424/

Some useful formulas on TLB miss penalty is given there on Part 1. Everyone thinks about fitting app data and kernel data fitting inside CPU cache. This boosts performance a lot.

database workloads will gain about 2-7% performance using huge pages where as scientific workloads can range between 1% and 45%

In the initial support for huge pages on Linux, huge pages were faulted at the same time as mmap() was called. This guaranteed that all references would succeed for shared mappings once mmap() returned successfully. Private mappings were safe until fork() was called. Once called, it's important that the child call exec() as soon as possible or that the huge page mappings were marked MADV_DONTFORK with madvise() in advance. Otherwise, a Copy-On-Write (COW) fault could result in application failure by either parent or child in the event of allocation failure.

Saturday, March 13, 2010

waiting on a locked page

a page might be locked because it is having IO or migration.

each zone has a hash table of wait queues. there is a waiting on a bit mechanism. when a process waits on a page, it actually waits on a page flag bit inside a wait queue from that zone hash table.

pagemap.h:lock_page(), filemap.c:page_waitqueue()

Wednesday, March 10, 2010

read --> disk blocks

there are two kinds of high-level accesses: sync, async
there are two kinds of low-level accesses: page cache and direct IO
sync infact uses async adding waiting on the async
sync, async both go through low-level accesses either through page cache or skipping page cache

generally unix filesystems use native unix read/write implementation. from files, control goes to mapping toward inode read_page method. that method converts file pages to disk blocks.

Sunday, February 21, 2010

To Do's

1. Signal handling in a kernel thread
2.

Thursday, February 18, 2010

Locks while extracting pages from Free List (PCP and Buddy)

access pcp cache
----------------
get_cpu()
local_irq_save()
zone lock (nex)
access buddy to replenish
zone unlock (nex)
local_irq_restore()
put_cpu()

access buddy
------------
get_cpu()
zone lock (ex)
zone unlock (ex)
put_cpu()

zone_lock_irqsave: local_irq_save + zone_lock
zone_unlock_irqrestore: zone_unlock + local_irq_restore

* when a page is given to pcp, the page private is set to migration type.

Tuesday, July 28, 2009

Page Reclaim IV (Swap Cache)

pages in a swap cache has the following things...

page->mapping = NULL ... why is that? afaik, it should point to the swap cache addr space object
PG_swapcache flag is on
page->private = swapped-out page id

a swap cache page linked to a page slot in swap area and some processe(s).

a swapper_space address space object is used for the swap cache for the whole system. question: how is a page found inside a swap cache? ans: using the swapped out page identifier

It is possible to know how many processes were sharing a swapped out page. Until everyone gets it in, the page is temporarily stored in the swap cache.

swap area has clustering of page slots (for efficiency during r/w: swap in-out)

Page Structure Finds

PG_swapcache: the page is in the swap cache

PG_locked: undergoing IO, in swap space update, in file r/w etc.