Tuesday, December 14, 2010

using red-black tree

http://lwn.net/Articles/184495/

#include

we need a rbtree root of type struct rb_node:

[looking for an entry]
struct intv_obj * intv = NULL;
struct rb_node * rb_node = chnl->rb_root;
while(rb_node){
struct intv_obj * intv_tmp;
intv_tmp = rb_entry(rb_node,struct intv_obj,rb_node);
if(time < intv_tmp->end){
intv = intv_tmp;
if(intv_tmp->start<=time) break; // we are done
rb_node = rb_node->left;
}
else
rb_node = rb_node->right;
}
return vma;

[inserting an entry]

[deleting an entry]

Friday, November 19, 2010

Huge Pages in Linux

ref: http://lwn.net/Articles/374424/

Some useful formulas on TLB miss penalty is given there on Part 1. Everyone thinks about fitting app data and kernel data fitting inside CPU cache. This boosts performance a lot.

database workloads will gain about 2-7% performance using huge pages where as scientific workloads can range between 1% and 45%

In the initial support for huge pages on Linux, huge pages were faulted at the same time as mmap() was called. This guaranteed that all references would succeed for shared mappings once mmap() returned successfully. Private mappings were safe until fork() was called. Once called, it's important that the child call exec() as soon as possible or that the huge page mappings were marked MADV_DONTFORK with madvise() in advance. Otherwise, a Copy-On-Write (COW) fault could result in application failure by either parent or child in the event of allocation failure.

Saturday, March 13, 2010

waiting on a locked page

a page might be locked because it is having IO or migration.

each zone has a hash table of wait queues. there is a waiting on a bit mechanism. when a process waits on a page, it actually waits on a page flag bit inside a wait queue from that zone hash table.

pagemap.h:lock_page(), filemap.c:page_waitqueue()

Wednesday, March 10, 2010

read --> disk blocks

there are two kinds of high-level accesses: sync, async
there are two kinds of low-level accesses: page cache and direct IO
sync infact uses async adding waiting on the async
sync, async both go through low-level accesses either through page cache or skipping page cache

generally unix filesystems use native unix read/write implementation. from files, control goes to mapping toward inode read_page method. that method converts file pages to disk blocks.

Sunday, February 21, 2010

To Do's

1. Signal handling in a kernel thread
2.

Thursday, February 18, 2010

Locks while extracting pages from Free List (PCP and Buddy)

access pcp cache
----------------
get_cpu()
local_irq_save()
zone lock (nex)
access buddy to replenish
zone unlock (nex)
local_irq_restore()
put_cpu()

access buddy
------------
get_cpu()
zone lock (ex)
zone unlock (ex)
put_cpu()

zone_lock_irqsave: local_irq_save + zone_lock
zone_unlock_irqrestore: zone_unlock + local_irq_restore

* when a page is given to pcp, the page private is set to migration type.