Nabendu's Blog

Posts

Showing posts from October, 2013

How malloc works in userspace

- October 31, 2013

I was checking how malloc work as compared to kernel kmalloc in term of physical memory allocation.What I found as below.. Glibc/C libary implement the malloc implementation, For memory less than 128kb use brk()/sbrk more than that use mmap with MAP_PRIVATE|MAP_ANONYMOUS for test ran a small program on userspace I wrote void main() { int *a; a =(int*)malloc(150*1024); /* mmap invocation */ //a =(int*)malloc(10); /* brk invocation */ } and ran strace on the o/p, there I can see brk and mmap invocation.What I understood from brk call.It increases the data section (Thr Brk limit) and add the memory to malloc.In mmap anonymous private pages are allocated by kernel and added as one vma section of process address space,From there glibc allocate memory and give it to malloc. more details : Link1 Link2 Link3 bestone__ Best_link

Process Virtual to Physical Translation

- October 31, 2013

Allocating aligned address and freeing them

- October 31, 2013

uintptr_t mask = ~(uintptr_t)(align - 1); void *mem = malloc(1024+align-1); void *ptr = (void *)(((uintptr_t)mem+align-1) & ~mask); ptr is the aligned address. Some time back I faced an interview ,There I was asked to write a custom malloc using malloc and free program for a predefined aligned address.Here is the solution.The byte before the aligned byte is always empty.We will keep the offset of actual memory allocated by malloc. But there we have to allocate total memory =(desired+alignment) instead of (desired+alignment-1) downside is that it can store upto 2pow8 offsets. mallocX(size_t X,alignment Y) { p= malloc(X+Y); ret = (p+Y) & ~(Y-1); *(ret-1) = ret - p; return ret; } similarly for free freeX(memptr mem) { free (mem - *(mem-1)); }

Tracking Linux kworker threads

- October 27, 2013

How to find out which part of kernel/module has created this workqueue. How to track a kworker-thread named for example ''kworker/0:3 to its origin in kernel-space? I found this thread on lkml that answers your question a little. (It seems even Linus himself was puzzled as to how to find out the origin of those threads.) Basically, there are two ways of doing this: $ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event $ cat /sys/kernel/debug/tracing/trace_pipe > out.txt (wait a few secs) For this you will need ftrace to be compiled in your kernel, and to enable it with: mount -t debugfs nodev /sys/kernel/debug More information on the function tracer facilities of Linux is available in the ftrace.txt documentation . This will output what threads are all doing, and is useful for tracing multiple small jobs. cat /proc/THE_OFFENDING_KWORKER/stack This will output the stack of a single thread doing a lot of work. ...

Bit reversing tips

- October 26, 2013

Reversing bit pairs unsigned int i, j; // positions of bit sequences to swap unsigned int n; // number of consecutive bits in each sequence unsigned int b; // bits to swap reside in b unsigned int r; // bit-swapped result goes here unsigned int x = ((b >> i) ^ (b >> j)) & ((1U << n) - 1); // XOR temporary r = b ^ ((x << i) | (x << j)); ----------------------------------------------------------- Another standard simple method: unsigned int reverseBits(unsigned int num) { unsigned int count = sizeof (num) * 8 - 1; unsigned int reverse_num = num; num >>= 1; while (num) { reverse_num <<= 1; reverse_num |= num & 1; n...

Nice value and priority and relations

- October 26, 2013

The priority of a process in linux is dynamic: The longer it runs, the lower its priority will be. A process runs when its actually using the CPU - most processes on a typical Linux box just wait for I/O and thus do not count as running . The priority is taken into account when there are more processes running than CPU cores available: Highest priority wins. But as the winning process looses its proirity over time, other processes will take over the CPU at some point. nice and renice will add/remove some "points" from priority. A process which has a higher nice value will get lesser CPU time. Root can also set a negative nice value - the process gets more CPU time. Example: There are two processes (1 and 2) calculating the halting problem and one CPU core in the system. Default is nice 0 , so both processes get about half of the CPU time each. Now lets renice process 1 to value 10. Result: Process 2 gets a s...

Forked process, thread and address spaces little deeper

- October 23, 2013

In Fork() The child process has a unique process ID. The child process has a different parent process ID (i.e., the process ID of the parent process). The child process has its own copy of the parent's descriptors. These descriptors reference the same underlying objects, so that, for instance, file pointers in file objects are shared between the child and the parent, so that an lseek(2) on a descriptor in the child process can affect a subsequent read(2) or write(2) by the parent. This descriptor copying is also used by the shell to establish standard input and output for newly created processes as well as to set up pipes.semaphores if opened it also inherit. Memory mappings created in the parent are retained in the child process,(If MAP_PRIVATE was used in parent,it will be MAP_PRIVATE in child to,after forking if any change in memory mapped area that will be visible to corresponding process only) The child process' resource utilizations are set to 0; see setrli...

Global variable ,static always initialized but auto is not initialized

- October 10, 2013

Security : leaving memory alone would leak information from other processes or the kernel. Efficiency : the values are useless until initialized to something, and it's more efficient to zero them in a block with unrolled loops. Reproducibility : leaving the values alone would make program behavior non-repeatable, making bugs really hard to find. Elegance : it's cleaner if programs can start from 0 without having to clutter the code with default initializers. One might wonder why the auto storage class does start as garbage. The answer is two-fold: It doesn't, in a sense. The very first stack frame does receive zero values. The "garbage", or "uninitialized" values that subsequent instances at the same stack level see are really the previous values left by the same program. There might be a runtime performance penalty associated with initializing auto (function locals) to anything. A function might not use any or all of a large ar...

Speed of execution in term of code in some cases

- October 10, 2013

Is “else if” faster than “switch() case” ? For small loops not required,For very big loop hash table is used for switch case so faster execution. But with good compiler with optimization enabled, it's same. Why i++ is faster than i=i+1 in c older compilers used (1)ADD \\i+1 (2)Assignment operation \i=x for i=i+1 for i++ (1)INR but currently good compilers with optimizations enabled creates same assembly code.

ARM Nesting of Interrupts

- October 06, 2013

NESTING INTERRUPTS Applies to: RealView C Compiler Answer Information in this article applies to: RealView Compiler Version 3.0 or higher QUESTION The classic ARM architecture only provides two interrupts (IRQ and FIQ). The Vectored Interrupt Controller or Advanced Interrupt Controller provides interrupt priorities and interrupt nesting for the standard interrupt, but it requires that you set the I bit in the CPSR. What is the best method to allow interrupt nesting with the RealView compiler? ANSWER It should be noted that good programming technique implies that you keep interrupt functions very short. When you are using short interrupt functions, interrupt nesting becomes unimportant. When you are using an Real-Time Operating System (such as the RTX Kernel), the stack usage of user tasks becomes unpredictable when you allow interrupt nesting. However, if you still need interrupt nesting in your application, you may implement it using an assem...