Posts

Showing posts from October, 2013

How malloc works in userspace

I was checking how malloc work as compared to kernel kmalloc in term of physical memory allocation.What I found as below.. Glibc/C libary implement the malloc implementation, For memory less than 128kb use brk()/sbrk more than that use mmap with  MAP_PRIVATE|MAP_ANONYMOUS for test ran a small program on userspace I wrote void main() {     int *a;     a =(int*)malloc(150*1024);  /* mmap invocation */    //a =(int*)malloc(10); /* brk invocation */ } and ran strace on the o/p, there I can see brk and mmap invocation.What I understood from brk call.It increases the data section (Thr Brk limit) and add the memory to malloc.In mmap anonymous private pages are allocated by kernel and added as one vma section of process address space,From there glibc allocate memory and give it to malloc. more details : Link1 Link2 Link3 bestone__  Best_link

Process Virtual to Physical Translation

Image

Allocating aligned address and freeing them

uintptr_t mask = ~(uintptr_t)(align - 1); void *mem = malloc(1024+align-1); void *ptr = (void *)(((uintptr_t)mem+align-1) & ~mask); ptr is the aligned address. Some time back I faced an interview ,There I was asked to write a custom malloc using malloc and free program for a predefined aligned address.Here is the solution.The byte before the aligned byte is always empty.We will keep the offset of actual memory allocated by malloc. But there we have to allocate total memory =(desired+alignment) instead of (desired+alignment-1) downside is that it can store upto 2pow8 offsets. mallocX(size_t X,alignment Y) { p= malloc(X+Y); ret = (p+Y) & ~(Y-1); *(ret-1) = ret - p; return ret; } similarly for free freeX(memptr mem) { free (mem - *(mem-1)); }

Tracking Linux kworker threads

How to find out which part of kernel/module has created this workqueue. How to track a kworker-thread named for example ''kworker/0:3 to its origin in kernel-space? I found  this thread on lkml  that answers your question a little. (It seems even Linus himself was puzzled as to how to find out the origin of those threads.) Basically, there are two ways of doing this: $ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event $ cat /sys/kernel/debug/tracing/trace_pipe > out.txt (wait a few secs) For this you will need  ftrace  to be compiled in your kernel, and to enable it with: mount -t debugfs nodev /sys/kernel/debug More information on the function tracer facilities of Linux is available in the  ftrace.txt documentation . This will output what threads are all doing, and is useful for tracing multiple small jobs. cat /proc/THE_OFFENDING_KWORKER/stack This will output the stack of a single thread doing a lot of work. ...

Bit reversing tips

Reversing bit pairs unsigned int i, j; // positions of bit sequences to swap unsigned int n; // number of consecutive bits in each sequence unsigned int b; // bits to swap reside in b unsigned int r; // bit-swapped result goes here unsigned int x = ((b >> i) ^ (b >> j)) & ((1U << n) - 1); // XOR temporary r = b ^ ((x << i) | (x << j)); ----------------------------------------------------------- Another standard simple method: unsigned int reverseBits(unsigned int num) {      unsigned int count = sizeof (num) * 8 - 1;      unsigned int reverse_num = num;      num >>= 1;      while (num)      {         reverse_num <<= 1;               reverse_num |= num & 1;         n...

Nice value and priority and relations

The  priority  of a process in linux is dynamic: The longer it runs, the lower its priority will be. A process  runs  when its actually using the CPU - most processes on a typical Linux box just wait for I/O and thus do not count as  running . The priority is taken into account when there are more processes running than CPU cores available: Highest priority wins. But as the winning process looses its proirity over time, other processes will take over the CPU at some point. nice  and  renice  will add/remove some "points" from priority. A process which has a higher  nice value will get lesser CPU time. Root can also set a negative  nice  value - the process gets more CPU time. Example: There are two processes (1 and 2) calculating the halting problem and one CPU core in the system. Default is  nice 0 , so both processes get about half of the CPU time each. Now lets renice process 1 to value 10. Result: Process 2 gets a s...

Forked process, thread and address spaces little deeper

In Fork() The child process has a unique process ID. The child process has a different parent process ID (i.e., the process ID of the parent process). The child process has its own copy of the parent's descriptors. These descriptors reference the same underlying objects, so that, for instance, file pointers in file objects are shared between the child and the parent, so that an lseek(2) on a descriptor in the child process can affect a subsequent read(2) or write(2) by the parent. This descriptor copying is also used by the shell to establish standard input and output for newly created processes as well as to set up pipes.semaphores if opened it also inherit.  Memory mappings created in the parent are retained in the child process,(If  MAP_PRIVATE was used in parent,it will be  MAP_PRIVATE in child to,after forking if any change in memory mapped area that will be visible to corresponding process only) The child process' resource utilizations are set to 0; see setrli...

Global variable ,static always initialized but auto is not initialized

Security : leaving memory alone would leak information from other processes or the kernel. Efficiency : the values are useless until initialized to something, and it's more efficient to zero them in a block with unrolled loops. Reproducibility : leaving the values alone would make program behavior non-repeatable, making bugs really hard to find. Elegance : it's cleaner if programs can start from 0 without having to clutter the code with default initializers. One might wonder why the  auto  storage class does start as garbage. The answer is two-fold: It doesn't, in a sense. The very first stack frame does receive zero values. The "garbage", or "uninitialized" values that subsequent instances at the same stack level see are really the previous values left by the same program. There might be a runtime performance penalty associated with initializing  auto  (function locals) to anything. A function might not use any or all of a large ar...

Speed of execution in term of code in some cases

Is “else if” faster than “switch() case” ? For small loops not required,For very big loop hash table is used for  switch case so faster execution. But with good compiler with optimization enabled, it's same. Why i++ is faster than i=i+1 in c older compilers used  (1)ADD      \\i+1 (2)Assignment operation          \i=x for i=i+1 for i++ (1)INR but currently good compilers with optimizations enabled creates same assembly code.

ARM Nesting of Interrupts

Image
NESTING INTERRUPTS Applies to:   RealView C Compiler Answer Information in this article applies to: RealView Compiler Version 3.0 or higher QUESTION The classic ARM architecture only provides two interrupts (IRQ and FIQ). The Vectored Interrupt Controller or Advanced Interrupt Controller provides interrupt priorities and interrupt nesting for the standard interrupt, but it requires that you set the  I  bit in the CPSR. What is the best method to allow interrupt nesting with the RealView compiler? ANSWER It should be noted that good programming technique implies that you keep interrupt functions very short. When you are using short interrupt functions, interrupt nesting becomes unimportant. When you are using an Real-Time Operating System (such as the RTX Kernel), the stack usage of user tasks becomes unpredictable when you allow interrupt nesting. However, if you still need interrupt nesting in your application, you may implement it using an assem...