Friday 26 October 2012

Solution to problem of module getting marked as [permanent]

Lot of times you will see your own module has been marked as permanent. 

$ lsmod
 Module Size Used by
 hello 78567 0 [permanent] 


      These module can't be removed unless system is rebooted. You get messages like following when you try to remove/rmmod them.

ERROR: Module hello is in use by [permanent]
or
ERROR: Removing 'hello': Device or resource busy

      Solution is quite simple for this problem. Recompile you hello.ko module with -DCC_HAVE_ASM_GOTO flag. Problem is struct module layout has dependency on HAVE_JUMP_LABEL => CC_HAVE_ASM_GOTO => gcc-goto.sh script => gcc version being used. 
      When there is a mismatch, the module exit callback (destructor) gets value of NULL which results in module being marked as permanent. This causes mod->exit in following code snippet of kernel/module.c to become null and because CONFIG_MODULE_FORCE_UNLOAD is not set in config file -EBUSY is returned. 

/* If it has an init func, it must have an exit func to unload */
if (mod->init && !mod->exit) {
     forced = try_force_unload(flags);
     if (!forced) {
         /* This module can't be removed */
         ret = -EBUSY;
         goto out;
     }
 } 


To find out exactly which Linux platforms needs this flag run 

sh scripts/gcc-goto.sh gcc 

command in kernel header directory. If you get output as "y" then you must include this flag in any module compilation 

Tuesday 23 October 2012

Linux kernel crash in apply_alternatives function

If anyone hits kernel crash while loading any module and stack similar to something like this, 

 [<ffffffff810b90c6>] ? crash_kexec+0x66/0x110 
 [<ffffffff810121b8>] ? apply_alternatives+0x328/0x3b0 
 [<ffffffff814f1410>] ? oops_end+0xc0/0x100 
 [<ffffffff8100f2bb>] ? die+0x5b/0x90 
 [<ffffffff814f0d04>] ? do_trap+0xc4/0x160 
 [<ffffffff8100ce75>] ? do_invalid_op+0x95/0xb0 
 [<ffffffff810121b8>] ? apply_alternatives+0x328/0x3b0 
 [<ffffffff8126c0b0>] ? idr_get_empty_slot+0x110/0x2c0 
 [<ffffffff81133229>] ? zone_statistics+0x99/0xc0 
 [<ffffffff8100bf1b>] ? invalid_op+0x1b/0x20 
 [<ffffffff810121b8>] ? apply_alternatives+0x328/0x3b0 
 [<ffffffff81096a5f>] ? up+0x2f/0x50 
 [<ffffffff8106a36f>] ? release_console_sem+0x1cf/0x220 
 [<ffffffff810aca32>] ? each_symbol+0xa2/0x1f0 
 [<ffffffff8106a931>] ? vprintk+0x1d1/0x4f0 
 [<ffffffff814ed360>] ? printk+0x41/0x49 
 [<ffffffff810339ac>] ? module_finalize+0x10c/0x1b0 
 [<ffffffff810af1a2>] ? load_module+0x17c2/0x1ca0 




Then it is likely that your module is missing .rheldata ELF segment. Confirm with following command
objdump -h module.ko
whether module has this segment or not. To add this ELF segment to your .ko you must compile your module with modpost. This issue is generally seen on RHEL 6 but I have hit this on other Linux distributions also.  
Thanks,
Pritam

Linux source compilation error

Many times when you start compiling Linux source code you come across following error,


[pritam@pritam-pc 2.6.32-220.7.1.el6.x86_64]$ sudo make
  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  SYMLINK include/asm -> include/asm-x86
make[1]: *** No rule to make target `missing-syscalls'.  Stop.
make: *** [prepare0] Error 2


    It is very likely that you are compiling set of header files when you hit this. To cross check do this,


[pritam@pritam-pc 2.6.32-220.7.1.el6.x86_64]$ find . -name *.h | wc -l   4606
[pritam@pritam-pc 2.6.32-220.7.1.el6.x86_64]$ find . -name *.c | wc -l
55


This is output of commands where compilation on headers files is tried. Second command has given only 55 files but it should have been much more (around 4500)

    Also check Makefile if it is pointing to correct source directory. Remember devel/header rpms get installed into /usr/src/kernels on some Linux distributions so this path will actually have header files only and result of this will be ABOVE error.  

Thanks,
Pritam
`