Thursday, August 11, 2011

Ruby __END__ and DATA

The __END__ keyword in Ruby causes the parser to stop executing the source file; it is often used for appending documentation such as a LICENSE file to the end of a source code file.

More interesting is the fact that the contents of the file following the __END__ keyword are available via the global IO object named DATA.

This means that it is possible to include test data -- even binary data -- at the end of a Ruby source file:

bash$ cat od.rb
#!/usr/bin/env ruby
if __FILE__ == $0
  offset = 0
  while (buf = DATA.read(16)) do
    bytes = buf.unpack 'C*'
    puts "%08X: %s\n" % [ offset, bytes.map { |b| " %03o" % b } ]
    offset += 16
  end
end
__END__  
bash$ cat /bin/true >> od.rb
bash$ ./od.rb
00000000:  177 105 114 106 002 001 001 000 000 000 000 000 000 000 000 000
00000010:  002 000 076 000 001 000 000 000 220 044 100 000 000 000 000 000
00000020:  100 000 000 000 000 000 000 000 060 226 001 000 000 000 000 000
...

Monday, August 8, 2011

knotify4 uses 100% CPU

This is true, and has been for awhile. It tends to occur when a laptop is suspended while online, then resumed while offline. A nice side effect is that battery life gets reduced by about 75% while offline, which is generally when one wants the longest battery life.

There are plenty of bug reports open on this, but it's pretty clear that the KDE/Kubuntu guys either have no clue how to fix this, or cannot be bothered.

Since knotify4 isn't really all that useful (especially when using E17 for a WM), there is a brutal hack that will effectively silence it:



sudo mv /usr/bin/knotify4  /usr/bin/knotify4.orig
sudo cp /bin/true /usr/bin/knotify4

Needless to say, the original file should be restored before doing an upgrade.

Disabling startup (init.d) services in Ubuntu

Ubuntu never has made obvious what the "Ubuntu way" of removing services from System-V run levels is. The GUI tools in GNOME and KDE are incomplete, and a quick investigation of the run level directories shows that they are filled automatically -- so that symlinks added and removed manually might, in the future, get ignored.

Fortunately, the README in /etc/init.d ends with the following advice:


Use the update-rc.d command to create symbolic links in the /etc/rc?.d
as appropriate. See that man page for more details.

The man page lists the following forms for invoking upgrade-rc.d:                                                        


      update-rc.d [-n] [-f] B name  remove


       update-rc.d [-n] B name  defaults [NN | SS KK]


       update-rc.d  [-n]  name  start|stop  R  NN runlevel  [ runlevel ]... 


       update-rc.d [-n] B name  disable|enable [ S|2|3|4|5 ]

The following command will remove the service collectd from all run levels:


bash$ sudo update-rc.d collectd disable
update-rc.d: warning: collectd start runlevel arguments (none) do not match LSB Default-Start values (2 3 4 5)
update-rc.d: warning: collectd stop runlevel arguments (none) do not match LSB Default-Stop values (0 1 6)
 Disabling system startup links for /etc/init.d/collectd ...
 Removing any system startup links for /etc/init.d/collectd ...
   /etc/rc0.d/K95collectd
   /etc/rc1.d/K95collectd
   /etc/rc2.d/S95collectd
   /etc/rc3.d/S95collectd
   /etc/rc4.d/S95collectd
   /etc/rc5.d/S95collectd
   /etc/rc6.d/K95collectd
 Adding system startup for /etc/init.d/collectd ...
   /etc/rc0.d/K95collectd -> ../init.d/collectd
   /etc/rc1.d/K95collectd -> ../init.d/collectd
   /etc/rc6.d/K95collectd -> ../init.d/collectd
   /etc/rc2.d/K05collectd -> ../init.d/collectd
   /etc/rc3.d/K05collectd -> ../init.d/collectd
   /etc/rc4.d/K05collectd -> ../init.d/collectd
   /etc/rc5.d/K05collectd -> ../init.d/collectd


The command to remove a service from a specific runlevel should be the following:


sudo update-rc.d BASENAME disable `runlevel | cut -d ' ' -f 2`

...however a quick experiment shows that the runlevel argument is ignored:


bash$ sudo update-rc.d collectd disable 2

sudo update-rc.d collectd disable 2
update-rc.d: warning: collectd start runlevel arguments (none) do not match LSB Default-Start values (2 3 4 5)
update-rc.d: warning: collectd stop runlevel arguments (none) do not match LSB Default-Stop values (0 1 6)
 Disabling system startup links for /etc/init.d/collectd ...
 Removing any system startup links for /etc/init.d/collectd ...
   /etc/rc0.d/K95collectd
   /etc/rc1.d/K95collectd
   /etc/rc2.d/S95collectd
   /etc/rc3.d/S95collectd
   /etc/rc4.d/S95collectd
   /etc/rc5.d/S95collectd
   /etc/rc6.d/K95collectd
 Adding system startup for /etc/init.d/collectd ...
   /etc/rc0.d/K95collectd -> ../init.d/collectd
   /etc/rc1.d/K95collectd -> ../init.d/collectd
   /etc/rc6.d/K95collectd -> ../init.d/collectd
   /etc/rc2.d/K05collectd -> ../init.d/collectd
   /etc/rc3.d/K05collectd -> ../init.d/collectd
   /etc/rc4.d/K05collectd -> ../init.d/collectd

Sunday, August 7, 2011

perf-backed disassembly

Since 2.6.31 or thereabouts, the Linux kernel has come with a built-in performance counter known as perf.

The common form of perf is well-known to be useful in gathering performance statistics on a running program:


bash$ perf stat -cv ./a.out 


cache-misses: 11313 2020574449 2020574449
cache-references: 62031796 2020574449 2020574449
branch-misses: 17909 2020574449 2020574449
branches: 606684832 2020574449 2020574449
instructions: 6324531571 2020574449 2020574449
cycles: 6408533747 2020574449 2020574449
page-faults: 304 2019963367 2019963367
CPU-migrations: 7 2019963367 2019963367
context-switches: 205 2019963367 2019963367
task-clock-msecs: 2019963367 2019963367 2019963367


 Performance counter stats for './a.out':

             11313 cache-misses             #      0.006 M/sec
          62031796 cache-references         #     30.709 M/sec
             17909 branch-misses            #      0.003 %    
         606684832 branches                 #    300.344 M/sec
        6324531571 instructions             #      0.987 IPC  
        6408533747 cycles                   #   3172.599 M/sec
               304 page-faults              #      0.000 M/sec
                 7 CPU-migrations           #      0.000 M/sec
               205 context-switches         #      0.000 M/sec
       2019.963367 task-clock-msecs         #      0.996 CPUs 


        2.027948307  seconds time elapsed

The  events to be recorded can be specified with the -e option in order to refine the output:

bash$ perf stat -e cpu-clock -e instructi
ons  

 Performance counter stats for './a.out':

       2026.748812 cpu-clock-msecs         
        6324293589 instructions             #      0.000 IPC  

        2.032519896  seconds time elapsed

A list of available events can be obtained via perf list:

bash$ perf list | head
List of pre-defined events (to be used in -e):

  cpu-cycles OR cycles                       [Hardware event]
  instructions                               [Hardware event]
  cache-references                           [Hardware event]
  cache-misses                               [Hardware event]
  branch-instructions OR branches            [Hardware event]
  branch-misses                              [Hardware event]
  bus-cycles                                 [Hardware event]


The perf toolchain also includes the utility perf top, which can be used to monitor a single process, or which can be used to monitor the kernel:

bash$ sudo perf top 2>/dev/null
-------------------------------------------------------------------------------
   PerfTop:       0 irqs/sec  kernel:-nan%  exact: -nan% [1000Hz cycles],  (all, 4 CPUs)
-------------------------------------------------------------------------------

             samples  pcnt function               DSO
             _______ _____ ______________________ __________________

               77.00 39.3% intel_idle             [kernel.kallsyms] 
               13.00  6.6% __pthread_mutex_unlock libpthread-2.13.so
               13.00  6.6% pthread_mutex_lock     libpthread-2.13.so
               12.00  6.1% __ticket_spin_lock     [kernel.kallsyms] 
                7.00  3.6% schedule               [kernel.kallsyms] 
                6.00  3.1% menu_select            [kernel.kallsyms] 
                6.00  3.1% fget_light             [kernel.kallsyms] 
                6.00  3.1% clear_page_c           [kernel.kallsyms] 


Where things start to get interesting, however, is with perf record. This utility is generally used along with perf report to record the performance counters of a process, and review them later.

This can be used, for example, to generate a call graph:

bash$  perf record -g -o /tmp/a.out.perf ./a.out
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.148 MB /tmp/a.out.perf (~6461 samples) ]
bash$ perf report -g -i /tmp/a.out.perf
# Events: 1K cycles
#
# Overhead        Command  Shared Object  Symbol
# ........  .............  .............  ......
#
    99.90%          a.out  a.out          [.] main
            |
            --- main
                __libc_start_main

     0.10%          a.out  [l2cap]        [k] 0xffffffff8103804a
            |
            --- 0xffffffff8105f438
                0xffffffff8105f675
...


Once perf data has been recorded, the perf annotate utility can be used to display a disassembly of the instructions that were executed:

bash$  perf annotate -i /tmp/a.out.perf |more

------------------------------------------------
 Percent |      Source code & Disassembly of a.out
------------------------------------------------
         :
         :
         :
         :      Disassembly of section .text:
         :
         :      0000000000400554
:


    0.00 :  400554:       55                      push   %rbp
    0.00 :  400555:       48 89 e5                mov    %rsp,%rbp
    0.00 :  400558:       48 81 ec 30 00 0c 00    sub    $0xc0030,%rsp
    0.00 :  40055f:       48 8d 85 d0 ff fb ff    lea    -0x40030(%rbp),%rax
    0.00 :  400566:       ba 00 00 04 00          mov    $0x40000,%edx
    0.00 :  40056b:       be 00 00 00 00          mov    $0x0,%esi
    0.00 :  400570:       48 89 c7                mov    %rax,%rdi
    0.00 :  400573:       e8 b0 fe ff ff          callq  400428 <memset@plt>
    0.00 :  400578:       c7 45 fc 00 00 00 04    movl   $0x4000000,-0x4(%rbp)
    ...

    4.21 :  4006a5:       8b 45 d0                mov    -0x30(%rbp),%eax
   15.54 :  4006a8:       83 c0 01                add    $0x1,%eax
    4.97 :  4006ab:       89 45 d0                mov    %eax,-0x30(%rbp)
    4.87 :  4006ae:       8b 45 d0                mov    -0x30(%rbp),%eax
   17.79 :  4006b1:       83 c0 01                add    $0x1,%eax
    4.36 :  4006b4:       89 45 d0                mov    %eax,-0x30(%rbp)
    4.72 :  4006b7:       48 83 45 f0 01          addq   $0x1,-0x10(%rbp)
    0.00 :  4006bc:       48 8b 45 f0             mov    -0x10(%rbp),%rax
    ...

As to be expected from Torvalds and company, the utilities include a number of options for generating parser-friendly output, limiting reporting to specified events and symbols, and so forth. Check the man pages for details.