pthread debugging

pthreads, gdb
general debugging advice for pthread programs
Debugging threaded programs can be tricky because there are multiple streams of execution. Here are a few things to try to make debugging easier:
gdb and pthreads
gdb has support for debugging threaded programs, including setting breakpoints for some threads and examining threads' stacks. Below are presented some basics for debugging threaded programs in gdb. More information about gdb and pthreads can be found here: gdb and pthreads

gdb has support for debugging threaded programs. One thing to keep in mind as you debug pthreaded programs on our system, is that there are at least three different identifiers for the same thread as you run it in gdb:

  1. the pthread library's id for the thread (its pthread_t value)
  2. the operating systems id for the thread (its LWP id value). This is used in part for the OS to keep track of this thread for scheduling purposes.
  3. the gdb id for the thread: this is the id you should use when specify gdb commands for a single thread.
The correspondence between the threads can differ from one OS and pthread library implementation to another, but on our systems there is a one-to-one-to-one correspondence between a pthread id, an LWP id, and a gdb thread id.

A few gdb thread-specific commands:

  set print thread-events   # prints out thread start and exit events
  info threads              # list all existing threads in program 
                            # the gdb threadno is the first value listed
                            # the thread that hit the break point is *'ed 
  thread threadno           # switch to thread threadno's context
                            # (see its stack when type where, for example)
  break [where] thread [threadno] # set a breakpoint at [where] just for 
                                  # thread threadno
  # apply the gdb command to all or a subset of threads
  thread apply [threadno|all] command  
Bascially, in gdb you use the following prefix to a gdb command to apply a particular gdb command to all or just a subset of threads (ex. 2-5) (using its gdb thread id):
thread apply [thread_id | all]  command
This doesn't seem to work for setting breakpoints on a single thread, so use the other way:
break line_no thread thread_no

The default behavior of gdb when a thread hits a breakpoint is that all threads are suspended whereever they happen to be until the user types cont. You can change this default behavior to have threads who are not at a breakpoint continue executing while you debug the ones that hit their breakpoints (but it is hard to think of scenarios where doing this would make debugging easier, so I'd say probably stick with the default).


Setting Breakpoint for single thread

Here is example run of a multithreaded executable named "racecond". In this example a breakpoint for all threads is set at the begining of the thread's main function (named worker_loop), and then a breakpoint is set at line 76 in worker_loop just for thread 3 REMEMBER gdb's thread number 3 may not correspond to a logical thread number in your program (i.e. myid may not be 3 for gdb thread 3).
$ gdb ./racecond
(gdb) delete all
(gdb) break worker_loop            # set breakpoint for all threads
(gdb) run 5                        # start racecond with command line arg 5
(gdb) info threads
(gdb) break 76 thread 3     # set's the breakpoint just for thread 3
(gdb) display myid
(gdb) cont 

Longer example run

Here is some more output from using gdb on the racecond program that shows how to use some of the thread commands and what their output might look like:
% gdb ./racecond
(gdb) set print thread-events on
(gdb) run 5

Starting program: ./racecond 5
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/".

[New Thread 0x7ffff77fd700 (LWP 17471)]
hello I'm thread 0 with pthread_id 140737345738496
# LWP 17471: means Light Weight Process with id number 17471: 
# an LWP is a thread the OS knows about, 17471 is the OS's id number for
# the thread 140737345738496 is the pthread library's id number for the thread 

[New Thread 0x7ffff6ffc700 (LWP 17472)]
hello I'm thread 1 with pthread_id 140737337345792
[New Thread 0x7ffff67fa700 (LWP 17473)]
hello I'm thread 2 with pthread_id 140737328948992
[New Thread 0x7ffff5ff9700 (LWP 17474)]
hello I'm thread 3 with pthread_id 140737320556288
[New Thread 0x7ffff57f8700 (LWP 17475)]
hello I'm thread 4 with pthread_id 140737312163584
[Thread 0x7ffff6ffc700 (LWP 17472) exited]
[Thread 0x7ffff77fd700 (LWP 17471) exited]
[Thread 0x7ffff67fa700 (LWP 17473) exited]
[Thread 0x7ffff57f8700 (LWP 17475) exited]
count = 141335712
[Thread 0x7ffff5ff9700 (LWP 17474) exited]
[Inferior 1 (process 17451) exited normally]

(gdb) break worker_loop
(gdb) run 3

(gdb) break 76     # sets the breakpoint for every thread 

Breakpoint 2, worker_loop (arg=0x602030) at racecond.c:76
76	      count += i; 

(gdb) info threads  (the star'ed one is active)
  Id   Target Id         Frame 
  4    Thread 0x7ffff67fa700 (LWP 17587) "racecond" worker_loop (arg=0x602038)
    at racecond.c:68
  3    Thread 0x7ffff6ffc700 (LWP 17549) "racecond" __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
* 2    Thread 0x7ffff77fd700 (LWP 17548) "racecond" worker_loop (arg=0x602030)
    at racecond.c:76
  1    Thread 0x7ffff7fcd700 (LWP 17539) "racecond" 0x00007ffff7bc6148 in 
    pthread_join (threadid=140737345738496, thread_return=0x0) at pthread_join.c:89

# thread 2 is the current thread, where will show thread 2's stack trace:
(gdb) where
#0  worker_loop (arg=0x602030) at racecond.c:76
#1  0x00007ffff7bc4e9a in start_thread (arg=0x7ffff77fd700)
    at pthread_create.c:308
#2  0x00007ffff78f1dbd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#3  0x0000000000000000 in ?? ()

# switch to thread three's context
(gdb) thread 3
[Switching to thread 3 (Thread 0x7ffff6ffc700 (LWP 17549))]
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
132	../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.

# get thread 3's stack trace
(gdb) where
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
#1  0x00007ffff7bc7065 in _L_lock_858 ()
   from /lib/x86_64-linux-gnu/
#2  0x00007ffff7bc6eba in __pthread_mutex_lock (mutex=0x6010c0)
    at pthread_mutex_lock.c:61
#3  0x0000000000400aa2 in worker_loop (arg=0x602034) at racecond.c:75
#4  0x00007ffff7bc4e9a in start_thread (arg=0x7ffff6ffc700)
    at pthread_create.c:308
#5  0x00007ffff78f1dbd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

# move into stack frame 3 of thread 3
(gdb) frame 3
#3  0x0000000000400aa2 in worker_loop (arg=0x602034) at racecond.c:75
75	      pthread_mutex_lock(&my_mutex);

(gdb) print my_mutex