CVS and GDB for the NCBI Toolkit

28 November 2009

CVS - Concurrent Versioning System

"CVS (Concurrent Versions System) is a version control system that can
record the history of your files (usually, but not always, source
code). CVS only stores the differences between versions, instead of
every version of every file you have ever created. CVS also keeps a log
of who, when, and why changes occurred.

CVS is very helpful for managing releases and controlling the
concurrent editing of source files among multiple authors. Instead of
providing version control for a collection of files in a single
directory, CVS provides version control for a hierarchical collection
of directories consisting of revision controlled files. These
directories and files can then be combined together to form a software
release." from

Other CVS Documentation
Version Management with CVS by Per Cederqvist

Want to use CVS to check out the NCBI C toolkit directly from NCBI?
See the NCBI C++ Toolkit manual

Want to set up a CVS server with the NCBI toolkit?

Installing CVS on Centos (client)

as root:
yum install cvs.i386

man cvs


cvs --help

CVS Source Code Retrieval for NCBI Toolkit

Public Read-only Access (cannot check-in)

  • The $CVSROOT environment variable should be set to:

HOW?  On Centos edit the .bash_profile file in your home directory
Near your export NCBI line, add the line

then type the command
source .bash_profile

  • Use empty password to login:
    > cvs login
    Logging in to
    CVS password: <just press ENTER here>

  • cd ~ (your local directory)
  • cvs checkout ncbi


checkout: fetch a working copy of repository code / directories 

cvs --help checkout

Lists checkout options

import: create a new project / directory in the repository

cvs --help import

add: new files in the project / directory

cvs --help add

update: fetch changes of other developers since checkout

cvs --help update

commit (check-in): Send your changed files to the repository.

This puts your changes back in the repository for other users.

cvs --help commit

cvs commit -m "Added new library code seqfast"

Source Code Inspection commands...

log: Changes to the file: Who has change what, when and how

cvs -- help log

diff: Show the differences between two versions of files

cvs --help diff

NOTE these are better done with web-based cvs repository browsers:

ViewVC - a Web Based CVS or Subversion Browser

Graphical CVS Clients on Windows/Mac/Linux:

Setting Environment Variables on Windows
(from NCBI C++ tk manual)

  • Create environment variable CVSROOT:

    • Click the right mouse button on the icon of your PC "My Computer" (it is usually situated in the upper left corner of the desktop), and then select "Properties" from the pop-up menu.

    • Form "System Properties" shows up. Here, choose tab "Advanced" and then press the button "Environment Variables" (users of older NT systems may instead want to choose tab "Environment"). Locate the part of the window titled "User Variables for Yourname", and then click at the end of the list the line containing variable TEMP.

    • Press button "New...".

    • Now, type CVSROOT in the text field "Variable Name", then type   in the text field "Variable Value". 

    • Press the button "OK" (or "Set"). The new variable CVSROOT and its value should appear in the pane "User Variables for Yourname ".

    • Apply the changes pressing "OK", "Apply", etc buttons until all popup windows open in the previous steps closed.

    • Logout, then login to your PC again.

GDB - Gnu Debugger

"GDB, the GNU debugger, allows you to debug programs written in C, C++, Java, and other languages, by executing them in a controlled fashion and printing their data."

Manual - Debugging with GDB

GDB is powerful software with many commands.

Three of the most often used debugger commands are:

1a.  Stopping execution at a specific line of code (or even fancier, when a specific condition ocurrs).

This is called a breakpoint.
Look at the line number in your executable - e.g. to stop at line 140.
You can pass this into gdb on the command line.

1b. Stopping execution where something goes wrong.

Either you did this already and you have code with a Segmentation Fault or other run-time error, or you can intentionally do this with something very very bad like:
int bad_array[5];
printf ("%s",bad_array[8]);
 /* this is out of bounds for the array - C won't warn you */
/* and printf will not like to print this uninitialied int as a string - this will either print out garbage or create a classic Segmentation Fault */

2.  Printing variable values.

Once your program has stopped (or while running), you can see inside variables! Just know which one you want to print out - e.g. myargs[0].intvalue so you can pass it into gdb on the command line.
Even fancier, you can change variable values while the program has stopped and see what happens.

3.  Figure out what functions have been called already (stack trace).

This tells you which functions have been called (including library functions) before your program stopped (either by a breakpoint or by some run-time error).

STEP 1. Installing GDB on Centos

as root:
yum install gdb.i386

STEP 2. Setting up your makefiles for debugging.

In order for GDB to work, you have to compile code with an optional flag gcc -g

This embedds information inside the executable so that the debugger knows what source code line number each bit of executable code comes from, and what the variable names are.  This makes the debug versions of your code larger, too.

Since all your compiling is controlled with make files, you must first change the make files.  For this example we will use the fetchseqs.c and libseqfast.a in debug mode. 

Edit the two makefiles: make.fetchseqs and make.seqfast.
Find the line:
 and uncomment it.
and save both makefiles - they are now set to compile in debug mode.
NOTE this also removes the OPTFLAG -03 which turns off built-in compiler optimizations. If your program works in debug mode suddenly (no bug) it is usually because of this change.  You may have code that does not optimize properly.

Remove all the *.o and *.a files from your fetchseqs directory so that the compiler will be forced to replace them with debug versions.
Compile with:
make -f make.seqfast
make -f make.fetchseqs

You should see the -g option in the compiler output.

Now you should have a new, larger version of fetchseqs, with debug information embedded within.  Now we can use gdb.

NOTE only the code compiled with -g is visible to the debugger.  If the bug is in some other library - you cannot step through it line by line. In other words, you have to recompile any library you want to debug, otherwise they will be "silent".  That is why we have recompiled both the source code and library in this case.

STEP 3. Start the Debugger
Start gdb from the command line passing it the name of your executable.

> gdb fetchseqs
GNU gdb Fedora (6.8-37.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...

The (gdb) prompt offers a multitude of commands.
You can type help and it will give you a list of secondary commands

(gdb) help
List of classes of commands:

aliases -- Aliases of other commands
breakpoints -- Making program stop at certain points
data -- Examining data
files -- Specifying and examining files
internals -- Maintenance commands
obscure -- Obscure features
running -- Running the program
stack -- Examining the stack
status -- Status inquiries
support -- Support facilities
tracepoints -- Tracing of program execution without stopping the program
user-defined -- User-defined commands

Type "help" followed by a class name for a list of commands in that class.
Type "help all" for the list of all commands.
Type "help" followed by command name for full documentation.
Type "apropos word" to search for commands related to "word".
Command name abbreviations are allowed if unambiguous.

Try this:
(gdb) run -

NOTE: The argument passed to fetchseqs is now "-"
To lock in arguments, you can use the command set args

Here is what you should see - the program running and printing out the argument list.

Starting program: /home/chogue/readseqs/fetchseqs -
[Thread debugging using libthread_db enabled]
[New Thread 0xb7fdd6c0 (LWP 27924)]
FetchSeqs   arguments:
  -g  Single GI number [Integer]  Optional
    default = 0
  -i  Input File List of GI or Accessions, one per line [File In]
    default = NULL
  -o  Output File [File Out]
    default = stdout
  -a  Single Accession Code [String]  Optional
    default = NULL
  -q  Quiet Mode (T/F) [T/F]  Optional
    default = F
  -d  Database To Use [Data In]  Optional
    default = pdbaa.faa
  -r  Report ONLY
        (0) FASTA Files
        (1) FASTA Definition Lines
        (2) Accession Codes
        (3) GI numbers
         [Integer]  Optional
    default = 0
Program exited with code 01.

OK, so far no bugs...

SO - The three most useful gdb commands are:

1. run -args-

Runs the program (to completion if no bug, breakpoint or condition).
Pass the arguments here after the run command!

2. break myprog.c:140

Sets the breakpoint to line 140 in file myprog.c making the run command stop there.  Shorhand is just the letter b

3. where

Prints out the stack trace once the program is stopped.

Other commands:

continue or 'c' Proceeds to next breakpoint or to the end of the program

print -var-
Prints the variable in current scope. Shorthand is letter p. Examples:
    print i
    print a[3]
    p myargs[0].intvalue

next Executes the next command in the program

Executes the next function (step moves more than next)

Try the following:
RUN the program WITH a BOGUS GI as argument,

Set a BREAKPOINT at line where GetArgs is called.
Look up the line number in fetchseqs.c

RUN it again with the BOGUS GI.
Print the value of the GI in the variable myargs[0].intvalue

(gdb) run -g 1234
Starting program: /home/chogue/readseqs/fetchseqs -g 1234
[Thread debugging using libthread_db enabled]
[New Thread 0xb7efa6c0 (LWP 27943)]
[fetchseqs] ERROR: GetSeqByGI: GI was not found in database.
Program exited normally.
(gdb) b fetchseqs.c:[USE GETARGS LINE NUMBER HERE]
Breakpoint 1 at 0x8049d11: file fetchseqs.c, line 145.
(gdb) run -g 1234
Starting program: /home/chogue/readseqs/fetchseqs -g 1234
[Thread debugging using libthread_db enabled]
[New Thread 0xb7f456c0 (LWP 27945)]
Breakpoint 1, Nlm_Main () at fetchseqs.c:145
145                 f = stdout;
(gdb) print myargs[0].intvalue
$1 = 1234

STEP 4.  Try a BUGGY version of fetchseqs.c

Download the attached file fetchseqs_bug.c

Rename the good version of fetchseqs.c
>mv fetchseqs.c fetchseqs_good.c

Compare the two version with the Unix "diff" command, so you can see what line numbers are different. 
>diff fetchseqs_bug.c fetchseqs_good.c
<         int Bad_array[5]; /* THIS IS the bug */
<         printf("%s",Bad_array[8]);
<  /* This will SEG FAULT -  consequence of randomly assigning some unitialized block of memory to a string-handling statement */

Good idea to set your break point here, at line 145.

Copy the buggy version to fetchseqs.c and compile it with the debug version of the make.fetchseqs makefile.

Try running this version with no arguments - it should report a Segmentation fault error (or something like that) because the error we put in is at the very start of the code.

Then try running it under gdb. 
Once the code stops try the where command and you will see the function trace (like the bottom of the session below).

[chogue@localhost readseqs]$ ./fetchseqs
Segmentation fault
[chogue@localhost readseqs]$ gdb fetchseqs
GNU gdb Fedora (6.8-37.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...
(gdb) run
Starting program: /home/chogue/readseqs/fetchseqs
[Thread debugging using libthread_db enabled]
[New Thread 0xb7f3f6c0 (LWP 27990)]

Program received signal SIGSEGV, Segmentation fault.
0x008c1fab in strlen () from /lib/
(gdb) where
#0  0x008c1fab in strlen () from /lib/
#1  0x008941ff in vfprintf () from /lib/
#2  0x008999c3 in printf () from /lib/
#3  0x08049d0b in Nlm_Main () at fetchseqs.c:145
#4  0x082f6908 in main ()

What does this mean?  The standard C library function strlen() died. WHY? It could not find the '\0' terminator at the end of the string you told it to print.  That is because it was handed unitialized memory (whoops!)  This is a very common problem in C code.

STEP 5.  Try debugging the good version, stepping through it, listing the code around each step, printing variable values.

For ordinary C code you can set a breakpoint at main with

(gdb) b main

For NCBI Toolkit code you need to use this:
(gdb) b Nlm_Main

Here is a session - note the list (l), next (n) and step (s) commands.

[chogue@localhost readseqs]$ gdb fetchseqs
GNU gdb Fedora (6.8-37.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...
(gdb) b Nlm_Main
Breakpoint 1 at 0x8049ca5: file fetchseqs.c, line 128.
(gdb) run -g 1234
Starting program: /home/chogue/readseqs/fetchseqs -g 1234
[Thread debugging using libthread_db enabled]
[New Thread 0xb7ff36c0 (LWP 28082)]

Breakpoint 1, Nlm_Main () at fetchseqs.c:128
128             ReadDBFILEPtr rdbfp=NULL;
(gdb) l
123     }
126     Int2 Main(void)
127     {
128             ReadDBFILEPtr rdbfp=NULL;
129             CharPtr seq=NULL;
130             Boolean start=TRUE;
131             FILE *f;
132             FILE *fin;
(gdb) n
129             CharPtr seq=NULL;
(gdb) n
130             Boolean start=TRUE;
(gdb) n
133             Int4 gi=0;
(gdb) n
134             ValNodePtr pvnList = NULL; /* This linked list will hold the GI numbers parsed from the input file */
(gdb) n
135             ValNodePtr pvnHere = NULL;
(gdb) n
138             if ( !GetArgs("FetchSeqs", NUMARGS, myargs) ) return 1;
(gdb) p myargs[0].intvalue
$1 = 0
(gdb) n
144             if (!StringCmp(myargs[arg_output_filename].strvalue,"stdout")) { /* this test returns 0=FALSE if they match */
(gdb) p myargs[0].intvalue
$2 = 1234
(gdb) s
145                 f = stdout;
(gdb) s
159             rdbfp=OpenProteinFastaDB(myargs[arg_database].strvalue);  /* This library function reports mislabeled file errors by itself */
(gdb) continue
(gdb) quit

Final thoughts.

GDB supports many languages
C and C++

Yes you can run software in reverse.
set exec-direction reverse

to go forwards again
set exec-direction forward


When you compile with -gp    the gnu profiler can tell you how much time your code spends in each function. Use this to find time-wasting problems in your code.
Christopher Hogue,
Nov 28, 2009, 2:13 AM