NOTE - THIS PAGE IS UNDER REVISION PRACTICAL ASSIGNMENT – NCBI C Toolkit
Processing Arguments: Strings, Memory, Variables, Pointers, Casting and Linked Lists.
RESOURCES: NCBI C Toolkit Cross Reference http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/ident/
NCBI C toolkit Documentation (Old…) http://www.ncbi.nlm.nih.gov/IEB/ToolBox/SDKDOCS/INDEX.HTML The application will utilize a number of C functions, variable structures and memory constructs like Linked Lists.
Make sure you have downloaded and compiled the NCBI C toolkit. Ensure you can compile and run the test application from the previous lecture notes. Ensure that you have defined the $NCBI environment variable. Obtain a fresh copy of the readseq code from the course website attachments. Previous Instructions: Extract readseq.tar.gz to a separate directory make -f make.readseq Copy formatdb from $NCBI/build to your directory Format the pdbaa.faa database of PDB amino acid sequences with the command: formatdb -t PDB -i pdbaa.faa -o T
Run the newly compiled readseq application and see what happens.
PART A. Prepare a new library and program. Task 1 - Create a linkable library out of the readeseqs.c code:Make a separate library copy of the C code:
Create a matching header file:
Make your functions “Library” portable for either static or dynamic linking.
Question – how does the LIBCALL macro make your library portable?
Task 2 - Create a simple program streamseqs that uses the libseqfast.a library to echo all the sequences out of the database.
Question - Which program is larger - readseqs or streamseqs?
PART B Create a simple program fetchseqs that processes command-line parameters with GetArgs
Look up “Getting Program Arguments” in the CoreLib part of the NCBI C Toolit Documentation web site. Search for “GetArgs” and “Args” in the C Toolkit Cross Reference. Look at a number of instances of programs in the /demo directory that use arguments for examples. Modify fetchseqs.c so that it handles arguments as follows:
#define NUMARGS 7 static Args myargs[NUMARGS] = { /*0*/ { "Single GI number", "0", NULL, NULL, TRUE, 'g', ARG_INT, 0.0, 0, NULL}, /*1*/ { "Input File List of GI or Accessions, one per line", "stdin", NULL, NULL, FALSE, 'i', ARG_FILE_IN, 0.0, 0, NULL}, /*2*/ { "Output File", "stdout", NULL, NULL, FALSE, 'o', ARG_FILE_OUT, 0.0, 0, NULL}, /*3*/ { "Single Accession Code", "NULL", NULL, NULL, TRUE, 'a', ARG_STRING, 0.0, 0, NULL}, /*4*/ { "Quiet Mode (T/F)", "F", NULL, NULL, TRUE, 'q', ARG_BOOLEAN, 0.0, 0, NULL}, /*5*/ { "Database To Use", "nr", NULL, NULL, TRUE, 'd', ARG_DATA_IN, 0.0, 0, NULL}, /*6*/ { "Report ONLY \n\t(0) FASTA Files\n\t\(1) FASTA Definition Lines \n\t(2) Accession Codes \n\t(3) GI numbers\n\t", "0", NULL, NULL, TRUE, 'r', ARG_INT, 0.0, 0, NULL} };
FetchSeqs arguments:
FYI These are the variables within the Args structure (this is declared elsewhere - do not put into your program!) typedef struct mainargs {
const char *prompt; /* prompt for field */ const char *defaultvalue; /* default */ char *from; /* range or datalink type */ char *to; Nlm_Boolean optional; /* is the arg optional? */ Nlm_Char tag; /* argument on command line */ Nlm_Int1 type; /* type of value */ Nlm_FloatHi floatvalue; /* result goes here float */ Nlm_Int4 intvalue; /* result goes here int */ CharPtr strvalue; /* result goes here string */ } Nlm_Arg, * Nlm_ArgPtr;
PART C. Change fetchseqs to perform the required command-line operations and fetch and output information from the database.
Task 1 – Add an output function that writes to a stream a sequence given a single GI number.This is the output function: Boolean WriteSequenceToStream(FILE *fpStream, ReadDBFILEPtr rdbfp, Int4 Gi) { CharPtr pcSequence = NULL; if(fpStream==NULL) { ErrPostEx(SEV_ERROR,0,0,"WriteSequenceToStream: Passed fpStream was NULL."); return(FALSE); } if(rdbfp==NULL) { ErrPostEx(SEV_ERROR,0,0,"WriteSequenceToStream: Passed ReadDBFILEPtr was NULL."); return(FALSE); } pcSequence = GetSeqByGI(rdbfp, Gi); /* can change this to break FASTA up into readable length lines according to Unix or MS-DOS standard formats.. */ /* for now this just prints out one long string without any breaks in it */ fprintf(fpStream, "%s\n", pcSequence); MemFree(pcSequence); return(TRUE); } Your Main() must be altered from the previous version to do the following:
/* this test returns 0=FALSE if they match */ f = stdout; } else { f = FileOpen(myargs[arg_output_filename].strvalue, "w"); if (f == NULL) { printf("Unable to open output stream %s\n", myargs[arg_output_filename].strvalue); return 2; /* you will hit this error if you don't have permissions to write in an intended directory */ } }
/* This library function reports mislabeled file errors by itself */
/* At this stage the program can retrieve ONE sequence by GI number */
Attached below is the working example of the code to this point is fetchseqs2.c Task 2 – Add a function that opens the input file (if specified) and reads GI numbers one line at a time, building a ValNode linked list.
ValNodePtr ProcessInputFile(FILE *fpStream) { ValNodePtr pvnHead = NULL; Char pcBuf[100]; CharPtr pcTest = NULL; static long iGI = 0; do { pcBuf[0] = '\0'; pcTest = fgets(pcBuf, (size_t) 100, fpStream); if (pcTest != NULL) { /* we assume that no input file error checking is needed - but may change later to recognize Accession Numbers */ sscanf(pcBuf, "%ld", &iGI); ValNodeAddInt(&pvnHead, 0, (Int4) iGI); /* This adds a link onto the linked list */ } } while (pcTest != NULL); return pvnHead; }
if (fin == NULL) { fprintf(f,"Unable to open input stream %s\n", myargs[arg_input_filename].strvalue); goto bad_exit; /* you will hit this error if you specify the wrong file name or a missing file */ } pvnList = ProcessInputFile(fin); /* Parses out GIs, one per line, and makes a linked list */ FileClose(fin);
while (pvnHere != NULL) { WriteSequenceToStream(f, rdbfp, (Int4) pvnHere->data.intvalue); pvnHere = pvnHere->next; }
ValNodeFree(pvnList); /* free the linked list if any... */ Attached below is the working example of the code to this point is fetchseqs3.c Task 3. Create Functions that write Deflines or PDB codes to the output stream, then call them depending on the -r (Report) parameter value.
This handles the case of the -r flag for the single GI passed in with -g: if (myargs[arg_gi_number].intvalue > 0) { /* single GI value as argument - which kind of output? */ if ((int) myargs[arg_report].intvalue == report_fasta) WriteSequenceToStream(f, rdbfp, myargs[arg_gi_number].intvalue); else if ((int) myargs[arg_report].intvalue == report_defline) WriteDeflineToStream(f, rdbfp, myargs[arg_gi_number].intvalue); else if ((int) myargs[arg_report].intvalue == report_acc) WritePDBCodeToStream(f, rdbfp, myargs[arg_gi_number].intvalue); goto done; } This handles the -r flag for the ValNode list of GIs pass in as a file: pvnHere = pvnList;
|
Christopher Hogue's Research > RCE in Mechanobiology Advanced Bioinformatics Software Development Workshop >