BIRLA INSTITUTE OF SCIENTIFIC RESEARCH, JAIPUR

Summer Training May-July, 2010



Day1

Go Back

Accessing Bioinformatics Resources and Databases

Objective

The objective of this exercise is to make student aware of the biological information and databases available over internet and to download the desired information from these databases. There are numerous websites available which deals with Bioinformatics related information, but there are limited dedicated servers which offer different services to the scientific community, such servers are referred to as "metaserver". The major information repositories include NCBI, EBI, EXPASY, DDBJ etc. In this exercise we will access NCBI website for different information and resources.

Lab 1: Searching the literature

For the present exercise we are taking "Pyruvate kinase" as research material. The purpose of this exercise is to search literature for research from PubMed, a service of National Library of Medicine, contains over 1700 millions records in an abstract form and also links to the journals, if available online.

Steps:

  • Go to the NCBI home page by clicking on this link
    (http://www.ncbi.nlm.nih.gov)

  • Select the "PubMed" database from the scroll down list of databases. In the blank space provided after "for" type "Pyruvate kinase". Now click at "Go".

  • The result page contains 8244 records out of which only 20 sequences are visible (this no. may vary as database is being update regularly). You can modify this number by selecting the desired option from "show" field mentioned above in the web page. In order to look at abstract of article, select the abstract from "display" option.

  • In order to download the abstract at your PC, select checkbox of the abstract of your interest and then from the display field select "file" options from "send to" field. A dialogue box will appear asking for saving the file. Click ok.

Lab 2: Searching for Nucleotide and Protein Sequence

In this exercise we will fetch the nucleotide(s) and protein sequence(s) of "Pyruvate kinase" from NCBI.

Steps:

  • Go to the NCBI home page by clicking on this link
    (http://www.ncbi.nlm.nih.gov)

  • From the database options select the "Nucleotide" or "Protein" database and in the query space type "Pyruvate kinase". Now click at "Go".

  • Similar options are available as discussed above. But here we will download the sequence in fasta format. So select the checkbox of the specific record and from the display field select "fasta" option. It will convert all records from summary to fasta files. Now you can download the files at your PC in a folder by selecting the "file" option from "send to" field.

  • For proteins it will show 4304 items and for nucleotide, 3169 items or records.

  • You can restrict your search according to organism. For eg. If you are interested in only human sequences, then in query field type Pyruvate kinase AND human [organism]. Now it will filter only human pyruvate kinase sequences. So it will show only 88 sequences.

  • The entrez query engine accepts the "boolean search", means the use of AND, OR and NOT operators within a query word.

  • You can limit your search according to Fields, organism, gene location dates etc., go to limit option for further detail.

Assignment:

  • Search the following nucleotide and protein sequences

    • Prohibitin from Drosophila melanogaster, (hint: 47 proteins)

    • insulin from human (hint; 2232 proteins)

  • Search other resources available at NCBI such as genomes, taxonomy, books etc.

Lab 3: Working with ORFs - Discovering Genes

Steps:

  • Retrieve the DNA fragment with Accession no. NM_001743 ( the current version 4.)

  • Go to
    http://www.ncbi.nlm.nih.gov/gorf/gorf.html

  • Paste the downloaded sequence into the open space or write the above Accession Number into the field. Click to find ORF.

  • The result contain the six reading frames of given sequence. Now carefully check all the reading frames for the longest ORFs. Mostly, longest ORF gives you correct frame for amino acid sequence.

Note: You should bear in the corner of your mind that longest ORF, always, doesn't mean to be the correct frame for amino acid sequence so check that manually to confirm the accurate frame by performing BLAST.

  • Now select the longest ORF by clicking over the graphical pane. It will pop up with the corresponding amino acid sequence below the DNA sequence. Here you can accept the ORF or go on checking alternative options.

  • Once you are accepted with the ORF, you will be shown a form (just above) for identifying the homologous sequence using BlastP in 'nr' database. Without changing parameters, click on BLAST. It will take you to the BLAST form. Again click on View Report which will produce the BLAST result for your query. Thereby you can look into the details of your query sequence and its homologs, if any, and gather information related to its function.

Note: This is very common criteria to check whether query sequence may be a gene or not. This is how one can functionally annotate the newly sequenced DNA sequence. Even at NCBI, if curators find a good similarity with an existing database sequence, they assign the sequence with tag of putative function.

Go Back