iNquiry implemented by the Princeton Genomics Group: help

Description

The Princeton Genomics Computational Grid currently runs the iNquiry bioinformatics cluster tool on a set of Apple Xserves. iNquiry includes a port of Sun's GridEngine for Mac OS X and provides commonly used bioinformatics tools such as the EMBOSS software suite, BLAST, and HMMer. Several of these tools have been optimized for use on Mac OS X. The bioinformatics tools have been wrapped in a web interface by iNquiry via PISE. You can use the web interface, or you can run programs on the command line. This page describes how you perform basic molecular biology/bioinformatics analyses using these bioinformatics tools available through our iNquiry installation.

Other cluster resources are also available.

Getting Started: the iNquiry web interface

Note to Safari web browser users: we have noticed some problems when using Safari to view frame-based pages, including these iNquiry pages. You might consider using Explorer or Netscape when using iNquiry on the web.

Requesting an account

Contact
gridhelp@genomics.princeton.edu to request an account and access to the iNquiry system.

Page layout

There is a very basic introduction to the layout of the iNquiry web pages available through iNquiry. It might be useful to have their basic introduction page open while you go through this help document so that you can scroll through their screen shots. Basically, all the tools are listed within the left frame of the page, and the user and job administration tools are displayed across the top of the page.

Results: retrieval and storing

You will generally get results immediately via the web. Generally, the initial results page provides a link to the actual results file; some tools provide options to results in multiple formats.

The results will be stored on the server for 10 days; you can use the Job History link at the top of the Home page to see the jobs that are saved. Note that jobs over 10 days are deleted. You can save the results files on your desktop by viewing the results and using the Save As option under the File menu on your browser.

Example Scenarios

Example 1: Generating a multiple sequence alignment (ClustalW)

I did a BLAST of the yeast Fpr3 protein against the nr database at NCBI and want to generate a multiple sequence alignment of the interesting hits with the ClustalW application via iNquiry. So, I log on to iNquiry and click on the ClustalW application (within the All Applications folder and also the Alignment::multiple folder).

ClustalW takes as input multiple fasta sequences and generates a multiple sequence alignment from them. I get the sequences from the BLAST results at NCBI, entering them into a file (or directly into the ClustalW form) in fasta format. Note: to get the sequences in fasta format, enter the "greater than" sign followed by a unique name as the first line for each sequence. For example (note that the numbers and spacing between amino acids are not required):

>cerevisiae
        1 msdllplaty slnvepytpv paidvtmpit vritmaalnp eaideenkps tlriikrnpd
       61 fedddflggd fdedeidees seeeeeektq kkkkskgkka eseseddeed ddeddefqes
      121 vlltlspeaq yqqsldltit peeevqfivt gsyaislsgn yvkhpfdtpm gvegededed
      181 adiydsedyd ltpdedeiig ddmddlddee eeevrieevq eedeedndge eeqeeeeeee
      241 qkeevkpepk kskkekkrkh eekeeekkak kvkkvefkkd leegptkpks kkeqdkhkpk
      301 skvleggivi edrtigdgpq akrgarvgmr yigklkngkv fdkntsgkpf afklgrgevi
      361 kgwdigvagm svggerriii papyaygkqa lpgipansel tfdvklvsmk n
>pombe
       1 mslpiavysl svkgkdvpav eestdasihl tmasidagek snkpttllvk vrpripvede
       61 ddeeldeqmq elleesqref vlctlkpgsl yqqplnltit pgdevffsas gdatihlsgn
      121 flvdeedeee eesdedydls pteedlvetv sgdeeseees esednsasee deldsapakk
      181 aqvkkkrtkd eseqeeaasp kknntkkqkv egtpvkekkv afaekleqgp tgpaakkekq
      241 qassnapssp ktrtlkggvv vtdvktgsga satngkkvem ryigklengk vfdkntkgkp
      301 fafilgrgev irgwdvgvag mqeggerkit ipapmaygnq sipgipknst lvfevklvrv
      361 h
>neurospora
        1 maplmpvavf glevppgeil ipaasefpai ihitmaaldp tkapeadgqg nipalprstl
       61 kiikatghdh ddddeeedey lqsllgggds ddeanggpsd pskskkakqe aaikklmaat
      121 qeesdeemed akpngkkgkg kgkasesdee esdeesdccg dddlqledyv vctldterny
      181 qqpinitige gekvffcvqg thsvyltgnf vvpeddeeds eddedesdde dydfplgged
      241 ddsddmsdel deldgtprvk eitsedeeee apklvdtskk gkkrpaedda egldamiskd
      301 dkklskkqqk kqkveeakke epkketksdk kvqfaknleq gptgpakdkl enkkptstvk
      361 vvqgvtiddr kvgtgraakn gdrvgmryig klqngkvfds nkkgapfsfk lgkgevikgw
      421 digvagmavg gerrltipah laygsralpg ippnstlifd vklleik
I can copy and paste the entire file directly into the Actual Data box on the ClustalW form or upload the file. I then click the Submit clustalw button, and receive the results page with a link to the alignment file (infile.aln); I click on the file to view my results:

CLUSTAL W (1.82) multiple sequence alignment


cerevisiae      MSDLLPLATYSLNVEPYTPVPAIDVTMPITVRITMAALNPEAIDEENKP--------STL
pombe           MS--LPIAVYSLSVKGKD-VPAVEESTDASIHLTMASID--AGEKSNKP--------TTL
neurospora      MAPLMPVAVFGLEVPPGEILIPAASEFPAIIHITMAALDPTKAPEADGQGNIPALPRSTL
                *:  :*:*.:.*.*     : .        :::***:::     : :          :**

cerevisiae      RIIKRNPDFEDDD-----------FLGGDFDEDE-------------------------I
pombe           -LVKVRPRIPVE----------------DEDDEE-------------------------L
neurospora      KIIKATGHDHDDDDEEEDEYLQSLLGGGDSDDEANGGPSDPSKSKKAKQEAAIKKLMAAT
                 ::*       :                * *::                           

cerevisiae      DEESSEEEEEEK-TQKKKKSKGKKAESESEDDEEDD----DEDDEFQESVLLTLSPEAQY
pombe           DEQMQELLEE---SQR-------------------------------EFVLCTLKPGSLY
neurospora      QEESDEEMEDAKPNGKKGKGKGKASESDEEESDEESDCCGDDDLQLEDYVVCTLDTERNY
                :*: .*  *:   . :                               : *: **..   *

cerevisiae      QQSLDLTITPEEEVQFIVTGSYAISLSGNYVKHPFDTPMGVEGEDEDEDADIYDSEDYDL
pombe           QQPLNLTITPGDEVFFSASGDATIHLSGNFLVD--------EEDEEEEESD----EDYDL
neurospora      QQPINITIGEGEKVFFCVQGTHSVYLTGNFVVPE------DDEEDSEDDEDESDDEDYDF
                **.:::**   ::* * . *  :: *:**::          : ::.::: *    ****:

cerevisiae      TPDEDEIIGDDMDDLDDEEEEEVRIEEVQEEDEEDNDGEEEQEEEEEEEQKEEVKPE---
pombe           SPTEEDLVETVSGDEESEEESESEDNSASEEDELDSAPAKKAQVKKKRTKDESEQEE---
neurospora      PLGGEDDDSDDMSDELDELDGTPRVKEITSEDEEEEAPKLVDTSKKGKKRPAEDDAEGLD
                .   ::      .*  .* :   . :.  .*** :.        :: . :  . . *   

cerevisiae      ---PKKSKKEKKRKHEEKEEEKKAKKV--------KKVEFKKDLEEGPTKPKSKKEQDKH
pombe           ---AASPKKNNTKKQK---VEGTPVKE--------KKVAFAEKLEQGPTGPAAKKEKQQA
neurospora      AMISKDDKKLSKKQQKKQKVEEAKKEEPKKETKSDKKVQFAKNLEQGPTGPAKDKLENKK
                   . . ** ..::::    *    :         *** * :.**:*** *  .* ::: 

cerevisiae      K------PKSKVLEGGIVIEDRTIGDGPQAKRGARVGMRYIGKLKNGKVFDKNTSGKPFA
pombe           SSNAPSSPKTRTLKGGVVVTDVKTGSGASATNGKKVEMRYIGKLENGKVFDKNTKGKPFA
neurospora      P-----TSTVKVVQG-VTIDDRKVGTGRAAKNGDRVGMRYIGKLQNGKVFDSNKKGAPFS
                       .. :.::* :.: * . * *  *..* :* *******:******.*..* **:

cerevisiae      FKLGRGEVIKGWDIGVAGMSVGGERRIIIPAPYAYGKQALPGIPANSELTFDVKLVSMKN
pombe           FILGRGEVIRGWDVGVAGMQEGGERKITIPAPMAYGNQSIPGIPKNSTLVFEVKLVRVH-
neurospora      FKLGKGEVIKGWDIGVAGMAVGGERRLTIPAHLAYGSRALPGIPPNSTLIFDVKLLEIK-
                * **:****:***:*****  ****:: ***  ***.:::**** ** * *:***: :: 
 
To get an even nicer alignment with boxed regions, colors, etc, I can copy and paste the above clustalw results into another application, prettyplot. By default, prettyplot generates a png file, but other file format options are available, which you can choose in the Output section of the form. To view the png file that is created from prettyplot, click here.

EMBOSS links

Because many of the tools in the iNquiry package are part of the EMBOSS suite, links to the EMBOSS documentation are provided below:

Running iNquiry programs on the command line

Getting started

Like with the web interface, you will need an account to access the iNquiry cluster via the command line. Once you have an account, use the ssh command to connect to: genomics-grid.princeton.edu

Tip: sometimes when transferring files from a Mac or PC to a unix machine, there are problems caused by the different platforms using different end of line characters. If you were using a Mac, for example, and transferred a fasta file that is generating errors, open the file on the unix server to see if there are Control-M's present in the file. If so, there are several ways you can get rid of them. One simple way is to use the unix 'tr' utility, for example:

tr '\15' '\12' < original_fasta_with_controlM.fasta > new_fasta.fasta

Running one job on the command line

To run an individual job on the command line, simply type the name of the program that you wish to run along with any relevant parameters. For example, to run BLASTp using a query protein sequence against the nr database, you can enter the following:
blastall -p blastp -d /common/data/nr -i vps8.fasta -a 2 > vps8.txt ;
html4blast -o vps8.html -g -e vps8.txt

In this example, the query protein sequence is in a file named vps8.fasta. BLASTp will generate results first in a file called vps8.txt, then will also generate an HTML-formatted results file named vps8.html, using the vps8.txt file as input.

Tip: use the web interface first to help you set the parameters properly. When you submit a job using the web interface, the results page will tell you the exact unix command that was used. You can then use that to help generate a batch query on the command line.

Sending batch jobs

The easiest way to submit batch jobs is to create a simple shell script containing all the commands that you want to execute, then run the shell script.

A sample shell script:

#!/bin/sh
/usr/local/biotools/bin/blastpgp -i fpr1p.fsa -d /common/data/nr -e 1 -o fpr1.2.txt

/usr/local/biotools/bin/blastpgp -i tfc3.fasta -d /common/data/nr -e 1
-o tfc3.2.txt

/usr/local/biotools/bin/blastpgp -i vps8.fasta -d /common/data/nr -e 1
-o vps8.2.txt

/usr/local/biotools/bin/blastpgp -i efb1.fasta -d /common/data/nr -e 1
-o efb1.2.txt

/usr/local/biotools/bin/blastpgp -i ssa1.fasta -d /common/data/nr -e 1 -o ssa1.2.txt

Executing a shell script

To submit your shell script to the cluster, use the qsub command, for example:
qsub ./Sample_shell_script.sh

Checking your job

There are several commands that let you check on the status on the job that is running:

List your job:

qstat
Example:
user@portal2net:~$ qstat  
job-ID  prior name       user         state submit/start at     queue
master  ja-task-ID
-------------------------------------------------------------------------------------------
1118     0 proteinSys user         r     04/01/2005 10:55:00 node03.q
MASTER        

Check on details of your job (after getting the job id from qstat):

qstat -j jobid
Example:
user@portal2net:~$ qstat -j 111
job_number:                 111
exec_file:                  job_scripts/111
submission_time:            Fri Apr  1 10:54:58 200
owner:                      user
uid:                        102
group:                      user
gid:                        102
sge_o_home:                 /Users/user
sge_o_log_name:             user
sge_o_path:
/sw/bin:/sw/sbin:/usr/local/bin:/usr/X11R6/bin:/common/sge/bin/darwin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/local/biotools/bin:/common/biotools/bin:/usr/X11R6/bin:/Users/postgres/current/pgsql/bi
sge_o_mail:                 /var/mail/user
sge_o_shell:                /bin/bas
sge_o_workdir:              /Users/user
sge_o_host:                 portal2net
account:                    sg
mail_list:                  user@portal2net.cluster.private
notify:                     FALSE
job_name:                   proteinSysCalls.s
script_file:                ./proteinSysCalls.s
usage    1:                  cpu=00:00:00, mem=0.00000 GBs,
io=0.00000, vmem=N/A, maxvmem=N/
scheduling info:            There are no messages available

Delete your job:

qdel jobid

last update: