| iNquiry implemented by the Princeton Genomics Group: help |
- Description
- Getting Started
- Description of all the bioinformatics applications
- Example scenarios
- Multiple sequence alignment (ClustalW)
- EMBOSS links
- Running iNquiry programs on the command line
The Princeton Genomics Computational Grid currently runs the iNquiry bioinformatics cluster tool on a set of Apple Xserves. iNquiry includes a port of Sun's GridEngine for Mac OS X and provides commonly used bioinformatics tools such as the EMBOSS software suite, BLAST, and HMMer. Several of these tools have been optimized for use on Mac OS X. The bioinformatics tools have been wrapped in a web interface by iNquiry via PISE. You can use the web interface, or you can run programs on the command line. This page describes how you perform basic molecular biology/bioinformatics analyses using these bioinformatics tools available through our iNquiry installation.
Other cluster resources are also available.
The results will be stored on the server for 10 days; you can use the Job History link at the top of the Home page to see the jobs that are saved. Note that jobs over 10 days are deleted. You can save the results files on your desktop by viewing the results and using the Save As option under the File menu on your browser.
ClustalW
takes as input multiple fasta sequences and generates a multiple
sequence alignment from them. I get the sequences from
the BLAST results at NCBI, entering them into a file (or directly into the ClustalW form) in
fasta format. Note:
to get the sequences in fasta format, enter the "greater than" sign
followed by a unique name as the first line for each sequence. For
example (note that the numbers and spacing between amino acids are not required):
>cerevisiae
1 msdllplaty slnvepytpv paidvtmpit vritmaalnp eaideenkps tlriikrnpd
61 fedddflggd fdedeidees seeeeeektq kkkkskgkka eseseddeed ddeddefqes
121 vlltlspeaq yqqsldltit peeevqfivt gsyaislsgn yvkhpfdtpm gvegededed
181 adiydsedyd ltpdedeiig ddmddlddee eeevrieevq eedeedndge eeqeeeeeee
241 qkeevkpepk kskkekkrkh eekeeekkak kvkkvefkkd leegptkpks kkeqdkhkpk
301 skvleggivi edrtigdgpq akrgarvgmr yigklkngkv fdkntsgkpf afklgrgevi
361 kgwdigvagm svggerriii papyaygkqa lpgipansel tfdvklvsmk n
>pombe
1 mslpiavysl svkgkdvpav eestdasihl tmasidagek snkpttllvk vrpripvede
61 ddeeldeqmq elleesqref vlctlkpgsl yqqplnltit pgdevffsas gdatihlsgn
121 flvdeedeee eesdedydls pteedlvetv sgdeeseees esednsasee deldsapakk
181 aqvkkkrtkd eseqeeaasp kknntkkqkv egtpvkekkv afaekleqgp tgpaakkekq
241 qassnapssp ktrtlkggvv vtdvktgsga satngkkvem ryigklengk vfdkntkgkp
301 fafilgrgev irgwdvgvag mqeggerkit ipapmaygnq sipgipknst lvfevklvrv
361 h
>neurospora
1 maplmpvavf glevppgeil ipaasefpai ihitmaaldp tkapeadgqg nipalprstl
61 kiikatghdh ddddeeedey lqsllgggds ddeanggpsd pskskkakqe aaikklmaat
121 qeesdeemed akpngkkgkg kgkasesdee esdeesdccg dddlqledyv vctldterny
181 qqpinitige gekvffcvqg thsvyltgnf vvpeddeeds eddedesdde dydfplgged
241 ddsddmsdel deldgtprvk eitsedeeee apklvdtskk gkkrpaedda egldamiskd
301 dkklskkqqk kqkveeakke epkketksdk kvqfaknleq gptgpakdkl enkkptstvk
361 vvqgvtiddr kvgtgraakn gdrvgmryig klqngkvfds nkkgapfsfk lgkgevikgw
421 digvagmavg gerrltipah laygsralpg ippnstlifd vklleik
I can copy and paste the entire file directly into the Actual
Data box on the ClustalW form or upload the file.
I then click the Submit clustalw button, and receive the
results page with a link to the alignment file (infile.aln); I click
on the file to view my results:
CLUSTAL W (1.82) multiple sequence alignment
cerevisiae MSDLLPLATYSLNVEPYTPVPAIDVTMPITVRITMAALNPEAIDEENKP--------STL
pombe MS--LPIAVYSLSVKGKD-VPAVEESTDASIHLTMASID--AGEKSNKP--------TTL
neurospora MAPLMPVAVFGLEVPPGEILIPAASEFPAIIHITMAALDPTKAPEADGQGNIPALPRSTL
*: :*:*.:.*.* : . :::***::: : : :**
cerevisiae RIIKRNPDFEDDD-----------FLGGDFDEDE-------------------------I
pombe -LVKVRPRIPVE----------------DEDDEE-------------------------L
neurospora KIIKATGHDHDDDDEEEDEYLQSLLGGGDSDDEANGGPSDPSKSKKAKQEAAIKKLMAAT
::* : * *::
cerevisiae DEESSEEEEEEK-TQKKKKSKGKKAESESEDDEEDD----DEDDEFQESVLLTLSPEAQY
pombe DEQMQELLEE---SQR-------------------------------EFVLCTLKPGSLY
neurospora QEESDEEMEDAKPNGKKGKGKGKASESDEEESDEESDCCGDDDLQLEDYVVCTLDTERNY
:*: .* *: . : : *: **.. *
cerevisiae QQSLDLTITPEEEVQFIVTGSYAISLSGNYVKHPFDTPMGVEGEDEDEDADIYDSEDYDL
pombe QQPLNLTITPGDEVFFSASGDATIHLSGNFLVD--------EEDEEEEESD----EDYDL
neurospora QQPINITIGEGEKVFFCVQGTHSVYLTGNFVVPE------DDEEDSEDDEDESDDEDYDF
**.:::** ::* * . * :: *:**:: : ::.::: * ****:
cerevisiae TPDEDEIIGDDMDDLDDEEEEEVRIEEVQEEDEEDNDGEEEQEEEEEEEQKEEVKPE---
pombe SPTEEDLVETVSGDEESEEESESEDNSASEEDELDSAPAKKAQVKKKRTKDESEQEE---
neurospora PLGGEDDDSDDMSDELDELDGTPRVKEITSEDEEEEAPKLVDTSKKGKKRPAEDDAEGLD
. :: .* .* : . :. .*** :. :: . : . . *
cerevisiae ---PKKSKKEKKRKHEEKEEEKKAKKV--------KKVEFKKDLEEGPTKPKSKKEQDKH
pombe ---AASPKKNNTKKQK---VEGTPVKE--------KKVAFAEKLEQGPTGPAAKKEKQQA
neurospora AMISKDDKKLSKKQQKKQKVEEAKKEEPKKETKSDKKVQFAKNLEQGPTGPAKDKLENKK
. . ** ..:::: * : *** * :.**:*** * .* :::
cerevisiae K------PKSKVLEGGIVIEDRTIGDGPQAKRGARVGMRYIGKLKNGKVFDKNTSGKPFA
pombe SSNAPSSPKTRTLKGGVVVTDVKTGSGASATNGKKVEMRYIGKLENGKVFDKNTKGKPFA
neurospora P-----TSTVKVVQG-VTIDDRKVGTGRAAKNGDRVGMRYIGKLQNGKVFDSNKKGAPFS
.. :.::* :.: * . * * *..* :* *******:******.*..* **:
cerevisiae FKLGRGEVIKGWDIGVAGMSVGGERRIIIPAPYAYGKQALPGIPANSELTFDVKLVSMKN
pombe FILGRGEVIRGWDVGVAGMQEGGERKITIPAPMAYGNQSIPGIPKNSTLVFEVKLVRVH-
neurospora FKLGKGEVIKGWDIGVAGMAVGGERRLTIPAHLAYGSRALPGIPPNSTLIFDVKLLEIK-
* **:****:***:***** ****:: *** ***.:::**** ** * *:***: ::
To get an even nicer alignment with boxed regions, colors, etc, I can copy and
paste the above clustalw results into another application, prettyplot.
By default, prettyplot generates a png file, but other file
format options are available, which you can choose in the Output
section of the form. To view the png file that
is created from prettyplot, click here.
Like with the web interface, you will need an account to access the
iNquiry cluster via the command line. Once you have an account, use
the ssh command to connect to: genomics-grid.princeton.edu
Tip: sometimes when transferring files from a Mac or PC to a unix machine, there are problems caused by the different platforms using different end of line characters. If you were using a Mac, for example, and transferred a fasta file that is generating errors, open the file on the unix server to see if there are Control-M's present in the file. If so, there are several ways you can get rid of them. One simple way is to use the unix 'tr' utility, for example:
tr '\15' '\12' < original_fasta_with_controlM.fasta > new_fasta.fasta
blastall -p blastp -d /common/data/nr -i vps8.fasta -a 2 > vps8.txt ; html4blast -o vps8.html -g -e vps8.txt
In this example, the query protein sequence is in a file named vps8.fasta. BLASTp will generate results first in a file called vps8.txt, then will also generate an HTML-formatted results file named vps8.html, using the vps8.txt file as input.
Tip: use the web interface first to help you set the parameters properly. When you submit a job using the web interface, the results page will tell you the exact unix command that was used. You can then use that to help generate a batch query on the command line.
#!/bin/sh /usr/local/biotools/bin/blastpgp -i fpr1p.fsa -d /common/data/nr -e 1 -o fpr1.2.txt /usr/local/biotools/bin/blastpgp -i tfc3.fasta -d /common/data/nr -e 1 -o tfc3.2.txt /usr/local/biotools/bin/blastpgp -i vps8.fasta -d /common/data/nr -e 1 -o vps8.2.txt /usr/local/biotools/bin/blastpgp -i efb1.fasta -d /common/data/nr -e 1 -o efb1.2.txt /usr/local/biotools/bin/blastpgp -i ssa1.fasta -d /common/data/nr -e 1 -o ssa1.2.txt
qsub ./Sample_shell_script.sh
qstatExample:
user@portal2net:~$ qstat job-ID prior name user state submit/start at queue master ja-task-ID ------------------------------------------------------------------------------------------- 1118 0 proteinSys user r 04/01/2005 10:55:00 node03.q MASTER
qstat -j jobidExample:
user@portal2net:~$ qstat -j 111 job_number: 111 exec_file: job_scripts/111 submission_time: Fri Apr 1 10:54:58 200 owner: user uid: 102 group: user gid: 102 sge_o_home: /Users/user sge_o_log_name: user sge_o_path: /sw/bin:/sw/sbin:/usr/local/bin:/usr/X11R6/bin:/common/sge/bin/darwin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/local/biotools/bin:/common/biotools/bin:/usr/X11R6/bin:/Users/postgres/current/pgsql/bi sge_o_mail: /var/mail/user sge_o_shell: /bin/bas sge_o_workdir: /Users/user sge_o_host: portal2net account: sg mail_list: user@portal2net.cluster.private notify: FALSE job_name: proteinSysCalls.s script_file: ./proteinSysCalls.s usage 1: cpu=00:00:00, mem=0.00000 GBs, io=0.00000, vmem=N/A, maxvmem=N/ scheduling info: There are no messages available
qdel jobid