序列比對 - BLAST

BLAST全名Basic Local Alignment Search Tool，是一種用來對比對序列一級結構，在蛋白質database或DNAdatabase中進行相似性的比較。10/12日的課堂介紹了NCBI提供的BLAST工具中的BLASTn、BLASTp、BLASTx以及primer-BLAST。

〝 The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families. 〞

Standard Nucleotide BLAST (Blastn)

Blastn用於比對核甘酸序列 (nucleotide vs nucleotide)，內有三種功能

l Megablast : intra-species comparison, sequence identification

l Discontiguous megablast : cross-species comparison, searching with coding sequences

l blastn: searching with shorter queries, cross-species comparison

我以cloning所得的16s rDNA片段，經過定序得出之序列作練習。

Blastn的頁面可以分為三個區塊：Enter Query Sequence、Choose Search Set以及Program Selection。

在query序列輸入處可單純將序列貼上，也可上傳FASTA檔。

A. Query subrange處可以調整搜尋範圍，比如我上傳的序列有約1500 bp，但只想比對42~1250位置，可在此欄位做修正

B. 可以選擇搜尋比對的database，預設的nr (nt)會搜尋所有來自GenBank + EMBL + DDBJ + PDB中的核甘酸序列。像我要比對的是細菌的16s rDNA，可以值間將database設定為16S ribosomal RNA sequences

C. 可以選擇要比對的生物，像這邊就可以設定成要比對的是細菌。勾選後方的方框會變成是在比對範圍排除選擇的生物在比對範圍，另外可以選擇比對多種生物，在後方的 + 符號可以多加設定欄位。

在program selection處可以調整搜尋的模式，三個模式各不同的速度以及敏感度，適合不同的任務：

l Megablast適合用來確認input的序列以及收搜尋較大的input序列

l Discontiguous megablast適合用來搜尋來自其他生物體的相關序列

l Blastn適合用於較短的input序列

在Alogrithm parameters下方的選項可以調整整個搜尋作業的敏感度等設定

A處可以調整最多搜尋出來多少match的序列

B處則是針對短query序列用的自動化設定

C處會過濾掉比設定值less significant的matches

lower EXPECT thresholds are more stringent, leading to fewer chance matches being reported.

D處則設定了”種子”序列的長度，要滿足此長度的match才會展開對此序列全條的比對。Word size越小越敏感。

〝 an exact match of the entire word is required before an extension is initiated, so that one normally regulates the sensitivity and speed of the search by increasing or decreasing the word-size.〞

BLASTn的結果呈現分為幾個部分：Graphic overview, description table, alignment sections.

l Description table提供了以下數種資訊

l Description/Title of matched sequence

l Max score / Total score則是演算法得出的序列比對的評分

l Query coverage則是序列有多少比例是有被比對到該matched sequences

l E value是隨機發生此事件的期望值

l Max ident則是最高的相似度

最後Accession欄位提供了此序列的GenBank連結。勾選每個title前的方框可以將勾選的序列以不同的檔案格式下載下來。BLAST的結果顯示我這段序列很接近Candidatus Jettenia，Jettenia是厭氧氨氧化菌五個屬中的一個，以厭氧氨氧化菌核心的廢水處理技術是近年來環工領域處理含氮廢水的熱門研究。在Alignment區域中可以詳細的看query序列以及matched序列每個鹼基比對的結果。

Standard Protein BLAST (BLASTP)

Blastp在Enter Query Sequence以及Choose Search Set兩部分都差不多，在Program Selection部分，有五種模式可選擇：

· QuickBLASTP is an accelerated version of BLASTP that is very fast and works best if the target percent identity is 50% or more.

· BlastP simply compares a protein query to a protein database.

· PSI-BLAST allows the user to build a PSSM (position-specific scoring matrix) using the results of the first BlastP run.)

· PHI-BLAST performs the search but limits alignments to those that match a pattern in the query.

· DELTA-BLAST constructs a PSSM using the results of a Conserved Domain Database search and searches a sequence database.

其中Blastp是預設的選項，Blastp一general purpose用途的模式，可用來確認query序列或找到與其相似的序列。

其餘Blast

其餘的Blast則有其各自的功用，Blastx可用來確認query DNA序列可能encode的蛋白質產物 (nucleotide -> translated vs protein)；tblastn可用來確認能encode出與query 胺基酸序列相似的DNA序列 (protein vs nucleotide)；tblastx則根據query DNA序列以及database中DNA序列的coding potential來確認query序列。

Primer-Blast : A tool for finding specific primers

Primer blast 常被用來預測query primer set的產物。此處以target為Nitrospira nxrB 基因的primer set做測試。Nitrospira在廢水處理系統中是主要的nitrite oxidizing bacteria(NOB)，在將氨氮從廢水移除占了很重要的角色。NxrB是encode亞硝氧化酶其中一個subunit的基因，常被用來當作偵測Nitrospira或是NOB的marker gene.

從結果可以看到此組primer set的PCR產物長度為179 bp，可以amplify出Nitrospira nxrB gene。

SPECIAL TOPICS ON MOLECULAR BIOLOGY INFORMATION RESOURCES

搜尋此網誌

序列比對 - BLAST

留言

張貼留言

這個網誌中的熱門文章

Protein structure database (2017/11/30)

NGS for prokaryote (2018/01/04)