<P> Using a heuristic method, BLAST finds similar sequences, by locating short matches between the two sequences . This process of finding similar sequences is called seeding . It is after this first match that BLAST begins to make local alignments . While attempting to find similarity in sequences, sets of common letters, known as words, are very important . For example, suppose that the sequence contains the following stretch of letters, GLKFA . If a BLAST was being conducted under normal conditions, the word size would be 3 letters . In this case, using the given stretch of letters, the searched words would be GLK, LKF, KFA . The heuristic algorithm of BLAST locates all common three - letter words between the sequence of interest and the hit sequence or sequences from the database . This result will then be used to build an alignment . After making words for the sequence of interest, the rest of the words are also assembled . These words must satisfy a requirement of having a score of at least the threshold T, when compared by using a scoring matrix . One commonly used scoring matrix for BLAST searches is BLOSUM62, although the optimal scoring matrix depends on sequence similarity . Once both words and neighborhood words are assembled and compiled, they are compared to the sequences in the database in order to find matches . The threshold score T determines whether or not a particular word will be included in the alignment . Once seeding has been conducted, the alignment which is only 3 residues long, is extended in both directions by the algorithm used by BLAST . Each extension impacts the score of the alignment by either increasing or decreasing it . If this score is higher than a pre-determined T, the alignment will be included in the results given by BLAST . However, if this score is lower than this pre-determined T, the alignment will cease to extend, preventing the areas of poor alignment from being included in the BLAST results . Note that increasing the T score limits the amount of space available to search, decreasing the number of neighborhood words, while at the same time speeding up the process of BLAST . </P> <P> To run the software, BLAST requires a query sequence to search for, and a sequence to search against (also called the target sequence) or a sequence database containing multiple such sequences . BLAST will find sub-sequences in the database which are similar to sub sequences in the query . In typical usage, the query sequence is much smaller than the database, e.g., the query may be one thousand nucleotides while the database is several billion nucleotides . </P> <P> The main idea of BLAST is that there are often High - scoring Segment Pairs (HSP) contained in a statistically significant alignment . BLAST searches for high scoring sequence alignments between the query sequence and the existing sequences in the database using a heuristic approach that approximates the Smith - Waterman algorithm . However, the exhaustive Smith - Waterman approach is too slow for searching large genomic databases such as GenBank . Therefore, the BLAST algorithm uses a heuristic approach that is less accurate than the Smith - Waterman algorithm but over 50 times faster . (8) The speed and relatively good accuracy of BLAST are among the key technical innovations of the BLAST programs . </P> <P> An overview of the BLAST algorithm (a protein to protein search) is as follows: </P>

What is the major advantage of blast over smith-waterman
find me the text answering this question