A phylogenetic tree (Additional File 1) was also generated from the same data using the dnaml (maximum likelihood) program of the PHYLIP package version 3.6 [18]. Node pairings which discriminated between subspecies or clades were selected for the development of diagnostic typing assays. Criteria used to select SNP locations for the assay were: 1. The SNP location must cleanly differentiate the two nodes of interest. Within each of the nodes, all of the member strains must share the same base call at the location, and the two nodes must differ at the location. 2. The sequences downstream of the SNP location must be in sufficient agreement among all strains
from both nodes so that an appropriate primer can be chosen from the consensus sequence (the consensus at the primer location may not contain “”N”" calls or any conflicting base calls). 3. The primer sequences must have melting selleck temperatures within a specific limited range (60°C to 70°C). 4. The predicted PCR product size must be within the range 150 to 500 bp. We developed a set Hedgehog inhibitor of programs to identify candidate SNP locations for the real-time PCR (RT-PCR) assay. SNPTree uses the phylogenetic tree and
the multi-FASTA files from the resequencing experiments as input, assigns arbitrary node numbers to all nodes in the tree, and produces a set of multi-FASTA files, one for each node in the tree, of the consensus base calls for each node. The consensus call is “”N”" unless all members of a particular node share the same base call at that location. The program also produces a set of files, one for each node, listing the base calls
that occur at every SNP location, for all SNP positions detected within the entire set of 40 samples (19,897 locations). The program CompareNodes uses the SNP list files for any mafosfamide two nodes and produces a list of SNP locations that cleanly differentiate the two nodes (described above). The program CreatePrimer3 uses a list of discriminating SNP locations and the multi-FASTA files for two nodes, and creates an input file for the Primer3 program [19]. CreatePrimer3 also chooses the 5′-forward primers, which are constrained by the locations of the SNPs. The Primer3 software [19] is then used to identify appropriate 3′-reverse primers. The Primer3 program enforces the last three criteria listed above. This process resulted in the design of a large number of primers for candidate SNP locations for most node pairs that may be used as diagnostic markers. The final set of SNP markers/locations we used was selected manually by identifying primers distributed over the entire genome. The programs SNPTree, CompareNodes and CreatePrimer3 were developed at the J. Craig Venter Institute specifically for this study and are freely available for download ftp://ftp.jcvi.org/pub/software/pfgrc/SNPTree/SNPTreePackage.tar.gz.