If one or more of the targets was missing, then the sample was eliminated (Additional file 1: Table S7). The final data set consisted of 63 or the original 84 samples (63% of asymptomatically colonized stool samples, 80% of diarrheal stool, 73% of xenic cultures and 84% of amebic liver aspirates) which passed quality control and had buy XL765 the greater than 8 fold sequence
coverage needed to confidently call SNPs. The libraries generated from stool samples and from polyxenic culture contained a greater number of reads that did not map to the E. histolytica amplicons than those obtained from amebic liver abscess aspirates. This was likely due in part to off-target amplification (Figure 1) of gut flora,
or a reduction in specificity because most of these samples did not undergo nested PCR amplification prior to library preparation. Samples isolated from amebic liver aspirates do not have associated bacterial flora, unlike pyloric abscesses, therefore a higher proportion of the template DNA is E. histolytica. Figure 1 Amplicon sequencing efficiency for individual samples. A) Number of reads obtained from the Illumina libraries prepared from different sample source x-axis libraries prepared from different sample source; y-axis number of reads (log2 scale) B) Average coverage of the reads when mapped to the concatenated amplicon reference; x-axis libraries prepared from different sample source y-axis average coverage of mapped reads (log2 scale) Line indicates median number of reads. In the samples that passed quality control, Resminostat the read depth for buy AZD0530 individual SNPs was >8x coverage; this was considered adequate for SNP verification. SNPs were scored as described in materials and
methods. The results of the illumina sequencing and the presence of predicted and novel SNPs within the amplicon sequences was tabulated as homozygous Reference (the same as the reference HM-1:IMSS sequence at this position) heterozygous (contained both the HM-1:IMSS nucleotide and the variant nucleotide at this position) or homozygous Non-Reference (has only the variant base at this location) (Additional file 1: Table S8). In Figure 2 the diversity of the SNPs at each locus in both the original sequence data (genomes shown in Table 1), and in the Bangladesh samples analyzed in this study, (extra details shown in Additional file 1: Table S9). Figure 2 Similarity of E. histolytica diversity in Bangladeshi and whole genome sequenced strains. Shown on the y axis (H) is the calculated heterozygosity and represents sum of the squared allele frequencies was subtracted from 1 on the x axis the loci containing the SNPs genotyped by MSLT(■ value in Bangladesh samples genotyped during this study, (□ value in the sequenced genomes described in Table 1). Our work supports previous finding of extensive diversity among E.