9 between both datasets. To identify highly expressed transcripts and their putative functions, we selected the 100 most abundant transcripts based on their RPKM values in the CP and CS datasets, and investigated the biological processes in which those transcripts might be involved. Although many transcripts (15
in CP and 23 in CS) could not be assigned to known biological processes, most (52 in CP and 51 in CS) were involved in stress response and protein metabolism, including pathogenesis-related proteins, antioxidant enzymes, heat-shock proteins, and metallothionein-like proteins in the stress response category, and translation- and protein Natural Product Library high throughput degradation-related proteins in the protein metabolism category (Fig. 4). After these, transcripts related to lipid metabolism, such as fatty acid desaturases and lipid transfer proteins, were most abundant. Ginsenosides www.selleckchem.com/products/Bortezomib.html are the most important phytochemicals in ginseng and are known to be synthesized through the mevalonic acid pathway [24]. We focused on downstream enzymes from farnesyl diphosphate synthase (FDS) to UDP-glycosyltransferase (UGT) in the mevalonic acid pathway (Fig. 5A). In previous studies, 17 genes for the seven downstream enzymes (FDS to protopanaxatriol synthase) have been reported in P. ginseng [25], [26], [27], [28], [29], [30], [31] and [32] ( Table 2). We used amino acid sequences of the 17 genes as queries for TBLASTN searches against transcript
datasets of the CP cultivar, resulting in the identification of 10 genes encoding the seven downstream enzymes. Of them, a single transcript for FDS was identified with 15 isoforms in the CP dataset ( Table 2). Squalene synthase, dammarenediol synthase, PDK4 β-amyrin synthase,
protopanaxadiol synthase (CYP716A47), and protopanaxatriol synthase (CYP716A53v2) were also identified to be encoded by single transcripts with several isoforms. Exceptionally, four transcripts were identified for squalene epoxidase. Although we identified the isoforms using a reliable algorithm (Trinity assembler), the forthcoming P. ginseng genome sequence will provide more solid information about them. Based on our analysis, we considered that the isoforms are likely to originate from a single gene. To investigate the expression levels of the transcripts, the RPKM values of isoforms from the same transcripts were averaged and compared (Fig. 5B). All showed similar expression levels between CP and CS cultivars, with transcripts encoding cytochrome P450 for protopanaxatriol synthase showing the highest expression in both cultivars. Three UGT proteins, SvUGT74M1, MtUGT73K1, and MtUGT71G1, were used as queries for TBLASTN searches, because UGT genes for ginsenoside biosynthesis had not been identified in P. ginseng. Three UGT proteins were reported to function in triterpene saponin biosynthesis in Medicago truncatula and Saponaria vaccaria [33] and [34].