Codon usage and evolutionary dynamics of genetic diversity of novel imported porcine reproductive and respiratory syndrome virus in China

Porcine reproductive and respiratory syndrome (PRRS) is a problem that has significant economic impact on the global pig industry. In recent years, there has been an increased importation of pork into China, contribut‑ ing to the emergence of novely imported porcine reproductive and respiratory syndrome virus (PRRSV) sub‑types. Nevertheless, codon usage patterns and their effects on the evolution and adaptation of these new input PRRSV sub‑types in hosts remain elusive. To investigate this, we employed a Bayesian approach to analyze two novel imported PRRSV sub‑types, namely, NADC30‑like and NADC34‑like viruses. These sub‑types have different codon preferences. Besides, the Effective Number of Codon (ENC) analysis revealed that both NADC30‑like and NADC34‑like fall within the expected curve distribution, describing a balanced codon usage for both NADC30‑like and NADC34‑ like virus. Based on the Codon Adaptation Index (CAI), NADC30‑like showed the highest similarity to the host, align‑ ing with the main prevalence trend of the host. In contrast, NADC34‑like exhibited the highest frequency of optimal codon usage; this analysis is based on Frequency of Optimal Codons (FOP). Moreover, the Relative Codon Deopti‑ mization Index (RCDI) indicates that NADC30‑like sub‑types have a greater degree of inverse optimization sub‑type. These findings suggest that mutational pressure affects codon usage preferences of genes in newly imported PRRSV, and that natural selection plays a vital role in determining PRRSV gene codon preferences. Our study provides new insights into the disease, origin, evolutionary patterns, and host adaptation of these newly imported PRRSV sub‑types in China. It also contributes to the development of theoretical frameworks for studying genetics and the evolution of PRRSV.


Introduction
Porcine reproductive and respiratory syndrome virus (PRRSV) is a major issue affecting the global pig industry, resulting in substantial economic losses each year [1].The infection of PRRSV destroys the whole lymphocyte population in the infected animals, thereby eliminating or destroying the adaptive immune response system and resulting in immunosuppression through various mechanisms.This makes the infected pigs more susceptible to other pathogens, leading to subsequent infections that are harmful to both the animals and PRRSV, which is genetically diverse [2].Since the first report of PRRS in China in 1995, it has become one of the significant infectious diseases threatening the swine industry in this country.In recent years, several novel imported PRRSV strains have emerged, including the NADC30-like strains in 2013 [3] and the NADC34-like strains in 2017 [4].Moreover, the NADC34-like strain, recognized as PRRSV/CN/FJGD01/2021, was first identified in 2022.This recombinant virus originates from the combination of NADC30-, NADC34-, and JXA1-like isolates.In 2006, a novel PRRSV variant (HP-PRRSV) emerged in China, inflicting devastating damage on the pig industry in China [3].It is recognized to cause moderately serious respiratory symptoms and significant histopathological damage to the lungs in piglets [5].
A codon is the smallest functional unit within a DNA or RNA sequence responsible for encoding amino acids during the process of protein translation.Under ideal conditions, without selective pressure and neutral mutations, the probability of occurrence of synonymous codons should theoretically be perfectly even [6,7].However, numerous studies have demonstrated that the use of synonymous codons is a non-random process [8] and that some codons are employed more preferentially than others during translation [9][10][11].This phenomenon of codon usage preference contains not only simple random mutations in base composition, but also corresponding functions that can explain the occurrence of different codon usage patterns in distinct species at a biological level [12].It describes why similar codon selection strategies might be employed, thus complementing the molecular data with biological significance.
It is presumed that codon preference is the result of long-term evolution and is influenced by multiple factors, including the external environment and internal molecular mutations during the evolutionary process.Besides, studying codon preference and the factors leading to its formation is essential for understanding the characteristics of the genome, molecular evolution and ecological adaptation of novely imported PRRSV.In recent years, NADC30-like and NADC34-like strains have been spreading in China.To proactively prevent outbreaks of the two PRRSV subtypes, we conducted an examination of the codon usage of NADC30-like and NADC34-like strains identified in China.Our findings revealed that their codon usage preferences are significantly shaped by natural selection.

Phylogenetic analysis of novel imported PRRSV sub-types
To assess the genetic relationships between two novel imported PRRSV sub-types, namely the NADC30-like and NADC34-like strains, we conducted an analysis.This analysis included HP-PRRSV strains that were fully sequenced.The results clearly illustrated that the NADC30-like and NADC34-like strains clustered in two distinct branches signified by different colors (Fig. 1).

Constructing of phylogenies based on the Bayesian Markov Chain method
The codon mutation rate of GP5 structural proteins, from these two novel PRRSV NADC30-like and NADC34-like, were calculated using the Bayesian Markov Chain method.The results showed that CP1 + 2:0.77 and CP3:1.461(Fig. 2A) of the NADC30-like strain were distinct from CP1 + 2:0.619 and CP3:1.763(Fig. 3A) of the NADC34like strain.In addition, CP1 + 2:0.77 and CP3:1.461(Fig. 3A) of the NADC30-like strain were different from CP1 + 2:0.619 and CP3:1.763(Fig. 3A) of NADC34-like strains, indicating divergence.Since the third codon has the highest mutation rate, some mutations do not change the amino acids encoded in the protein, resulting in a high degree of homology between the emerging imported sub-types of PRRSV strains.Additionally, the geographical distribution of two novel imported PRRSV sub-types, namely, NADC30-like and NADC34-like strains has been progressively expanding.The Skyline plot illustrates that since 2018, the effective outbreak of the NADC30-like strain has declined in comparison with its peak in 2013 (Fig. 2B, C).Moreover, the NADC30-like strain was first detected in China in 2017 and has experienced an upward trend through 2021 (Fig. 3B, C).

Nucleotide bias of novel imported PRRSV sub-types
Among the two novel imported PRRSV sub-types and their antigenic variants, the highest %T ratios were determined in the NADC30-like and NADC34-like strains, with values of 29.560 ± 1.224 and 28.754 ± 0.701 (means ± SD), respectively (Table 1).Concerning terms of synonymous codons, the mean values of C3s (0.360 ± 0.021) and the third codon position (0.402 ± 0.022) were the highest.Respectively (Table 1), the NADC30-like and NADC34-like strains had comparable patterns of synonymous codon composition at the third position.Except for that, in the NADC30-like and NADC34-like strains, the GC content of distinct synonymous codon locations was basically the same.GC3s in both strains were substantially higher than GC1s and GC2s, and across all species, GC3s > GC1s > GC2s.

Measurement of codon bias
The ENC values for the NADC30-like and NADC34-like strains were 59.223 ± 1.729 and 56.714 ± 2.387, respectively.All ENC values were higher than 35, indicating a balanced codon usage with low codon bias and balanced codon usage (Table 1).

Relative synonymous codon usage analysis of the two novel imported PRRSV sub-types genomes
Relative synonymous codon usage analysis (RSCU) is a common method for studying synonymous codon usage patterns.In our study, we noticed that 11 of the most commonly used synonymous codons are G/C-terminal codons (8 ending in C), and the number of T-terminal, A-terminal, and G-terminal codons is 4, 3, and 3, respectively (Table 2).Importantly, four of the 18 optional synonymous codons have RSCU values greater than 1.6 (RSCU < 0.6) (Fig. 4).In the following, we analyzed the tendency of synonymous codons in clades and found that G/C-terminal codons are more prevalent than A/Tterminal codons.We subsequently explored the relationship between the NADC30-like and NADC34-like strains and their natural host, pigs.Accordingly, we determined that the 18 most abundant host codons were essentially identical to those of the NADC30-like and NADC34like strains.These data demonstrated that the two novel imported PRRSV sub-types are in a rich set of 18 synonymous codons, the majority of synonymous codon preferences are identical, which is crucial in exploring receptor binding in different sub-types of PRRSV.

Comparison of codon usage comparison between the virus and the host
Regarding the Codon Adaptation Index (CAI) values, both of the two novel imported PRRSV sub-type strains shared a CAI value converging to 0, indicating a low codon bias (Fig. 5A).The Relative Codon Deoptimization Index (RCDI) measures the translation rate of genes versus the general codon distribution.Among the virus genes tested, the NADC30-like strain had the highest RCDI value, implying that the higher the similarity to the host gene, the higher the translation rate (Fig. 5B).
Frequency of Optimal Codons (FOP) represents the most frequently utilized codon in a species' highly expressed genes.A comparison of the Fop values showed that the NADC34like strain had a higher value than the NADC30-like strain, indicating a more efficient codon usage frequency as well as larger codon preference (Fig. 5C).As a PRRSV sub-type that has emerged in recent years, the NADC30-like strain's CAI, RCDI, and Fop values indicate its robust foreign gene expression ability and superior codon usage frequency compared to the novel imported PRRSV sub-type strains.

ENC-GC3s drawing analysis
The effect of GC3s on codon preference is studied using the ENC-GC3s plots.As shown in Fig. 6, the distribution of ENC-GC3s plots for the two novel imported PRRSV sub-types is relatively similar, closely fitting the expected curve.These data indicate that in the absence of any natural selection, the codon preference is merely subjected to the ideal state of mutational pressure.

G12s/GC3s neutrality plot analysis
Neutral analysis using GC12 and GC3 provided quantitative means for assessing the effects of stress mutation and natural selection.Through our assay, we observed a negative correlation between the coefficients of GC12s and GC3s in the NADC30-like strain.Conversely, the NADC34-like strain exhibited a positive correlation between the coefficients of GC12s and GC3s.Nonetheless, it is of great significance to note that these correlations were not determined to be statistically significant.This phenomenon suggested that the role of natural selection in shaping the primary factors of codon bias generation in two novel imported PRRSV sub-types, and in particular for NADC30-like (Slope of regression line −0.3562,R 2 = 0.0128) and NADC34-like (Slope of regression line 0.7706, R 2 = 0.0544) (Fig. 7).These observations suggest that mutational pressure has an effect on codon usage preferences of the two newly imported PRRSV sub-type genes, and that natural selection plays an extremely essential or even dominant role in the generation of codon preferences.

Discussion
PRRSV is a plus-stranded RNA virus with a genome approximately 15 kb in length and contains at least 10 open reading frames (ORFs), including ORF1a, ORF1b, ORF2a, ORF2b, ORF3, ORF4, ORF5, ORF5a, ORF6, and ORF7 [1].The PRRSV-2 is widespread in China and has a subtantial impact on the pig farming industry.Chinese PRRSV-2 strains can be divided into four sub-types: VR-2332-like (Lineage 5), JXA1like/CH-1a-like (lineage 8), QYYZ-Like (lineage 3) and NADC30-like (Lineage 1) [13,14].Recently, NADC34-like strain isolates appeared in the United States, and both sow herds and piglets demonstrated relatively high mortality [15].The NADC34-like PRRSV strain was first reported in China (Shenyang) in 2017 and in Peru in 2019 [16], the NADC34-like PRRSV strain can be also isolated from MLV-inoculated pigs in South Korea.Besides, a recombinant porcine reproductive and respiratory syndrome virus with NADC30-, NADC34-, and JXA1-like strains was reported in 2022 [5].For the purpose of exploring the evolutionary features and host adaptation of distinct imported PRRSV sub-types and analyzing codon usage, we conducted MCC phylogenetic analysis, identifing two major imported PRRSV subtypes, NADC30-and NADC34-like strains, and for the first time identified the evolutionary dynamics of a novel imported PRRSV sub-type.Studying codon usage preference is beneficial to enhance the efficiency of target gene expression in the receptor expression system, and the translation and expression of target genes in host cells are related to the expression system of the host cells.
If the target gene introduced into the host cell contains numerous rare codons that are not frequently utilized by the host cell's expression system, it can result in a decrease in the expression level of the target gene or premature termination of translation [17].Optimization by codon modification has emerged as one of the most prosperous and efficient methods to enhance the expression efficacy when expressing ectopic genes in host cells [18].The study of codon preference provides valuable insights into the evolutionary relationship between species and reflects the fundamental principles governing biological evolution.The degree of genetic relatedness between species and the extent of differences in codon preference are closely correlated.Analysis of mutational pressure and translational selection provides significant insights for further understanding of the characteristics of an organism's genes or gene sisters, molecular evolutionary pressure, and ecological adaptation.Furthermore, comprehending codon preference is crucial for several purposes, including identifying highly expressed genes, determining whether gene repression or gene level transfer is taking place, and discovering novel genes.
Previous studies have not only validated the bat origin of PCV3 with greater accuracy but have also successfully  identified various sub-types of PCV3.Consequently, PCV3 has replaced PCV2 and secured its position as the single-stranded DNA virus exhibiting the highest substitution rate known to date [19].Natural selection is the main factor affecting the codon use preference of PCV2 clade [20].In the analysis of synonymous codon use, 59 of the four sub-types of DENV indicated similar overall trends.Some codons, which are paired with low tRNA copy numbers in both primate species, tend to be more frequently in the translation start area compared to the open reading frame in DENV [21].Codon usage bias was lower for the ICV HE gene and the main factor shaping codon usage was natural selection [22].The mutation is the main factor in the codon use preference of the PPV gene, and natural selection plays a leading role in the codon use pattern [23].
In this study, we conducted a comprehensive analysis using CAI, RCID, and FOP to compare two newly imported sub-type strains of PRRSV.The results revealed a high expression capability of foreign genes and an optimal frequency of codon usage in both strains.These findings provide strong evidence of their adaptability in the host and suggest a potential correlation with virus-host infection [12].While numerous studies look for similarities between virus and host genomes, most of these similarities are more closely related to a biological function rather than codon usage [12].Whereas, it is not unusual for there to be different codon usage biases between viruses and host that correlate with the range of hosts infected, host translation and virulence, for example [24][25][26].It has been demonstrated that the use of codons is more similar among related viruses than between viral and host genomes [27].These insights are crucial for advancing our understanding of this virus and provided important guideline for studying the infectivity and pathogenicity of the new novel imported PRRSV subtype strains.The codon preference of GC3s was explored using the ENc-GC3s map, and it was demonstrated that the codon preference of two novel imported PRRSV subtype strains is merely affected by mutational pressure in the ideal state.Neutral analyses based on GC12 and GC3 could quantitatively assess the effects of mutation pressure and natural selection, showing that mutation stress affects the codon usage preference of the genes of the two novel PRRSV input sub-types, and natural selection plays a extremely essential or even dominant role in shaping codon preference.There are subtle differences in the two PRRSV sub-types, one with a higher CAI and the other with a higher FOP, and yet it is not clear if there is an evolutionary advantage for one or the other.Nonetheless, there is one sub-group that becomes more dominant over time also worth exploring and following up [28].In conclusion, novel imported PRRSV sub-types are spreading in China, and the analysis of bias in codons indicates that the novel imported PRRSV sub-type strains are better adapted to the host for enhanced transmission.
To conclude, our study has successfully identified the impact of natural selection and mutational pressure on the codon usage preferences of two newly imported PRRSV subtype strains.These findings shed light on the evolutionary dynamics and origins of these viruses, highlighting the crucial role of codon bias.Moreover, our study may serve as a valuable model for comprehending the evolution and codon preferences of novel imported PRRSV viruses, which is significant for predicting dynamic phylogenetic trends and establishing effective prevention strategies in clinical practice.

Analysis of genome alignment and phylogenetic
Bootstrap analysis of nucleotide sequences with 1000 replicates was performed using iTOL v6 (https:// itol.embl.de/).The reference sequence information is demonstrated in Table 3 (the nucleotide database was recorded in 2021).

Phylogeographic model
Phylogenetic and phylodynamic analysis based on GP5 genes of NADC30-like and NADC34-like, maximum cladistic confidence tree analysis using Markov Chain Monte Carlo (MCMC) method (computation: 10 million cycles).BEAST software (version 1.10.4) is used to calculate the time and evolution of the tMRCA rate.Best-fit models (GTR + I + G), relaxation clocks (lognormal), and the effective population size was evaluated by building a Coalescent Bayesian skyline model [29].

Calculation of Effective Number of Codons (ENC)
Effective Number of Codons (ENC) is a quantitative value employed to define the average frequency of codon Fig. 6 Linear relationship between ENC and GC3 for two novel imported PRRSV sub-types usage in genes that deviates from synonymous codons.It describes the extent to which codon usage deviates from random selection and better reflects the degree of preference for codon usage in genes [31].

Relative Synonymous Codon Usage Analysis (RSCU)
The RSCU is the ratio of the statistical observation of a synonymous codon to the original expectation of the number of occurrences of that codon, and is 1 if the codon has no preference, or greater than 1 if the codon is used preferentially compared to other synonymous codons, and vice versa [32] (http:// www.bioin forma tics.nl/ cgi-bin/ emboss/ cusp) [23].

Relative Codon Deoptimization Index (RCDI)
The RCDI is a method for comparing general codon distributions.Translation rates and RCDI values are positively correlated with the similarity between viral and host genes (RCDI value close to 1), indicating the potential expression of certain genes or even a lower replication rate (http:// genom es.urv.cat/ CAIcal/ RCDI) [23].

Measurement of Codon Adaptation Index (CAI)
The CAI value is adopted to measure codon usage preference, and the CAI value is the ratio of RSCU, which is evaluated by comparing the ideal RSCU value of a protein encoded by utilizing the optimal codon exclusively, weighted by applying its actual RSCU observations [33].

Analysis of Frequency of Optimal Codons (FOP)
The ratio of the best codon to the total number of codons of a gene, provided that the best codon in the highly expressed gene is obtained, is the Frequency of Optimal Codons (FOP) [23].

Calculation of Codon Bias Index (CBI)
The index used to calculate the extent of the use of the best codons is the Codon Bias Index (CBI), which   correlates well with the ENC values, and the expression of foreign genes in the target host is reflected.Access to the best codons in highly expressed genes is a prerequisite for the calculation of the CBI [34].

ENC-GC3s drawing
The linear relationship between ENC (vertical coordinate) and GC3s (horizontal coordinate) was explored to investigate whether factors other than mutational pressure are involved in codon usage pattern formation.Codon preference is merely affected by mutational pressure in the absence of natural selection, the ENC value will fall on or closer to the ideal curve as the ideal state.On the condition that codon preference is affected by natural selection and mutational pressure as well as other factors, the ENC value will fall below the desired curve, indicating that other factors also affect codon usage preference [35].

Neutrality plot analysis (G12s/GC3s)
The linear relationship between GC12 and GC3 mainly indicates the effect of mutation and natural selection pressure on codon usage bias.When the correlation between GC12 and GC3 is significant, mutation is the major factor and the effect of natural selection is a minor factor.Conversely, on the condition that the correlation between GC12 and GC3 is not significant, the effect of natural selection becomes the major factor and mutation is a minor factor [23].

Fig. 1
Fig. 1 PRRSV phylogenetic tree based on the nucleotide sequences of two novel imported PRRSV sub-types

Fig. 2
Fig.2Codon mutation rates and skyline plots for the structural protein of the novel input PRRSV NADC30-like strain (A, B).The Bayesian Markov Chain method (BEAST) was applied to estimate the codon mutation rate of the NADC30-like-GP5 structural protein gene (C)

Fig. 3
Fig.3Codon mutation rates and skyline plots for the structural protein of the novel input PRRSV NADC34-like strain (A, B).The Bayesian Markov Chain method (BEAST) was applied to estimate the codon mutation rate of the NADC34-like-GP5 structural protein gene (C)

Fig. 4
Fig. 4 Relative synonymous codon usage (RSCU) of PRRSV sub-type values.The colors of NADC30-like strain and NADC34-like strain are red and blue, respectively

Fig. 5 A
Fig. 5 A Scatter-plot of CAI of structural protein genes of two novel input PRRSV sub-types.B Scatter-plot of RCDI of structural protein genes of two novel input PRRSV sub-types.C Scatter-plot of FOP of structural protein genes of two novel input PRRSV sub-types.Asterisks indicate significant differences between two groups, 'ns' stands for non-statistical significance, **P < 0.01 and ****P < 0.0001

Fig. 7
Fig. 7 Neutrality plot of two novel imported PRRSV sub-types

Table 1
Properties of structural protein genes from PRRSV strains analyzed in this study (mean value ± SD)

Table 2
Properties of structural protein genes from two novel imported PRRSV sub-type strains, with relative synonymous codon usage analysis (Preferred codons, sub-types, and potential hosts are displayed in bold (mean value ± SD)

Table 3
Reference strains information