Sample collection and laboratory procedures
Snow leopard samples were collected from ten of the twelve snow leopard range countries (Russia, Mongolia, Kyrgyzstan, Tajikistan, Uzbekistan, Afghanistan, Pakistan, Nepal, Bhutan, and China; see Appendix
S1 in Supporting Information). Scat samples were collected opportunistically from 2008 to 2019 via extensive international collaborative efforts between researchers, academic institutions, NGOs, and local communities (Table
1). Scats were georeferenced and stored until DNA extraction in either silica gel beads or ethanol. Blood samples were collected for research activities outside this study, and were obtained using proper permits and transported to the United States according to CITES guidelines. DNA from a total of 507 samples (scat = 443, blood = 25, and DNA previously extracted from scats = 39) were analyzed, 336 were confirmed as snow leopards and suitable for analysis. Of the 336 individuals, 108 were previously genotyped for the same loci we used (Caragiulo et al. unpublished data, American Museum of Natural History, New York, NY, USA) and 99 were genotyped for other markers (Korablev et al.
2021) and re-genotyped using our loci (see below), and the remaining 129 were processed for the first time in this study. Genetic analysis, DNA extraction, species ID (as needed), and genotyping were conducted at the National Genomics Center for Wildlife and Fish Conservation (NGC), United States Forest Service, Rocky Mountain Research Station (Missoula, MT, USA). DNA was extracted from scats with QIAmp DNA Stool Mini Kits (Qiagen, Inc., Valencia, CA, USA). DNA from blood samples was extracted with DNeasy Tissue Kits (Qiagen, Inc., Valencia, CA, USA). Quality and quantity of template DNA were determined by 1.6% agarose gel electrophoresis.
To determine species in unknown samples, we amplified ~ 360 base pairs (bp) of mitochondrial DNA using 16 S rRNA universal primers, (modified from Hoelzel and Green
1992). Reaction volumes of 30 µl contained 50–100 ng DNA, 1× reaction buffer (Life Technologies, NY, USA), 2.5 mM MgCl
2, 200 µM each dNTP, 1 µM each primer, 1 U AmpliTaq Gold polymerase (Life Technologies, NY, USA). The polymerase chain reaction (PCR) program was 94 ◦C/5 min, [94 ◦C/1 min, 50 ◦C/1 min, 72 ◦C/90s] × 34 cycles, 72 ◦C/5 min. PCR products were purified using ExoSap-IT (Affymetrix-USB Corporation, OH, USA) according to manufacturer’s instructions. Reactions were sequenced at Eurofins Genomics (Louisville, KY) using standard Sanger sequencing protocols. DNA sequence data were viewed and aligned with Sequencher v.5.4.6 (Gene Codes Corp. MI), and compared to reference sequences from known species.
We assessed genetic variation from nuclear DNA to characterize range-wide population structure to determine if patterns of microsatellite diversity differed from those observed in mtDNA. We amplified 12 microsatellites known to be variable in snow leopards: FCA032, FCA075, FCA096, FCA100, FCA124, FCA126, FCA132, FCA208, FCA212, FCA225, FCA229, FCA275 (Menotti-Raymond et al.
1999). Microsatellite loci were amplified in PCR multiplexes, where total volume (10 µl) contained 1.0 µl DNA, 1x reaction buffer (Life Technologies Grand Island, NY, USA), 2.0 mM MgCl
2, 200 mM of each dNTP, 1 mM reverse primer, 1 mM dye-labeled forward primer, 1.5 mg/ml BSA, and 1U Taq polymerase (Life Technologies). The PCR profile was 94 °C/5 min, ([94 °C/1 min, 55 °C/1 min, 72 °C/30 s] x 36 cycles). Resultant products were visualized on a LI-COR DNA analyzer (LI-COR Biotechnology).
To standardize the previous genotypes (Caragiulo et al. unpublished) with new genotypes, 15 individuals from the Caragiulo et al. dataset that encompassed the range of observed alleles were used as standards in scoring the new genotypes, thus ensuring consistency across scoring profiles and allowing us to confidently combine datasets. To screen for allelic dropout, we performed replicate genotyping for all new samples. Samples with amplification failure, or discrepancies between replicates, were again repeated in duplicate to minimize genotyping error. For statistical analyses, we only used unique individuals that successfully amplified at 8 or more (≥ 66%) of our loci, which provided 43 individuals from the 129 unknown samples, 87 of 108 individuals from Caragiulo et al. (unpublished), and 52 of 99 individuals from Korablev et al. (
2021), for a microsatellite dataset of 182 individual snow leopards.
For phylogenetic analyses, we sequenced a spatially stratified subset of 80 individual snow leopard samples across 607 bp concatenated mtDNA sequences: 445 bp of cytochrome
b using primers mcb398 and mcb869 (Verma and Singh
2003), and 162 bp of control region using primers PUN-CCR-F and PUN-CCR-R254 (Janečka et al.
2017). PCR profile and sequencing steps for both markers followed the same protocol used for species identification.
Fifty-eight individuals were represented in both the mitochondrial and the microsatellite datasets; the remaining individuals in the mitochondrial dataset (
n = 22) and microsatellite dataset (
n = 124) represented different individuals from different sampling locations (Table
1). No individuals from the Tibetan Plateau successfully amplified for the mitochondrial dataset.
Data analysis
Previous studies on snow leopard connectivity suggest mountain ranges are important linkages for global snow leopard populations (Riordan et al.
2016; Li et al.
2020). Therefore, we assigned individual samples to groups based on their geographic location in one of eight major mountain ranges (Altai, Gobi Desert, Himalaya, Karakoram, Pamir, Sayan, Tian Shan, Tibetan Plateau; see Table
1 for sample size by range). These mountain ranges served as a priori population units for subsequent analyses requiring sample groupings.
We described phylogeographic dynamics of snow leopards using two approaches with mtDNA data: (i) comparing phylogenies using Bayesian and maximum likelihood (ML) methods, and (ii) testing for signatures of population expansion. This phylogeographic framework simultaneously addressed our second question regarding genetic signatures of glacial refugia, as spatial structuring of distinct genetic lineages may indicate evidence of divergence in glacial refugia (Provan and Bennett
2008). We estimated number of polymorphic sites, number of haplotypes (
k), haplotype diversity (
h), and nucleotide diversity (π) for mtDNA sequences (Table
1) using DnaSP v6 (Rozas et al.
2017).
To infer phylogenetic relationships among mtDNA haplotypes, we used Bayesian and maximum likelihood criteria. We used the software jModelTest (Posada
2008) to identify the best substitution model for concatenated snow leopard sequences based on consensus between corrected Akaike Information Criterion (AICc) and Bayesian Information Criterion (BIC). A Bayesian maximum clade credibility tree was created using BEAST v1.10 (Suchard et al.
2018) using a strict clock model, the best substitution model, default optimization schedule, and Markov chain Monte Carlo (MCMC) chain-length of ten million. We sampled every 10,000 generations, discarded the first 10% of samples, and estimated Bayesian posterior probabilities (BPP) on the 50% majority rule consensus of remaining trees. We analyzed results in Tracer v1.7.2 (Rambaut et al.
2018). Phylogenetic trees were summarized in TreeAnnotator v1.10.4 (Drummond et al.
2007) and visualized and stylized in FigTree v1.4.4 (
http://tree.bio.ed.ac.uk/software/figtree/). We also calculated a maximum likelihood phylogeny with software MEGA X (Kumar et al.
2018), and assessed resulting relationships with 1000 bootstrap replicates. All phylogenies were rooted with a concatenated sequence from a tiger mitogenome (Zhang et al.
2011- Genbank accession HM589215.1). To further examine the intraspecific genetic variation of phylogenetic relationships, we also conducted a haplotype network analysis on the mtDNA sequences with PopART software (
https://popart.maths.otago.ac.nz/) using the median joining algorithm (Bandelt et al.
1999).
Prior work has identified common genetic signatures underlying population dynamics that occur through glaciation cycles (Stewart et al.
2010). Demographic events such as rapid population growth and expansion exhibit either high levels of haplotype diversity, with low levels of nucleotide diversity, or low levels of haplotype diversity over large areas (Avise
2000). We calculated Tajima’s D (Tajima
1989) and Fu’s
FS statistic (Fu
1997), using DnaSp (Rozas et al.
2017), for the global population and any distinctive clades denoted in the phylogenies. These two tests of neutrality are indicative of population expansion with significant negative departures from zero. To distinguish patterns of expansion from background selection, we calculated Fu and Li’s (
1993)
D* and
F* statistics. If
FS is significant but
D* and
F* are not, population growth or range expansion is supported (Fu
1997). We also calculated mismatch distributions of pairwise nucleotide differences in DnaSp, where unimodal distributions indicate population expansion, and multimodal distributions indicate stable populations.
Using our microsatellite data, we used two approaches to address our third question to characterize contemporary population genetic structure and connectivity: (i) determining range-wide population genetic structure with non-spatial Bayesian clustering approaches, and (ii) testing for concurrence amongst multiple exploratory multivariate analyses. We derived all standard estimates of genetic diversity for each mountain range in GenAlEx 6.503 (Peakall and Smouse
2012), and used the R package
genepop (Rousset et al.
2008) to calculate allelic richness (Table
1). We tested loci for linkage disequilibrium (LD) and deviations from Hardy- Weinberg equilibrium (HWE) with significant
P-values adjusted for multiple comparisons (Rice
1989). For populations experiencing spatially limited dispersal, genetic differentiation among individuals increases as geographic distance increases, resulting in spatially-autocorrelated mating patterns (isolation by distance; Wright
1943). We tested for isolation by distance (IBD) for all cats via a simple Mantel test, which quantifies the correlation between individual pairwise genetic and Euclidian distances (calculated in GenAlEx using default genetic distances; statistical significance assessed with 10,000 permutations), and evaluated patterns of spatial autocorrelation for all 182 individuals to detect departures from random mating within 100 km distance categories reflecting snow leopard dispersal (Riordan et al.
2016).
We used the non-spatial, Bayesian clustering program
structure (Pritchard et al.
2000) to determine the optimal number of populations (
K) based on genotype data. This approach minimizes Hardy-Weinberg proportions and linkage disequilibrium within each cluster. However, simulation studies have shown
structure to be unreliable at identifying population structure when sampling is unbalanced, including merging populations comprising small sample sizes (Puechmaille
2016; Wang
2017). Wang (
2017) showed this reduced performance is primarily caused by using the default ancestry prior, which assumes all populations contribute equally, and the default alpha value (α = 1), which inhibits the mixing of the MCMC sampler and falsely attributes individual ancestry to come from a single population. Instead of using these default parameter settings, a combination of the alternative ancestry prior, which assumes populations contribute variably to the sampled individuals, the uncorrelated allele frequency model, and adopting a smaller alpha value improved
structure’s ability to obtain accurate individual assignments (Wang
2017). To address the likely influence of uneven sampling on population clustering, we explored multiple parameter combinations in
structure to optimize our inferences on snow leopard population genetic structure. We tested the following parameter combinations: for all models we used the alternative ancestry prior, an initial alpha value α = 1/
K, where
K is the assumed number of clusters 1–5, and either a correlated or uncorrelated allele frequency model. For all models, we also applied the prior model parameter LOCPRIOR, which can be informative when there are weak population signals due to small sampling sizes or low levels of genetic differentiation between populations (Hubisz et al.
2009). This resulted in ten separate
structure runs for the microsatellite dataset. For each model, we performed ten independent runs of
K = 1–5 with 1,000,000 Markov Chain Monte Carlo (MCMC) steps and 300,000 burn-in steps under admixture, uncorrelated allele frequency model. Optimal
K among tested values was determined by visual examination of likelihood scores in
structure harvester (Earl and vonHoldt
2012) using both the standard procedure and Δ
K statistic (Evanno et al.
2005). When genetic differentiation is strong, the Δ
K statistic performs well, but under moderate to low genetic differentiation the standard procedure performs better (Waples and Gaggiotti
2006). We applied both approaches to identify the uppermost level of population structure in each model, and evaluated pairwise
FST values for the most highly supported
K in Arlequin 3.5.2 (Excoffier and Lischer
2010). Individual snow leopards were assigned to a putative population based on their highest ancestry coefficient (
q), which represents the estimated probability an individual belongs to a given cluster. We created
structure bar plots in
structure plot (Ramasamy et al.
2014). For populations with weak genetic differentiation, congruence amongst multiple methods may be especially useful in determining the true
K while minimizing various criticisms associated with each method (Kanno et al.
2011; Latch et al.
2006), thus we used principal component analysis (PCA) as implemented in R package
adegenet (Jombart
2008). PCA complements Bayesian analysis by not requiring data to meet the assumptions of LD and HWE. For PCA, we retained the first three axes, and performed k-means clustering to evaluate the optimal number of clusters. We assessed recent population bottlenecks in each mountain range by testing for heterozygote excess using a two-phase model of mutation (TPM) using the program BOTTLENECK (Piry, Luikart, and Cornuet;
1999). We constrained models by defining multistep mutations to account for 5%, 10%, 20%, and 25% of mutations for cats in each mountain range. For all four scenarios, the Wilcoxon signed rank test was used to determine which mountain ranges exhibited a heterozygous excess (Luikart et al.
1998).