Introduction

Reiss (1967) conducted the first large-scale study of attitudes toward “various degrees of sexual permissiveness embodied in our premarital standards” (p. 6), in which he noted that egalitarianism had not yet been achieved and that a sexual double standard (SDS) existed for men and women (Crawford & Popp, 2003). Many studies on the sexual double standard followed and the concept has been thoroughly reviewed over recent decades (Bordini & Sperb, 2013; Crawford & Popp, 2003; Eaton & Rose, 2011; Fugère, Escoto, Cousins, Riggs, & Haerich, 2008; Sanchez, Fetterolf, & Rudman, 2012). These reviews conclude that heterosexual romantic relationships seem to have become somewhat more egalitarian, but that the sexual double standard still exists, albeit in a different form. Whereas originally the central notion of the sexual double standard pertained to premarital courting and sexual behavior (Reiss, 1967), later definitions focused less on marital status, and included expectations in terms of sexual roles in line with the sexual double standard (e.g., prescribing divergent [re]active and [sub]assertive sexual roles to men and women) (Sanchez et al., 2012).

The sexual double standard has been related to a multitude of negative sexual and health outcomes, such as increased dating violence and sexual violence (Shen, Chiu, & Gao, 2012), poor sexual functioning among young women (Kiefer & Sanchez, 2007), higher STI/HIV infection risk (Bermúdez, Castro, Gude, & Buela-Casal, 2010), and decreased sexual and relationship satisfaction for both men and women (Sanchez, Crocker, & Boike, 2005). However, other studies in this field have produced mixed and contradictory results (Fugère et al., 2008; Marks & Fraley, 2005, 2006; Sanchez et al., 2012). Reviews have partially ascribed this to methodological issues (Sanchez et al., 2012), as well as to the use of outdated measures (Bordini & Sperb, 2013; Crawford & Popp, 2003; Fugère et al., 2008). It seems that the concept of the sexual double standard has evolved along with changes in the display of gendered behavior in dating and sexuality, but research methods have not been able to keep pace. This calls for the development of modernized methods and measures.

In this study, we respond to this call for modernization (Bordini & Sperb, 2013), by introducing a new measure to examine sexual double standard endorsement in its contemporary form. The new measure was designed based on a number of desired features. Firstly, we set out to develop an instrument that was suitable for capturing sexual double standard endorsement from the moment people begin to experience romantic and partnered sexual situations. Adolescence is a period when people start to explore sexual and romantic interactions (Collins, Welsh, & Furman, 2009), whereas emerging adulthood is a time when romantic interactions become more serious and are more likely, compared to adolescence, to include sexual intercourse (Arnett, 2000). We also know that there is already evidence of sexual double standards in first-time sexual interactions (Sanchez et al., 2012). Simultaneously, older measures mostly asked about abstinence before marriage, which does not translate well to today’s reality for young people as marriage rates fall and different relationship forms, such as co-habitation, are on the rise. Therefore, the suitability of the new instrument specifically for assessment among both adolescents and emerging adults was a key factor in its design.

Secondly, the instrument encompassed a greater variety of aspects of sexual double standards in comparison with older sexual double standard measures. Many previous measures have focused heavily on permissiveness and sexual abstinence before marriage, such as the Sexual Double Standard Scale (Muehlenhard & Quackenbush, 1998) and the Double Standard Scale (Caron, Davis, Halteman, & Stickle, 1993). However, in a contemporary context, the sexual double standard encompasses several other aspects that have been insufficiently highlighted or were absent in previous measures. Based on numerous studies, sexual double standard endorsement is no longer related only to premarital sex and virginity status, but also to (as many as 13) other beliefs relevant to the SDS construct (Allen, 2003; Bay-Cheng, 2015; Hayes, Lorenz, & Bell, 2013; Horne & Zimmer-Gembeck, 2005, 2006; Hyde, 2005; Kehily, 2001; Kreager & Staff, 2009; Moss-Racusin, Phelan, & Rudman, 2010; Petersen & Hyde, 2010; Reidy, Shirk, Sloan, & Zeichner, 2009; Sanchez, Fetterolf, & Rudman, 2012; Sprecher, Regan, & McKinney, 1998; ter Bogt, Engels, Bogers, & Kloosterman, 2010; Tolman, 2009, 2012; Vanwesenbeeck, 2009; Yasan, Essizoglu, & Yildirim, 2009).

We therefore chose to reflect this multifaceted nature of the contemporary sexual double standard in the item pool of the new instrument (Study 1). In doing so, we broadly define the contemporary sexual double standard as “the degree to which an individual’s attitude reflects a divergent set of expectations for boys and girls, in that boys are expected to be relatively more sexually active, assertive, and knowledgeable and girls are expected to be relatively more sexually reserved, passive, and inexperienced” (Emmerink, van den Eijnden, Ter Bogt, & Vanwesenbeeck, 2016). The instrument thus assesses an individual’s attitude toward perceived social norms concerning sexuality for boys and girls. We wish to be mindful of deleting any of the themes established above entirely, through the deletion of items based on statistical arguments, although this cannot be completely prevented. We believe that the leading argument in this case should be that the multifaceted nature of the scale should not be compromised.

Thirdly, since the sexual double standard is a highly heteronormative phenomenon, the instrument was designed specifically for assessment in heterosexual samples. Although non-heterosexual populations are also bound to be affected by heteronormative gender norms (Szymanski & Henrichs-Beck, 2014), it is plausible that they are affected in a different way. It may be possible in the future to adapt the instrument for use in non-heterosexual samples. Fourthly, to enhance comparability of the results for young men and women, as well as to enable the instrument to be used in multiple study types, we designed the instrument so that it would be suitable for assessing both females and males. Fifthly and lastly, the original item pool was constructed in a manner matching our expectation that this instrument would measure a single construct, namely the sexual double standard. The study should show, however, whether this is indeed the case.

This article covers both instrument development (Study 1) and tests of psychometric properties (Study 1 and 2). The following research questions are addressed in the two studies:

  1. 1.

    Which subset of items for assessing sexual double standard endorsement among male and female adolescents and emerging adults forms the best one-factor and internally consistent scale? (Study 1).

  2. 2.

    Does the factor structure of the newly developed instrument established in the first study replicate in another sample, and what are the test–retest reliability, the construct validity, and convergent validity of this instrument? (Study 2).

  3. 3.

    Does the newly developed instrument show measurement (in)variance across time, gender, age, education level, and sexual experience level? (Study 2).

Study 1

Method

Participants

The sample consisted of 512 adolescents and emerging adults (46.9% boys, 53.1% girls), aged between 16 and 20 years (M = 18.12, SD = 1.37). Participants completed a survey on “adolescent sexuality” and were assured of anonymity and the ability to end their participation at any time. Based on a sexual orientation question with a five-point response scale, participants were excluded if they indicated that they were attracted exclusively or mainly to members of their own sex, were attracted to both sexes equally, or were undecided as to their sexual orientation. Using this criterion, 47 participants were excluded from the analyses. The final sample consisted of 465 heterosexual adolescents (45.2% boys, 54.8% girls) (which amounts to 90.8% of the original sample), aged between 16 and 20 years (M = 18.08, SD = 1.34). Sample characteristics are shown in Table 1.

Table 1 Sample characteristics

Procedure

Ethical approval for the study was granted by the Child and Adolescent Studies department board at Utrecht University. An online panel enrolled by a commercial party was contracted to recruit community participants for our study. Participants were able to win prizes by participating, but received no financial reward. The aim was to obtain a sample that included often underrepresented groups (e.g., non-native Dutch and lower educated participants) in order to adequately reflect Dutch society. The method of data collection provided us with a sample of community adolescents and emerging adults aged between 16 and 20 years. The use of a panel made acquiring this sample more feasible. Moreover, using an Internet panel was of added value because our study involved rather personal questions and the Internet offers relative anonymity. This allowed participants to complete the questionnaire in the comfort and privacy of their own homes. Participants ticked a box stating that they understood that the questions would be of a sexual nature and that they wanted to continue to the questionnaire. They were further informed that they could cease their participation at any time. No parental consent was needed, because the minimum age for completing the questionnaire was 16.

Measures

The proposed scale items were designed with older sexual double standard measures in mind (e.g., Traditional Sexual Attitudes [Kiefer & Sanchez, 2007]; Gender-Equitable Men Scale [Pulerwitz & Barker, 2008]; Male Role Attitudes Scale [Pleck, Sonenstein, & Ku, 1994]; Double Standard Scale, [Caron et al., 1993]; Sexual Double Standard Scale [Muehlenhard & Quackenbush, 1998]) as well as based on empirically and theoretically derived insights from the literature analysis described in the Introduction. We made sure to design items that would be suitable for assessment among heterosexual male and female adolescents and emerging adults (i.e., no difficult wording or too many items describing marriage). In total, we generated 35 items on which participants indicated their degree of agreement on a six-point Likert scale ranging from “1 = completely disagree” to “6 = completely agree.” The items consisted of statements reflecting perceived social norms concerning sexuality for boys and girls. The study was conducted in the Dutch language. To facilitate readability for an international audience, English translations of the items are given in “Original 35-Item Pool for the SASSY”section. The original Dutch item wording can be obtained from the corresponding author upon request.

Demographics

Gender and age Participants indicated their biological sex (male or female) and age.

Education Participants answered a question on their current occupation: studying or not studying. They also indicated the highest academic qualification they had attained. If participants’ main occupation was studying, the type of education they were following was taken as their education level. If participants indicated that they were currently not studying, the highest-level qualification they had obtained was taken as their education level. Education level was categorized as lower (primary school and junior vocational training), intermediate (intermediate education and vocational training), and higher education (pre-university education and university).

Sexual experience Participants answered the question, “How many people have you had sex with in your life?” on a five-point scale with response categories ranging from “1 = none” to “5 = more than 10.” A definition of sex was given: “By “sex,” we mean everything from feeling each other naked or caressing each other, to intercourse (penetration of the vagina or anus by the penis).” The responses were then recoded into a binary variable for use in the analyses: no sexual experience (for participants who answered “none”) versus sexual experience (for participants who answered “one or more”).

Analytical Strategy

We assessed the factor structure and internal consistency of the 35-item pool to determine which subset of items formed the best one-factor and internally consistent, reliable scale for the assessment of sexual double standard endorsement. We employed an exploratory factor analysis to this purpose. There were no missing values; therefore, no missing data handling procedure was needed.

Results

Factor Structure and Internal Consistency

First, the factor structure and reliability of the 35-item instrument we had constructed were assessed. The scale was subjected to an exploratory factor analysis using principal axis factoring with oblique rotation. The Kaiser–Meyer–Oklin value was .88, which is above the recommended cutoff value of .60 (Kaiser, 1970, 1974), and Bartlett’s Test of Sphericity (Bartlett, 1954) was statistically significant, supporting factorability. Furthermore, upon inspection of the scree plot, a break could be seen after the first component extracted. As the aim was to construct a one-dimensional (single factor) measure, we excluded the 11 items that did not load above .40 on the first factor (see “Original 35-Item Pool for the SASSY” section for a breakdown of which items were excluded in this step). After these necessary scale adjustments had been made, an analysis of internal consistency was conducted with the remaining 24 items, yielding a Cronbach’s alpha of .80. However, analyses indicated that removing an additional four items would greatly increase internal consistency. This yielded a Cronbach’s alpha of .90 for the 20-item instrument. In the last step, the factor analysis was repeated, confirming a single-factor solution which explained 34% of the variance. Factor loadings for the items represented in the 20-item are shown in Table 2.

Table 2 Scale for the Assessment of Sexual Standards among Youth items and factor loadings across Studies 1 and 2

Study 2

A subsequent study was conducted in two waves to examine whether the factor structure would replicate in a different sample using a slightly broader age group, and to examine the test–retest reliability, construct and convergent validity, and measurement invariance of instrument scores. Test–retest reliability was assessed by comparing scores across the two waves, which were eight weeks apart. Construct validity was addressed by assessing the relationship of participant scores on the new instrument to participant scores on the SDSS (Muehlenhard & Quackenbush, 1998). This scale was chosen because it is widely used for the assessment of sexual double standards (Bordini & Sperb, 2013). We expected there to be a strong positive relationship between the scale scores, because both scales have been designed to measure sexual double standard endorsement. However, we did not expect the relationship to near perfection, because the new instrument was additionally designed for a specific context, a specific target group, and to be more multifaceted compared to previous instruments. Lastly, convergent validity was established by assessing the relationship of participant scores on the new instrument to scores on gendered attitudes in another context, namely the family context. We expected to observe a positive weak to moderate relationship between the scale scores, because both scales have been designed to measure gendered attitudes.

Method

Participants

The original sample obtained at Wave 1 consisted of 873 adolescents and emerging adults. As in the first study, we excluded participants who indicated that they were attracted mainly or exclusively to members of their own sex, were attracted to both sexes equally, or were unsure of their sexual orientation (n = 55). The final sample used in the analyses consisted of 818 heterosexual adolescents and emerging adults at Wave 1 (this amounts to 93.7% of the original sample). In comparison with Wave 1, a further 202 participants were lost as they did not complete the Wave 2 questionnaire. This led to a final sample used in the analyses for Wave 2 of 616 heterosexual adolescents and emerging adults (this amounts to 70.6% of the original sample, and to 75.3% of the sample analyzed at Wave 1). A comparison between the participants who dropped out between Wave 1 and Wave 2 (N = 202) and participants who completed both waves (N = 616) showed no significant differences in gender, age or scores on the variable of interest (scores on the new instrument). Sample characteristics are shown in Table 3.

Table 3 Sample characteristics in Study 2

Procedure

This study was granted ethical approval by the Ethics Committee of the Faculty of Social and Behavioral Sciences at Utrecht University (Reference: FETC15-003). Data collection was outsourced to CentERdata, Institute for Data Collection and Research, which is attached to Tilburg University, the Netherlands, and was carried out using the LISS (Longitudinal Internet Studies for the Social Sciences) panel. The LISS panel is a representative sample of Dutch individuals who participate in monthly Internet surveys from the comfort of their own homes (in exchange for a small reward). The panel is based on a true probability sample of (approximately 5000) households drawn from the population register. Households without a computer and Internet access are provided these by LISS. A random selection of LISS panel members from those households was invited to participate in the study. The number of eligible candidates in each household varied, according to how many household members subscribe to the panel and whether they fit our age inclusion criterion. The specific sample included thus consisted of a unique draw from the participants in the LISS panel. More information about the panel can be found on their Web site (www.lissdata.nl). The method of data collection provided us with a sample of community adolescents and emerging adults aged between 16 and 25 years that was large enough for a solid validation process. The use of a panel made acquiring this sample more feasible as there was a known response rate within the panel and adequate oversampling could be provided. Moreover, using an Internet panel was of added value because our study involved rather personal questions and the Internet offers relative anonymity. Participants ticked a box stating that they understood that the questions would be of a sexual nature and that they wanted to continue to the questionnaire. The study was described to them as “a study on young people and sexuality.” They were further informed that they could cease their participation at any time. No parental consent was needed, because the minimum age for completing the questionnaire was 16. Data collection took place between May and July of 2014. To enable test–retest reliability to be examined, the same participants completed the questionnaire in two waves, the second wave taking place eight weeks after the first.

Measures

The revised instrument described in Study 1, now consisting of 20 items, was administered to participants in both Wave 1 and Wave 2. Participants indicated their degree of agreement on a six-point scale ranging from “1 = completely disagree” to “6 = completely agree.” See “Original 35-Item Pool for the SASSY” section for English-language item wording.

Demographics

Gender, age, education level, and sexual experience were assessed in an identical manner to Study 1.

Construct Validity

The Sexual Double Standard Scale (Muehlenhard & Quackenbush, 1998) was included in the survey in order to examine construct validity. The scale contained 20 items on which participants could indicate their degree of agreement on a four-point scale ranging from “1 = completely disagree” to “4 = completely agree.” In our study, we obtained a Cronbach’s alpha of .52 for this scale. An example item is: “It is just as important for a man to be a virgin when he marries as it is for a woman.”

Convergent Validity

Scores on the constructs used to assess convergent validity were taken from the longitudinal database of the LISS panel. Complete data were available for 504 of the participants who had also completed the new instrument and SDSS questionnaires.

Family gender norms Questions were derived from the European Values Study (EVS) (2016). Higher scores on this measure indicated more conservative family gender norms. The scale contained seven items, rated on a five-point scale ranging from “1 = completely disagree” to “5 = completely agree.” The items formed a reliable scale, with a Cronbach’s alpha of .75 in this study. An example item is: “A child who is not yet attending school is likely to suffer if his or her mother has a job.”

Traditional values Questions were derived from the European Social Survey (ESS) (2016). Higher scores on this measure indicated more conservative gender norms for child-rearing. The scale contained four items, rated on a five-point scale ranging from “1 = completely disagree” to “5 = completely agree.” The items formed a reliable scale, with a Cronbach’s alpha of .72 in this study. An example item is: “Generally speaking, boys can be brought up more liberally than girls.”

Analytical Strategy

First, the factor structure and internal consistency of the new instrument were reassessed. We employed a confirmatory factor analysis to this purpose. Subsequently, analyses were performed to ascertain test–retest reliability, construct validity, and convergent validity. Lastly, measurement (in)variance was examined across time, gender, age, education, sexual experience level, and ethnicity using confirmatory factor analysis. There were no missing values: therefore, no missing data handling procedure was needed.

Results

Factor Structure

The factor structure of the new instrument was reassessed using confirmatory factor analysis with principal axis factoring. The Kaiser–Meyer–Oklin value was .91 for both Wave 1 and Wave 2, which is above the recommended cutoff value of .60 (Kaiser, 1970, 1974), and Bartlett’s Test of Sphericity (Bartlett, 1954) was statistically significant in both Wave 1 and Wave 2, supporting factorability. The analysis showed that all items, except one, loaded above .40 and sufficiently strong on the first factor in both Wave 1 and Wave 2, supporting a one-factor solution. Item 2 (Girls like boys who take the lead in sex), however, loaded somewhat lower in both Wave 1 and Wave 2. Based on these factor loadings, we decided to drop Item 2. Subsequent analyses were, therefore, performed with the 19-item instrument. The single factor of the 19-item instrument explained 32% of the variance in Wave 1 and 34% of the variance in Wave 2. Factor loadings are shown in Table 2. This final set of items was named the Scale for the Assessment of Sexual Standards among Youth (SASSY). The final 19-item instrument can be found in “Original 35-Item Pool for the SASSY” section. Mean scores on the separate SASSY items as a function of gender are shown in Table 4.

Table 4 t-tests of separate SASSY item means as a function of gender (Study 2)

Internal Consistency

The reliability of the 19-item SASSY scale was assessed in both Wave 1 and Wave 2. Cronbach’s alphas obtained were well above the cutoff point for a reliable scale: .89 in Wave 1 and .90 in Wave 2.

Test–Retest Reliability

The correlation between the Wave 1 and Wave 2 SASSY data was substantial and highly significant (see Table 5), and within-gender scores on the SASSY did not differ significantly between Wave 1 and Wave 2.

Table 5 Correlations between SASSY and other measured constructs

Construct Validity

The correlation between the SASSY and SDSS (Wave 1) was large (See Table 5).

Convergent Validity

A small but significant, positive correlation was found between SASSY (in both Wave 1 and Wave 2) and Family Gender Norms (see Table 5), indicating that increased sexual double standard endorsement was related to less liberal family gender norms (toward women). A moderate significant positive correlation was found between SASSY (in both Wave 1 and Wave 2) and Traditional Values (see Table 5), indicating that increased sexual double standard endorsement was related to less liberal gender norms for child-rearing.

Measurement Invariance

Lastly, measurement (in)variance was examined across time, gender, age, education, sexual experience level, and ethnicity using confirmatory factor analysis. We assessed configural invariance (requires that model fit is acceptable across groups), metric invariance (requires that factor loadings are invariant across groups), and scalar (or strong) invariance (requires that item intercepts are invariant across groups), as proposed by Steenkamp and Baumgartner (1998). As cited in Steenkamp and Baumgartner, measurement invariance refers to “whether or not, under different conditions of observing and studying phenomena, measurement operations yield measures of the same attribute.” In other words, whether a scale assesses true differences between groups or whether differences result from systematic biases. We used the standard Root Mean Square Error of Approximation (RMSEA) cutoff of <.05, with PCLOSE non-significant, and CFI > .90 to examine goodness of fit. To examine nested model differences, χ 2 difference tests were conducted. If this test was non-significant, measurement invariance is presumed to be present. However, we additionally used the decrease in CFI between the nested models, because the χ 2 difference test is sensitive to sample size, whereas CFI is more robust (Milfont & Fischer, 2015). If the χ 2 difference test was significant, but nested models differ by no more than .01 in CFI, measurement invariance is concluded to be present, regardless of the significant χ 2 difference test (Cheung & Rensvold, 2002).

The fit of the factor model was good: χ 2 (131) = 449.518, RMSEA = .055 (PCLOSE .077) and CFI = 0.932. All factor loadings were >.41. As shown in Table 6, the instrument showed configural and metric measurement invariance across gender, age, educational level, sexual experience level, and ethnicity, and configural, metric, and scalar measurement invariance across time.

Table 6 Tests of invariance constraints (1 configural, 2 metric and 3 scalar) for gender, age, education, sexual experience, time, and ethnicity

Discussion

As the concept of the sexual double standard has evolved, along with changes in the display of gendered behavior in dating and sexuality, and negative effects of sexual double standard endorsement on sexual health are evident (Bordini & Sperb, 2013; Crawford & Popp, 2003; Fugère et al., 2008; Sanchez et al., 2012), the development of modernized methods and measures is warranted. In response to this call, this study proposed a new, multifaceted, and one-dimensional 19-item scale to measure sexual double standard endorsement.

The SASSY demonstrated excellent one-factor model fit, internal consistency, test–retest reliability, convergent, and construct validity, and showed configural and metric measurement invariance across gender, age, education level, and sexual experience level and configural, metric, and scalar measurement invariance across time. Overall, this speaks for the use of the scale in future studies.

Of course, there were also some limitations to our study. First off, we note that no scalar measurement invariance was found across gender, age, education level, and sexual experience level. Strictly speaking, this would mean that (since both configural and metric invariance do hold) assessing structural relationships across variables using the SASSY is advisable, but comparing group means is not. However, measurement invariance is often ignored in (validation) studies altogether, and when it is not, strict forms of invariance (such as scalar invariance) only rarely hold (Van De Schoot, Schmidt, De Beuckelaer, Lek, & Zondervan-Zwijnenburg, 2015). Therefore, we do not see a reason to be overly cautious in comparing group means.

Secondly, although we were mindful of deleting entire themes through the deletion of items based on statistical arguments, for the theme of “gender violations” all items had to be dropped, based on either the factor loadings or the subsequent reliability analysis. However, since this was the only one of the established themes that dropped out completely in the process of creating the final instrument, we do not think that the multifaceted nature of the scale was compromised. We additionally assessed whether there were any commonalities to be detected among the deleted items (for instance, a lower degree of variability, compared to the retained items, that they shared), but this did not appear to be the case.

The present study used self-report data. As the instrument was designed to assess the individual’s attitude toward perceived social norms concerning sexuality for boys and girls, socially desirable responding can never be ruled out completely. We also note that using either a between- or within-subjects design in sexual double standard research may lead to different results, also regarding socially desirable responding. Both types of designs have previously been used in this field, generally assessing either individual double standards (mostly using within-subject designs) or perceptions of societal double standards (mostly using between-subject designs) (Crawford & Popp, 2003). Although it remains up for discussion whether the SASSY is more a measure of individual or societal double standards (or a combination), we believe it leans more toward the societal double standard. It could, therefore, be argued that it is an instrument that is most suitable for assessment in between-subject designs (Crawford & Popp, 2003). It is possible, that, as a result of the gender comparison inherently present in the scale, participants may be somewhat more inclined toward socially desirable responding and report a similar standard for boys and girls. Previous research shows that this is, however, not necessarily the case, even when assessing the SDS in a between-subjects design (Sakaluk & Milhausen, 2012).

In the future, the instrument might be adapted for use in sexual minority samples as well. However, as the sexual double standard is a highly heteronormative phenomenon, it seemed best to focus first on heterosexual populations.