Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
As social media become a staple for knowledge discovery and sharing, questions arise about how self-organizing communities manage learning outside the domain of organized, authority-led institutions. Yet examination of such communities is challenged by the quantity of posts and variety of media now used for learning. This paper addresses the challenges of identifying (1) what information, communication, and discursive practices support successful online communities, (2) whether such practices are similar on Twitter and Reddit, and (3) whether machine learning classifiers can be successfully used to analyze larger datasets of learning exchanges. This paper builds on earlier work that used manual coding of learning and exchange in Reddit ‘Ask’ communities to derive a coding schema we refer to as ‘learning in the wild’. This schema of eight categories: explanation with disagreement, agreement, or neutral presentation; socializing with negative, or positive intent; information seeking; providing resources; and comments about forum rules and norms. To compare across media, results from coding Reddit’s AskHistorians are compared to results from coding a sample of #Twitterstorians tweets (n = 594). High agreement between coders affirmed the applicability of the coding schema to this different medium. LIWC lexicon-based text analysis was used to build machine learning classifiers and apply these to code a larger dataset of tweets (n = 69,101). This research shows that the ‘learning in the wild’ coding schema holds across at least two different platforms, and is partially scalable to study larger online learning communities.