Skip to main content
Top

Open Access 18-01-2024 | Methodological Paper

Beyond text: Marketing strategy in a world turned upside down

Authors: Xin (Shane) Wang, Neil Bendle, Yinjie Pan

Published in: Journal of the Academy of Marketing Science

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Analyzing unstructured text, e.g., online reviews and social media, has already made a major impact, yet a vast array of publicly available, unstructured non-text data houses latent insight into consumers and markets. This article focuses on three specific types of such data: image, video, and audio. Many researchers see the potential in analyzing these data sources, going beyond text, but remain unsure about how to gain insights. We review prior research, give practical methodological advice, highlight relevant marketing questions, and suggest avenues for future exploration. Critically, we spotlight the machine learning capabilities of major platforms like AWS, GCP, and Azure, and how they are equipped to handle such data. By evaluating the performance of these platforms in tasks relevant to marketing managers, we aim to guide researchers in optimizing their methodological choices. Our study has significant managerial implications by identifying actionable procedures where abundant data beyond text could be utilized.
Notes

Supplementary Information

The online version contains supplementary material available at https://​doi.​org/​10.​1007/​s11747-023-01000-x.
Mark Houston served as Editor for this article.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

Marketing researchers once faced data shortages investigating consumer insights outside the lab. Choices were largely limited to expensive mass surveys, cumbersome focus groups, and secondary data that was scarce, costly, messy, and rarely addressed the right questions. With the rise of internet panels, respondents could answer surveys in their pajamas, allowing for a more diverse respondent pool. Yet not all research questions proved suitable for the panels, while data quality and worker exploitation concerns lurked just off camera. Efforts to improve such panels brought budget-breaking expenses and access to data continued to keep researchers up at night.
Then everything changed. A revolution’s first stirrings could be seen with the rise of text-mining. Social media, online reviews, and blogs all became data sources for enterprising marketing researchers (Van Laer et al., 2019). Researchers developed the skills to wrangle unstructured text data, i.e., data not in neat rows and columns. They imposed upon the data the structure necessary for analysis. Innovative researchers viewing the horizon saw that text was only the nearest shore. There was a vast continent of unstructured data beyond text.
Marketers can now observe potential consumer insights in many data forms. It is even cliché to complain that younger consumers share pictures of every meal with the entire world. Image, video, and audio are as revealing as text, indeed, often more so. Non-text data is often quicker to provide and less formal, making consumers less likely to censor emotional reactions.
The world has been turned upside down. Now marketers have access to more data than could possibly be used in multiple lifetimes, with more shared every minute. The abundance of unstructured data beyond text, and the increasingly convenient analysis tools, have offered an opportunity for managers to monitor, structure, and enhance their offerings but how to do so?
This research aims to help overcome these constraints by drawing attention to the possibilities for marketing research and to highlight work already done in the main beyond text areas: images, video, and audio. Past studies have demonstrated the potential in exploiting beyond text data. For example, Li and Xie (2020) predict the impact of photo content on social media engagement, Lin et al. (2021) consider emotions’ impact on viewers, while Wang et al. (2021) connect vocal tones to funding decisions. Wedel and Kannan (2016) proposed a framework utilizing various forms of data to facilitate and evaluate marketing decisions, enhance planning regarding customer relationship management, personalization, and the marketing mix, as well as addressing privacy and security. The recent dramatic rise of LLM (Large Language Models) only increases the need to better understand and incorporate artificial intelligence into marketing research and decision-making.
To aid adoption of the new methods, we describe and test three cloud-computing platforms. We also offer insights for researchers and marketers on how to utilize beyond text data for various marketing tasks in a cost-efficient manner, while reducing the technological barrier for those unfamiliar with computer coding.
What unstructured data could prove especially useful for providing insight?

Consumer data

Understanding the consumer is central to marketing. Not least, understanding is central to the acquisition and retention processes. Marketers often face the challenge of gaining access to appropriate data and tools to enhance their understanding. The increasing availability of different data sources has enabled firms to analyze consumer behavior and purchase processes more effectively. Throughout the consumer journey, from before being exposed to the product to making the final purchase decision and beyond, the consumers’ thoughts, feelings, attitudes, and behaviors change with the firm’s marketing mix and their particular needs.
Consumers’ psychological processes are often not explicitly expressed and therefore cannot be easily assessed. Yet, managers can make inferences. Consumers leave traces of their feelings and attitudes in their verbal and non-verbal expressions and consumption behaviors. Managers generally have access to, and could utilize, the data generated through interactions between the firm and its consumers. Consequently, firms could grasp information about the consumer process of “gathering information, making a purchase, receiving the product, making a return, and receiving post-purchase service” (Cui et al., 2021). However, as these processes generate a large body of available data including consumer conversations, transaction data, customer relationship management records, etc., the challenge is to find and utilize the thoughts and feelings hidden within a massive quantity of data.
In addition, consumer touchpoints are not equally accessible across different channels (Du et al., 2021). This introduces notable challenges for marketers aiming to exploit omnichannel information. To help with this, many firms are turning to big data technologies and machine learning techniques to increase task scalability and efficiency.
Data generated outside the direct interaction between the firm and consumer contains valuable information about market trends (Klaus & Maklan, 2013). With online platforms, consumers express their thoughts, and influence others, through word-of-mouth. These information exchange processes impact consumer information searching and decision-making. However, many firms struggle to (a) monitor and influence online exchanges, and (b) strategically respond to current market trends, due to the challenge of collecting and analyzing relevant data. In addition to consumers' and regulators' concerns surrounding data privacy and data security, problems also emerge for those marketers who are deterred by the technological barriers from utilizing appropriate analytical tools or constrained by their budgets (Wedel & Kannan, 2016).
Analyzing consumer data poses significant technical difficulties, especially when integrating across unstructured data types like image, video, and audio. These challenges arise given the different underlying structures (Grewal et al., 2021). Embracing the insights of Balducci and Marinova (2018), we recognize that each data type can be multifaceted and manifest at the same time. This not only highlights the richness of the data but also underscores the challenges of weaving together different encoding layers such as pitch, volume, phoneme, and the words used. Despite these challenges, integrating these data types offers the potential to develop superior models and novel theoretical frameworks. A potential exemplified by the machine learning data fusion approach of Boughanmi and Ansari (2021).

Competitor data

Competitor data is crucial to strategic decision-making. Firms often struggle to directly observe or gather insights on their competitors' underlying marketing strategies which, by their nature, are concealed. They can, however, deduce these strategies from the tangible, observable actions—the marketing behaviors—displayed by competitors. These behaviors, evidenced in aspects like promotional discounts and advertising content, as well as the discernible reactions from consumers, provide a blend of structured and unstructured data that managers can leverage. Such competitor data also offers an opportunity for firms to understand the effects of different marketing strategies without experimenting with every option themselves and absorbing all the risks of such experiments. Managers can learn from their competitors’ successes and failures.
However, it is costly and difficult to monitor and identify successful competitor marketing elements from massive amounts of competitor data and then make predictions based upon this. It is especially difficult to isolate the effect of specific marketing tools. For example, is an advertising video successful because of its acoustics, color palette, a particular image object, or a combination of these? Uncovering insights may require advanced machine learning tools, which deters marketers who are unfamiliar with what is available. While these tasks seemed impossibly difficult before big data and user-friendly machine learning tools, when employing beyond text data and cloud platforms a marketer can now hope to glean helpful clues to such questions.

Firm strategy data

Marketers can improve their offerings by better understanding consumer feedback and adjusting their strategies given findings about how image, video, and audio messages influence consumer behavior. Villarroel Ordenes et al. (2019) found that the mere presence of images and video in consumer generated content encourages sharing behaviors by other consumers. Product presentation, including design aesthetics, helps marketers to transmit messages, and Liu et al. (2017) proposed an image-based method to quantify such product attributes.
Managers can also quantify and evaluate company logo attributes. For example, Satomura et al. (2014) developed a quantification method to measure consumer confusion about copycat logos. In terms of advertising content, Xiao and Ding (2014) showed that spokesperson faces in print advertisements have a significant impact on viewers’ reactions to the ads, and called for managers to devote efforts to select the right face for a product category. Brickman (1980) established that the voice pitch in TV advisements could be used to predict purchases. These findings have encouraged marketers to customize messages for their audiences.
Managers should also accurately assess their own marketing performance. They may struggle to identify a key product differentiator, as many design elements may be hard to put into words. This may cause miscommunication among departments within the firm and lead to the inability to explain the product and attract consumers. It is challenging for marketers to quantify visual attractiveness and innovativeness and predict market performance of advertising before trials. With increasing adoption of omnichannel marketing, problems of data integration and marketing attribution escalate (Cui et al., 2021). As the consumers are exposed to multiple contacts from the same firm, marketers find it hard to accurately attribute effects and assess the impact of marketing tools at various touchpoints of the consumer journey. This leads to difficulties in evaluating and optimizing marketing spending. See Table 1 for some objectives of marketing, challenges associated with these, and how getting beyond text can help.
Table 1
How marketer’s challenges are addressed by getting beyond text
Data
Objectives
Marketers’ Challenges
How Beyond Text Helps
Consumer
Understand consumer trends
Understand consumer decision processes
Consumer profiling and targeting
Relationship management
Predict consumer behavior
Access to consumer data
Consumer thoughts cannot be directly observed
High cost and limited ability of useful data
Gaining access to and monitoring consumer data from indirect sources
Provides access to unfiltered consumer commentary
Data sources often free to access
Data typically publicly available
Consumers often share freely
Competitor
Monitor competitor strategy
Allow for real-time adjustment of marketing strategies
Competitor strategies are not directly observable
Can observe how consumers react to competitors
Data is real-time
Can analyze competitor advertising across formats
Firm
Assess and optimize marketing effectiveness and efficiency
Hard to disentangle the impact of a marketer’s choices
Hard to attribute success of marketing activity
Integration of divergent data sources
Can test reactions from large numbers of consumers in real-time
Link unstructured data to observable outcomes
Can convert unstructured data to structured and compare structured data

Cloud platforms

So how can marketers get beyond text? New possibilities are arising through cloud computing. Cloud platforms are easy to use and provide analysis using pre-existing machine learning tools. For example, the platforms give classification tools for a variety of unstructured data. As envisioned by Lu et al. (2016) classification methods useful for text (kNN, CNN, SVM) can be applied to non-text data. Convolutional neural networks (CNN) have become dominant for image and video analysis in marketing research due to their superior ability to capture and analyze hidden patterns. Whilst some researchers rely on self-developed CNN programs (Liu et al., 2020a), application programming interfaces (API), such as CNN Clasifai, CNN VGGFace, and Google Cloud Vision, are popular. These APIs have already constructed the core structure of the complex machine learning algorithms and require only basic knowledge of the underlying methods from users to apply the models and tune parameters. Using a pre-trained algorithm, thus, greatly reduces a researcher’s entry cost. They do not have to spend the time and money, or have rare expertise, to effectively extract information from unstructured data. Added to this, the user-friendly interfaces and convenient storage options of commercial APIs have further lowered the entry bar for those without coding experience.
Comparison between the classifications of the algorithms and Mechanical Turk workers (Ghose et al., 2012) have validated the commercial APIs’ reliability. Yet, there is little systematic comparison among the APIs and less knowledge about how appropriate they are for marketers. Computer science researchers have only considered appropriateness for questions relevant to their field, so whatever they deem the ‘best’ API may not necessarily be the best for marketing tasks. As researchers did for text classification methods (Hartmann et al., 2019), we ask which platform is best for what situation?
Major tech firms provide cloud platforms with suites of tools and there is significant overlap of services. We considered three, AWS (Amazon Web Services), GCP (Google Cloud Platform) and Azure (Microsoft), offering cloud services and artificial intelligence solutions. We focused on these platforms, as they are very well-known firms whose offerings are commonly used. Our assumption is that because these firms are very familiar to marketers this should help ease any reticence to use them making a discussion of the offerings of these companies especially helpful.
Each firm has its own general machine learning (ML) platform with built-in options to train custom-built models. Table 2 shows the services each platform provides. We consider the free tier services (Appendix 1 gives current offers), but each cloud platform also provides paid tools for advanced analysis. We next consider image, video, and audio, why each type of analysis might be useful, and explain how to use the cloud platforms for each.
Table 2
Capabilities of cloud platforms
 
AWS
GCP
Azure
Image And Video Classification
Image Detection
Object Detection (Video)
Speech To Text
Custom Model (Image and Audio)
Custom Model (Video)
x
x
Pre-Trained Model (Image, Video, Audio)

Image analysis

Understanding image analysis

Consumers display their meals; marketers share carefully judged product photos; dating app users package themselves to create appealing personas just on the right side of the truth. Images matter in marketing, and automated image analyses seem set to only increase in importance.
There are two major types of image analysis tasks, a.k.a., image mining.
  • Image classification categorizes similar images together. Automated analysis can even identify patterns that humans cannot detect and group based on these patterns.
  • Object detection involves spotting an image within a larger image. It incorporates localization; building a box around each object within an image, before classifying.
Image classification and object detection are important techniques within “computer vision.” The algorithm used might “look” at a picture, see a pixel grouping, and treat this as a feature. Computers use multiple image features, e.g., pixels, positions, lighting, camera angles, to predict what an image represents using supervised or unsupervised approaches.
  • Supervised approaches use labeled pictures, e.g., cats/not cats. Pixel groupings recurring in labeled images, e.g., cats, are noted, and new images with similar features are given a label.
  • Unsupervised approaches group by common features, but algorithms do not look to group on any prior features, e.g., the computer ‘sees’ the images as similar but does not see them as cats. After the algorithm has created the groups, the researcher can post-hoc label the groups.
Image classification and object detection have been widely applied in marketing research. Researchers might seek a product placement within an image to test a relevant question, e.g., does having a soda in the fore- or background of a picture change engagement. The most obvious benefit of using an algorithm is scale. It is impractical for humans to view thousands of images, never mind have consistent judgement of performance. Nor can we ensure consistency between different human viewers. In comparison, algorithms never tire and are always consistent, reducing noise.
Of course, some uses of beyond text analysis will be controversial. Marketers could predict personal details of consumers from the images they post. The consumers though may not wish to share such details. Individual consumers could be targeted through facial recognition. We see that as a reason for academics to embrace the field. To push back on inappropriate uses, one has to understand what types of image analysis are possible, and how this can be conducted.

Using images for marketing tasks

One can expect that managers would be able to benefit from the large amount of data that consumers post. Consumers can be seen interacting with the products in much more novel and less constrained ways than might be encountered in a focus group. The manager will also have access to the interactions between the firm and the consumer. For example, images attached to tweets directed at the firm. Many firms classify their products using images to aid consumer selection. Social media platforms, e.g., Instagram, allow tags that can be used to better understand the images. For example, by building a custom model for classifying product offerings, retailers can use social media to identify which influencers wear a firm’s clothing.
Hartmann and colleagues (2021) find that purchase intentions and brand engagement increase as consumers share selfie images holding a branded product. Dynamic monitoring of images posted can be used to analyze engagement and design optimized messages. Commercial APIs have quantified happiness, naturalness, and sadness (Lee, 2021). Improving understanding of images can further consumer engagement, facilitate positive brand exposure, expand the consumer base, and, ultimately, should lead to favorable financial performance.
Consumers can even perform such analyses themselves to get ahead of trends. One might expect those on dating apps, or applying for jobs, to be able to seek advice from an algorithm on what image will help ‘sell’ their profile.
Note much of this beyond text data, here image, will come from outside the direct consumer-firm interaction. Social media monitoring allows the firm to observe images being shared by consumers across the marketplace.
Locating pertinent images in order to comprehend competitors' strategies is relatively straightforward compared to searching for direct information about the competitors’ strategy. Marketers can ascertain the consumer usage and settings highlighted in competitor advertising and deduce the intended target market by analyzing the individuals featured in the advertising. Images in competitors' advertising campaigns can give insight into their focus, and marketers can extrapolate a competitor’s marketing strategy before devising an appropriate response. Marketers can even group competitors based on the themes conveyed in their advertisements.
In addition, marketers can scrutinize product designs to glean insights into their competitors' strategy. Examining the themes portrayed in the design of competitors' products can provide valuable information about their promotional tactics. Various images and related items, such as logos, color schemes, and packaging visuals, are commonly utilized in business. By analyzing these elements, marketers can deduce the relative positioning of their competitors.
Managers can apply similar methods to their own actions. Some intriguing questions include:
  • How has our advertising changed over time?
  • Does our image use fit with what we said our target market was supposed to be?
  • Do our products look trendy or classic? Does this fit with what we are supposed to be aiming for?
Similar ideas apply to all other forms of company image, such as logos. Combining images with text can allow the marketer to identify the relationship between consumer commentary and image use, allowing for a measure of creative effectiveness. Academics have already started working in the area. One study was able to extract perceptions of a brand through user generated data (Klostermann et al., 2018). Researchers have used images to create brand prototypical collages (Dzyabura & Peres, 2021), predict restaurant survival (Zhang & Luo, 2022), and evaluate product aesthetics (Burnap et al., 2023).
Regarding the technical aspects of image analysis, investigating advertising images, product designs, and company logos mostly centers around (1) deviation from the population average to group by visual similarity, (2) design typicality, or (3) facial recognition. To assess similarity, a RGB histogram uses pixel level color distribution; texture histograms describe the pixel level distribution of pre-defined patterns, while Gabor Features use a linear filter to analyze the frequency and orientation of content and recognize patterns (Dzyabura et al., 2023). Photo morphing or principal component analysis can create a prototype average picture (Landwehr et al., 2013). Design typicality may be based on the difference between each product and the average (Landwehr et al., 2011) and used as a component to quantify product design aesthetics (Liu et al., 2017). Studies also use a machine learning facial recognition technique, Eigenface, which creates an average face which can be compared to any other face (Xiao & Ding, 2014), thus improving advertising efficiency.

Using the cloud platforms for image analysis

As a demonstration of how cloud platforms could be used for image analysis, we will give a detailed roadmap on how to train and test the algorithm to classify a specific image, e.g., supervised learning with three platforms: AWS, GCP and Azure. The tested platforms all use CNN as their image classification algorithm, consistent with the recent trends in marketing research. Before analysis one must import the data, e.g., images. (See Appendix 2 for sample code for retrieving images from the web). Note a general rule is that images uploaded should be similar to those later assessed to enhance prediction accuracy. For example, if the model is trained on nighttime images, performance on daytime images will likely disappoint.
Amazon’s image recognition service can be accessed through the AWS console or an API. To begin creating the model, specific preparation of images within a defined domain is required. It is advisable to include a minimum of ten images depicting the object of interest in diverse lighting, backgrounds, and resolutions comparable to those utilized with the model. Amazon Rekognition Custom Labels employs automated machine learning techniques to create a customized model. To accomplish this, the user can generate a project via the console or the API.
Images can be imported from four different locations. For a small dataset, uploading images from a local computer directly to the console is convenient. Otherwise, we recommend sorting the images into folders and importing these into an Amazon S3 bucket. Amazon SageMaker and existing Rekognition Custom Label datasets can also be imported. Images can be labeled through the dashboard while images uploaded through an S3 bucket can have labels assigned through folder names. There are two types of labels for classifying objects: scenes (the environment around an object, e.g., a beach), and concepts (the specific image, e.g., a coke can).
To train a model, select the project and training set. AWS Rekognition Custom Labels offers several ways to create a test set. The easiest approach is to use the built-in option which automatically splits the dataset into 80% for training to learn the associations between labels and images and 20% for testing. The AWS console provides a summary of the training results with metrics to judge model performance. When trained, the model can be applied to new image datasets using the Amazon Resource Name (ARN). To do this, one can start the model in the console with the “Use Model” tab and apply it to other images.
The Google Cloud Platform’s GCP Vertex AI is a machine learning model builder for image classification accessible through the GCP Console or an API. GCP has a similar requirement as Amazon AutoML, image datasets of at least 10 images. In the console, select Vertex AI to create a custom model with a new dataset and select single-label (one label per image) or multi-label classification. Images can be imported either by uploading images from a computer directly into the console or uploading images into GCS (Google Cloud Storage) and supplying Vertex AI with a CSV file containing the image paths. Only 500 images can be imported through the console, so upload via the CSV file route if you have a larger dataset.
Once all the images in the dataset are labelled, GCP automatically splits the images (80/10/10) into three sets, TRAIN, VALIDATION, and TEST. Users can check which images will be used for which purpose and customize this, assigning images to groups by editing the CSV file and reimporting. Vertex AI provides metrics to determine model performance overall and for each label. The first graph shows the trade-off between precision and recall at different confidence thresholds. (The metrics are explained later). The second graph shows how different confidence thresholds would affect the precision and recall metrics along with true and false positive rates. Although Vertex AI does not provide the precision and recall estimates for each label, it provides a confusion matrix that shows the percentage of times each label was predicted correctly. This could be used to calculate these performance metrics.
The third platform is Microsoft’s Azure Custom Vision. This classifies labels based on visual characteristics and allows users to build and deploy customized image classifiers. Azure Custom Vision can be accessed through a web portal or the Custom Vision SDK (software development kit). To use the web portal, create a new project, and select “Classification” under Project Types. Azure distinguishes between “Multilabel”, multiple tags per image—appropriate given different objects in an image, and “Multiclass.” The later allows only a single tag per image. After creating the project, add images and assign labels before uploading and training the model. Azure’s algorithm will train the model and evaluate the classifier. The Custom Vision Service uses the k-fold cross validation process to train the model reporting precision and recall.

Cloud platform performance for images

Dataset

Platforms may excel at different tasks. Running an image dataset through its free image analysis is a way to test a platform’s abilities. We used three sets of data downloaded from Kaggle.com to test the image classification performance of the three platforms. The first dataset of 17,760 images showed how platforms recognize images with vehicles in them or not. The second dataset contains 11,385 images with 24 tags as a combination of color (black, blue, brown, green, red, and white) and clothing (dress, pants, shirt, shoes, and shorts). Limitations in storage with the free tier meant both GCP and Azure could only train 5,000 images of the apparel dataset. We, therefore, only looked at images with tags limited to those featuring black apparel. The last dataset requires the platforms to identify real (versus fake) faces in 2,041 images. (See Table 3 and further details at https://​github.​com/​BeyondText/​Beyond_​Text).
Table 3
Image datasets used to test the cloud platforms
Dataset
# of Tags
Tag Description
# of Images
Source
Vehicle
2
Vehicle and non-vehicle
17,760
Apparel
24
Combinations of color* & clothing**
11,385
Real vs Fake Faces
2
Real and Fake
2,041
* black, blue, brown, green, red, white, ** dress, pants, shirt, shoes, shorts
What can marketers use these cloud services for? We tested the relative proficiency of the platforms in tasks of varying complexity, encompassing the identification of vehicles, clothing attributes, and facial image authenticity. Marketing uses include basic object detection functions (e.g. detection of vehicles) in user generated content. Marketers can search for insights that involves the interaction of object detection (e.g. apparel) and object visual characteristics (e.g. color). We tested a computationally demanding task, real face detection, which has received significant attention within, and outside, marketing. Knowing an image’s contents allows marketers to investigate consumer preferences relating to, for example, Instagram posts. Marketing researchers studying color’s impact on shopping (Crowley, 1993) can do so at scale. Investigating the different attitudes that consumers exhibit towards human versus artificial agents could also yield novel insights (Miao et al., 2022).

Performance metrics

All models report overall precision and recall so we compared these metrics. We also analyzed performance on each tag, e.g., Black Dress, looking at precision, recall and confidence threshold.
Precision measures the fraction of correct predictions (true positives) divided by all true or false positives. Thus, precision measures the percentage of those identified as having a characteristic who actually had the characteristic. \(Precision =\,\frac{\#\, Of\, True\, Positives}{\# Of\, True\, Positives\,+ \#\, of\, False\, Positives}\).
Recall provides insight into the fraction of the test set labels that were predicted correctly. Thus, recall measures the percentage of those who truly have a characteristic correctly identified as having it. \({\text{Recall}}=\frac{\#\, Of\, True\, Positives}{\#\, Of\, True\, Positives\,+ \#\, of\, False\, Negatives}.\)  
F1 score is an amalgam of prediction and recall measuring average model performance for each label or dataset. A higher F1 score indicates that the model performs well for both precision and recall. F1 Score: \({F}_{1}=\frac{\#\, Of\, True\, Positives}{\#\, Of\, True\, Positives\,+\frac{1}{2}(\#\, of\, False\, Positives\,+\#\, of\, False\, Negatives)}.\)  
Confidence score measures the assessed probability of correct prediction. If precision is more important (i.e., that images identified as having a characteristic must actually have it), a confidence threshold should be set higher. The threshold can be set lower to try to identify all images with a characteristic even at the risk of identifying some without the characteristic.

Image analysis results

For the vehicle dataset, which is relatively simple, all the cloud platform tools performed very well (see Table 4). AWS performed best with a precision and recall rate of 100%. The three platforms all performed very well with the apparel dataset, with AWS having a small edge.
Table 4
Overall image analysis results
 
Overall Precision
Overall Recall
Dataset
AWS
GCP
Azure
AWS
GCP
Azure
Vehicle
100%
99.80%
99.90%
100%
99.80%
99.90%
Apparel
97.90%
97.6%
97.60%
98.50%
97.5%
97.40%
(Real vs Fake) Faces
66%
97%
66.40%
86.50%
98.50%
66.40%
A clear strength of AWS was that its free tier could handle all tags in the apparel dataset. Appendix 3 shows performance by individual tag—i.e., detecting a specified object. The faces dataset saw meaningful differences emerge. GCP did best, overall precision and recall rate of 97.0% and 98.5% respectively. AWS’ overall precision was 66% but at 86.5% overall recall was better. Azure’s overall precision and recall rate were both only 66.4% but at a higher confidence threshold. That its confidence threshold can be changed gives Azure helpful flexibility.

What platform to use?

GCP’s superiority in distinguishing real from fake faces suggests strength in facial recognition. AWS presented the best performance for the vehicle and the apparel categories. Overall, all platforms demonstrated satisfactory results, and the performance differences in the non-face categories are not significant enough for marketers to favor one platform (see Table 5).
Table 5
Image analysis: Platform advantages and disadvantages
Platform
Advantages
Disadvantages
AWS
Object detection; Comprehensive free tier offering
Facial recognition
GCP
Facial recognition
Limited free tier offering
Azure
Single large project possible
Facial recognition
Their free tier services differ. AWS offers 5,000 images per month. GCP offers only 1,000. Azure offers 5,000 images per project and two projects per account, so marketers can use Azure’s free service for bigger projects. The storage options offered by these platforms require marketers to consider their particular needs given varied strengths (see Fig. 1). To increase computing capacity, a marketer can always opt for the pay-to-use tiers based on their individual needs.
Two platforms provide a wide range of image analysis beyond image classification. Azure Custom Vision offers object detection to tag pre-trained objects. AWS Rekognition offers an object detection function and many others: facial comparison and search to analyze similarities of a face against another; face detection and analysis to identify a face and its related attributes (i.e., eyes, glasses, and facial hair); celebrity detection to recognize famous faces; labels from pre-trained models (i.e., brand logo, object labels); and text detection. Marketers with complex image analysis needs may find AWS the most versatile.

Video analysis

Understanding video analysis

Video is becoming increasingly prevalent in social media. Analyzing such content will be a significant aspect of marketing, both academic and managerial, in the near future. Furthermore, as consumer data is captured in increasing amounts, video content can aid in more sophisticated eye-tracking type tasks while in-store behavior can also be monitored. For example, affective states have been deciphered using eye movement tracking (Zhou et al., 2021) and variation from neutral emotion facial features (Lu et al., 2016). Kawaf (2019) proposed a screencast videography recording method to allow managers to understand the consumer digital experience. Linking video of behavior to actual sales, can illustrate effective marketing choices.
Social media platforms offer new data sources, such as unboxing videos, where consumers and influencers offer opinions on products and packaging that can appear unfiltered and genuine. Such phenomena of the social media era offer valuable opportunities for greater understanding. As new trends emerge, those who quickly observe them, both academics and managers, will gain fresh insights from novel data sources. Capture of such data does raise public policy privacy concerns, which marketers must be at the forefront of addressing. Understanding the prevalence, and strengths and weaknesses, of methods can inform such work.
Video, including live streaming, poses practical challenges beyond still images. This is, perhaps, why it has received less attention. Analysis of video data starts with basic information including length, speaking rate, and video duration. Sentiment analyses can be applied to text transcripts accompanying videos (Zhou et al., 2021). The overarching objective of video classification is to determine actions within a scene. One approach is to convert the video into a massive dataset of static images. As such, many video classification tools employ the same basis as image classification and object detection e.g., AWS Rekognition. For example, Zhou et al. (2021) analyze online course videos by extracting one frame every ten seconds and applying a Microsoft image emotion recognition model. Other research compared facial expressions in selected frames to emotion categories (Lu et al., 2016). Facial tracking and smile classifier software has been used to code viewer response to advertisements, to evaluate their attractiveness and persuasiveness (Teixeira et al., 2014). Eye tracking technology has assisted marketing researchers to gain a better understanding of attention to advertisements (Brasel & Gips, 2008; Pieters & Wedel, 2012; Pieters et al., 2010).
Another approach is to analyze video holistically, such as, measuring variation between frames. For example, the work of Li et al. (2019) examined how stimulating a music video is using pixel-level distances between frames, to explain the success of pitches for funds. Others have separated foreground (speakers) from background video, applying algorithms, such as OpenCV (computer vision), to calculate motion, magnitude, and direction (Zhou et al., 2021).
Table 2 summarized video analysis tools. Note, GCP Video Intelligence offers custom models, but AWS and Azure do not.

Using video for marketing tasks

Video analysis has great potential when applied to social media and consumer feedback through user generated content (UGC). Marketers could gain valuable consumer insights by asking questions about the content of social media videos. What do consumers share, what messages do they send within their videos, and how do these differ between and within groups? How consistent are the messages any consumer sends? The vast quantity of UGC gives marketers the opportunity to uncover implicit consumer sentiments reflected through video-based reviews, including YouTube review videos. Marketers can seek to understand (Liu et al., 2020a), and visualize (Dzyabura & Peres, 2021) how their brand is perceived.
Marketers can enhance their persuasiveness with a better understanding of consumers collectively and at the individual level. Evaluating their communications can aid in assessing the accuracy of their perceptions of consumers’ interests and preferences and enable markers to adapt messages to their intended audience. With advanced facial recognition and analysis tools, marketers could access information on recipients’ real time reactions to a transmitted message. For example, Teixeira et al. (2012) measured viewers’ real time emotional responses and focus levels with eye movement tracking and automated facial expression detection. With a better understanding of the impact of its own marketing, a firm’s managers can leverage video to provide better training for their salespeople.
On the competitor side, video advertising can be accessed by all players in the market and consumer reaction is also publicly available online. This has the potential to fundamentally change the data advantage held by marketers compared to that held by their competitors. The ability to mine publicly available video and consumer reaction removes much of the privileged access that a marketer has over their ‘own’ data. As a result, all parties will be on a more equal footing, leading to a new competitive norm. Any shift in market power is not clear yet. A move towards the use of non-proprietary data may induce greater competitive intensity. Alternatively, interrogation of these new data sources might so heavily reward those who can more effectively use the insights that they find it a source of differentiation, and their unique capabilities could then limit the impact of competition.

Using the cloud platforms for video analysis

Amazon Rekognition Video detects objects, scenes, celebrities, text, activities, and inappropriate content from stored and streaming videos. AWS Rekognition does not allow users to build custom models, rather they utilize pre-built operations which include label detection, content moderation, text detection, face detection, and face search. The two major ways to access the Rekognition API are through the AWS CLI (Command Line Interface) and through videos stored in Amazon S3 buckets. The GetLabelDetection operation from AWS Rekognition Video returns a JSON (JavaScript Object Notation) object, useful for data interchange, containing information about the labels created for objects observed. (Appendix 4 gives sample code).
GCP Vertex AI Video has three categories. Video action recognition helps identify the human action or activities within the video. Classification allows users to train a custom model to classify videos, or specific shots and segments. Video Object Tracking detects objects in a video. The API can be accessed through the GCP console or the command line. The first step to uploading videos is to sort the videos into their respective tags. To do this, the analyst should sort these videos into folders with the tags as the folder name. This will make it easier to create a CSV file for the Video Intelligence API console to import the videos. Videos can only be imported in the form of a CSV file through Google Cloud Storage. This involves creating a GCS bucket to house the videos and the CSV files to be imported. Do label the images in the CSV files before importing them.
The Microsoft Azure Video Indexer for Media comes with models trained to extract audio and video data from videos. To start using Azure Video Indexer for Media, a Video Indexer account, Storage account, and Managed Identity need to be set up. The Azure Video Indexer for Media can be accessed at https://​www.​videoindexer.​ai/​. At this console, upload video files either individually or as a project (up to 10 videos at a time). After uploading the videos, the console will output insights for each video including speaker recognition, audio effects, keywords, labels, and sentiment. The pre-built model can also be slightly customized based on the language the video is in, brands, and people.

Cloud platform performance for video

Dataset

GCP is the only platform with a custom model offering, so we exclusively test that platform's capabilities. We analyze three publicly available datasets, see Table 6, to assess the detection of video content. The Fight Scenes data contains fighting and non-fighting actions. The other two datasets are randomly drawn from subsets from Hussain et al. (2017) (excluding private and unavailable videos). The source dataset is of video advertisements labeled with persuasive strategies. In particular, this dataset captures whether a video is exciting or not, or funny or not. We draw upon these two commonly recognized advertising strategies. Using these categories, we can test the cloud services’ potential to understand a specific marketing activity using an example of how persuasive elements can be analyzed in advertising messages. (We are grateful to an anonymous reviewer for this suggestion).
Table 6
Video datasets used to test the cloud platforms
Dataset
# Tags
Tag Description
Videos
Source
Fight Scenes
2
Fight or Not fight
200
Advertisement 1
2
Exciting or non-exciting
300
Advertisement 2
2
Funny or not-funny
284

Video analysis results

Similar to the metrics used in image analysis the overall results of video analysis are measured by precision and recall. (Precision measures the model's accuracy in items classified positively and recall whether the algorithm identifies things that should be classified positively correctly). We analyzed GCP’s performance on three datasets see Table 7. We split training to testing across the three datasets 70:30, 80:20, 80:20 respectively. For all three tasks, GCP accurately predicted the relationship between a video and its associated tag. We then assessed the platform's (GCP) custom model’s performance on three different video analysis tasks—Fight Scenes, Exciting, and Funny.
Table 7
Cloud platform video analysis summary
Dataset
Platform
Overall Precision
Overall Recall
Tag Name
Confidence Threshold
Precision
Recall
Fight Scenes
GCP
97.6%
100%
Fight
8%
100%
100%
No Fight
8%
96.6%
100%
Exciting
GCP
68.9%
70%
Exciting
50%
62.7%
84%
Not Exciting
50%
75.8%
50%
Funny
GCP
72.4%
73.7%
Funny
50%
73.7%
73%
Not-Funny
50%
73.7%
74%
The overall precision and overall recall metrics show high accuracy across all three datasets, ranging from 68.9% to 97.6% for precision and 70% to 100% for recall, indicating that GCP's custom model offers a highly reliable solution for video content analysis.
The performance disparities among the datasets highlight the complexity of classifying sentiment-related content compared to more straightforward categorizations like Fight or No Fight. Specifically, the Fight Scenes dataset exhibits nearly perfect precision and recall rates, likely because distinguishing fight scenes from non-fight scenes is often a matter of recognizing certain explicit physical actions. These are more easily quantifiable and less open to interpretation, making the classification task more straightforward.
On the other hand, tags like Exciting and Funny involve more nuanced analysis, which can be highly subjective and culturally dependent. For the Exciting tag, a precision of 62.7% versus a recall of 84% suggests that the model is better at correctly identifying truly exciting instances (high recall) but tends to mislabel non-exciting instances as exciting (lower precision). In contrast, for the Not Exciting tag, the model shows higher precision (75.8%) but lower recall (50%), indicating that while the model is careful about labeling a video as not exciting, it fails to catch many actual not exciting cases. Similar nuances can be observed in the Funny dataset. The balanced but less-than-perfect precision and recall indicate that sentiment-based attributes, like humor, are inherently challenging to pin down algorithmically.
In summary, while the GCP model excels at categorizing more explicit content like Fight or No Fight, it faces challenges in capturing the subtleties involved in sentiment-related tags. The false positives and false negatives within these sentiment groups can provide valuable insights for model refinement and underscore the complexities of automated sentiment analysis in video content.

What platform to use?

As the only platform offering we considered that had a custom model function, GCP Vertex AI is marketers' only choice in regard to training their own supervised machine learning model. GCP Vertex AI API also provides important features including object detection and tracking, explicit content detection, logo recognition, text detection and extraction, and automated captions and subtitles. In its beta version, the Vertex AI API grants consumers access to celebrity recognition, face detection, and person detection features.
For the general public, AWS Rekognition is a more accessible source for face/person-related features. It provides face search, celebrity recognition, and person pathing features even for its free tier users, along with other features including label detection and content moderation. Microsoft Azure has a unique analysis capacity regarding audio content within the video, including speaker recognition, audio effect, keywords, and a sentiment process function.
The three platforms have different policies for their free tier services: AWS provides 10 training hours plus 4 inference hours per month, and so is suitable for marketers with monthly continuous needs. GCP and Microsoft Azure offer 40 training hours, enabling marketers to complete several intense tasks in a limited time frame. (See Table 8).
Table 8
Video analysis: Advantages and disadvantages
Platform
Advantages
Disadvantages
AWS
More accessible for face-related features
Free tier offering good for continuous use
Free tier offering not as good for limited intense tasks
GCP
Custom model availability
Classification availability
Free tier offering good for limited intense tasks
Free tier offering not as good for continuous use
Azure
Free tier offering good for limited intense tasks
Free tier offering not as good for continuous use

Audio analysis

Understanding audio analysis

To analyze audio data, there are two different approaches: acoustic information analysis and transcribing. The information contained within the sound of audio data, acoustic information, provides marketers with opportunities to infer and understand the implicit messages communicated by the speaker. Researchers have used commercial software to perform this type of analysis, which can be an effective method for understanding real-world marketing applications, but it leaves the researcher reliant upon the software’s capabilities. For instance, Wang et al. (2021) measured focus, stress, and extreme emotion by analyzing variations in vocal tones. Lowe and Haws (2017) manipulated voice pitch and music in video ads using the digital sound engineering software Logic Pro, to test its effect on viewer perception of product size. Hwang et al. (2021) looked at loudness, pitch, and duration of speech in YouTube videos and linked this to whether they were sponsored. Research has also tackled the nature of voices. Chatbots can be effective at sales calls, but consumers may react if their AI nature is revealed (Luo et al., 2019). In general, the commercial applications of AI used have yet to be systematically tested or compared, but researchers can assess validity through additional experiments, for example, to manipulate the attributes of voice (Wang et al., 2021).
Emotion can be detected in music, (Fong et al., 2021). Researchers can find acoustic fingerprints including physical aspects (key, loudness, mode, tempo, and time signature) and listening experience (acousticness, danceability, energy, instrumentalness, liveness, speechiness, and valence) (Boughanmi & Ansari, 2021).
The second broad approach is transcribing the words in audio data before analyzing its content using text-based techniques. This is also a potentially beneficial method. Cloud platforms’ advanced features include speaker identification and live transcription, as well as solutions to technical challenges like isolating words from background noise. In addition to creating their own transcriptions, researchers can obtain existing transcriptions such as movie scripts (Toubia, 2021), video subtitles (Zhou et al., 2021), and YouTube transcripts (Liu et al., 2020b), etc. … which can then be analyzed.
Audio transcription has also been used to analyze verbal cues in communication captured in videos. For instance, text analysis of verbal communications recording consumer interaction has been used to study the effectiveness of salespersons’ query handling (Singh et al., 2020) and frontline problem-solving behaviors (Marinova et al., 2018).
Transcription relies upon speech recognition, and the three primary approaches to this are synchronous recognition, asynchronous recognition, and streaming recognition. Synchronous recognition performs a request on an audio file, typically with a maximum length of one minute. Asynchronous recognition allows multiple requests to be processed simultaneously, making it a more efficient method for handling large volumes of audio data. This approach can be particularly useful in situations where real-time transcription is unnecessary. Streaming recognition provides real time transcription of audio data as it is being recorded. This method is often used in situations where immediate analysis or action is required, such as in live transcription for closed captioning or real-time language translation.

Using audio data for marketing tasks

While our “Understanding audio analysis” section offers a balanced exploration of acoustic and transcription techniques, the emphasis on transcription in the coming sections is intentional. This is largely influenced by the cost-efficiency and accessibility of free-tier cloud services that have made transcription an especially attractive method for real-world marketing tasks. Yet, it is crucial to understand that audio analysis spans both dimensions: understanding explicit verbal cues through transcription and capturing implicit tones and emotions through acoustic tools. Together, they offer marketers holistic insight into consumer communication.
Historically, audio transcription was performed by humans, which was slow and prone to error. However, with the rapid advancement of automated text transcription, the process has become much more efficient and affordable. A major application is call center transcriptions, where automated transcription can help analyze customer needs, identify new marketing opportunities, as well as evaluate and improve agents’ performance. Again, a core benefit is the ability to scale.
For academics, automated transcription is simply more practical than research assistants. Larger datasets can be addressed, and the work done much more consistently than could be performed by humans. As for marketing managers, audio data analysis is of great significance in monitoring and adjusting the quality of customer service. It also offers great value to evaluate and suggest modifications to salespeople’s behaviors.
Audio transcription provides a vast amount of textual data that can be analyzed to gain valuable insights into the speech patterns of consumers. Combined with acoustic analysis tools that examine the tone and pitch of consumers’ speech, marketers may be able to detect consumer excitement, fears, and boredom and use this information to craft persuasive messages. However, marketers should always be aware of the potential for consumers to become aware of their tactics and develop countermeasures to combat them, a Red Queen scenario could arise. Where ethically appropriate, marketers can strive to remain ahead of the curve by continually refining their techniques and adapting to new technologies, data sources, and consumer behaviors.
One positive potential result for consumer welfare is that by analyzing audio posted to social media platforms, marketers can gain valuable information on consumer preferences and behaviors to serve them better. This approach is relatively less intrusive, as consumers presumably wanted to be listened to when they posted the audio. This method also allows for the scalability of data collection and analysis. Similarly, consumer reaction to the actions of competitors can be analyzed. By monitoring and analyzing competitors’ audio content, marketers can identify potential threats and opportunities in the market. This analysis can also help track changes in consumer sentiment towards competitors over time.
Internally, the firm can assess its call center operations to see what works and what doesn’t. This could be very useful in employee training and performance evaluation. In addition, automated voices can also be used in call center work, although their effectiveness remains a crucial question. Despite this, the cost-effectiveness of using automated voices over live agents gives marketers a clear incentive to use them. Therefore, it is important to leverage audio data analysis to gain a better understanding of the characteristics of successful automated voice agents and precise circumstances in which these characteristics are effective.

Using the cloud platforms for audio transcription

All three platforms allow for the analysis of audio data as shown in Table 1. AWS Transcribe converts speech to text and provides a custom solution for call centers as do the other platforms. Beyond the free tier, AWS Transcribe has more complex products such as building a custom language model that trains Transcribe’s standard models with domain specific text. To use AWS Transcribe, sign up for an AWS free tier account and head to the Transcribe console. This allows users to upload audio files and receive transcriptions directly through the console. After accessing this console, upload the audio file to get the result. Alternatively, an API request can be sent to AWS Transcribe and a JSON response of the result received.
GCP Speech to Text is a service with complex features such as model adaptation and multiple channel audio transcription. Beyond the free tier, opting into Data Logging will decrease the cost of using the models for speech recognition with standard or enhanced models. GCP also has a suite of prebuilt custom solutions. To start, upload audio files to GCS through a bucket. In the GCP console, create a project and import the audio files from the GCS bucket.
The Microsoft Azure Video Indexer for Media standard Audio Indexer provides rich metadata using predefined models. Azure Speech to Text transcribes audio in more than 85 languages/variants and also has batch transcription operations. Customizable options allow addition of specific words to a base vocabulary or custom building a speech to text model. Azure Speech to Text can be accessed in the console. In Azure, each audio file needs to be individually uploaded for transcription.
We next consider the performance of the various platforms. This is a constantly evolving area, so the reader should consider our results as indicative of capabilities at a given point in time. We anticipate the abilities, and ease of use, of all platforms will only improve with time.

Cloud platform performance for audio transcription

Dataset

Free tiers only provide speech to text analysis, given this we focused on transcription by the cloud platforms that we discuss. We did not build a custom model but tested transcriptions of 3,021 podcast summaries already available online which we collected in the GitHub. The MP3 formats caused a problem for Azure which only accepts WAV files. These were converted to WAV with 16,000 Hz frequency, 16-bit resolution, and Mono audio channel.

Audio transcription analysis results

Each platform was tasked with transcribing large audio files. The output displayed from all the speech to text analyses are the transcript, confidence level, and alternative words. The format of the results differ as the cloud platforms measure the results in different ways, see Table 9.
Table 9
Podcast Analysis Results
 
AWS
GCP, Sample Rate 48000 Hz
Azure
Podcast
Confidence on word
Confidence In ~ 1 min Segment
Overall Confidence
1
High: 1.0, Low:0.255
0.96, 0.97, 0.97, 0.78, 0.93
0.83
2
High: 1.0, Low:0.3453
0.96, 0.93, 0.97, 0.87
0.87
3
High: 1.0, Low: 0.3383
0.97, 0.9, 0.97, 0.97, 0.96
0.88
4
High: 1.0, Low: 0.3514
0.98, 0.95, 0.97, 0.97
0.91
5
High: 1.0, Low: 0.1937
0.97, 0.95, 0.97, 0.87
0.91
See the GitHub associated with this paper for the audio data used
AWS provided a transcript of the file with no overall confidence results but gave a confidence score for each word. (AWS’ approach is useful locating words that require a human to review them for accuracy). GCP gives a transcription, splits the audio file up into smaller segments, and provides a confidence level for each segment. Azure provided an overall confidence level. All three platforms gave alternative transcriptions when unsure about words. Given it is most consistently shared, we used the confidence level of the transcription to compare the cloud platforms. Our overall assessment was that all performed well.

What platform to use?

For the free tier service, AWS offers 60 min of transcription hours per month for 12 months; GCP offers a total of 60 min; while Azure offers the greatest amount of free capacity at 5 audio hours per month. Aside from transcription features, AWS's other services focus on more specific commercial settings (i.e., custom solutions for call centers, medical transcribing). For medical application of transcription services, Amazon Transcribe Medical can transcribe doctor notes or patient conversations, while GCP Speech to Text allows users to select a medical dictation or medical conversation model. GCP is able to increase domain-specific transcription accuracy with a selection of trained models. Being the only platform that allows users to upload and process multiple audios simultaneously, Microsoft Azure also has the unique ability to perform speaker indexing, caption formatting, and sentiment analysis. Researchers with these text analysis needs may find the Microsoft Azure experience most suitable for their needs (see Table 10).
Table 10
Audio analysis: Advantages and disadvantages
Platform
Advantages
Disadvantages
AWS
Using in specialized commercial settings
No textual analysis capacity
GCP
Medical applications are strong
Free tier offering limited
No textual analysis capacity
Azure
Free tier offering most extensive
Multiple audio uploading
Speaker indexing
Caption formatting
Sentiment analysis
Limited domain knowledge

Discussion and research agenda

Marketing researchers are recognizing the importance of utilizing unstructured non-text data and the potential it holds for marketing insights. It is hard to overemphasize the potential for insight hidden in plain sight. Some of the key advantages of using beyond text data include the ability to capture implicit messages that may not be conveyed through text-based data analysis and the increase in the scalability of marketing analysis. Thanks to the application of cloud platforms, the research’s entry costs has been efficiently reduced. By delving into various forms of unstructured data, researchers can investigate entire streams of unstructured data beyond text. The researcher can mine images that consumers post, the videos persuaders use to influence consumers, the words consumers utter, and even detect how they say them. Academics and managers can look at the advertising that marketers create, the persuasive messages that are sent, and go further seeking to understand the effectiveness of these. However, the ability to generate interesting and relevant marketing questions will remain a key constraint.
The challenge with this area is that there is a plethora of topics that academic researchers could address. In Table 11 we provide some suggestions, but these should not be considered as an exhaustive list by any measure.
Table 11
Some relevant questions for use of AI in marketing
Data Type
Questions That Could Create Substantive Contribution
Image
Can the images consumers post on social media predict sales? Can images be used as a leading indicator by practicing marketers?
What else besides sales do the images predict, such as engagement? Do some images predict other metrics but not sales? Why? What does this tell us about consumer behavior?
Can image mining be used to construct competitive maps? What technical barriers would be faced? How could managers be persuaded of the value of these? Would such maps be of aid in market definition during anti-trust investigations?
Do these maps created using consumer posted images predict real-world switching behavior, e.g., closer competitors on the image map are more likely to experience switching between them?
Do some types of images work better than others? How can marketers stimulate the posting of specific image types?
Video
What methods can academics devise to measure the addictiveness of video media, a key consumer behavior question? What factors in video can explain the addictiveness of such media?
What are marketers doing to abet, or reduce, addiction? How can firms use these findings ethically? How can young consumers be protected in an effective yet cost-feasible manner?
How can consumers understand and manage their own reactions to addictive video? How can parents help their children?
Precisely what sort of video is most likely to lead to positive business results, e.g., sales?
As a practical matter, how can such insights be integrated with a creative process? Can academics use these insights to understand consumer behavior and provide theoretical explanations for any finding?
Audio
Can consumer’s vocal tones in call center recordings predict defection? If so, how can vocal tones be used by managers as an early warning system? As a practical matter, what actions can managers take given such early warnings?
What tasks would most benefit from transcription services?
How will the presence of the transcription services impact academic fields and market research? Will the datasets from transcriptions be made available to fellow academics for replication? If scholars extend other academic’s findings how will this be viewed by journals?
What predictive power is lost when audio is transcribed? I.e., what can be predicted from vocal tones that is not predictable from the transcription of the words used? How can we measure loss of meaning through transcription?
What types of audio, and from what fields, e.g., sales calls versus in-person service, lose the most meaning when words are conveyed as transcriptions versus listened to?
What impact will transcription and translation services have for international business scholarship? How will their quality be assessed and reported?
Comparison
What is the relative power of the various beyond text areas? Does, for example, product placement in a video generate significantly more impact than placement in a static image? What theory can explain any difference?
Given differential costs between audio, image, and video marketing what are the most fruitful avenues for those with limited budgets? How does the effectiveness of beyond text media interact with product and industry type?
For what situations do the precise words used matter more than the accompanying video and vice versa? Does TV advertising rely more on soundtrack than an online video?
What works in a static image but not in a video and vice versa? Why?
Integrated Data Sources
How can the training of sales forces and other client facing staff be aided through use of video and audio analysis? How can such training recognize customer heterogeneity and avoid a one-size-fits-all approach?
How would the use of the platforms in B2B differ from traditional consumer facing operations? How could a manufacturer use beyond text insights to persuade a retailer to carry their products?
What constraints should be put on the marketer’s use of consumer data to protect the privacy of the consumer? How do, or even should, privacy concerns differ between text and beyond text data and within the various beyond text categories?
How can marketers help protect the privacy of third parties (background characters) in beyond text data? What actions should regulators take?
What is the value of creative? If structure can be created to describe creative, how can we use that to predict market outcomes? What is the difference between the outcomes we observe using various types of creative?
Can we create a method to measure the broader idea of creativity in marketing using beyond text techniques? Can we create measures of creativity? Do these relate to meaningful commercial outcomes? The Marketing Accountability Standards Board (MASB) have launched a Measuring Creativity Initiative (MASB, 2023). How can beyond text data help power such initiatives?
With just a modest re-tooling, the researcher can shift from a state of data shortage to one of abundance. Of course, challenges remain. Not least, researchers must still uncover the insights hidden within the data they find. Nevertheless, it is an exciting time to be a marketing researcher.

Declarations

Conflict of interest

The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix

Supplementary Information

Below is the link to the electronic supplementary material.
Literature
go back to reference Balducci, B., & Marinova, D. (2018). Unstructured data in marketing. Journal of the Academy of Marketing Science, 46(4), 557–590.CrossRef Balducci, B., & Marinova, D. (2018). Unstructured data in marketing. Journal of the Academy of Marketing Science, 46(4), 557–590.CrossRef
go back to reference Boughanmi, K., & Ansari, A. (2021). Dynamics of musical success: A machine learning approach for multimedia data fusion. Journal of Marketing Research, 58(6), 1034–1057.CrossRef Boughanmi, K., & Ansari, A. (2021). Dynamics of musical success: A machine learning approach for multimedia data fusion. Journal of Marketing Research, 58(6), 1034–1057.CrossRef
go back to reference Brasel, S. A., & Gips, J. (2008). Breaking through fast-forwarding: Brand information and visual attention. Journal of Marketing, 72(6), 31–48.CrossRef Brasel, S. A., & Gips, J. (2008). Breaking through fast-forwarding: Brand information and visual attention. Journal of Marketing, 72(6), 31–48.CrossRef
go back to reference Brickman, G. A. (1980). Uses of voice-pitch analysis. Journal of Advertising Research, 20(2), 69–73. Brickman, G. A. (1980). Uses of voice-pitch analysis. Journal of Advertising Research, 20(2), 69–73.
go back to reference Crowley, A. E. (1993). The two-dimensional impact of color on shopping. Marketing Letters, 4, 59–69.CrossRef Crowley, A. E. (1993). The two-dimensional impact of color on shopping. Marketing Letters, 4, 59–69.CrossRef
go back to reference Cui, T. H., Ghose, A., Halaburda, H., Iyengar, R., Koen Pauwels, S., Sriram, C. T., & Venkataraman, S. (2021). Informational challenges in omnichannel marketing: Remedies and future research. Journal of Marketing, 85(1), 103–120.CrossRef Cui, T. H., Ghose, A., Halaburda, H., Iyengar, R., Koen Pauwels, S., Sriram, C. T., & Venkataraman, S. (2021). Informational challenges in omnichannel marketing: Remedies and future research. Journal of Marketing, 85(1), 103–120.CrossRef
go back to reference Du, R. Y., Netzer, O., Schweidel, D. A., & Mitra, D. (2021). Capturing marketing information to Fuel Growth. Journal of Marketing, 85(1), 163–183.CrossRef Du, R. Y., Netzer, O., Schweidel, D. A., & Mitra, D. (2021). Capturing marketing information to Fuel Growth. Journal of Marketing, 85(1), 163–183.CrossRef
go back to reference Dzyabura, D., & Peres, R. (2021). Visual Elicitation of Brand Perception. Journal of Marketing, 85(4), 44–66.CrossRef Dzyabura, D., & Peres, R. (2021). Visual Elicitation of Brand Perception. Journal of Marketing, 85(4), 44–66.CrossRef
go back to reference Fong, H., Kumar, V., Sudhir, K. (2021). A theory-based interpretable deep learning architecture for music emotion. Available at SSRN 4025386 Fong, H., Kumar, V., Sudhir, K. (2021). A theory-based interpretable deep learning architecture for music emotion. Available at SSRN 4025386
go back to reference Ghose, A., Ipeirotis, P. G., & Li, B. (2012). Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content. Marketing Science, 31(3), 493–520.CrossRef Ghose, A., Ipeirotis, P. G., & Li, B. (2012). Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content. Marketing Science, 31(3), 493–520.CrossRef
go back to reference Grewal, R., Gupta, S., & Hamilton, R. (2021). Marketing insights from multimedia data: Text, image, audio, and video. Journal of Marketing Research, 58(6), 1025–1033.CrossRef Grewal, R., Gupta, S., & Hamilton, R. (2021). Marketing insights from multimedia data: Text, image, audio, and video. Journal of Marketing Research, 58(6), 1025–1033.CrossRef
go back to reference Hartmann, J., Huppertz, J., Schamp, C., & Heitmann, M. (2019). Comparing automated text classification methods. International Journal of Research in Marketing, 36(1), 20–38.CrossRef Hartmann, J., Huppertz, J., Schamp, C., & Heitmann, M. (2019). Comparing automated text classification methods. International Journal of Research in Marketing, 36(1), 20–38.CrossRef
go back to reference Hartmann, J., Heitmann, M., Schamp, C., & Netzer, O. (2021). The power of brand selfies. Journal of Marketing Research, 58(6), 1159–1177.CrossRef Hartmann, J., Heitmann, M., Schamp, C., & Netzer, O. (2021). The power of brand selfies. Journal of Marketing Research, 58(6), 1159–1177.CrossRef
go back to reference Hussain, Z., Zhang, M., Zhang, X., Ye, K., Thomas, C., Agha, Z., Ong, N., Kovashka, A. (2017). Automatic understanding of image and video advertisements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1705–1715. Hussain, Z., Zhang, M., Zhang, X., Ye, K., Thomas, C., Agha, Z., Ong, N., Kovashka, A. (2017). Automatic understanding of image and video advertisements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1705–1715.
go back to reference Hwang, S., Liu, X., Srinivasan, K. (2021). Voice analytics of online influencers—soft selling in branded videos. Available at SSRN 3773825 (2021). Hwang, S., Liu, X., Srinivasan, K. (2021). Voice analytics of online influencers—soft selling in branded videos. Available at SSRN 3773825 (2021).
go back to reference Kawaf, F. (2019). Capturing digital experience: The method of screencast videography. International Journal of Research in Marketing, 36(2), 169–184.CrossRef Kawaf, F. (2019). Capturing digital experience: The method of screencast videography. International Journal of Research in Marketing, 36(2), 169–184.CrossRef
go back to reference Klaus, P. P., & Maklan, S. (2013). Towards a better measure of customer experience. International Journal of Market Research, 55(2), 227–46.CrossRef Klaus, P. P., & Maklan, S. (2013). Towards a better measure of customer experience. International Journal of Market Research, 55(2), 227–46.CrossRef
go back to reference Klostermann, J., Plumeyer, A., Böger, D., & Decker, R. (2018). Extracting brand information from social networks: Integrating image, text, and social tagging data. International Journal of Research in Marketing, 35(4), 538–556.CrossRef Klostermann, J., Plumeyer, A., Böger, D., & Decker, R. (2018). Extracting brand information from social networks: Integrating image, text, and social tagging data. International Journal of Research in Marketing, 35(4), 538–556.CrossRef
go back to reference Laer, V., Tom, J. E., Escalas, S. L., & Van Den Hende, E. A. (2019). What happens in Vegas stays on TripAdvisor? A theory and technique to understand narrativity in consumer reviews. Journal of Consumer Research, 46(2), 267–285. Laer, V., Tom, J. E., Escalas, S. L., & Van Den Hende, E. A. (2019). What happens in Vegas stays on TripAdvisor? A theory and technique to understand narrativity in consumer reviews. Journal of Consumer Research, 46(2), 267–285.
go back to reference Landwehr, J. R., Labroo, A. A., & Herrmann, A. (2011). Gut liking for the ordinary: Incorporating design fluency improves automobile sales forecasts. Marketing Science, 30(3), 416–429.CrossRef Landwehr, J. R., Labroo, A. A., & Herrmann, A. (2011). Gut liking for the ordinary: Incorporating design fluency improves automobile sales forecasts. Marketing Science, 30(3), 416–429.CrossRef
go back to reference Landwehr, J. R., Wentzel, D., & Herrmann, A. (2013). Product Design for the Long Run: Consumer Responses to Typical and Atypical Designs at Different Stages of Exposure. Journal of Marketing Research, 77(5), 92–107.CrossRef Landwehr, J. R., Wentzel, D., & Herrmann, A. (2013). Product Design for the Long Run: Consumer Responses to Typical and Atypical Designs at Different Stages of Exposure. Journal of Marketing Research, 77(5), 92–107.CrossRef
go back to reference Lee, J. K. (2021). Emotional Expressions and Brand Status. Journal of Marketing Research, 58(6), 1178–1196.CrossRef Lee, J. K. (2021). Emotional Expressions and Brand Status. Journal of Marketing Research, 58(6), 1178–1196.CrossRef
go back to reference Li, X., Shi, M., & Wang, X. S. (2019). Video Mining: Measuring Visual Information Using Automatic Methods. International Journal of Research in Marketing, 36(2), 216–31.CrossRef Li, X., Shi, M., & Wang, X. S. (2019). Video Mining: Measuring Visual Information Using Automatic Methods. International Journal of Research in Marketing, 36(2), 216–31.CrossRef
go back to reference Li, Y., & Xie, Y. (2020). Is a picture worth a thousand words? An empirical study of image content and social media engagement. Journal of Marketing Research, 57(1), 1–19.CrossRef Li, Y., & Xie, Y. (2020). Is a picture worth a thousand words? An empirical study of image content and social media engagement. Journal of Marketing Research, 57(1), 1–19.CrossRef
go back to reference Lin, Y., Yao, D., & Chen, X. (2021). Happiness begets money: Emotion and engagement in live streaming. Journal of Marketing Research, 58(3), 417–438.CrossRef Lin, Y., Yao, D., & Chen, X. (2021). Happiness begets money: Emotion and engagement in live streaming. Journal of Marketing Research, 58(3), 417–438.CrossRef
go back to reference Liu, Y., Li, K. J., Chen, H., & Balachander, S. (2017). The effects of products’ aesthetic design on demand and marketing-mix effectiveness: The role of segment prototypicality and brand consistency. Journal of Marketing, 81(1), 83–102.CrossRef Liu, Y., Li, K. J., Chen, H., & Balachander, S. (2017). The effects of products’ aesthetic design on demand and marketing-mix effectiveness: The role of segment prototypicality and brand consistency. Journal of Marketing, 81(1), 83–102.CrossRef
go back to reference Liu, L., Dzyabura, D., & Mizik, N. (2020a). Visual Listening In: Extracting Brand Image Portrayed on Social Media. Marketing Science, 39(4), 669–86.CrossRef Liu, L., Dzyabura, D., & Mizik, N. (2020a). Visual Listening In: Extracting Brand Image Portrayed on Social Media. Marketing Science, 39(4), 669–86.CrossRef
go back to reference Liu, X., Susarla, A., Padman, R. (2020b). Ask your doctor to prescribe a YouTube video: An augmented intelligence approach to assess understandability of YouTube videos for patient education. Available at SSRN 3711751. Liu, X., Susarla, A., Padman, R. (2020b). Ask your doctor to prescribe a YouTube video: An augmented intelligence approach to assess understandability of YouTube videos for patient education. Available at SSRN 3711751.
go back to reference Lowe, M. L., & Haws, K. L. (2017). Sounds big: The effects of acoustic pitch on product perceptions. Journal of Marketing Research, 54(2), 331–346.CrossRef Lowe, M. L., & Haws, K. L. (2017). Sounds big: The effects of acoustic pitch on product perceptions. Journal of Marketing Research, 54(2), 331–346.CrossRef
go back to reference Lu, S., Xiao, Li., & Ding, M. (2016). A video-based automated recommender (VAR) system for garments. Marketing Science, 35(3), 484–510.CrossRef Lu, S., Xiao, Li., & Ding, M. (2016). A video-based automated recommender (VAR) system for garments. Marketing Science, 35(3), 484–510.CrossRef
go back to reference Luo, X., Tong, S., Fang, Z., & Zhe, Qu. (2019). Frontiers: Machines vs. Humans: The Impact of Artificial Intelligence Chatbot Disclosure on Customer Purchases. Marketing Science, 38(6), 937–947. Luo, X., Tong, S., Fang, Z., & Zhe, Qu. (2019). Frontiers: Machines vs. Humans: The Impact of Artificial Intelligence Chatbot Disclosure on Customer Purchases. Marketing Science, 38(6), 937–947.
go back to reference Marinova, D., Singh, S. K., & Singh, J. (2018). Frontline problem-solving effectiveness: A dynamic analysis of verbal and nonverbal cues. Journal of Marketing Research, 55(2), 178–192.CrossRef Marinova, D., Singh, S. K., & Singh, J. (2018). Frontline problem-solving effectiveness: A dynamic analysis of verbal and nonverbal cues. Journal of Marketing Research, 55(2), 178–192.CrossRef
go back to reference Miao, F., Kozlenkova, I. V., Wang, H., Xie, T., & Palmatier, R. W. (2022). An emerging theory of avatar marketing. Journal of Marketing, 86(1), 67–90.CrossRef Miao, F., Kozlenkova, I. V., Wang, H., Xie, T., & Palmatier, R. W. (2022). An emerging theory of avatar marketing. Journal of Marketing, 86(1), 67–90.CrossRef
go back to reference Pieters, R., & Wedel, M. (2012). Ad gist: Ad communication in a single eye fixation. Marketing Science, 31(1), 59–73.CrossRef Pieters, R., & Wedel, M. (2012). Ad gist: Ad communication in a single eye fixation. Marketing Science, 31(1), 59–73.CrossRef
go back to reference Pieters, R., Wedel, M., & Batra, R. (2010). The stopping power of advertising: Measures and effects of visual complexity. Journal of Marketing, 74(5), 48–60.CrossRef Pieters, R., Wedel, M., & Batra, R. (2010). The stopping power of advertising: Measures and effects of visual complexity. Journal of Marketing, 74(5), 48–60.CrossRef
go back to reference Satomura, T., Wedel, M., & Pieters, R. (2014). Copy alert: A method and metric to detect visual copycat brands. Journal of Marketing Research, 51(1), 1–13.CrossRef Satomura, T., Wedel, M., & Pieters, R. (2014). Copy alert: A method and metric to detect visual copycat brands. Journal of Marketing Research, 51(1), 1–13.CrossRef
go back to reference Singh, S. K., Marinova, D., & Singh, J. (2020). Business-to-business e-negotiations and influence tactics. Journal of Marketing, 84(2), 47–68.CrossRef Singh, S. K., Marinova, D., & Singh, J. (2020). Business-to-business e-negotiations and influence tactics. Journal of Marketing, 84(2), 47–68.CrossRef
go back to reference Teixeira, T., Wedel, M., & Pieters, R. (2012). Emotion-induced engagement in internet video advertisements. Journal of Marketing Research, 49(29), 144–159.CrossRef Teixeira, T., Wedel, M., & Pieters, R. (2012). Emotion-induced engagement in internet video advertisements. Journal of Marketing Research, 49(29), 144–159.CrossRef
go back to reference Teixeira, T., Picard, R., & El Kaliouby, R. (2014). Why, when, and how much to entertain consumers in advertisements? A web-based facial tracking field study. Marketing Science, 33(6), 809–827.CrossRef Teixeira, T., Picard, R., & El Kaliouby, R. (2014). Why, when, and how much to entertain consumers in advertisements? A web-based facial tracking field study. Marketing Science, 33(6), 809–827.CrossRef
go back to reference Toubia, O. (2021). A Poisson Factorization Topic Model for the Study of Creative Documents (and Their Summaries). Journal of Marketing Research, 58(6), 1142–1158.CrossRef Toubia, O. (2021). A Poisson Factorization Topic Model for the Study of Creative Documents (and Their Summaries). Journal of Marketing Research, 58(6), 1142–1158.CrossRef
go back to reference VillarroelOrdenes, F., Grewal, D., Ludwig, S., De Ruyter, K., Mahr, D., & Wetzels, M. (2019). Cutting through Content Clutter: How Speech and Image Acts Drive Consumer Sharing of Social Media Brand Messages. Journal of Consumer Research, 45(5), 988–1012. V. Morwitz and P. Kopalle, (eds.).CrossRef VillarroelOrdenes, F., Grewal, D., Ludwig, S., De Ruyter, K., Mahr, D., & Wetzels, M. (2019). Cutting through Content Clutter: How Speech and Image Acts Drive Consumer Sharing of Social Media Brand Messages. Journal of Consumer Research, 45(5), 988–1012. V. Morwitz and P. Kopalle, (eds.).CrossRef
go back to reference Wang, X. S., Lu, S., Li, X., Khamitov, M., & Bendle, N. (2021). Audio Mining: The Role of Vocal Tone in Persuasion. Journal of Consumer, 48(2), 189–211.CrossRef Wang, X. S., Lu, S., Li, X., Khamitov, M., & Bendle, N. (2021). Audio Mining: The Role of Vocal Tone in Persuasion. Journal of Consumer, 48(2), 189–211.CrossRef
go back to reference Wedel, M., & Kannan, P. K. (2016). Marketing analytics for data-rich environments. Journal of Marketing, 80(6), 97–121.CrossRef Wedel, M., & Kannan, P. K. (2016). Marketing analytics for data-rich environments. Journal of Marketing, 80(6), 97–121.CrossRef
go back to reference Xiao, Li., & Ding, M. (2014). Just the faces: Exploring the effects of facial features in print advertising. Marketing Science, 33(3), 338–352.CrossRef Xiao, Li., & Ding, M. (2014). Just the faces: Exploring the effects of facial features in print advertising. Marketing Science, 33(3), 338–352.CrossRef
go back to reference Zhang, M., & Luo, L. (2022). Can consumer-posted photos serve as a leading indicator of restaurant survival? Evidence from Yelp. Management Science, 69(1), 5–50. Zhang, M., & Luo, L. (2022). Can consumer-posted photos serve as a leading indicator of restaurant survival? Evidence from Yelp. Management Science, 69(1), 5–50.
go back to reference Zhou, Mi., Chen, G. H., Ferreira, P., & Smith, M. D. (2021). Consumer Behavior in the Online Classroom: Using Video Analytics and Machine Learning to Understand the Consumption of Video Courseware. Journal of Marketing Research, 58(6), 1079–1100.CrossRef Zhou, Mi., Chen, G. H., Ferreira, P., & Smith, M. D. (2021). Consumer Behavior in the Online Classroom: Using Video Analytics and Machine Learning to Understand the Consumption of Video Courseware. Journal of Marketing Research, 58(6), 1079–1100.CrossRef
Metadata
Title
Beyond text: Marketing strategy in a world turned upside down
Authors
Xin (Shane) Wang
Neil Bendle
Yinjie Pan
Publication date
18-01-2024
Publisher
Springer US
Published in
Journal of the Academy of Marketing Science
Print ISSN: 0092-0703
Electronic ISSN: 1552-7824
DOI
https://doi.org/10.1007/s11747-023-01000-x