main-content

## Weitere Artikel dieser Ausgabe durch Wischen aufrufen

01.12.2019 | Regular article | Ausgabe 1/2019 Open Access

# Understanding news outlets’ audience-targeting patterns

Zeitschrift:
EPJ Data Science > Ausgabe 1/2019
Autoren:
Erick Elejalde, Leo Ferres, Rossano Schifanella
Wichtige Hinweise

## Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## 1 Introduction

The mass media is one of the social forces with the strongest transformative power. However, news reach people unequally. According to Herman and Chomsky’s Propaganda Model (PM) [1], there are many factors that shape the distribution and influence of news media coverage. Two of the most important factors are the geographic reach of newspapers (national versus regional newspapers), the direct targeting of specific sectors of the population, and/or the political ideology of the media outlet itself. The PM states that each linguistic account of an event must pass through five filters that define what is newsworthy. One such filter, the advertising filter (the second in the PM), predicts that outlets will try to cater to a target demographic’s expectations, rather than being fair in their treatment of what is news. For instance, some advertisers will prefer outlets with target audiences of high purchasing power, marginalizing working-class audiences, or those of certain political colors, declining to do business with outlets perceived as ideological enemies, or unfavorable to their private interests.
In [2], Prat and Strömberg provide another model (henceforth PS) that seems to better define the same concept of a media system driven entirely by profit-maximization (PM’s second filter). In that work, the authors identify what characteristics audiences need to have for an outlet to create content that is relevant to that audience (Proposition 4 in [2]). To make Proposition 4 more concrete, Prat and Strömberg suggest three main factors influence the mass media coverage of an event; namely, whether: (1) the matter is of interest for a large group of people (a group may be characterized by a political stand, geographic location, ethnicity, etc.), (2) it has a significant advertising potential (e.g., it may attract readers with a higher purchasing power), (3) it is newsworthy to a group within easy reach (i.e., it is cheap to distribute news to that group). Thus, in a media system driven by profit, areas of low population density, minorities, and low-income classes will be relatively under-served and underrepresented in mainstream news coverage. This creates a negative feedback loop that ends up neglecting (policy-wise) some of the most vulnerable segments of society just for having limited access to the media.
To date, there has been comparatively little large-scale, quantitative research on the relationship between the quality and diversity of the content generated by the media, and the socioeconomic indices of a particular area of coverage. In this paper, we try to find whether or not an outlet’s coverage deviates from the purely geographic influence to a more sophisticated behavior involving the weight of political and socioeconomic interests for example, as operationalized by Prat and Strömberg. We examine the degree to which different geographic locations in the same regions are covered by existing news outlets using Chilean social media data. We quantify how much of this coverage can be explained by a natural geographic targeting (e.g., local newspapers will give more importance to local news), and how much can be attributed to the political and socioeconomic profile of the areas they serve. To find these coverage effects as predicted by the PM and PS models, we look for empirical evidence in the massive adoption of social networks. More specifically, we use statistical models that show how much of the distribution of Twitter followers can be explained based on the geographic, political and socioeconomic features of the different areas.

## 2 Background

In this section we give a short account of the geographic, socioeconomic, and political targeting strategies. We use the concepts laid out in Herman and Chomsky’s Propaganda Model, and the operationalization of Prat and Strömberg’s model on how to identify them.

### 2.1 Geographic targeting

Newspapers work on an economy of scale with a considerable first copy cost: news outlets tend to cover stories where reporters can get quickly and easily (again, to minimize the cost of the piece of news). According to Zipf’s Gravity Model [3, 4], as we move farther away from the source of a piece of news, the interest/relevance of a story should drop. Hence, their followers should be located predominantly in populations that are closer to them, and the size of a population at a particular place should influence how newspapers cover events originating in that area. In fact, distance and population size are also essential magnitudes to describe profit in the model proposed by Prat and Strömberg (Equation 5 in [2]). News outlets will favor in their coverage issues that may draw the attention of larger groups (e.g., big cities) and where it is cheaper to deliver the news (e.g., at a shorter distance).
According to the Gravity Model, we predict the flow of information in the news media system in terms of geography, and hence, indirectly, the proportional distribution of followers a target area will have for a given news outlet.

### 2.2 Socioeconomic targeting

Another factor that influences the news coverage is the socioeconomic profile of an area. As we mentioned earlier, a strategy that the news outlets could implement to increase advertising revenue is to target sectors of the population with a higher purchasing power. Herman and Chomsky point out in the second filter of the PM that advertising, being a fundamental source of income for news outlets, plays an important role to maintain the hegemony of the top news companies in the free market. News outlets that can secure good advertising contracts may afford lower sell prices and become more competitive. This business model breaks the natural market rules that give the final buyer’s choice the power to decide. In this case, the advertisers’ contracts have a significant impact on the media growth or even their survival. So, outlets are forced to comply and demonstrate to the announcer how their content may serve to its needs. The audience of a newspaper becomes its product, which can then be “sold” to the sponsors.
The second filter of the PM is in line with the predictions in Proposition 4(b) in [2]. This filter suggests that in their efforts to align their content with the advertisers’ interests, the media have shifted to a lighter and less controversial programming (e.g., lifestyle, fashion, sports, etc.) [5, 6]. In [7], the author presents some evidence of this, showing, for example, the media’s preference for “soft news” content that is favored by advertisers as it targets a demographic of female and young people. A more recent and direct example on how advertisers may influence the content of the media is the evolution from product placements to Native Ads [8], which makes it difficult to the reader to differentiate between news and advertisement. This type of pseudo-content provides a significant part of the outlets’ revenue [9].
Being able to detect this kind of behavior in the media is of utmost importance. For example, a socioeconomic bias in the media system can be very damaging as it may exacerbate the gap between rich and poor areas. A population with limited access to the news is less informed and, consequently, less likely to hold authorities responsible for public expenditure and providing broad public welfare [10, 11]. In turn, this motivates the incumbent to prioritize and divert resources to places where they will receive more media coverage and not necessarily where they are most needed. According to Chomsky and Herman [1], these characteristics make the news media system comparable to a political scheme where votes are weighted by income.

### 2.3 Political targeting

Political bias is probably the most studied type of bias in the mass media [1215]. In previous work [16] (and references therein), we analyze the nature of bias through a political quiz. Our study shows that even the political bias could have some economic factors. Extra evidence of this is given in [13]. The authors estimate the bias in newspapers according to how similar their language is compared to that used by congressmen for which a right/left stand is known. They do not find a direct relationship between the “slant” of a newspaper and the political preference of the owners (cf. our own work on the topic [17]). Instead, bias in the news is found to be more correlated to the political inclinations of the readers, showing a tendency in these news outlets to align themselves with the political preferences of their target audience and hence, maximizing selling profits. We think that this is an important result because, although outlets may seem to take a political stand in their editorial line, evidence suggests that this may be another strategy to generate revenue by targeting a specific group of people. For example, governmental offices at various levels assign a considerable part of their budgets to advertising. Newspapers sympathizers of the government policies may benefit from lucrative advertising contracts with the incumbent.
Thus, outlets discrimination can be also influenced by political reasons, with advertisers declining to do business with media that are perceived as ideological enemies or generally unfavorable to their interests. Likewise, other authors have also focused on identifying the political bias of the newspapers based on their audiences [18, 19]. They infer the political leaning of the outlet from the stand of their readers. These results complement our research as they also show a tendency of the outlets to cater content design to a specific audience.

### 2.4 Online news dynamics, coverage, bias

Several studies indicate that online news distribution and consumption can be subject to considerable bias, for example through the so-called Filter Bubble effect [20] and the prevalent tendency towards homophilic connections [19, 21] in online social networks.
In an interview with Mullen [22], Herman and Chomsky express their confidence in the applicability of the PM to forms of media other than newspapers, especially the Internet, where traditional news outlets compete with new digital media and advertising is more relevant than ever before. If anything, the “old” news industry has evolved and has adapted to the new environment.
More recently, Robinson [23] evaluates if the introduction of new communication technologies and the Internet have affected the influence of the economic structure over the news media system. He argues that even though there has been a shift from the printed news to the digital media (newspaper website audiences grew by 7.4% in 2012 [24]), the news cycle is still controlled by the big news corporations. The three most significant outlets in U.S. (i.e., Wall Street Journal, USA Today and New York Times) had a circulation of over 5 million users (which include digital subscribers); any of them counting at least one order of magnitude more readers than the next closest competitor [24]. With regards to advertising on Internet, the author concludes that news outlets had to rely even harder on this source of income: on-line subscription revenue does not cover the previous earnings made from selling print newspapers. So, the content has became more profit driven with a shift to soft-news and corporate-friendly reports [24].
More specifically, in [28], the authors analyze the dynamics of news and journalism on the Twitter platform. They found that 0.8% of the tweets are news media related. This gives an idea of how significant are news media to Twitter. They also report, confirming the results of Robinson [23], that the traditional notion of gatekeeping and news production have not changed, and large news organizations still control what is newsworthy. Moreover, they show that news entities do not use the social media to engage with their audience but rather as a way for content dissemination (mostly by redirecting users to their own websites). Also, they find difference in interest (topic-wise) between Twitter users and news outlets. This difference of focus gives some evidence to the agenda-setting behavior of the news industry, reinforcing the hypothesis of a profit-driven system, instead of an informative one.
In order to find out the extent of geographical, socio-economic and political effects in audience-targeting patterns, we examine a whole country’s media system in terms of their audience, who “follows” those newspapers. Before moving on, a word should be said about the notion of “audience”. There are two alternative definitions of “audience” when using the Twitter’s Application Programming Interface (API) that are based on Followers or Retweeters.2 We think both approaches have pros and cons. For example, adopting Retweeters is perhaps a “stronger” definition since there is an overt behavior of “endorsement” of the media by the action of retweeting. However, this would characterize only the active portion of an outlet audience. Using Followers instead would provide a “weaker” characterization since we are not entirely sure all followers do read the tweets of their subscriptions. However, the information produced by the news outlet still reaches them. Moreover, the stronger definition would have the effect to not consider the behavior of passive users since is something we cannot capture using the regular Twitter API. We decided to be less restrictive and rely on the weaker definition, as we think it better represents the general audience.

## 3 Data

We use the following data sources in the studies reported here: a manually curated dataset of news outlets Twitter handles, all the followers of those outlets and all the followers of the Chilean national soccer team, and several sources of demographic and political information (the Chilean 2012 census, the National Socio-Economic Characterization Survey (CASEN), and the election information from the local elections authority (Servel)). We now describe each of them in turn:
News outlets
Our database contains 403 active accounts. To build it, we used Poderopedia’s “influence” database [29] and Wikepedia [30] as our baseline, manually adding other news outlets in Chile. An account is considered active if it tweets at least once a month. We enriched the profile of each outlet by adding relevant information such as geographic location (see Sect. 4.2), scope, Twitter account, and number of Twitter followers.
The Twitter’s API allows to automatically access the flow of tweets and query the system for user profiles, followers and tweeting history. This data availability makes it possible to explore the behavior and interactions of personal and institutional accounts, developing and testing social theories at a scale that was unfeasible few years ago. This is the closest thing we have to a record of the every-day life of over 300 million people (Twitter reported 328 million monthly active users in the first quarter of 2017 [31]).
Further, we exploit the fact that Chile ranks among the top-10 countries regarding the average number of Twitter users per 1000 individuals [32]. Because of the massive adoption of Twitter and the strong presence of the news media on the social networks, we use the Twitter followers as a proxy for the audience of a news outlet.
Our Twitter dataset consists of the user profile of each follower of the outlets in our database. In total, there are $$39\text{,}020\text{,}390$$ followers for the 403 news outlets. Since some users follow more than one news outlet account, we collected only the $$4\text{,}943\text{,}351$$ unique user’s profile using the Twitter API.3 The collection of the users’ profile was made in April 2017. We geolocate each user in a commune using information in the location field of the personal profile (see Sect. 4.1 for more details).
Socioeconomic and political information
We obtained the total population of each commune and other demographic indices from the National Institute of Statistics (INE) [33]. We decided to use the commune as our location unit given that this is the smaller political division in Chile, but at the same time, it is big enough to create both a statistical and popularly perceived socioeconomic profile at the population level. The demographic indices from the INE were already aggregated by commune.
We also needed information on the socioeconomic development of each zone. This kind of information is harder to obtain. The most reliable source is the national census. The problem is that censuses are very expensive and therefore are performed very infrequently (sometimes more than a decade apart—last completed valid census performed in Chile was in 2002).4 Instead, we use the CASEN Survey from 20135 [34]. This study is conducted by the Ministry of Social Development in Chile. From the CASEN survey, we obtain the socioeconomic indicators at the level of a commune using the available expansion factor to calculate the weighted average income per household.
Our last dimension has to do with the political leaning of the communes. To measure the political tendencies of each geographical area, we use the results from the presidential election. The Chilean Electoral Service [35] provides detailed information district-wise on the Chilean presidential elections since 1989 (that is, since Chile’s return to democracy after the dictatorship of Augusto Pinochet).
To help understand the collected data, we represent in Fig. 1 the geographic distribution of each dimension for the most populated region of the country (i.e., Santiago). Note that most of the news outlets are located in the city center (Fig. 1(b)) and surrounded by very densely populated areas (Fig. 1(a)). Also, Fig. 1(c) shows a difference in the income level between communes from East to West (which coincides with the popular perception). Finally, more populated and urbanized areas show a right-leaning political predominance. While areas farther from the city center and mostly rural show the opposite tendency (see Fig. 1(d)).

## 4 Methods

### 4.1 Geolocation of the followers

Regarding the location of the Twitter followers, there is an extensive body of work that focuses on geo-tagging Twitter users [3638]. Most of this work can be divided into two groups according to their approach: content-based and network-based. Methods based on content can be further subdivided into those that use a gazetteer [39], as in our case, to find direct references to geographic places and those based on Language Models that try to learn a probabilistic text model [40]. The performance of the former depends heavily on the quality of the used dictionary. The latter may achieve high precision for the geo-localization of users at a country level, or even within country regions or cities [41, 42]. However, to achieve a good performance at a finer grain classification, such as commune/neighborhood level, massive corpora of social media annotations is required [38]. On the other hand, the geo-localization of users based on their network (based on the assumption that users are more likely to interact with other users that are geographically closer to them) are more accurate at a finer level [43, 44]. The problem is that crawling the connections of several million users and dealing with the corresponding graph is time consuming and computationally intensive.
In this paper, we decided to test our hypothesis using only the users that we were able to geolocate based on their profile’s location field. We use these as a sample of the population. The cumulative number of followers per commune in our sample is highly correlated with the estimated population distribution for 2017 [33] ($$r =0.61$$, $$p<0.01$$). So, we will use this information to model news outlets coverage in our database.
To find the accounts that are following more than one outlet we use the identifier from each user’s profile on Twitter. We use only those profiles that have a non-empty location field, which brings our list down to 1,579,068 accounts (31% of the initial amount).
In [45] the authors analyze the nature of the location field in the Twitter profile. Given that this is an open text field, users not always enter a valid (or even geographic) information. So, a pre-processing of this data is in order if we are making any study involving geolocation of the users based on this field.
In our remaining 31%, some of the users have a set of Global Positioning System (GPS) coordinates, and others have a text description of their location. Since the text description is a free text entered by the user, it ranges from an exact postal address to an unrelated text (e.g., “The milky way”). Using a gazetteer, we could extract 996,326 users with a recognizable location, which represents the 20% of the initial amount. We tried to assign each user to a commune with a given level of confidence. For the users with a pair of GPS coordinates, we used a shape-file [46] of the communes of Chile to find the one that enclosed the point. Only 4829 of the users had GPS coordinates. The users with a text description making explicit mention of a commune were assigned to that commune. For those who mentioned only a province or a region, we could allocate them in the city/commune capital of that region. Given that these cities have the most prominent population density in the area, we would maximize the chance to be correct when making a guess. Nevertheless, we choose to work only with users for which we have high confidence in their location, namely: those with GPS coordinates or explicit mention of a commune. Thus, our final list contains 602,810 users, which is over 12% of the total number of unique followers (Table 1 summarizes the followers statistics).
Table 1
Summary of news outlets and football players followers on Twitter

# Outlets’ Followers
# Players’ Followers
Unique users
4,943,351 (100%)
6,568,769 (100%)
Users w/ non-empty location
1,579,068 (31%)
2,434,183 (37%)
Users w/ useful location
996,326 (20%)
540,828 (8%)
Users w/ GPS coord
4829 (0.1%)
2041 (0.03%)
Users w/ high confidence location
597,981 (12%)
383,207 (6%)

### 4.2 Geographic targeting

We use the Gravity Model to identify how much of news coverage can be explained by the geographic factors of distance and population. For this we use the population and location of both the source of the medium and the target area. Equation (1) represents this relation.
$$F_{i}^{j} = \frac{P_{i}*P_{j}}{D_{ij}}.$$
(1)
Here $$P_{i}$$ is the population of the commune i, and $$P_{j}$$ is the population of the commune in which outlet j is located. $$D_{ij}$$ represents the distance between the two communes. Then, $$F_{i}^{j}$$ should give us a value that represent the expected number of followers that outlet j will have in the commune i. We run the model for each news outlet to analyze the geographic targeting behavior for different types of media.
For this study, we manually located each news outlets in its source commune. The location may be determined by the intended audience if the name of the commune is in the name of the outlets (e.g., soyConcepcion is assigned to Concepcion city) or by the location of its headquarters. At the intra-country level, big news media companies may have more than one headquarter, however in most cases they either work under a different name (with a more “local” name) or report directly to the central headquarters which ultimately define the editorial line. For example, Soy Concepcion is owned by the El Mercurio Group, which is also the group that owns one of the largest newspapers of the capital region (also called El Mercurio).
Finally, for every pair of communes we use their estimated populations (obtained from the INE [33]) and GPS coordinates. We calculate the direct distance between them using the Haversine formula. We represent each outlet j as a vector $$F^{j}$$. The elements of $$F^{j}$$ are the predicted proportion of followers in each commune for outlet j obtained from the Gravity Model. We also create a vector $$T^{j}$$ for each outlet with the number of Twitter followers on each commune i obtained from our ground-truth (see Sect. 4.1). Using the two vectors that represents each outlet we calculate the Pearson product-moment correlation coefficients. This coefficient will give us, for each news outlet, an idea of how much of the distribution of readers can be attributed to the geographic dimension.

### 4.3 Audience-targeting model

According to the PM and PS models, direct targeting of specific sectors of the population shape the distribution of news. If motivated by a profit-driven model of the media system, this targeting may be based in socioeconomic and/or political characteristics of the intended audience, instead of only geography.
We use a regression model to study the influence of geographic, socioeconomic and political characteristics of the communes that may attract profit-driven media coverage. We try to predict the ranking of communes for each outlet based on the number of followers from each commune.
As the geography feature, we use the distance from the commune to the news source, given that the actual population is closely related to our target variable (a function on the number of followers).
We take the socioeconomic feature of an area as its expected household income, using the CASEN survey. Although this may seem simplistic, Chile is quite segregated, and income has a very strong correlation with every other socio-economic index,6 including education especially, since it is not free. Thus, we used income as a proxy, without any intended loss of generality.
Finally, for the political feature, we use the right/left-leaning of the commune (see Sect. 3). To calculate the political factor per commune, we first aggregated the raw number of votes receives by each party on each commune in the past three elections (i.e., 2013, 2009, 2005). We manually annotated each political party as left-wing, right-wing or centrist according to their self-declared position. Political parallelism on the media system is seen when the media outlets are popularly perceived as leaned to one broad side in the political spectrum (not necessarily linked to a political party but rather to a political range) [47]. So, we aggregated the votes for all parties that have a similar political ideology. With this, we measure how left-leaning or right-leaning is a commune.
The data in all three dimensions was aggregated at the level of communes and normalized by calculating the z-score of each area on each feature. As learning model we use a random forest regressor [48] (implemented in the module RandomForestRegressor within the python library scikit-learn). This estimator is based on classifying decision trees. Models based on decision trees are less susceptible to overfitting, considering that our training sets are relatively small (for each newspaper we only have as many samples as communes with a valid entry).
We evaluated the model using a random shuffle cross validation that leaves 20% of the dataset for testing, and trains the regressor in the remaining 80%. Each experiment is repeated 100 times and the average score and standard deviation are reported. We measure the quality of the fit with the explained variance.
We also measure the explanatory power of each individual dimension on the media coverage. For this, we calculate the Kendall-Tau (KT) correlation of the corresponding feature against the number of followers per commune for each news outlet. The results of these measurements should give information on the marketing strategy of different outlets.
There is an alternative extra step of normalizing the data against the population of the communes. For example, more populated areas might have a different political leaning from those that are not very populated. Preliminary experiments using the ratio of followers over the population instead of just the followers to calculate the target ranking of the communes achieves similar levels of accuracy, but a slightly different distribution of weight in the features for the prediction. We think this is due to the fact that this alternative experiment is instead predicting the percentage of the population in the commune that follows the news outlets. This may exacerbate some bias in the selected sample (e.g., certain socioeconomic level of the Twitter users). Moreover, the theoretical models presuppose that the news outlets will try to reach a given audience independently of the percentage of the population that they represent as long as it gives them an economic benefit (e.g., wealthy people may be a small percentage of the population, but still a desirable audience to the outlets). The alternative target variable presents an interesting hypothesis that we would like to explore in future work.

### 4.4 Validation of the results: Chilean national soccer team followers

To validate the results and ensure that they are peculiar to the online media ecosystem and not an artifact of the social media attention dynamics as a whole, we repeat the same experiment using a different topical dataset. In particular, we gather data on the Twitter followers for the soccer players that were part of the Chilean national football7 team in the “Copa America Centenario 2016” tournament. We expect the coverage variable in a sport-celebrity fans scenario to be influenced by different aspects and, consequently, the link to the investigated features being weaker.
We download the Twitter’s profile of 6,568,769 unique users that follow at least one of the 21 players for which we were able to find an official Twitter account. From these, only 2,434,183 had a non-empty location field. We followed the same methodology for the geolocation of these users. We found 540,828 users that match a valid location in Chile. Out of the valid users, only 2041 had a GPS set of coordinates, and 381,166 did explicit reference to a commune. This gave us a total of 383,207 unique followers that we were able to assign with high confidence to one of the 346 communes in Chile. This is comparable with the 602,810 users that we will use as our sample of followers of the news outlets.
In this new dataset, the number of followers per commune is also strongly correlated with the actual population distribution of Chile ($$r =0.66$$, $$p<0.01$$). So, it is comparable in size to our newspapers-followers dataset (see Table 1) and it is a representative sample of the actual Chilean population.

## 5 Results

Our primary task in this work is to approximate the distribution of the audience of the online media based on the geopolitical and socioeconomic characteristics of an area. Multiple aspects of a given population can become factors in the news outlets’ audience-targeting strategy. To better understand the media coverage, we first study how this is correlated to geographic elements as predicted by the Gravity Model. To further improve our predictive and explanatory capability, in a second step, we fit a regression model that, besides the geographic feature, also takes into account the political leaning and income level of the communes in the most populated region of the country.

### 5.1 Gravity model

With the gravity model we want to characterize the news attention as a function of population size and distance from the media source. As we mention before, we model the source as the commune where the headquarter of the news outlet is located or based on the name of the outlet.
We represent each outlet as a vector $$F^{j}$$ with the expected number of followers in each commune obtained from the Gravity Model. We also created a vector $$T^{j}$$ for each outlet with the actual number of followers on each commune i obtained for our ground-truth. Using the two vectors that represents each outlet we calculate the Pearson correlation coefficients. In Fig. 2(a) we can see the distribution of the correlation coefficients. We can see that the coverage bias of a big number of outlets can be almost entirely explained just by the geography. Actually, half the outlets correlate over 0.7. However, there is an important number of outlets for which the geographic bias explains very little or none of their observed coverage.
Table 2 shows the characteristics of the news outlets with the lowest and highest correlation. Not surprisingly, the group that falls farther from the predicted coverage is dominated by the newspapers in the capital city (i.e., Santiago) and with a national scope. These are expected to be the ones with the most prominent political and socioeconomic bias, given that they are the most influential and the ones that dominate news production. Their leading position also ensures that they receive the biggest share in the investment of advertisers. Hence, these outlets are the most exposed to external pressures. On the other hand, news outlets with a local scope behave as described by the Gravity Model, at least in average. Figures 2(b) and 2(c) show the distribution of the correlation for the outlets with a local and national scope respectively. The figures illustrate the behavioral difference of these two classes.
Table 2
Stats about the news outlets’ correlation coefficients
Category
Total
ρ>0.7
ρ<0.2
Outlets
402
203
141
w/ national scope
133
23
94
w/ local scope
269
180
47
located in Santiago
156
29
108
w/ national scope & located in Santiago
126
18
93
From the previous results, we can conclude that geographic bias is not enough to describe the nature of the news media. If we look for example at the communes Lo Prado [49] and San Miguel [50], they have a similar population and are situated at a similar distance from the center of Santiago, where an important number of news outlets (local and national) are located. If we take only these news outlets located in the center of Santiago, the average difference in the expected number of followers between the two communes according to the Gravity Model is just over 1%. In other words, based only on geographic factors these communes should be virtually indistinguishable. But, if we look at the actual number of followers for the same set of outlets, the average difference is almost 250%, with an overwhelming dominance of followers from San Miguel. Moreover, a general query in the Twitter API for tweets geo-located near “Lo Prado, Chile” during August 2017, gives almost 100,000 unique users, while the same query for tweets near “San Miguel, Chile” throws only 61,165 unique users. Thus, the difference in news outlets’ followers cannot be thought as the result of a disparity in Twitter penetration. One possible factor that may influence this striking contrast is the gap in socioeconomic conditions and deprivation levels between the two communes. Lo Prado, despite being located within the capital city, is ranked in the top ten of the poorest communes of Chile [51]. On the other hand, San Miguel, even though it is a predominantly residential commune, it is also an important economic/industrial pole of the city. In fact, San Miguel ranks in the 40th position out of 93 communes in the Index of Urban Quality of Life for Chile [51]. Thus, the hypothesis of our theoretical models in the political economy of the mass medial [1, 2] supports the idea that the socioeconomic characteristics of a sector can make its population more or less attractive to the media.

### 5.2 Focusing on the Santiago metropolitan region

Given that we are analyzing geographic coverage and its relation to socioeconomic and political factors, we have to take into account the specific characteristics of Chile. From every point of view, Chile is a heavily centralized country. The previous results detailed in Table 2 give evidence of this. According to a study from 2013 [52], in proportion to its size, population, and economic development, Chile is the most centralized country in Latin America. The data obtained from the INE [33] gives us a total estimated population of 17.9 million people for the entire country, out of which 7.4 million (41%) are located in the Metropolitan Region (where its capital, Santiago, is located). If we also add that this is the smallest (in area) of the 15 regions that compose Chile, we have a very dense population area. Only for its geographic and demographic characteristics, Santiago is already a desirable market for the media based on Proposition 4 of Prat and Strömberg [2]. Now, on the political side, each region in Chile is headed by an Intendente (equiv. Mayor), but they are appointed and respond directly to the president. Moreover, members of the House of Representatives who legislate on behalf of the different districts of the country reside in Santiago. This organization concentrates almost all the political power in this one region. In the same way, according to the annual report published by the Central Bank of Chile for 2016 [53], the Metropolitan Region participated with 46% of the Gross Domestic Product (GDP) (5x the next highest contribution). With this heavily concentrated power in all spheres, and based on our set of hypothesis, the capital of Chile matches all the conditions needed by a population to receive an extensive media coverage.
Consequently, based on our results of the Gravity Model, we will focus on the community of outlets identified as the least influenced by the geographic bias. That is, we filtered our database to keep only those news outlets (locals and national) with headquarter in the capital. The centralization of the Chilean population it is also perceived in our collection of followers: out of 15 regions, 36.9% of our geolocated followers are in the Metropolitan Region. To minimize the noise in our model, we decided to limit the study of the coverage only to the communes in Santiago. With 51 communes and a wide range of socioeconomic conditions, the Metropolitan Region offers a good case of study on its own. To further strengthen the signal, we also limited the analysis to the 25 news outlets with the highest number of followers.

### 5.3 Modeling news outlets audience

To extend our model and study the influence of other factors such as the political and socioeconomic characteristics in the distribution of news media followers, we use a regression model. As mention before, our target variable is, given a news outlet and a commune, the ranking position based on the number of followers of that communes for that outlet.
We include three features in our model: right-leaning, representing the political dimension; income, representing the socioeconomic dimension; and distance, representing the geographic dimension. In Table 3 we show the Pearson correlation between our three features (using the filtered data). One thing to notice is the relatively high correlation between the expected income of an average household and the political leaning of the area where it is located. In Chile (and Latin America in general), right-conservative political parties are popularly associated with wealthy people. At least in the last few year, left-leaning parties tend to be more populists.
Table 3
Correlation between features
Feature
right-leaning
income
distance
right-leaning
1.00
0.59
−0.48
income
0.59
1.00
−0.35
distance
−0.48
−0.35
1.00
Using these features, our trained model is able to represent the mass media behavior with high precision. The results of the regression indicate the three predictors explained on average up to 96.3% ($$\mathit{SD} = 0.005$$) of the variance in cross-validation. Figure 3 shows the learning curve for the selected model.
We were also interested in modeling the coverage behavior of each individual outlet to see how they fit with respect to these three dimensions. To do this, we used the selected features to create a regression model for each news outlet. This model is then used in the same way. That is, we predict their audience-based ranking of the communes in Santiago but using data related only to the selected outlet. The results, shown in Fig. 4, confirm that with the selected features, the regressors are able to approximate the distribution of followers ($$M = 0.82$$, $$\mathit{SD} = 0.03$$).
We also studied the ranking of the communes in relation to each feature. We used the KT correlations to have an indication of how strong is the influence of each factor in the prediction. Figures 5(a), 5(b) and 5(c) show the distribution of the KT correlation for the top 25 news outlets in Santiago with respect to the communes’ political leaning, expected income and distance to the origin, respectively. Results are shown regarding their absolute values because the direction of impact is not important for our model. For example, if a news outlet favors a commune based on the area being right-leaning, for our model this is as telling as another news outlet disregarding the commune for the same reason. In both cases, the outlets are biased based on political factors. The results show that the behavior of news outlets is very similar in terms of the discriminating influence of these three dimensions in the news coverage, at least within this group.
In Fig. 6 we show a comparison of the KT correlation coefficient for all three dimensions for each of the top 25 news outlets. This comparison can be used as a characterization/profiling of each outlet’s coverage behavior. For example, the coverage of Radio Cooperativa (cooperativa) [54] seems to be driven by political and economic factors, with practically no attention to the location of the commune. This is a radio station with a national scope. According to a survey conducted in 2015, it is the second in audience in the region of Santiago [55] and the first one among people with the highest income (last quantile). Moreover, its editorial line is “popularly perceived” to be associated to the Christian Democratic Party [16, 54]. Actually, from the early 70’s until the late 90’s the radio was directly owned by this party (currently belongs to El Mercurio Group). This profile coincides with the characterization reflected by our model. On the other hand, El Quinto Poder (elquintopoder) [56] is an online news website/community where any member can contribute with its column. This newspaper follows the concept of citizen journalism popularized by sites like http://​www.​ohmynews.​com/​. Its editorial line and community rules explicitly prohibit any content that is aimed at a personal or institutional gain. In the same way, political opinions can only be expressed through personal profiles (rather than an organization profile). In our model, for this newspaper the influence from the political and economic factors are equated, but also the geographic dimensions is the highest within these top 25 outlets.
Just as a comparison, we repeat the analysis, this time filtering the dataset to keep only the top 25 “newspapers” in Santiago—i.e., excluding radio station, TV channels, etc. (see Fig. 7). Here, for example, it is easy to distinguish newspapers with a local scope, such as portaldemeli or betazeta. For those, the influence of the geographic factor is higher than that of the economic and even the political features. Actually, in the case of portaldemeli (a small commune’s local digital newspaper), the economic factor is almost non-existing.
To confirm that the results adhere to the news media ecosystem and do not mimic a behavior common to the social media sphere as a whole, we repeat the experiments on a different topical domain, namely the followers of a group of football players (see Sect. 4.3). We filtered the data to keep only football players that were born, play or live in Santiago (this condition matched six players). We also kept only the followers that were geolocated in one of the 51 communes of the capital region. The regression model trained with the three selected features, on average, is able to explain 84% ($$\mathit{SD} = 0.06$$) of the variance in cross validation. Although the model also gives a good fit for this data, it is less explanatory than for the news outlets (over 10% loss of precision compared to the news outlets). Notice that it is very difficult, if not impossible, to find a public/popular figure for which the followers are not influenced by neither of these three factor. So, the results must be evaluated relative to each other.
Another way to differentiate the two datasets is by comparing the individual influence of each dimension. We calculated the KT correlation coefficient for all three dimensions for each of the top 6 players (see Fig. 8). We found that, compared with the news outlets, the difference in the average correlation is statically significant for all three features (right-leaning: $$t=8.31$$, $$p<0.001$$; income: $$t=7.93$$, $$p<0.001$$; distance: $$t=2.39$$, $$p=0.03$$).
We can also see that the profile obtained for each individual player (shown in the shaded area in Fig. 6) differs from that of the news outlets. In this case, football players tend to have a comparatively stronger influence from the geographic factor and politics plays a lesser role.
The found differences between the two datasets indicate that the distribution of followers for the news outlets is not entirely determined by the social media substrate but is defined by the characteristics of the entities.

## 6 Discussion

In general, our results support the idea of a media system motivated by economic interest, as described by the theoretical models that prompted the current study. This profit-driven media system seems to promote selective coverage that targets specific segments of the population based on the “quality” of the readership. Notice that a relationship between these features (geographic, social and economic) and media reach is, of course, not a direct proof of a causal relation. However, we assume that there is a natural order of information demand and supply: news media models usually presume that readers get some value from the news they read (e.g., entertainment or arguments to decide on a private action) [2, 57, 58]. In other words, we consider that people following a newspaper account are interested in the “editorial line” of that newspaper. This means that the news outlet is creating content that is attractive to a specific audience.
The above notwithstanding, the news media ecosystem has changed and many blogs and “independent” media sources have emerged. People can now create a page in Facebook and become a news source. So, it is possible that these extreme politically-biased news outlets may have their content driven by ideological inclinations rather than economic aspects, especially around political elections. Here, however, we are testing how influential the three dimensions (geography, economy and political leaning) are in the average mass media sources of the country. In general, we couldn’t find evidence to disprove the hypothesis from the theoretical model (the Propaganda Model), which was our main objective here.8
The proposed methods can indeed be applied to other countries. The methods’ effectiveness will largely depend on the level of adoption of the social networks by the mass media as a medium to reach out to their audience and the platform penetration at population level. The only peculiarity of Chile could be its heavy centralization in Santiago. Still, we don’t think that this is a factor (at least not one that favors the methods). Actually, by limiting our analysis to the capital region we have removed, to a large degree, the influence of the centralization factor.
Although our model seems to generalize quite well and lends evidence to the hypotheses, we recognize that the methods have some limitations. First, we are restricted to users that we are able to geo-locate using the location field from the Twitter profile. A more sophisticated method of location could increase the number of valid users and maybe increase the precision of the model for other regions. A second limitation comes from the fact that the political and economic dimensions seem to be closely related. This prevents us from creating a characterization of the outlets that better reflect the actual preference for a population with either a certain political profile or a socioeconomic range, but not both. The entanglement of these two dimensions may be due to the reality of the studied country.
Besides the limitations in the location of the users, the choice to use Twitter followers as a proxy for the audience of the news outlet may introduce some bias and noise to our study. For example, it is difficult to determine the demographics of the population in the social media [59]. An alternative method to effectively define the actual audience of the news outlets (e.g., monitoring the passive and active traffic on the selected Twitter accounts or websites) could complement our method and improve the predictive capability of our model in areas with a weaker signal (e.g., beyond the Metropolitan Region). This is left for future work.

## 7 Conclusions

This work presents a method to characterize the news outlets in the media system based on the geographic, socioeconomic and political profile of their audiences. Under the assumption of a natural order of information demand and supply (i.e., readers gets some value from the news [58]), this modeling of the media can imply a conscious targeting of some specific public by catering to their preferences.
Using data from multiple sources we found that news outlets systematically prefer followers from densely populated areas with a specific socioeconomic profile. The political leaning of the commune proved to be the most discriminating feature on the prediction of the level of readership ratings. These findings support the theoretical claim that describes the news media outlets as profit-driven companies [1, 2].
In summary, the results seem to support the hypothesis that outlets focus on reaching and acquiring an audience with a higher “quality”, that can be latter sell to advertisers. This type of media systems neglect areas of low population (e.g., rural communes) and high deprivation levels, causing these to be underserved and underrepresented in the news coverage. In turn, this creates a full cycle when public policies and politicians overlook sectors of the population that are less informed and hence, are less likely to influence the status quo of the political elite.
We often say that outlets are “biased”, but seldom discuss how or towards what. Thus, it is important to define what it means for these companies to be biased to: one example, and the assumption of our work, is that news outlets are businesses, and as such are motivated by economic interests. If we think of news outlets as a profit-oriented business, we notice that they are perfectly rational in their effort to select content that generates a greater return (a generalization of the “sex sells” maxim). It is only when we try to hold on to the traditional view of news outlets as servers of the common good and advocates of democracy that the concept of bias becomes relevant again. Whatever the case, given the influence that news outlets have in society, we should be well aware of their behavior to be able to appropriately process the information they distribute.

## Availability of data and materials

Twitter’s term of service do not allow redistribution of Twitter content but query terms used are included in the manuscript. Using this information, interested researchers can recreate the underlying dataset. Additionally, in the repository https://​github.​com/​eelejalde/​News-Outlets-Audience-Targeting-Patterns we made available the list of news outlets and the scripts needed to download the followers profile.

### Competing interests

The authors declare that they have no competing interests.

## Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Footnotes
2
We take the action of “liking” in this case is equivalent to retweeting—a mark that the content was actually read.

3
The script we used to find the profiles can be found at https://​github.​com/​eelejalde/​News-Outlets-Audience-Targeting-Patterns.

4
There was another census in 2012, but it was methodologically flawed, with problems in coverage, and a supposed manipulation of some of the key indices [60].

5
There is a CASEN survey from 2015 but the expansion factor for the communes is not complete (not even for the communes in Santiago).

7
“Soccer”, in the US dialect.

8
We thank one of the anonymous reviewers for crystallizing this point for us.

## Unsere Produktempfehlungen

### Premium-Abo der Gesellschaft für Informatik

Sie erhalten uneingeschränkten Vollzugriff auf alle acht Fachgebiete von Springer Professional und damit auf über 45.000 Fachbücher und ca. 300 Fachzeitschriften.

### Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

• über 69.000 Bücher
• über 500 Zeitschriften

aus folgenden Fachgebieten:

• Automobil + Motoren
• Bauwesen + Immobilien
• Elektrotechnik + Elektronik
• Energie + Umwelt
• Finance + Banking
• Management + Führung
• Marketing + Vertrieb
• Maschinenbau + Werkstoffe
• Versicherung + Risiko

Testen Sie jetzt 30 Tage kostenlos.

### Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

• über 58.000 Bücher
• über 300 Zeitschriften

aus folgenden Fachgebieten:

• Bauwesen + Immobilien
• Finance + Banking
• Management + Führung
• Marketing + Vertrieb
• Versicherung + Risiko

Testen Sie jetzt 30 Tage kostenlos.

Weitere Produktempfehlungen anzeigen
Literatur
Über diesen Artikel

Zur Ausgabe