Skip to main content
Log in

Business strategy and firm location decisions: testing traditional and modern methods

  • Published:
Business Economics Aims and scope Submit manuscript

Abstract

For nearly a century, economists have relied upon the neoclassical principle of a “profit-maximizing firm.” Two modern challenges to this principle have arisen: the theory of the value-maximizing firm, and machine learning. In this article, we empirically compare the predictive power of both traditional and modern approaches to business decisions. To do so, we make use of an unusual natural experiment, and extensive data, as follows: (1) Outline competing models of business decision making from both traditional and modern approaches: Expert judgement; an income model of a profit-maximizing firm; a suite of machine learning models; and a recursive model of a value-maximizing firm. (2) Assemble data on costs, productivity, workforce, transit, and other factors for over 50 large North American cities. (3) Empirically compare these models to determine which best explains the selection of 20 cities by Amazon Inc. for its “HQ2.” We observe first that expert judgement, of the type traditionally performed by business economists, outperformed all other approaches. Second, we observe that “supervised learning” machine learning models performed poorly, with results that were often worse than a coin flip. Third, we found that the model of a value-maximizing firm slightly outperformed an income model using the same underlying data, and handily outperformed machine learning. Based on these results, we conclude that expert human judgement remains superior over machine learning methods, and warns against naive reliance on such models when the penalty for an incorrect decision is high. We also recommend that businesses economists consider value methods for business strategy decisions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. In economics, at least since the first edition of Alfred Marshall’s Principles of Economics (Marshall 1890), the neoclassical school has been the dominant approach to Microeconomics. One should not overlook the same in related disciplines. In finance, the famous Modigliani–Miller principles (Modigliani and Miller 1958) rely upon the supposition of profit-maximizing firms. The wide-ranging intellectual history authored by (Rubenstein 2006) identifies Milton Friedman’s influential “survivor” argument in (Friedman 1953), which asserts that any other approach to business management is doomed to fail.

    Friedman’s argument finds much support in standard business management methods. Inventory management methods (such as “economic order quantity” models), advertising management methods (such as the classic “customer lifetime value” models), and production management methods (particularly in the objective function for the enterprise) commonly rely upon the presumption that businesses maximize profits.

  2. The seminal works include Principles of Political Economy and Taxation (Ricardo 1817) and Von Thünen's Isolated State (von Thünen 1826; translated 1966). Both Ricardo and von Thünen modeled the decision of a landowner in an agricultural society as a profit maximization problem. No less a luminary than Paul Samuelson observed the debt modern economics owes to von Thünen, in his bicentennial essay (Samuelson 1983).

  3. The new economic geography is associated with Krugman (1991) and Fujita et al. (1999). Before the “new” economic geography, Hotelling (1929) described the classic location selection method for a profit-maximizing firm.

  4. The seminal case supporting the principle of the profit-maximizing firm is Dodge v Ford Motor Company, Michigan Supreme Court, 1919, in which Henry Ford faced a shareholder revolt led by the Dodge brothers. That court concluded as follows:

    A business corporation is organized and carried on primarily for the profit of the stockholders. The powers of the directors are to be employed for that end. The discretion of directors is to be exercised in the choice of means to attain that end, and does not extend to a change in the end itself, to the reduction of profits, or to the non-distribution of profits among stockholders in order to devote them to other purposes.

    Almost a century later, in the Burwood v. Hobby Lobby (2014) case, the US Supreme Court confirmed that a shift had taken place:

    While it is certainly true that a central objective of for-profit corporations is to make money, modern corporate law does not require for-profit corporations to pursue profit at the expense of everything else, and many do not do so.

  5. The term “real options” began with Myers (1977); the intellectual groundwork for their importance was laid by Dixit and Pindyck (1994).

  6. This controversy is epitomized by the differences between Thomas Friedman’s book The World is Flat (Friedman, 2005) and Pakaj Ghemawat’s rejoinder journal article in Foreign Affairs “Why the World Isn’t Flat” (Ghemawat 2007).

  7. Examples include, at various times, Microsoft, Google, and Amazon. Alphabet (now the parent company of Google), states their rationale for such a policy explicitly in the annual report for the year 2017:

    Dividend Policy.

    We have never declared or paid any cash dividend on our common or capital stock. We intend to retain any future earnings and do not expect to pay any cash dividends in the future (Alphabet 2018, p. 21, emphasis in original).

  8. While profitability is obviously directly related to shareholder value, it is not identical—as will be demonstrated in the differing results for income and value models in this article, even though they use the same underlying data.

    In addition, as is now recognized in both the “real options” literature and in formal guidance for banks in most countries, prudent managers will often deliberately, prudently, and rationally reduce profits to achieve other objectives.

  9. Mark Zandi, “Where Amazon's Next Headquarters Should Go,” and “Metro Analysts on Amazon's Top Cities,” Moody’s Analytics, October 12, 2017.

  10. Joseph Parilla, “Who is best positioned to land Amazon’s HQ2?” Brookings Metropolitan Policy Council, September 2017.

  11. Anderson Economic Group, “The HQ2 Index,” October 2017; and “Updated: The Anderson Economic Group HQ2 Index,” February 2018.

  12. Among the top ten in all three experts’ rankings were the New York, Boston, Philadelphia, and Atlanta MSAs. There were differences, of course; Moody’s picked Rochester NY as fourth on their list, while that city did not make the top 20 for Anderson or Brookings; only Brookings initially considered Canadian cities, and included both Vancouver and Toronto; the Anderson rankings when augmented later included Toronto in the top 20, but not Vancouver.

  13. The RFP stated “an international airport with daily direct flights to Seattle” and other specific cities were “an important consideration.” Apparently, metropolitan economic development officials considered the scheduling of such flights to be an easy decision should Amazon locate their HQ2 in their areas.

  14. These are derived from annual and quarterly reports (Amazon 2016, 2017).

  15. The “Lucas critique” of Keynesian macroeconomics (Lucas 1976) is one motivation for the use of recursive techniques in modeling economic decisions.

    The textbook Recursive Methods in Macroeconomics (Ljungqvist and Sargent 2009) describes both the theoretical underpinnings and numerous uses in academic economics.

  16. We will note here that, for some of the machine learning models for which results are presented below, the economist tuned the model by adjusting parameters and selecting variables.

    These we call “machine learning with professional judgement” in Sect. 3.

  17. The essay by Kopf (2015) collects some of the letters of Gauss and notes the controversy over who invented the method commonly called regression. It appears Gauss and Legendre both used the method of least squares, but Galton was the one who publicized the term “regression,” including in his 1886 article “Regression towards Mediocrity in Hereditary Stature.”

    The term “mediocrity” is often deleted from contemporary discussions of this technique, in favor of “mean.” As Kopf notes, “unfortunately,” Galton sometimes used statistical techniques in support of the “science” of “eugenics,” which is a term he coined.

  18. Amazon blog, “Amazon selects New York City and Northern Virginia for new headquarters,” Amazon, November 13, 2018.

  19. See Anderson Economic Group (2017, 2018).

  20. These software packages are available from Microsoft, Tableau, The MathWorks, and Supported Intelligence.

References

  • Alphabet, Inc. Form 10-K, 2017. 2018. Alphabet investor relations website.

  • Amazon. 2018. “Amazon Announces Candidate Cities for HQ2.”  January 18, retrieved from www.amazon.com.

  • Amazon, Inc. 2017. Amazon HQ2 RFP. (September 7) www.images-na.ssl-images-amazon.com.

  • Amazon, Inc. 2016-2017. Form 10-K (annual results). Securities and Exchange Commission.

  • Amazon, Inc. 2016-2017. Form 10-Q (quarterly results). Securities and Exchange Commission.

  • Anderson Economic Group. 2017. 2017. The HQ2 Index (October); and Updated: The Anderson Economic Group HQ2 Index  (February 2018). Both retrieved from www.andersoneconomicgroup.com.

  • Anderson Economic Group. 2017. 2017 State Business Tax Burden Rankings Report.  www.andersoneconomicgroup.com.

  • Anderson Economic Group. 2018. 2018 State Business Tax Burden Rankings Report. Retrieved from www.andersoneconomicgroup.com.

  • Anderson Economic Group. 2018. Supplemental Information on Professional Sports Teams Acquired from Websites of NFL, NBA, MLB, and NHL Professional Sports Teams; Supplemental Information on Female Mayors and Female City Council Membership Acquired from Websites of Relevant Cities; County Government Election Official Websites of the Central Cities for HQ2 Analysis Regions; Latitude and Longitude for Cities.

  • Anderson, Patrick. 2012. Economics of Business Valuation. Stanford: Stanford University Press.

    Google Scholar 

  • Anderson, Patrick. 2014. Policy Uncertainty and Persistent Unemployment: Numerical Evidence from a New Approach. Business Economics 49 (1): 2–20.

    Article  Google Scholar 

  • Burwell v. Hobby Lobby, et al. 2014. No. 13-354, 573 U.S. 723 F. 3d 1114.

  • Corporation for National and Community Service. 2015. Volunteering and Civic Life in America. www.nationalservice.gov/vcla.

  • Cover, T.M., and P.E. Hart. 1967. Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory 13 (1): 21–27.

    Article  Google Scholar 

  • Dixit, Avanish, and Robert S. Pindyck. 1994. Investment Under Uncertainty. Princeton: Princeton University Press.

    Google Scholar 

  • Dodge v. Ford Motor Company. 1919. 204 Mich. 459, 170 N.W. 668.

  • Federal Transit Administration. 2017. National Transit Database Monthly Module, August 2016-July 2017. www.transit.dot.gov.

  • Flightview. 2018. Airport information for HQ2 analysis regions, acquired from Flight- view.com, 2018.

  • Fraser Institute. 2017.  Economic Freedom of North America Index.  www.fraserinstitute.org/studies/economic-freedom-of-north-america-2017.

  • Friedman, Milton. 1953. The Methodology of Positive Economics, in Essays in Positive Economics, 3–43. Chicago: Chicago University Press.

    Google Scholar 

  • Friedman, Thomas. 2005. The World is Flat. New York: Farar, Straus & Giroux.

    Google Scholar 

  • Fujita, Masahisa, Paul Krugman, and Anthony Venables. 1999. The Spatial Economy: Cities, Regions and International Trade. Cambridge: MIT Press.

    Book  Google Scholar 

  • Gallup Organization. 2017 Gallup-Sharecare Well-Being Index. www.wellbeingindex.sharecare.com.

  • Galton, Francis. 1886. Anthropological Miscellanea: Regression towards Mediocrity in Hereditary Stature. The Journal of the Anthropological Institute of Great Britain and Ireland 15: 246–263.

    Article  Google Scholar 

  • Ghemawat, Pakaj. 2007. Why the World Isn’t Flat. Foreign Affairs.

  • Harvard Business School Institute for Strategy and Competitiveness. 2015. Business Services Cluster Employment. U.S. Cluster Mapping Project 2015. www.clustermapping.us.

  • Hotelling, Harold. 1929. Stability in Competition. Economic Journal 39 (153): 41–57.

    Article  Google Scholar 

  • Jones Lang LaSalle. 2017. Office Outlook Q4 2016. www.us.jll.com/united-states/en-us.

  • Jones Lang LaSalle.  New Jersey Office Statistics Q4 2016. www.us.jll.com/united-states/en-us.

  • Jones Lang LaSalle. 2017. Washington, D.C. Office Statistics Q4 2016” www.us.jll.com/united-states/en-us.

  • Kaldor, Nicholas C. 1966. Marginal Productivity and the Macro-Economic Theories of Distribution: Comment on Samuelson and Modigliani. The Review of Economic Studies 33 (4): 309–319.

    Article  Google Scholar 

  • Kauffman Foundation. 2016.  Index of Startup Activity Metropolitan Area Rankings.  www.kauffman.org/kauffman-index/reporting/startup-activity.

  • Krugman, Paul. 1991. Increasing Returns and Economic Geography. Journal of Political Economy 99 (3): 483–499.

    Article  Google Scholar 

  • Ljungvist, Lars, and Thomas J. Sargent. 2012. Recursive Macroeconomic Theory, 3rd ed. Cambridge: MIT Press.

    Google Scholar 

  • Lucas, Roberts. 1976. Econometric Policy Evaluation: A Critique. In The Phillips Curve and Labor Markets (ed. K. Brunner and A. Meltzer), 19–46. New York: American Elsevier.

    Google Scholar 

  • Modigliani, Franco, and M. Miller. 1958. The Cost of Capital, Corporation Finance and the Theory of Investment. American Economic Review 48 (3): 261–297.

    Google Scholar 

  • Marshall, Alfred. 1890. Principles of Economics. London: MacMillan.

    Google Scholar 

  • National Center for Education Statistics. Integrated Postsecondary Education Data System: Degrees Granted 2015–2016. www.nces.ed.gov/ipeds.

  • Parilla, Joseph. 2017. Who is best positioned to land Amazon’s HQ2? Brookings Metropolitan Policy Council. www.brook-ings.edu/blog/the-avenue/2017/09/08/which-cities-are-well-positioned-to-land-amazons-hq2. Accessed July 2018.

  • Ricardo, David. 1817. Principles of Political Economy and Taxation. London: J.M. Dent & Sons.

    Google Scholar 

  • Rubenstein, Mark. 2006. A History of the Theory of Investments: My Annotated Bibliography. New Jersey: Wiley.

    Google Scholar 

  • Rutgers University Center for American Women and Politics. 2018. Women Mayors in U.S. Cities, 2018. www.cawp.rutgers.edu/levels_of_office/women-mayors-us-cities-2018.

  • Samuelson, Paul A. 1983. Thünen at Two Hundred. Journal of Economic Literature 21: 1468–1488.

    Google Scholar 

  • Stansel, Dean. 2013. An Economic Freedom Index for U.S. Metropolitan Areas. Journal of Regional Analysis and Policy 43 (1): 3–20.

    Google Scholar 

  • Stokey, Nancy, Robert Lucas, and Edward Prescott. 1989. Recursive Methods in Economic Dynamics. Cambridge: Harvard University Press.

    Google Scholar 

  • Texas Transportation Institute. 2015. 2015 Urban Mobility Scorecard. www.mobility.tamu.edu/ums.

  • Tobin, James, and W.C. Brainard. 1977. Asset Markets and the Cost of Capital. Cowles Foundation  Discussion paper no. 427. www.cowles.yale.edu.

  • Thünen, Johan Heinrich. 1826. Von Thünen’s Isolated State. Oxford: Pergamon Press.

    Google Scholar 

  • Trust for Public Land. 2018. Park Scorewww.tpl.org/10minutewalk.

  • United Nations World Meteorological Organization. 2010. Standard Normals. www.wmo.int/pages/prog/wcp/wcdmp/GCDS_1.php.

  • U.S. Bureau of Labor Statistics. 2016. Occupational Employment Statisticswww.bls.gov.

  • U.S. Bureau of Economic Analysis. 2015.  GDP & Personal Income data by State, Retrieved from www.bls.gov.

  • U.S. Census Bureau. 2010-2014 Business Dynamics Statistics. www.cen-sus.gov/ces/dataproducts/bds.

  • U.S. Census Bureau. 2016.  American Community Survey 1-Year Table B01003: Total Population. www.factfinder.census.gov.

  • U.S. Census Bureau. 2016.  American Community Survey 5-Year Table B01003: Total Population, 2012–2016. www.actfinder.census.gov.

  • U.S. Census Bureau. 2016. American Community Survey 1-Year Table S0701: Geographic Mobility by Selected Characteristics. www.fact-finder.census.gov/.

  • U.S. Census Bureau. 2016. American Community Survey 5-Year Table B25105: Median Monthly Housing Costs. www.factfinder.census.gov.

  • U.S. Census Bureau. 2016. American Community Survey. 5-Year Table S0501: Selected Characteristics of the Native and Foreign-Born Populations. www.factfinder.census.gov.

  • U.S. Census Bureau. 2016.  American Community Survey 5-Year Table S1601: Language Spoken at Home. www.factfinder.census.gov.

  • U.S. Census Bureau. 2016. American Community Survey 5-Year Table S2403: Industry by Sex for the Civilian Employed Population 16 Years and Over.  www.factfinder.census.gov.

  • U.S. Census Bureau.  Table PEPANNRES: Population Estimates, 2010–2017. www.factfinder.census.gov.

  • U.S. Center for Disease Control and Prevention. 2017. Local Data for Better Health.  www.cdc.gov/500cities.

  • U.S. Department of Justice. 2014. Uniform Crime Reporting Statistics. www.ucrdatatool.gov.

  • U.S. Environmental Protection Agency. 2017. Air Quality Index Report. www.epa.gov/outdoor-air-quality-data/air-quality-index-report.

  • U.S. National Oceanic and Atmospheric Administration. 2018. Temperature Data for HQ2 Analysis Regions. www.ncdc.noaa.gov.

  • Urban Institute. 2015. Economic Inclusion Index. www.apps.urban.org/features/inclusion.

  • Urban Institute. 2015. Racial Inclusion Index. www.apps.urban.org/features/inclusion.

  • Walkscore. 2018. Walk Score, Transit Score, and Bike Score for Various Cities and Regions. www.walkscore.com.

  • Zandi, Mark. 2017. Where Amazon’s Next Headquarters Should Go, and Metro Analysts on Amazon’s Top Cities.  www.economy.com/dismal/analysis/commentary/298321/Where-Amazons-Next-Headquarters-Should-Go. Accessed 12 Oct 2017.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick L. Anderson.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Methodology appendix

1.1 HQ2 index of economic attributes

The HQ2 Index was compiled by the consulting firm of Anderson Economic Group and originally published in advance of the selection date.Footnote 19 It is described by the firm as follows:

In Amazon’s request for proposals (RFP), they emphasized the following items, among others:

–Metropolitan area with more than one million people

–A stable and business-friendly environment and tax structure

–Potential to attract and retain strong technical talent

–A highly educated labor pool and a strong university system

–Proximity to international airport, major highways, and mass transit

–Competitive incentives

Recreational opportunities, educational opportunities, and high quality of life

Using measurable factors from the lists above, we compiled the AEG HQ2 index, which captured a city’s measurable advantage in attracting Amazon’s HQ2. For the 35 cities in the United States that met specific requirements from the RFP, we estimated their performance using 11 total metrics across three broad categories:

Access to Labor and Services, including four indicators: degrees granted in relevant fields of study by colleges; employment of workers in specific occupational categories; size of the business services industry; and the number of migrants with bachelor’s degrees from other counties, states, and countries.

Ease of Transportation, including two indicators: hours of delay due to traffic congestion, and per-capita use of public transit systems.

Cost of Doing Business, including five indicators: state and local business taxes (using Anderson Economic Group’s Business Tax Burden studies); rental costs for commercial real estate; and unit cost of labor in 3 occupations important to Amazon. Note that this measure of labor costs takes into account worker productivity, meaning that more productive workers can be ranked higher even if their wages are also higher.

The AEG HQ2 index is the average of the values for each category. For each category, a higher value is better than a lower value (e.g., a lower cost of doing business translates into a higher index).

1.2 Income model

We estimate the distributable profit for the operations of the proposed HQ2 facility, in each city or metro, and for each stage of operation. We use the following method:

  1. 1.

    To facilitate comparisons among cities and metropolitan areas, the output of the hypothetical facility is standardized across all areas, while the costs and productivity vary. This allows for direct comparisons of the likely profitability of a standardized facility located in multiple cities.

  2. 2.

    We use the assumptions described above regarding the underlying business, and our analysis of Amazon Inc. business data, to estimate the revenue. We use these data and assumptions, plus the data for costs and productivity for each metro and city, to estimate costs.

    • COGS (cost of goods sold), which combines Amazon’s “cost of sales” and “fulfillment” categories; we assume the COGS related to the products and services generated or otherwise arising from the HQ2 operations will be largely the same across all metro areas, and at all stages of operation.

    • Operating expenses, which combines “marketing,” “technology and content,” and “general and admin” costs. We assume this stays stable (as a share of revenue) at all stages of operation.

    • Other expenses, includes mainly amortization of intangible assets.

    • Facility costs related to this specific facility, which includes cost of building, permitting, moving employees, training and hiring, and the rent or rent equivalents as well as property taxes, utility, and service costs for the facility itself. We assume this changes (increases as a share of revenue) by stage of operation, for two reasons: larger facilities require more land acquisition and longer construction lead time; and we expect incentives from state and local governments to be focused on the first stage (and possibly the second).

Because the operating costs (including wages and taxes) vary by place, and the facility costs (including rent and construction costs) vary by both place and stage of operations, each of these income statements will be different.

1.3 Value model

The value model uses the same input data as the income model, and the same income statement for the proposed facility at each stage of operation. However, it also includes a set of decisions (actions) that a manager of the firm could take, including expanding and shutting down the facility. There is a cost associated with expansion, and that cost is related to the size of the workforce in the respective city.

We compose and solve a decision model for the proposed facility in each of the respective cities, again using the same data as the income model.

The value model requires a “reward matrix” that captures the current income for each stage of production, and with each business decision. We create this matrix from the income statements for each city and stage, and costs for expansion or closure.

In addition to these elements, the model requires a time index, discount rate, growth factor, and transition matrix. We use a yearly time index and a small growth rate (g = .02) and reasonable discount rate for a corporation (d = .15). The transition matrix is largely determined by the business decision itself.

The solution method can be expected to solve the resulting problem as it meets the conditions specified by (Anderson 2012) for a business value problem as composed in (Eq. 1). A value function iteration solution algorithm is used. The results of the optimization problem include, for each city, a value in that stage of operation, and a suggested business decision to achieve that value.

1.4 Machine learning models

The Machine Learning models used include:

  • K nearest neighbors This classifier is a method for associating, clustering, or classifying data. This method selects the k “nearest” neighbors using a distance metric. The standard method is Euclidean distance. There are numerous variations among KNN routines, including on the distance metric, the number of neighbors, weighting of observations, and other factors. The KNN method is often a benchmark for other methods and has been used widely. For classification purposes, the method seeks a classification scheme that minimizes the distance among similarly classified data points.

  • Naïve Bayes classifier A classifier which applies Bayes’ theorem on a collection of independent variables to classify features (or in this case, decisions).

  • Ensemble trees This model creates an “ensemble” of classification learners, where each learner is a classification tree. The trees are “boosted” in this model by using a logistic function to evaluate the deviance (difference) between actual and predicted classifications. There are many alternative functions. To make more use of the data, this learner also “bags” trees by using a bootstrap (re-sampling) method.

  • Fine classification tree This classifier uses Machine Learning algorithms to construct flexible, fast-to-estimate binary separations of the data into “decision trees.” These “trees” may then be traversed with variable data to determine the corresponding prediction.

  • Linear discriminant Discriminant classification analysis assumes that different classes generate data based on different Gaussian distributions. The fitting function estimates the parameters of a Gaussian distribution for each class, and then minimizes the classification error.

  • Support vector machine classifier This method creates support vectors” in hyperspace to attempt to separate the data. The use of a “kernel” that statistically transforms the data allows for it to be arrayed in manner that a separating hyperplane can be constructed from the support vectors.

  • Logistic regression This classifier models probabilities as a function of the linear combination of predictors using a logarithmic cost function to determine the classifications.

Multiple versions of most models were run, and many such versions were run multiple times with different-sized datasets. The reported results contain the best-fitting models in each category where we had a least one version of the model that produced results (even if poor). Some models did not run at all.

A dimension-reduction technique (principle component analysis) was attempted with a number of models; it is reported for only one. In general, PCA did not improve the results, and made them even more difficult to interpret.

1.5 Software used for all models

The following software was used, which allowed for inputting the same data to all models, direct comparisons of the models, and for the use of the intermediate results from the income models in the value models.Footnote 20

  • Data collection and reporting: Microsoft Excel and Tableau were used to collect data from multiple sources, and to report and visualize the data. These same products were used for calculation in the HQ2 Index (including with augmented data).

  • Exploratory Data Analysis: Matlab with Statistics and Machine Learning Toolbox.

  • Income, Machine Learning, and Value Models: Matlab with the Statistics and Machine Learning toolbox were used for the income, machine learning, and value models.

  • Value Models: The Rapid Recursive toolbox was used to compose and solve the value functional models.

1.6 Availability of extended results and base data

We intend to make available to all members of NABE (and subscribers to Business Economics) extended results in the following form:

  • The base data used in the analysis, including all the data listed in the “Data Appendix,” with the limitation noted below.

  • A printout of the computer run for the income, value, and machine learning models, which include notes on all techniques, notes on the data, extensive intermediate results, additional EDA graphics, explicit parameter selections, and other information.

  • Any written corrections or clarifications to the journal article and any revised versions of the dataset the author prepares for the purpose of documenting the work presented in the article.

This information will be made available at the Anderson Economic Group website (https://www.andersoneconomicgroup.com) for at least 1 year after publication. The author cautions that, with over 50 variables from dozens of sources, some of the data used in this analysis will have been revised by the time of publication. The author wishes to acknowledge the extensive work by Brian Peterson of Anderson Economic Group, and Ervin Batka of Supported Intelligence, in collecting the data and running the machine learning and value maximization models.

Data appendix

2.1 Data sources

Table 4 lists the data used in the machine learning, value, and income models presented in this article. The table also includes the variables used in the HQ2 Index that was one of the expert predictions.

Table 4 Data variables and sources

The table shows the data short names, full names, and sources for each variable. These include outcome variables and three groups of explanatory variables: HQ2 Index variables, conventional economic indicators considered important for site selection, and quality of life and other variables. Additional data notes are included in “Special Data Notes.”

2.2 Special data notes

2.2.1 Airport qualified

If a region has an international airport, it was coded as a “2.” If a region has an international airport with direct daily flights to Seattle, New York (JFK or LGA), San Francisco (SFO or OAK), and Washington DC (BWI or DCA), it was coded as a “3.” This coding considers airports within 45 min of the MSA as per Amazon’s RFP when determining access to direct daily flights.

2.2.2 New establishments 100+ and new establishments 500+

This variable shows the total number of new establishments emerging in a given MSA from 2010 to 2014 from new firms with greater than 100 employees. A derivative of this variable looks at the total number of new establishments emerging in a given MSA from 2010 to 2014 from new firms with greater than 500 employees.

2.2.3 Volunteer hours per capita, volunteer rate, and the share of population active in their neighborhood

These three metrics come from 2015 Volunteering and Civic Life in America, a dataset produced by the Corporation for National and Community Service (CNCS). CNCS is an independent federal agency that is dedicated to supporting the American culture of citizenship, service, and responsibility. The data were collected through two supplements to the U.S. Census Bureau’s Current Population Survey (CPS)—the Volunteer Supplement (2015) and the Civic Supplement (2013). The data are reported by MSA before metro definitions were revised in 2015.

Volunteer rate is the percentage of individuals who responded on the Current Population Survey’s Volunteer Supplement that they had performed unpaid volunteer activities at any point during the 12-month period that preceded the survey for or through an organization.

2.2.4 Walk score, transit score, and bike score

The scores are calculated by Walk Score for 141 largest core cities in the U.S. and Canada. Walk Score is a private tech company that originated in Seattle and is now owned by Redfin, a real estate agency.

Walk Score is designed to assess walkability in the area. The score analyzes walking routes to nearby amenities. Points are awarded based on the distance to amenities in each category. Amenities within a 5-min walk (.25 miles) are given maximum points. A decay function is used to give points to more distant amenities, with no points given after a 30 min walk. The score also measures pedestrian friendliness by analyzing population density and road metrics such as block length and intersection density.

Transit Score aims to reflect how well an area is served by public transport. Points are assigned to nearby transit routes based on the frequency, type of route, and distance to the nearest stop on the route.

Bike Score aims to reflect how convenient an area is for biking. Points are awarded based on availability of bike infrastructure (e.g., lanes, trails), hills, road connectivity, and the number of bike commuters. For each score, the points are summed and normalized to a score between 0 and 100. For details, see https://www.walkscore.com/.

2.2.5 Average hours of sunshine per year

The data for this metric come from the World Meteorological Organization Standard Normals dataset, accessed through United Nations Data portal. It measures the mean number of hours of sunshine per year for cities all over the world, including in the U.S. For most of the cities used in our analysis, the reported values are averages computed for the consecutive periods of 30 years, from 1961 to 1990. For details, see http://data.un.org.

2.2.6 Number of good air quality days per year

This metric is designed by U.S. Environmental Protection Agency and measures how clean or polluted air is by MSA, and whether the associated health effects might be a concern. The data are based on Air Quality Index (AQI), which focuses on measuring ground-level ozone and particle pollution. Days are evaluated as good, moderate, unhealthy for sensitive groups, unhealthy, very unhealthy, or hazardous. We use the number of days evaluated as ‘good.’ During ‘good’ days, air quality is considered satisfactory and poses little or no risk. For details, see https://www.epa.gov/outdoor-air-quality-data/air-quality-index-report.

2.2.7 Economic inclusion index and racial inclusion index

These indices are calculated by the Urban Institute for 274 largest cities across the U.S. The Urban Institute is a non-profit research organization based in Washington, DC.

The Economic Inclusion Index measures the ability of residents with lower incomes to contribute to and benefit from economic prosperity. Among the indicators used to calculate the index are income segregation rank, share of renters who pay 35% or more of their income in rent, share of 16 to 19-year olds who are not in school and have not graduated, share of families that are below the poverty line with householder working full-time. Income segregation is computed through estimating the segregation between families above and below each income distribution bucket at the census tract level. The indicator values are then averaged (weighted by income comparative to the median income) to construct the city-level measure. The index is computed as an average of z-scores of these four indicators.

The Racial Inclusion Index measures the ability of residents of color to contribute to and benefit from economic prosperity. Among the indicators used to calculate the index are racial segregation, homeownership gap, educational attainment gap, poverty rate gap, and share of people of color. Racial segregation is calculated as (1/2) * ((# people of color in census tract/# people of color in city)—(# non-Hispanic white in census tract/# non-Hispanic white in city)). The homeownership gap is calculated as a difference between the share of white non-Hispanic households that own a home and the share of persons of color households that own a home. The education attainment gap is calculated as a difference between the share of white non-Hispanic population over 25 with a high school degree or more and the share of the person of color population over 25 with a high school degree or higher. The poverty gap is calculated as a difference between the poverty rate for white non-Hispanic population and the poverty rate for person of color population. The index is computed as an average of z-scores of these five indicators. For details, see https://apps.urban.org/features/inclusion/.

2.2.8 Share of the foreign-born population

This metric is based on the 2012–2016 American Community Survey 5-Year Estimates, Table S0501, accessed through the American FactFinder. The share of foreign-born population is calculated by dividing the estimated number of foreign-born persons by the total population for each area in the analysis. For details, see https://factfinder.census.gov/.

2.2.9 Share of the population who speak only English at home

This metric is based on the 2012–2016 American Community Survey 5-Year Estimates, Table S0601, accessed through the American FactFinder. The share of the population that speaks only English at home is calculated by dividing the number of persons over 5 years old who speak only English at home by the total population over 5 years old. For details, see https://factfinder.cen-sus.gov/.

The share of the population who speak English at home may be a more effective measure for ethnic and cultural diversity of the population than just the share of foreign-born residents. The share of the foreign-born population tells us only about the first-generation migrants, but the share of population who speak languages other than (in addition to) English at home also captures the children of the earlier generations of migrants who are likely to have preserved their cultural identity. The share of the population who speak only English at home is a proximate measure for the level of cultural homogeneity in a given area.

2.2.10 Rate of population growth 2010–2017

This metric is based on the data from Population Estimates Program by U.S. Census Bureau, accessed through American FactFinder. To find the rate of population growth, we divided the 2017 population estimate by the 2010 population estimate and subtracted one from the result. For details, see https://fact-finder.census.gov/.

2.2.11 Violent and property crime rates

For this metric, we use 2014 crime rates as reported by Uniform Crime Reporting Statistics (UCR), U.S. Department of Justice. Crime rate is defined as the number of crimes per 100,000 residents. The crime rates are reported by UCR at the core city level and come from respective city agencies. Violent crime includes murder, rape, robbery, aggravated assault. Property crime includes burglary, larceny-theft, motor vehicle theft. For details, see https://www.bjs.gov/ucrdata/Search/Crime/Crime.cfm.

2.2.12 Median housing cost per month

This metric is based on the 2012–2016 American Community Survey 5-Year Estimates, Table B25105, accessed through American FactFinder.

2.2.13 Share of the population who reported mental distress and share of the population who reported bad physical health in the last 30 days

These metrics report the age-adjusted 2015 estimates from Local Data for Better Health dataset produced by Centers for Disease Control and Prevention, a U.S. federal agency under the Department of Health and Human Services. The dataset contains information for 500 largest core cities and was released in 2017.

The share of population who reported mental distress 14 or more days in the last 30 days was computed by dividing the number of respondents age 18 years or older who report 14 or more days during the past 30 days during which their mental health was not good, by the total number of respondents. The share of population who reported bad physical health 14 or more days in the last 30 days was computed by dividing the number of respondents aged 18 years or older who report 14 or more days during the past 30 days during which their physical health was not good by the total number of respondents. For details, see https://chronicdata.cdc.gov/.

2.2.14 Park score

Park Score is calculated by the Trust for Public Land, a U.S.-based Non-governmental Organization dedicated to creating and improving neighborhood parks. The score assesses quality and accessibility of parks in the 100 most populous core cities in the U.S.

The indicators behind the score are grouped into four areas: park acreages, investment, amenities, and access. For acreage, the indicators include median park size and parkland as a share of city area. For investment, the indicators include public spending, non-profit spending, and monetized volunteer hours worked any public parks and recreation agencies. For amenities, the indicators include the number of park amenities per capita, with amenities defined as playgrounds, rest rooms, dog parks, splash pads, recreation and senior centers, and basketball hoops. For access, the indicator is the share of population living within a 10-min walk of residence. Cities can earn a maximum score of 100. For details, see http://parkscore.tpl.org.

2.2.15 Female mayor

The data for this metric come from the Center for American Women and Politics (CAWP) at Rutgers Eagleton Institute of Politics. A value of “1” indicates the central city of the MSA in question had a female mayor as of March 2018, and “0” indicates that the central city had a male mayor. For details, see http://www.cawp.rutgers.edu/levels_of_office/women-mayors-us-cities-2018.

2.2.16 Share of female members in the city council

The data for this metric were collected from individual city council websites for central cities of the MSAs. We divided the number of female members by the total number of members for that council. Mayors were excluded from the calculations.

2.2.17 Share of employees in arts, entertainment, and culture

This metric is based on the 2012–2016 American Community Survey 5-Year Estimates, accessed through American FactFinder. The share of the employees who work in arts, entertainment, and culture industries was calculated by dividing the number of persons who reported employment in these industries by the total population who reported employment. For details, see https://fact-finder.census.gov/.

2.2.18 Government spending, taxation, and labor market freedom scores by state

The three scores are the components of the Economic Freedom of North America (EFNA) Index reported by the Fraser Institute. The Fraser Institute is a think tank headquartered in Vancouver, British Columbia, that produces research about government actions in areas such as taxation, health care, aboriginal issues, education, economic freedom, energy, natural resources, and the environment.

Government Spending scores are designed to reflect the size of the government. Each score is calculated based on the following indicators: general consumption expenditures by government as a percentage of income, transfers and subsidies as a percentage of income, and insurance and retirement payments as a percentage of income. Taxation scores are aimed at assessing the tax burden. The score is calculated based on income and payroll tax revenue as a percentage of income, top marginal income tax rate and the income threshold at which it applies, property tax and other taxes as a percentage of income, and sales taxes as a percentage of income. Labor Market Freedom scores are based on minimum wage legislation, government employment as a percentage of total state/provincial employment, and union density. For each score, states/provinces in the U.S., Canada, and Mexico are included in the analysis, and are awarded points on a scale of 0–10. For details, see https://www.fraserinstitute.org/studies/economic-freedom-of-north-america-2017.

2.2.19 Economic freedom index by MSA

This index comes from a 2013 article by Dean Stansel in Journal of Regional Analysis and Policy. To calculate each score, Stansel used the model of 2011 Economic Freedom of North America (EFNA) Index by the Fraser Institute. As in the EFNA scoring system, points are awarded to MSAs on a scale of 0–10.

While EFNA reports scores only for states/provinces, Stansel’s study uses the model to assess economic freedom at a more granular level. In contrast to EFNA, this was a one-time study and it only includes areas within the United States. While most of the scores in the article are reported by Metropolitan Statistical Area, some are reported by metropolitan statistical division instead. We used the data for MSAs but, when information was available only for metropolitan statistical divisions, we selected the divisions where the central city of the relevant MSA was located. For details, see http://www.jrap-journal.org/pastvol-umes/2010/v43/index431.html.

2.2.20 Rate of new entrepreneurs, opportunity share of new entrepreneurs, and startup density

The data for these three variables come from the 2016 Kauffman Index for Startup Activity produced by the Kauffman Foundation. The Kauffman Foundation focuses its work on education and entrepreneurship.

The rate of new entrepreneurs measures the share of adult population that became entrepreneurship in a given month. The opportunity share of new entrepreneurs measures the share of new entrepreneurs who were not unemployed or in school prior to becoming entrepreneurs. The startup density measures the number of startups per 1,000 firms, where startups are defined as businesses less than 1 year old that employ at least one person beside the owner. For details, see https://www.kauffman.org/kauffman-index.

2.2.21 Number of days with pleasant temperatures per year

For this metric, we used the Global Surface Summary of the Day (GSOD) database of the U.S. National Oceanic and Atmospheric Administration, accessing it through NCEI Climate Data Online Data Search. The days were counted as pleasant if the daily mean temperature was between 59 and 77 F, the maximum temperature did not exceed 85 F, and the minimum temperature did not fall below 45 F. We looked at the time period from 1/1/2007 to 12/31/2017 or smaller periods if data from a single station were unavailable. We then divided the total number of pleasant days by the number of years within the time period considered.

The values recorded by weather stations in or close to the central cities of relevant MSAs were chosen for analysis. Most of the stations chosen were located in international airports to maximize completeness and reliability of data. For details, see https://www7.ncdc.noaa.gov/CDO/cdoselect.cmd.

2.2.22 Republican and Democrat votes in the 2016 presidential election

This metric is based on the final official 2016 presidential election result data reported by the state and county authorities (e.g., boards of elections, county clerks) on their websites. The data are reported by county. For each MSA, we only include the county where the central is located. We calculated vote percentages by dividing the counts of Republican and Democrat votes by the number of the total votes cast. In cases where the necessary data were not easily accessible on a government entity website, we used information from NPR Election 2016 Results special series.

2.2.23 Well-being index

This index is produced by Gallup-Sharecare annually since 2008, based on a survey of 175,000 + respondents. The scores and ranks are reported by MSA.

The survey questions used to calculate the index are associated with one of the five elements of well-being. Among the five elements of well-being that Gallup-Sharecare chose to include are purpose (liking what one does, being motivated to achieve their goals), social (having supportive relationships, love), financial (managing one’s economic life to minimize stress and increase security), community (liking where one lives, feeling safe and proud of one’s community), and physical condition. Gallup categorizes the respondents as thriving, struggling, or suffering for each of the five elements. For details, see https://wellbeingin-dex.sharecare.com/.

2.2.24 Number of major professional sports league teams

This metric reflects the number of professional football, basketball, baseball, and hockey teams headquartered in each MSA included in the analysis that belong to NFL, NBA, MLB, and NHL, respectively. We found the number of teams by consulting NFL, NBA, MLB, and NHL websites.

2.2.25 Data estimates for Toronto

Since Toronto is not included in most data, we made manual estimates. We describe our methodology for the estimates below.

Average hours of sunshine per year The measure is reported as “Total Hours of Bright Sunshine.” It is calculated by the Government of Canada using 1981–2010 station data from Toronto. This source was used in lieu of missing data from the World Meteorological Organization Standard Normals dataset for hours of sunshine. Both sources were last updated in 2010.

Number of good air quality days per year This variable counts the number of “low risk” days (values of 1-3) on the Ontario Ministry of the Environment and Climate Change’s Air Quality Health Index as reported at the Toronto Downtown station. All values are from 2017 (like data from the EPA).

Rate of population growth This measure reflects the population percentage change from 2011–2016. This estimate is slightly distinct from data for MSAs in the U.S., which cover 2010–2017. That said, Statistics Canada’s population by census subdivision exists for census years (2016, 2011, etc.). Quarterly data that could be used to find the 2010-2017 rate of growth are on the country, territory, and providence level.

Share of foreign-born population Data on foreign-born individuals come from Statistics Canada’s 2016 Census that looks at the total number of immigrants born outside of Canada in Toronto. Toronto’s population is from Statistics Canada’s 2016 Census.

Share of employees in arts, entertainment, and culture This measure is calculated by dividing the number of employees in arts, entertainment, and culture by total number of employees in Toronto. The former value comes from Statistics Canada’s 2016 census, which counts the number of workers in “Occupations in art, culture, recreation and sport” with its “National Occupational Classification.” The total number of employees in Toronto comes from their count of the “Total labor force population aged 15 years and over.”

Median housing costs per month This value is the “median monthly shelter costs for rented dwellings” as reported in Dollars from Statistics Canada’s 2016 Census. It is for Toronto only. Rented dwellings were taken because data did not exist for both rented and owned dwellings.

Share of population who speak only English at home This percentage is calculated by dividing the number of individuals with knowledge of only English in Toronto by the total population of Toronto excluding institutional residents (as data on their knowledge of languages were not collected). Data were from Statistic Canada’s 2016 census.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Anderson, P.L. Business strategy and firm location decisions: testing traditional and modern methods. Bus Econ 54, 35–60 (2019). https://doi.org/10.1057/s11369-018-00111-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1057/s11369-018-00111-6

JEL Classification

Keywords

Navigation