More attention should be dedicated to intra-urban localisation decisions of technological startups. While the general trend of innovative companies being attracted to metropolitan areas is well-known and thoroughly researched, much less is understood about the micro-geographical patterns emerging within cities. Considering the growing number of papers mentioning that agglomeration externalities attenuate sharply with distance, such an analysis of micro-scale localisation patterns is crucial for understanding whether these effects are of importance for technological startups. Using a sample of startups from the up-and-coming market in Central-East Europe in Warsaw, Poland, their spatial organisation across the years will be tracked to investigate whether there is a defined pattern consistent with highly localised externalities operating within cities and how this pattern evolves over time. Additionally, the paper will show how recurrent neural networks may help predict the locations of technological startup clusters. It will be presented how to include the spatial dimension in the model in a computationally effective way and how this augmentation improves the results by allowing the network to “understand” the spatial relations between neighbouring observations.
Hinweise
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
Technological startups are now increasingly locating in cities (Duvivier and Polèse 2018). A growing literature on the geography of startup locations shows that the diverse urban environments are becoming increasingly attractive for innovative businesses (Arauzo-Carod 2021). The idea of urban attractiveness dates back to the work of Jacobs (1960, 1969), as well as the Marshall-Arrow-Romer (MAR) framework (Arrow 1962; Marshall 1920; Romer 1986), which stands today as the classic approach to understanding business agglomeration. However, little is known about the specifics of this process for technological startups. The literature suspects that agglomerative economies are the most important drivers of this towards-city switch (Jang et al. 2017; van Oort and Atzema 2004). Yet more recent papers suggest that the positive effects of agglomeration have a strong attenuation with distance, and their most significant influence is experienced only at fine geographical scales (Andersson et al. 2016; Ferretti et al. 2022; Jang et al. 2017; Rammer et al. 2019). This may indicate that companies seeking to utilise these effects need to co-localise, creating dense intra-urban clusters of business activity.
The identification of intra-urban startup localisation patterns is important from at least two perspectives. On the one hand, it allows one to revisit the question of agglomerative economies’ importance for innovative businesses, tracing the possible influence of such effects at the micro-scale, which is rare in the literature. On the other hand, understanding the forces driving startup localisation decisions can provide a strong argument for policy-making, by showing where new startups tend to emerge, which may be used as an indicator for city planning. This brings us to the central question of this article. Do we see startup localisation patterns consistent with highly localised externalities operating at fine spatial scales within cities?
Anzeige
In order to investigate highly localised externalities for technological startups, their localisation decisions need to be tracked at a microgeographic scale, no larger than a city neighbourhood. Ideally, the startup localisation decisions should be observed at their exact georeferenced addresses and compared over time with the newly created companies to track the stability of the spatial pattern. Such analysis can be successfully conducted using machine learning (ML) methods on point data on startup location. Techniques such as DBSCAN or recurrent neural networks (RNN) can be effectively utilised to identify dense clusters of startup activity and to track and recognise nonlinear, nuanced patters of business formation across space and time at micro-geographic scale. Moreover, the usage of deep learning tools like RNN enables making predictions regarding the spatial patterns of the business location and forecasting future hotspots of startup formation within cities—a goal not so easily achieved with spatial econometric models. Such forecasts can be highly relevant for creating targeted startup support policies or land-use agendas for city planning.
The contribution of this paper is threefold. First, it adds to the literature on the attenuation of agglomeration externalities by verifying whether the spatiotemporal startup localisation patterns are consistent with the small-scale of impact of these effects. Second, methodological contributions are made, showing how modern ML techniques can be used in micro-geographical analyses, extending the possibility to quantify co-localisation tendencies and predict spatiotemporal trends. Starting with an in-depth introduction to neural networks in Sect. 3.3 and moving on to the applications discussed in Sect. 4.3, it is shown how deep learning techniques can be used to enrich analyses in regional science. Finally, the empirical analysis is conducted on a sample of technological startups in Warsaw, Poland, expanding the knowledge of startup activity in emerging markets of Central-East Europe.
2 Motivation and related literature
Despite growing globalisation forces and declining communication costs, today’s knowledge-driven economy continuously benefits from agglomeration externalities and clustering (de Groot et al. 2009). These effects are particularly important for innovative industries for which knowledge spillovers and information flows—forces hugely reliant on proximity—are key to developing ideas and creating innovative products (Jang et al. 2017; Rammer et al. 2019).
The importance of agglomerative economies may be observed in the location decisions of technological startups. The literature reports that there is a growing number of companies that choose to locate in urban spaces rather than suburban areas (e.g. Arauzo-Carod 2021; Duvivier and Polèse 2018). The shift in location trends has been noticed by the popular press, popularising terms like “Silicon Alley” for the new technological cluster in Manhattan and “Silicon Roundabout” for a similar structure created in London (Duvivier and Polèse 2018). While the general towards-city tendency has been well-recognised, little is still known about the intra-urban startup location patterns.
Anzeige
The concentration of economic activity (both in industrial clusters and in metropolitan areas) has positive effects for nearby actors. These agglomeration externalities have been intensively researched in the literature, investigating how they may impact the companies located within their range (e.g. Devereux et al. 2007; Jang et al. 2017; Neffke et al. 2010). However, a growing body of literature indicates that the spatial range of agglomeration externalities is smaller than previously expected, forcing a switch in analytical approach from the metropolitan to the intra-urban level. Among the micro-foundations of urban agglomeration economies, as defined by Duranton and Puga (2004), the most limited range of influence is related to the learning effects. The literature suggests that the most effective impact of these effects may be restricted to areas as small as city neighbourhoods (Andersson et al. 2016; Ferretti et al. 2022), or even lesser range of 50–250 m (Rammer et al. 2019).
Learning effects are particularly important for technological companies. Stimulation of knowledge flows and information sharing between creative individuals increases productivity and raises the chances of creating an innovative product (Boschma 2005; de Groot et al. 2009; Isaksen 2004; Neffke et al. 2010). While it may seem that the rising popularity of remote communication techniques should diminish the role of physical proximity, especially for tech-oriented companies, the literature proves otherwise. As A. Isaksen (2004) has shown, the knowledge specific to technological companies is tacit and complex, and face-to-face interactions are needed share it effectively. This may induce these companies to co-locate, creating clusters of innovative activity at the intra-urban level.
Despite the expected positive effect on productivity and innovativeness, localising within a cluster may be a hazardous decision for a startup. Dense co-location with similar companies makes supply—demand push—as the companies compete for production factors, skilled employees, investors, and customers. Moreover, especially in urban spaces, the greater presence of companies in an area generates upward pressure on office rental costs, creating an additional financial burden for startups (Huynh 2014; Jennen and Brounen 2009). When these negative forces prevail, startups may be driven to exit the market prematurely. Thus, anticipating the negative consequences of concentration, some technological startups may choose to locate outside of clusters, opting for remote city areas. Simultaneously, for companies that choose to take this risk, it seems that the unique quality of human capital available in the densest and most central locations is much more valuable than all the other possible disadvantages of congestion.
The previously mentioned reasons were purely based on rational considerations. However, as has been repeatedly proven, no human being is a perfect homo oeconomicus, and our decisions are usually dictated by less rational factors. In this context, it is important to mention the imitation mechanism, which can induce new companies to locate near their university or technical school, or the company from which they are (possibly) spun off (Berg 2010; Golman and Klepper 2016). Such a choice is usually dictated by an attempt to follow in the footsteps of active players in the market or by locating in areas known for their high human capital and access to skilled workers. Regardless of the nature of these factors, rational or non-rational, co-location patterns are what we can expect from emerging technological startups.
While cities as a whole offer a wide range of opportunities and amenities, the neighbourhoods themselves are much more limited in what they have to offer and therefore vary in terms of available services, access to amenities or public transport and quality of life. The importance of a city's neighbourhood, much smaller after all than a district, has been highlighted in concepts such as the 15 min city (Moreno et al. 2021). The idea of walking distance to all essential services has been widely popularised and is now the gold standard for vibrant, good neighbourhoods. This concept emphasises the importance of the micro-geographical characteristics of inner-city areas. Yet what is commonly understood by urban and social planners somehow escapes the general discourse of regional researchers. There is a tendency to focus on the overall spatial pattern and thus overlook micro-scale processes. The reason for this is unclear—there really should be no doubt that the same person who chooses their residential address based on micro-geographical factors will also be looking from a similar scale when choosing the location of their office.
Interestingly, the 15-min city radius and the ranges of highly localised agglomeration externalities found in the aforementioned works (Andersson et al. 2016; Ferretti et al. 2022; Rammer et al. 2019) seem to overlap. After all, what is a city neighbourhood or a 50–250-m radius if not a distance that office workers are willing to travel when looking for a lunch spot or a good café? This intuition is supported by the fact that the most spatially constrained element of agglomeration effects is the learning mechanism. Learning that, in the case of technology companies, must take place in face-to-face interaction (Isaksen 2004), perhaps over coffee or a business lunch. People’s comfort has positive effects for the companies they start(up) or where they are employed. The micro-geographical effects determining how people perceive their neighbourhood will be evident in the choices their businesses make. These insights provide an even stronger rationale for micro-geographical research on the location of startups.
Acknowledging the importance of the agglomeration externalities for innovation creation and their sharp attenuation with distance, I expect that technological startups will co-locate and create intra-urban clusters of innovative activity. If such a pattern can be observed at the micro-geographic level, one might suspect that technological startups utilise the highly localised externalities operating at fine spatial scales within cities. Simultaneously, it may be that some startups choose remote parts of the city, localising further away from the areas of business concentration. These contradictory trends of concentration and dispersion in city space leave us with an open-ended question: which of these patterns will be identified and which will prove to be relatively stronger for technological startups? Answering this question will be the main objective of the empirical part of this study.
3 Study design and dataset
3.1 Sample
Although there have been many attempts in the literature to standardise the definition of “startup” (e.g. Nauman and Edison 2010; Reisdorfer-Leite et al. 2020), there is still no consensus about the description of this term. In their mapping study, Paternoster et al. (2014) identified several themes which dominate the literature on technological startups. According to the papers they screened, startups are usually defined as new companies that lack resources, have a small, inexperienced team of employees, develop mainly one innovative product and evolve rapidly in a very uncertain environment (Paternoster et al. 2014). However, depending on the focus of a particular study, different factors were taken into consideration when deciding whether a company is a startup or not. Thus, this study adopts the following definition of a technological startup—it is a newly founded business entity (operating for up to 5 years) whose main specialisation is in the domain of computer technology and software.
Specifically, the total sample consists of 11′100 business entities established between 2010 and 2018 in capital city of Warsaw, Poland. The period of analysis was chosen to fit between the global financial crisis and the COVID shock, providing a long window of stable spatial pattern development. The companies were selected concerning their specialisation statement. The firms that will be included in the study specialise in manufacturing computers and their peripherals, producing electronic and optical products, publishing of computer games, computer programming, software-related consultancy, or conduct other information technology and computer service activities. Company information was obtained from the ORBIS database and address information was geocoded with the usage of MapQuest API (MapQuest 2018).
The choice of city for this analysis was driven by several factors. Firstly, while there is abundance of research on innovative businesses in developed economies such as the USA, the United Kingdom or Germany (e.g. Banal-Estañol et al. 2019; Florida and Mellander 2017; Geibel and Manickam 2016; Pisoni and Onetti 2018), a very limited amount of papers has been devoted to the emerging markets of Central-East Europe. Secondly, the Warsaw startup community is growing rapidly and attracting larger sources of funding from both domestic and foreign capital. After much anticipation, the first “unicorn”, a startup estimated to be worth more than 1 billion USD, was announced in 2021 (Bełcik 2021). The presence of “unicorns” is a signal to investors that the market is highly attractive and profitable (Suwarni et al. 2020). Thirdly, Warsaw, with a population of 1′794′200 and an area of 517.24 km2 (Urząd Statystyczny w Warszawie, 2021), is one of Europe’s metropolises. Being a highly heterogeneous city, with areas of business concentration and less populated peripheries, it creates an interesting case study for different location opportunities and consequences of such choices for startups. Fourthly, Warsaw has its own special circumstances, which makes it a great area for location studies. Unlike most American cities, Warsaw does not have strict plans that dictate where and where not to locate a business. This makes business location truly dictated by the factors unique to startup founders. Additionally, Warsaw’s urban organisation has gone through dramatic changes over the last century. After being almost completely destroyed in 1944 and rebuilt by the communist government, Warsaw’s urban planning has undergone significant transformation over the past decades, which is unusual for other cities in Europe and beyond. Moving forward in time, a new, modern Warsaw with a vibrant business centre has only emerged in the last few years, after the financial crisis and even more so after the Brexit shock (Fig. 1). Inspiring instances of regeneration and rapid construction of new high-rise buildings continue, creating new attractive business areas that attract companies and change the fabric of the city as a whole. These specific conditions of Warsaw make it an even more interesting case for spatial research, showing how forces previously studied for already developed markets operate in a dynamic, almost unlimited environment. As a result, the conclusions drawn for Warsaw can be successfully applied to other cities around the world.
Fig. 1
Towarowa Street in Warsaw in 2011 vs in 2021—previously empty area turned into a part of CBD, containing e.g. Google and Meta offices.
Following the call for analysis of entrepreneurial activity at a “very low level of aggregation” suggested by Guzman and Stern (2016), the localisation choices of technological startups will be here investigated from the micro-geographic perspective. Individual localisation points will be considered, created by geocoding the business address with an accuracy of 0.1 m1—such precision of geocoding allows for accurate inference based on the methods used later. The highest aggregation level that will be used is a 1 km per 1 km grid, the size of which is dictated by the availability of a population grid derived from the census (Portal Geostatystyczny 2021).
3.2 Methods
The empirical strategy of this paper is organised in three stages: recognition of startup concentration in urban space, identification of clusters based on the density of startups at the single company level, and prediction of future cluster localisations with 1 km precision. The main objective, which is to verify if localisation patterns consistent with sharply attenuating agglomeration externalities can be observed, will be achieved with a mixture of ML and deep learning methods2 (Fig. 2).
Fig. 2
Structure of the research methods, data formats and expected results.
Source Own work in diagrams.net software
×
3.2.1 Preliminary analysis of the sample and its spatial context
To build the reader’s intuition about the considered situation, the results of the visual exploratory data analysis will be presented. Figure 3A presents the total sample of startups and the administrative borders of Warsaw with the background map published by the Open Street Map association (OpenStreetMap 2022). The city is divided into 18 districts, which are diversified from the perspective of urban organisation and business concentration. This difference is well-visualised by the raw presentation of the sample plotted on the city map (Fig. 3A). It can be seen that there are some preferred localisations that are commonly chosen for startup localisation (mainly downtown), while there are also neighbourhoods that are less preferred and probably economically disadvantaged, especially on the eastern and southern side. These districts were only incorporated into the administrative boundaries of Warsaw in 2002 and are now on a convergence path to the city centre. There is also a visible division of the city indicated by the Vistula River. This division is created by a path dependency, resulting from Warsaw’s past development—the presence of the Old Town and municipal government in the west vs poorer residential areas in the east. It can be seen that there is a higher density of innovative companies on the west side of the city, while the east side has much lower concentration of companies. Despite lower startup density on the eastern side, the area has shown increasing potential to attract innovative companies in recent years. Looking at the grid representation of the sample in Fig. 3B, it can be seen that the distribution of companies is uneven in the space. There is a much higher concentration of startups in the city centre and south-west side than in the remaining area.
Fig. 3
Administrative boundaries of Warsaw and distribution of the total sample of startups founded between 2010 and 2018. Details: A shows the administrative boundaries of Warsaw, including the division into districts, and the point distribution of the total sample (all startups established between 2010 and 2018 in Warsaw). The saturation of an individual point is set to 50%. The more points located in an area, the more intensive the overall colour will be. B shows the distribution of companies from the total sample, captured as counts of founded companies per grid cell. The grid distribution follows the structure of the 1 km x 1 km census grid from the INSPIRE project, which is also utilised in Sect. 4.3.
Source Own work in R, utilising the OpenStreetMap background map for Warsaw, Poland
×
3.2.2 First stage—concentration tracking with kernel density estimation
In the first stage, areas of concentration of startup activity in the city space will be recognised. This will be achieved by applying a two-dimensional kernel density estimation with 25 bins on geocoded business locations (Fig. 2). This method recognises the concentration of points on a two-dimensional plane (Wand and Jones 1994). This means that it can be effectively utilised to identify hotspots of startup activity, identifying the city areas that attract the most companies and form the most concentrated business clusters. Applied to the total sample, this method can show the overall pattern of startup concentration in Warsaw, identifying the areas which are relatively the most attractive for technological companies. By repeating the analysis on annual subsamples of newly created companies, it is possible to discern potential changes in hotspot locations.
Technically, kernel density smoothing is a nonparametric estimation of a density function in which points are assigned relative importance depending on the number of points around them (Wand and Jones 1994). The results obtained for individual points are then smoothed to produce a density function estimate (Wand and Jones 1994). In the literature, this method has been successfully applied in cases of hotspot identification and multimodality recognition (see e.g. Hart and Zandbergen 2014; Hu et al. 2018; Lin et al. 2011; Silverman 1981; Wand and Jones 1994).
3.2.3 Second stage—cluster identification with DBSCAN
While results from the previous stage can show general trends in the intra-urban startup location pattern, especially from the concentration perspective, the next stage of the analysis will focus on recognising and quantifying clusters of startup activity (Fig. 2). In general, it can be said that for a cluster to form, a big enough number of companies must locate closely next to each other. This working definition points directly to the fact that clusters depend on the density of business locations. Automatic identification of such density-based clusters can be carried out with the DBSCAN method, for which “big enough” and “closely” are just parameters to be tuned.
Specifically, DBSCAN is a density-based clustering method that can identify clusters of arbitrary shapes (Ester et al. 1996). The algorithm attempts to find a concentration of a certain minimum number of points (“big enough” parameter: minPts), which are located within a particular distance from each other (“closely” parameter: eps, interpreted as reachability distance radius). If such a concentration is found, a cluster is recognised (Ester et al. 1996; Schubert et al. 2017).
The results from this method will show how many clusters were created by technological startups, how large they were, where exactly they were located, and, finally, how many companies were not assigned to any groupings. These results will shed light on the co-location tendencies among startups, while comparing the relative strength of concentration and dispersion trends in the urban space.
3.2.4 Third stage—recurrent neural networks
Once the existing trends and patterns of startup localisation have been recognised, one may wish to look ahead and try to predict the areas with the highest startup attraction potential in coming years at an intra-urban level. Such knowledge can be particularly beneficial from a policy-making perspective and can have a significant impact on urban planning.
To achieve this goal, a model will be build that is able to predict whether a city area will be a part of a startup cluster (Fig. 2). This part will use aggregated data, where the total city area will be divided into 1km2 grid cells.3 The model will take information about the attractiveness history of an area (considered as a 3-year sequence of the share of startups located in a given grid cell in consecutive periods) and predict whether a cluster will be formed there in the upcoming period (the exact cluster localisations are taken from the DBSCAN results).
To find such connections, the model used here must be capable of learning (fuzzy) patterns within the dataset and providing accurate predictions—a problem that neural networks can be effectively solve. In this case, a recurrent neural network (RNN) will be the most suitable, as it allows for processing time series and text data, discovering patterns hidden in data sequences. This is an extension of the standard neural network procedure, allowing the network to retain a “memory” of previously processed inputs (Medsker and Jain 1999). In recurrent neural networks, the same weights are recursively applied to structured, sequence-sensitive inputs in a linear form—combining previous time steps and information drawn from the next input (Medsker and Jain 1999). RNNs have been successfully utilised in numerous use cases, including short panel data (Fan et al. 2017; Gu et al. 2019; Zhang and Man 1998).
Utilising RNN in this paper’s scenario will allow the model to learn information stemming from the attractiveness sequence, remembering how the startup concentration has evolved over time, and connect it to the probability of cluster formation. Adding spatial lag as a second variable will allow the model to further account for spatial relations within the sample. The results from this step will provide information about the stability of concentration patterns and provide an accurate prediction of future cluster localisations with 1 km precision.
3.3 Neural networks—a method for modelling nonlinear data patterns
Before moving on to the empirical results, it would be useful to provide some basic description of how neural networks work and why they can be a beneficial tool for regionalists. Therefore, this section will outline the most important concepts related to neural networks (NNs).
When describing how a neural network works, it is important to start with the very core of this model, which is the single neuron, developed from the inspiration of the brain's neuronal cells in 1943 (McCulloch and Pitts 1943). A neuron, in mathematical terms, is a function that takes an input, transforms it with an appropriate weight, applies a (activation) function to the resulting output, and then passes this output on. The goal of this transformation is to match the final output of the network to the true labels of the input data (like the y-variable in OLS). A single neuron (a so-called perceptron) forms the most basic structure of a neural network model (Fig. 4). Note that by setting the activation function g as sigmoid (logistic) in the form \(\sigma \left(z\right)= \frac{1}{1+{e}^{-z}}\), the model is equivalent to a logistic regression (Lee and Almond 2003).
Fig. 4
Possible structures of neural networks—a single neuron, b NN with one layer, and c NN with two layers.
Source Own work in diagrams.net software
×
The output of a single neuron is represented by the following equation:
where g is the activation function, w0 is a constant, d is the number of input variables, wi is a weight applied to a given input, and xi is a value of a given input. The neuron takes d input variables \({x}_{i}\), which are transformed with weights \({w}_{i}\) to match the learned pattern. The weights are calibrated during the model training. A weight \({w}_{0}\), or a constant, is added to account for possible additional influences on the data that are not explained directly by the inputs. The activation function g is applied to the result of input weighting. Various forms of activation function are used in practice, e.g. binary, linear, sigmoid (logistic), tanh (hyperbolic tangent), softmax, ReLU, etc. (Sharma et al. 2017). The form of the activation function for each network layer is set according to previous use cases of similar network structures for similar contexts. The activation functions can also be reparametrised in search of hyperparameters along with the number of neurons in the final finetuning of the model.
Since a single neuron only recognises linear patterns in the data, neurons can be combined to form layers (Fig. 4). A layer in neural network can be understood as running a set of simultaneous regressions, the individual results of which are then combined (flattened) to provide the final output of the network. In more complex settings, where patterns in the data are more nuanced, it is common to stack several layers together, creating a deep neural network (Fig. 4C). Usually the layers are fully connected—meaning that the outputs from each neuron from the previous layer are passed as inputs to each neuron in the next layer. This mechanism allows the model to catch nonlinear complex patterns in the data with maximum efficiency.
A neural network “learns” the patterns in the data by shifting its weights (which is similar to fitting coefficients in OLS). The procedure of weights shifting is an iterative process, where an error measure (loss) is reported at each stage. The error measure is feedback to the model about how distant its predictions are from the true data labels (or from the true values of the dependent variable). Error measures are matched to the characteristics of the data—for predictions of continuous values, the root-mean-squared error (RMSE—commonly used in OLS evaluation) can be used, while for predictions of categorical variables, an entropy-based measure such as binary cross-entropy is usually used.
The pace at which the model reshuffles its weights is dictated by the learning ratio parameter. The higher the learning rate, the greater the changes in weights with each iteration. With each learning iteration (epoch), the learning rate should decrease, making the model more cautious about future changes in the weights. This behaviour is ensured by the specification of learning rate annealing, a parameter responsible for adjusting the learning rate as changes in the loss function become flatter.
Time series data provides its own set of challenges for NN modelling. In the standard setting, the NN model assumes that each input provided is independent of the others. Using this assumption, the model attempts to understand the pattern for each input individually. For time series data, it is important to capture the connections between data points that result from their sequential nature. This challenge is solved by recurrent neurons, which do not simply pass learned information to the next layer (a feedforward mechanism), but train on the output of one time step to find a connection to the next. By separating the information from the sequence and examining the pattern at each time step, the network is able to "retain memory" of previous events while considering subsequent events. Variations of this “memory” mechanism are used in recurrent neural networks (RNN), long short-term memory Loss (LSTM), and many others. The use of recurrent neurons allows the model to learn temporal patterns even on short time series. This ability, combined with the recognition of non-linear data patterns, makes RNNs much more effective in time series forecasting than any “traditional” econometric approaches.
Neural networks, like many other ML techniques, are often described as black box models. This means that it is not possible to easily look at their final structure and understand the (fuzzy) relationships that have been discovered in the data. Compared to models such as OLS—the most well-known white-box—black-box models do not offer the possibility to explain the phenomenon they model. This situation is slowly changing with the emergence of new developments in explainable artificial intelligence (XAI). However, there is still much room for improvement in this area. Thus, ML techniques are not a competition to econometric models, they are simply another statistical approach that focuses more on predictive power than explanatory power. Keeping in mind this divide between predictive vs. interpretability and linear relationships vs. fuzzy patterns, one can successfully draw from both worlds and improve their research.
Accounting for the spatial dimension in ML or NNs is a growing area of research that is still quite underdeveloped. An excellent overview of the current state of the art in spatial machine learning can be found in Kopczewska (2022). Fitting into the same theme, this paper aims to show the usefulness of NNs in regional science and presents a fresh perspective on machine learning methods in spatial analysis.
4 Results
4.1 Results of the hotspot analysis
The first part of the formal analysis will consider the startup concentration in general. It will be investigated whether any significant hotspots have emerged and if such trends are evident in annual samples or only in the whole sample.
Figure 5 shows the result of kernel density estimation for the total sample. It appears that startups, in general, are localising densely on the left side of the Vistula. There is some concentration on the eastern side of the map, but it is much smaller than that identified on the western side. One very visible hotspot can be observed in the city centre, where startup concentration is significantly higher than in other parts of the city. However, three other concentration areas in the southern part of the city can be also observed. These are less concentrated than the one in the centre, but are indeed present. The startup localisation structure in Warsaw is polycentric and not homogeneously distributed in the urban space.
Fig. 5
Kernel density estimation results for the whole sample of startups.
Source Own work in R, utilising the OpenStreetMap background map for Warsaw, Poland
×
Tracking the yearly subsamples of newly created startups, it can be observed that the concentration pattern changed over time (Fig. 6). The most visible tendency observed in the yearly subsamples is the consolidation of central hotspots. While in 2010, only some general concentration of startups on the left side of the Vistula River could be observed, in the following years, startups more intensively chose central hotspots rather than the western part of the city. Between 2016 and 2018, the importance of the central hotspot grew, until much of the startup concentration was localised in the most central, prestigious city area.
Fig. 6
Kernel density estimation results for yearly subsamples of newly created startups.
Source Own work in R
×
The kernel density estimation results show that technological startups tend to co-locate. The strength of the concentration trend is changing over time, but the conclusion remains constant—the localisation pattern in each year showed significant hotspots of startup concentration. In these hotspots, the companies are located densely, and their concentration is much higher than in the rest of the city. These results suggest that technological startups may follow spatial patterns consistent with highly localised externalities operating in cities. However, to assess the strength of the concentration trend, it is necessary to compare the number of co-located and dispersed companies within the city area. The DBSCAN method will be used for this purpose, and the results will be discussed in the next section.
4.2 Results of DBSCAN
DBSCAN is a method that can track even small groupings of companies formed within a target distance from each other. Such an analysis can show how many clusters are created, their sizes and how many companies locate outside of the business groupings. It allows tracking the information omitted by the previous method and verifying the relative strength of the concentration and dispersion trends.
First, it will be checked how many clusters can be identified in the entire sample of startups. With the parameters eps = 0.00354 and minPts = 20, the algorithm recognised sixty clusters (Fig. 7). Their sizes varied between 4625 to 9 observations in each. The largest cluster (dark red points located centrally on Fig. 7) was localised in the city centre, on the west Vistula’s bank and coincided with the hotspot identified by the previously discussed method. Despite the identification of that many groupings, there is also a large share of companies that locate outside clusters (Fig. 7). In the total sample, the method has recognised 8288 such firms, which represent 74.7% of the analysed sample.
Fig. 7
Density-based clusters identified in the total sample. DBSCAN parameters: eps = 0.0035 (approx. 238 m) and minPts = 20. Clusters are ordered and coloured according to their size, e.g. cluster of order = 1 is the biggest grouping with 4625 observations, cluster of order = 60 is the smallest grouping with nine observations. Grey points represent startups that were not allocated to any cluster.
Source Own work in R
×
The number and size of identified clusters differed throughout the years (Fig. 8). In 2010, of 821 startups founded, only 30% were localised in nine clusters (Table 1). The remaining companies were dispersed in the urban space with a slight preference for locations in the western part of the city. In 2011, due to the lower number of founded companies, only two clusters were identified—one of which was located directly in the city centre. In the 2012–2016 samples, a growing number of groupings were recognised, with an increasing size and significance of the central cluster. The share of startups located outside of any grouping was decreasing—from 52.9% in 2012 to 21.2% in 2016 (Table 1). However, in 2017 and 2018, this pattern seemed to change. Fewer clusters were identified, and most of the grouped companies were located in the city centre, concentrating in much smaller areas than before. Simultaneously, the share of dispersing companies increased to the levels identified in 2014 (the share of companies dispersing in the urban area was 35.1% in 2017 and 36.4% in 2018). However, while more companies decided to follow the dispersion trend, the concentration of remaining businesses was stronger than ever. Clustering tendencies were much more localised, following a pattern consistent with agglomeration externalities operating at fine spatial scales.
Fig. 8
Density-based clusters identified for yearly subsamples of startups. Yearly subsamples include only technological startups which were founded in a given year, while the results of the total sample show clusters recognised for the aggregated group of all startups founded between 2010 and 2018. DBSCAN parameters: eps = 0.006 (approx. 408 m) and minPts = 10. Grey points represent startups that were not allocated to any cluster.
Source Own work in R
Table 1
DBSCAN results in detail
Source own work
Sample
Companies in the sample
Number of clusters
Companies in clusters
Companies outside clusters
Share of companies in clusters (%)
Total sample
11,100
60
2812
8288
25.3
2010
821
9
248
573
30.2
2011
304
2
81
223
26.6
2012
978
16
461
517
47.1
2013
1360
17
767
613
56.4
2014
1538
28
1002
536
65.1
2015
2105
28
1615
490
76.7
2016
2324
30
1831
493
78.8
2017
832
7
540
292
64.9
2018
838
6
534
304
63.7
Yearly subsamples include only technological startups which were founded in a given year, while the results of the total sample show clusters recognised for the aggregated group of all startups founded between 2010 and 2018
×
Some changes have been observed in the spatiotemporal localisation pattern of technological startups. Over the years, the concentration trend has strengthened. Initially, this tendency was evident in the increasing share of clustering startups and then in the increasing density of the newly formed groupings. While the share of clustering startups fluctuates, the concentration trend is relatively stronger than the dispersing tendency. Clusters created by startups are dense and concentrated in similar localisations—following a polycentric pattern with hotspots located on the west side of the city. Following this insight, it can be concluded that most technological startups form a spatiotemporal localisation pattern based on concentration and co-location. These two elements, binding companies together in small distinct city areas are consistent with highly localised agglomeration effects. Thus, it can be suspected that small-scale agglomeration externalities may be important forces influencing the intra-urban location decisions of most technological startups.
4.3 Recurrent neural networks in predicting the startup clusters
Knowledge of trends in startup localisation patterns can be very beneficial. Being able to predict future hotspots can be helpful in accurately planning urban infrastructure, locating startup support facilities or proposing local programmes promoting entrepreneurship in less preferred areas. Although the theoretical roots of intra-urban startup location decisions are not yet fully understood, it is possible to utilise deep learning methods such as NNs to “learn” the nonlinear patterns available in the data.
4.3.1 Model structure
The aim is to build a model that can predict the locations of startup clusters identified using the DBSCAN method. As the goal is to track a strongly spatially autocorrelated process, it is crucial to incorporate the spatial dimension in the model. It is a common practice to use convolutional neural networks for spatial data modelling. However, this approach works best for long time series, following preferably not more than a few distinct data points. As our data is a short panel (601 grid cells observed over 9 years), this approach was not available. Therefore, it was decided to use RNNs that can follow short-term relationships between observations and include the spatial dimension as an additional feature.
Specifically, the model will be fed with time series data, containing information on the fraction of companies localised in a particular grid cell over the last three years.5 In the second specification, the model will be augmented with an additional spatial feature that will store information about the first-order spatial lag of the former variable (spatial average of the fraction of companies founded in a given year from adjacent cells).
Inputs for the model were created as follows. Firstly, counts were made of the number of companies founded in each grid cell. Then, counts were transformed to fractions—each number assigned to a cell was divided by the yearly sum of established companies. The data were then rescaled to 0–1 to keep a consistent scale across the samples, which is a standard procedure for NNs (Beck 2018). Additionally, the first-order spatial lag of these values was calculated with a contiguity-based queen spatial weight matrix. The data were then reorganised into 4-year sequences. The 3-year sequences of companies’ fractions are used as model input, while the information on the cluster’s presence in period T + 1 is treated as model output.
A four-layer neural network structure is used, where the first three layers are fully connected RNNs and the last layer is a perceptron with a single output. The general specification is shown in Fig. 9. The RNN layers are fed with T-dimensional inputs, returning full sequences to their successors. The output is eventually flattened in the last RNN layer to fit into simpler neurons. After each RNN, a random dropout ratio is applied, which controls the fraction of weights learned in the previous training iteration (epoch) that will be removed (forgotten in the network) and tuned again (Gal and Ghahramani 2016). This approach helps the model to constantly change its parameters, finding a solution that will be suitable for the whole phenomenon rather and not just the training dataset. Particularly in RNNs, where there are many neurons with numerous repetitions, it is crucial to introduce measures such as random dropout ratio to help alleviate the overfitting problem (Gal and Ghahramani 2016). The learning ratio annealing parameter is additionally used to optimise the learning pace as the model reaches a plateau. The first model specification uses only one feature—which is the scaled fraction of companies in a given grid cell provided in 3-year sequences. The second specification uses two features, including the spatial lag of the first variable. The specific number of neurons in each layer, dropout ratios, learning rate annealing parameter and an optimiser algorithm were tuned with the hyperparametric search.6
Fig. 9
General structure of the neural network.
Source Own work in diagrams.net software
×
Using annual data from 2010 to 2018, six sequences were created for each grid cell (with T covering 2012–2017). The sequences for T = 2017 were set aside for final model testing (with T + 1 = 2018 period for prediction). With the remainder of the sample, the decision was made to create a 70%/15%/15% split of the observations. The sequences were randomly split training, testing and validation samples. The split was done according to the grid ID to avoid data leakage. This ensured that if a grid cell was chosen for testing, so was any sequence associated with it. The final samples consisted of the training dataset (2100 observations from 420 grid cells), the testing dataset (4555 observations from 91 cells) and the validation dataset (450 observations from 90 cells).
For both specifications (univariate or bivariate model), optimal parameters were sought within the ranges shown in Table 2. Due to computational restrictions a hyperparametric search was done with a 5% sample of all combinations. Models were trained on 50 epochs with a batch of size 50. They were evaluated using the binary cross-entropy measure as a loss function, which is a metric based on the Kullback–Leibler information theory commonly utilised for optimising any binary classification problems (Ramos et al. 2018). In the simplest terms, it can be interpreted as the distance between the distribution of the true event and the estimated probability distribution for empirical data—the lower the cross-entropy is, the better the model is at explaining the phenomenon. Additionally, the accuracy was reported for a better understanding of the explanatory power of a model. By choosing the model with the minimum loss value on the validation sample, the following sets of parameters were obtained: {N1 = 128, N2 = 64, N3 = 16, d1 = 0.2, d2 = 0.3, d3 = 0.3, lr = 0.1, “rmsprop”} for univariate model, and {N1 = 32, N2 = 16, N3 = 16, d1 = 0.2, d2 = 0.2, d3 = 0.3, lr = 0.05, “rmsprop”} for bivariate model. Quality metrics for the final models are shown in Table 3. The statistical distribution of the obtained predictions for the test sample and hold-out sample are presented in Fig. 10 and summarised in Table 4.
Table 2
Parameters tested in the hyperparametric search.
Source own work
Parameters
Options
Neurons in layer one (N1)
{128, 64, 32}
Neurons in layer two (N2)
{64, 32, 16}
Neurons in layer three (N3)
{32, 16, 8}
Dropout ratios for RNN layers (d’s)
{0.2, 0.3, 0.4}
Learning rate annealing (lr)
{0.1, 0.05}
Optimiser
{“rmsprop”, “adam”}
Table 3
Results of the final models.
Source own work
Measure
Model one-variable
Model two-variables
Binary cross entropy training
0.3351
0.2519
Binary cross entropy validation
0.2976
0.2298
Binary cross entropy test
0.2981
0.2358
Binary cross entropy hold-out T = 2017
0.1294
0.0965
Accuracy training
0.8690
0.8967
Accuracy validation
0.8711
0.8844
Accuracy test
0.8615
0.8857
Accuracy hold-out T = 2017
0.9733
0.9750
Fig. 10
Distribution of the predictions from both models on test sample and hold-out sample.
Source Own work with vioplot:: package
Table 4
Statistical distribution of the predictions from both models on hold-out and test samples.
Source own work
Sample
Variables in a model
min
mean
var
sd
q25
median
q75
max
Test
1
0.068
0.193
0.043
0.206
0.071
0.088
0.231
0.987
Hold-out
1
0.069
0.127
0.016
0.126
0.071
0.071
0.118
0.989
Test
2
0.013
0.177
0.060
0.245
0.026
0.034
0.253
0.994
Hold-out
2
0.013
0.084
0.023
0.150
0.021
0.026
0.099
0.995
×
4.3.2 Results and predictions from RNN
Using RNNs, two models were built that successfully predict the location of the future startup cluster (accuracy on test data of 0.86 and 0.88, respectively; meaning that 86–88% of observations were correctly classified for the test set).
The statistical distribution of the prediction in both models is quite similar, suggesting the stability of prediction across competing specifications (Table 4). However, the bivariate model provides a wider range of results, with the possibility of predicting unattractive areas with higher efficiency (lower minimum scores and higher distribution weights along the lowest values). The prediction distribution from this model provides a smoother shift between the lowest and highest probabilities (Fig. 10), suggesting a higher efficiency in recognising patterns in the middle range of observations. This means that the spatial model is much more sensitive to the nuanced patterns in the data. That insight is confirmed when comparing model metrics.
Bivariate model has better results for the loss function (binary cross-entropy) as well as for the accuracy. Its results are also more stable across samples—similar accuracy scores are obtained for training, validation and testing datasets. This means that bivariate model provides more precise predictions, even for previously unknown samples. Once the spatial dimension was included in the model, the RNN performed better in recognising spatiotemporal patterns. Notably, these better results were achieved with a simpler model specification (fewer neurons than in the univariate model). By enriching the a-spatial RNN model with spatial features, more robust results are obtained for spatial panel data.
Considering the results for the hold-out sample T = 2017, it can be seen that the models perform much better than in the testing scenario (exceptionally low scores of cross-entropy measure and accuracy with values as high as 0.973 and 0.975). Such good results are probably because temporal patterns present at the grid cell level have already been learned by RNNs in previous stages. In this case, only one time step has been added—relations from T = 2017. Having learned the patterns for past time sequences, the model easily recognised the next outcome for each grid cell. With information for 2015, 2016 and 2017, the spatial model correctly classified 97.5% of the grid cells. Looking only at the numerical results (Table 3), it seems that this one-step-ahead forecast works extremely well for both specifications, with a slight advantage for the bivariate model.
The Moran's I statistic on the final predictions of both models clearly show the advantage of including spatial lag in the RNN specification. Moving from an univariate to a bivariate model, the Moran's I coefficient changes from 0.626 to 0.815 (both results are significant at the 0.001 level). This means that including a spatial lag in the training sample helps the model to capture spatial dependencies in the data with higher accuracy. This allows the RNN model—successfully used for time series analysis—to be transformed into an effective spatiotemporal prediction tool.
This observation can be further verified by visually examining the results—by comparing the predictions of the two models, it is easy to see that the bivariate model gives more stable results in space (Fig. 11). Adding a spatial lag increases the model’s ability to navigate the spatial correlation between the results. The obtained predictions are smoother and more accurately match the true cluster localisations in 2018. There is also much less noise in the predictions (compared to the scattered pattern resulting from the univariate model). The bivariate model can account for spatial dependencies in the sample and can produce more robust results, identifying preferred urban areas for startup clustering.
Fig. 11
Cluster prediction probabilities and presence of true clusters in T + 1 = 2018—models tested for the hold-out sample of 2017 on the Warsaw 1kmx1km grid.
Source Own work in R
×
An assessment of the predictive power of the RNN model can only be made by focusing on the qualitative metrics reported for the test and hold-out sample. These metrics show how well the model has achieved its goal—how close its predictions are to the true values of the modelled phenomenon. While for simpler NN structures it is sometimes possible to compare a ML model with econometric methods, in the case presented in this paper, such an approach is simply not feasible. The differences in the modelling approach here are too significant. While traditional econometric models typically deal with a single time series for analysis and forecasting, here the RNN operates on 2100 short data sequences to unravel the data generation process. Additional fundamental differences are represented in the lack of assumptions imposed by the NN on the functional form of relationships within the data or on the statistical properties of the data generation process. This blind approach allows RNN to capture hidden relationships within the data. Any fuzzy pattern identified in the data is then mimicked in the weights learned by the model. While this approach does not give us any information about the interpretability of the process, in terms of prediction, it performs brilliantly (the bivariate model achieved a 97.5% classification efficiency on a hold-out sample). In terms of spatial predictions, visual assessment can be also used to further ensure that the predictions from the model are a good representation of reality. This concept is explored in the next section.
4.3.3 Further insights from the RNN predictions
Using the bivariate model, it can be predicted that in 2018, there will be two main areas that will attract most startups (Fig. 11B). There is a great chance that a large cluster will be present in the city centre (bright yellow area), but one may expect clustering towards the south of the city centre (Mokotów region–light green squares). It can be expected that although companies are likely to locate in other city areas, they will be rather distant from one another, and no other innovation cluster will emerge (dark blue colours in most parts of the city). A greater number of startups are expected on the left side of the Vistula (lighter colours in the western city area). However, the area in Praga Północ (east of the city centre) may become more popular (lighter blue squares just outside the bright yellow hotspot). It is likely that this notion could be reinforced by some targeted policies that would increase the attractiveness of this urban area.
Those predictions can be easily verified. Following the empirical data from 2018 (Fig. 12), it can be seen that most of the information extracted from the model’s prediction holds true. Indeed, in the 2018 sample, a large cluster emerged in the city centre, followed by another bigger grouping south of this cluster (Fig. 12B). The remaining companies were spread loosely across the city, with a preference for the left side of the Vistula (Fig. 12B). While an increasing number of companies located in the Praga Północ region can be observed (a growth in concentration east of the city centre, the city district marked with a dark blue asterisk in Fig. 12A), there are still no startup clusters there (Fig. 12B).
Fig. 12
Spatial organisation of the startups founded in 2018. Figure A shows the results of kernel density estimation for the 2018 sample. Figure B shows DBSCAN results for the 2018 sample with parameters: eps = 0.006 (approx. 408 m) and minPts = 10. Grey points represent startups that were not allocated to any cluster. Dark blue asterisk is marking the Praga Północ region on Fig. 11A.
Source Own work in R
×
4.4 Discussion of the results
The neural network model presented in this paper works very well in predicting the localisation of startup clusters in Warsaw, Poland. It can also provide valuable information about general trends in the popularity of specific city areas at the very detailed scale of 1km2 grid cell. Powered by the current data, the model can be a very valuable tool for policy-making. However, these results are not limited to this one specific city. Following this successful model structure, it can be easily recalibrated to make predictions for another metropolis. As shown in the article, even short panel data can be used in this approach. In this case, having data on the addresses of newly founded startups across nine years was enough to build a model that predicts innovation clusters with more than 95% accuracy.
Additionally to their methodological value, the results from this section also contribute to the main thesis of the paper. The persistence of the concentration trend of technological startups allows for accurate predictions even with a considerably simple modelling structure. While there have been changes in the size of clusters and their exact location over the years, it turns out that a 3-year sequence of the attractiveness of an area is sufficient to decide whether a new cluster will be formed there in the upcoming year. Moreover, the effect of including the spatial feature is so effective due to the co-localising trend. When startups locate densely in one location (to utilise the small scale agglomeration externalities), the density of startups is likely to be high in the direct neighbouring areas as well (because externalities spread across space). The co-localising tendency discovered in the previous sections improves the stability of the model results, showing that the clustering process is highly dependent on micro-geographical trends appearing in the neighbouring area. Accurately predicting future hotspots of technological startup activity is possible because these companies benefit from the highly localised externalities that require them to locate near areas that have already proven attractive to startups in previous years.
5 Conclusions
Technological startups create intriguing localisation patterns at the intra-urban level. Tracking their localisation choices at a micro-geographical scale allows one to see that innovative business activity is not evenly distributed across urban space. This paper shows that technological startups tend to co-locate and create dense clusters of business activity in urban space. Such a pattern is consistent with highly localised agglomeration externalities operating at fine scales within cities.
What has been observed at the macro-scale—the towards-city switch of startup location—turns out to be only a starting point for describing the actual localisation patterns of innovative business observed from a micro-perspective. The intra-urban localisation patterns of technological startups suggest that different parts of the city are valued differently by entrepreneurs. Companies are not attracted only to the metropolitan area, but rather to the dense business clusters located within it. The size of these groupings and the density of business activity within them seem to correspond to agglomeration effects operating at small scales within cities.
This paper also contributes to the literature from a methodological perspective. It shows how machine learning tools can be used in spatial research, advancing the field of spatial data science. The most significant input from this perspective is the example of how neural networks can be helpful in regional science, enabling the prediction of spatial patterns in data. With only a few historical data records for a given location, future occurrences of a given spatial process can be predicted with RNNs. This type of model can be easily improved by using a spatial feature which stores the spatial lags of the main input variable. Such a simple operation allows the use of a non-spatial model in the spatial context at low computational cost. The model presented in this paper can be easily utilised for future data points, helping city authorities to target their future startup support programmes. Although there is still much to be discovered about the dynamics of the startup location process, machine learning tools can be built to assist in decision support.
There is still a large gap in the literature regarding the impact of intra-urban localisation decisions on the lifecycle of technological startups. Tracking location-dependent survival rates and exploring the impact of cluster dynamics on innovation activity are only first ideas for future research. The role of this paper was to shed a first light on the intra-urban organisation of technological startups and to link it to the growing literature on the attenuation of agglomeration externalities. Patterns uncovered in this paper pave the way for new micro-geographical research on the startup location to come.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The input preparation procedure used for the model in part 4.3 is shown in Fig.
Fig. 13
Visualisation of the input data preparation for the neural network model (part 4.3).
Source Own work in diagrams.net software
13. The steps are visualised here for a simplified example of a square “city space” for a given year T in which 10 startups are assumed (N = 10). The point pattern of startups founded in year T (step 1) is overlapped with the 1 km x 1 km census grid (step 2). Then, companies within a given grid cell are then counted (step 3). In the next step, each count is divided by N, i.e. the number of startups founded in a given year T (step 4). The fractions calculated in the previous step are rescaled to range 0–1 to keep a consistent scale of the input, which is required by neural networks (step 5). The rescaling is performed for each yearly sample separately. Then, for the second model structure, spatial averages from the rescaled fractions are calculated (step 6). A neighbourhood is defined here as a complete set of adjacent cells (following the structure of the queen’s contiguity matrix).
×
The data preparation for the output variable of the neural network model (labels whether there is a cluster in a cell or not) is shown in Fig.
Fig. 14
Visualisation of the output data preparation for the neural network model (part 4.3).
Source Own work in diagrams.net software
14. The results from Sect. 4.2 are used here: On the point data of startups founded in a year T (step 1) DBSCAN algorithm is run (step 2). The results from DBSCAN are then overlapped with the 1 km x 1 km grid (step 3). In the last step, a binary variable is created from the DBSCAN results. If, among the points belonging to a given grid cell, at least one was considered to be a part of a density-based cluster, the variable will take the value 1. Otherwise, the binary variable will take the value 0.
×
The model was fed with 4-year data sequences, with the first three years used as input and the last year's data forming the basis of the prediction. The detailed structure is presented in Fig.
Final structure of the second model specification (part 4.3—model with two variables).
Source Own work in diagrams.net software
17. The number of neurons in each layer, dropout ratios, learning ratios and optimisers were chosen based on the hyperparametric search. To get the best-tuned specification, the general model structure was run with different combinations of parameters. After each run, the loss value (binary cross-entropy) was calculated on the validation sample and saved in the output file. Having a full set of results from different models, the specification with the best score (minimum binary cross-entropy) was selected.
GPS coordinates with six decimal places allow for such precision. In our case we may additionally consider possible errors in the assignment of a particular address to a given geocoded location, however in the urban areas it is rarely the case (Davis & Fonseca 2007; Goldberg & Wilson 2007).
Eps in this paper is measured in degrees (as the longitude-latitude coordinates are). Around 52.25N, a 0.001-degree distance is equal to 68 m (Morse 2008). In this case, the distance for eps = 0.0035 is approximately 238 m, while eps = 0.006 is equal to approximately 408 m. Because there is no theory supporting the choice of appropriate values for eps and minPts (Lai et al. 2019), values for both parameters were initially defined using a knee plot and then tuned to the sample following the adaptive approach (Sawant 2014).
Hyperparameter search is a procedure that allows the testing of competing model structures in ML (models with different parameters, different numbers of neurons, etc.). A set of different model specifications with permutations of selected parameters is run and their quality measures are compared. The model with the highest quality is selected as the final model.