1 Introduction
-
Estimate population density according to dimensions such as migration background, age group, gender, and socio-economics.
-
Dynamically determine the number of clusters over time according to dimensions such as migration background, age group, gender, and those reflecting the socio-economic conditions.
-
Identify structural as well as intra-cluster changes over time.
-
Determine the variables that are over- or under-represented for each cluster in a given year.
2 Literature review
Work | Migration background | Age-group | Gender | Socio-economic | Digital |
---|---|---|---|---|---|
[1] | ✓ | ✗ | ✗ | ✗ | ✗ |
[3] | ✓ | ✗ | ✗ | ✗ | ✗ |
[4] | ✗ | ✓ | ✗ | ✗ | ✗ |
[5] | ✓ | ✗ | ✗ | ✗ | ✗ |
[6] | ✓ | ✓ | ✗ | ✗ | ✗ |
[7] | ✓ | ✓ | ✗ | ✗ | ✗ |
[8] | ✗ | ✓ | ✓ | ✗ | ✗ |
[9] | ✓ | ✗ | ✗ | ✗ | ✗ |
[2] | ✓ | ✗ | ✗ | ✓ | ✗ |
[10] | ✓ | ✗ | ✗ | ✓ | ✗ |
[11] | ✓ | ✗ | ✗ | ✗ | ✗ |
[12] | ✓ | ✓ | ✗ | ✓ | ✗ |
[13] | ✓ | ✗ | ✗ | ✓ | ✗ |
[14] | ✓ | ✗ | ✗ | ✓ | ✗ |
[15] | ✗ | ✓ | ✓ | ✓ | ✗ |
[16] | ✓ | ✗ | ✗ | ✗ | ✗ |
[17] | ✓ | ✗ | ✗ | ✓ | ✗ |
[18] | ✓ | ✗ | ✗ | ✓ | ✗ |
[19] | ✓ | ✗ | ✗ | ✗ | ✗ |
[20] | ✓ | ✓ | ✓ | ✓ | ✗ |
[21] | ✓ | ✓ | ✗ | ✗ | ✗ |
[22] | ✓ | ✗ | ✗ | ✗ | ✗ |
[23] | ✓ | ✓ | ✓ | ✓ | ✗ |
[24] | ✓ | ✓ | ✗ | ✓ | ✓ |
[25] | ✓ | ✗ | ✗ | ✗ | ✗ |
[26] | ✓ | ✗ | ✗ | ✗ | ✗ |
[27] | ✓ | ✗ | ✗ | ✓ | ✗ |
[28] | ✓ | ✗ | ✗ | ✗ | ✗ |
[29] | ✓ | ✓ | ✓ | ✓ | ✗ |
-
Although Helbig’s [23] contribution does not consider the demographic spatial dimension of residential areas (i.e. estimates of population density across residential areas), his work emphasizes the temporal dimension of residential segregation.
-
The contribution of Marcińczak [28], who conducted a cluster analysis of residential segregation in Berlin, used hierarchical cluster analysis to examine several years of demographic data. However, no dynamic analysis of cluster formation in Berlin was carried out, due to the nature of the clustering method used, which is static. Furthermore, this work is guided by a predefined interpretation of the clusters.
-
Kurtenbach’s key study [24], which explored digital segregation in Berlin using data from a social media service designed to organise community life in neighbourhoods.
-
Finally, innovative techniques for estimating population densities in residential areas in Berlin have been developed. For example, Groß [21] has developed methods for estimating anonymized spatial densities at a higher resolution. Building on this work, Masías et al. [29] have used non-negative matrix factorization to study different facets of residential segregation.
3 Methods
3.1 Methodological approach
3.1.1 Data source
3.1.2 Multivariate kernel density estimation in the presence of measurement error
-
\(|\cdot |\) denotes the determinant.
-
\(K(\cdot )\) is the kernel, a symmetric multivariate density function. This function assigns weights to the observed data points based on their distance from the point where we want to estimate the density. We use the standard multivariate normal kernel, i.e., \(K(x)=(2\pi )^{-\frac{d}{2}}e^{-\frac{1}{2}x^{T}H^{-1}x}\).
-
H is the bandwidth \(d\times d\) 1 matrix, characterized by being symmetric and positive definite. It controls the window size in each dimension over which the kernel function operates. A small bandwidth will result in a density estimate that is very sensitive to the data (potentially too sensitive, resulting in over-fitting). In contrast, a large bandwidth may smooth out important features of the data (under-fitting). Therefore, the choice of H is critically important for the accuracy of the kernel density estimations. There exists a lot of discussion in the literature about the selection of the bandwidth matrix. Here, we use the approach of Wand and Jones, as it is done in [21].
3.1.3 Dynamic fuzzy c-means
-
1. Run the fuzzy c-means algorithm using the initial data set.
-
2. Receive new data and merge it with the current data.
-
3. Look for relevant changes in the structure of clusters.
-
4. If relevant changes exist, update the structure of clusters.
-
5. Repeat until no new data arrive.
-
pair-wise distance \(d(\mathbf{v}_{j},\mathbf{v}_{k})\) between each pair of the current centers \(\mathbf{v}_{j}\) and \(\mathbf{v}_{k}\), for all \(j,k = 1,2, \dots ,c_{t}\).
-
distance \(d(\mathbf{x}_{i},\mathbf{v}_{j})\) between the new data point \(\mathbf{x}_{i}\in X_{t}\) and the current centers \(\mathbf{v}_{j}\), for all \(i = 1,\dots ,n_{t}\) and \(j = 1,2,\dots ,c_{t}\).
-
the membership degree \(\hat{\mu}_{i,j}\) of the new object \(\mathbf{x}_{i}\in X_{t}\) to the cluster j, for all \(i = 1,\dots ,n_{t}\) and \(j = 1,2,\dots ,c_{t}\).
3.1.4 Cluster interpretation
4 Results
4.1 Results of the multivariate kernel density analysis
4.2 Results based on the dynamic fuzzy c-means
4.2.1 Cluster validation
4.2.2 Clustering results
-
If the v-test value is greater than 1.96, a variable is considered to be over-represented in a given cluster.
-
If the v-test value is less than −1.96, a variable is considered to be underrepresented.
-
If the v-test value is between −1.96 and 1.96, then a variable is not considered significant.
Cluster 7
. From a qualitative point of view, it can be seen that the change in the cluster structure occurred in the same year as the so-called European migration crisis.Cluster 0
is characterized by all migrant-related variables being underrepresented (see, Fig. 4a). In the year 2009, the three most under-represented variables correspond to Germans without a background of migration (MIC = 7.165; v-test = −96.753; p = 0.000), Poland (MIC = 2.181; v-test = −96.19; p = 0.000) and other subpopulations, while for the year 2020, the most underrepresented variables correspond to Poland (MIC = 2.052; v-test = −313.965; p = 0.000), Germans with no migration history (MIC = 6.301; v-test = −311.816; p = 0.000) and Syria (MIC = 1.702; v-test = −283.379; p = 0.000).
Cluster 1
is also characterized by the under-representation of all variables related to the migrant background (see Fig. 4b). In 2009, the three most underrepresented variables corresponded to other minorities (MIC = 6.332; v-test = −30.513; p = 0.000), Syrians (MIC = 5.136; v-test = −29.968 p = 0.000) and Ukrainian subpopulations (MIC = 5.981, v-test = −29.255, p = 0.000). The least underrepresented are the USA (MIC = 12.166; v-test = −3.303; p = 0.001), Iran (MIC = 9.331; v-test = −10.596; p = 0.000) and China (MIC = 6.855; v-test = −16.39; p = 0.000). For 2020, the most underrepresented variables are other minorities category (MIC = 5.132; v-test = −129.563; p = 0.000), Poland (MIC = 6.504; v-test = −122.126; p = 0.000), and Italy (MIC = 4.154; v-test = −121.158; p = 0.000).Cluster 2
, all variables are overrepresented, except for Kazakhstan, which ranks last and is underrepresented for all measured years (see Fig. 4c). The most overrepresented subpopulations in 2009 are Iran (MIC = 76.504; v-test = 123.307; p = 0.000), Ukraine (MIC = 56.573; v-test = 104.875; p = 0.000), China (MIC = 54.082; v-test = 84.846; p = 0.000), USA (MIC = 60.44; v-test = 84.445; p = 0.000) and Austria (MIC = 51.335; v-test = 83.014; p = 0.000), and other subpopulations. Similarly, the most overrepresented variables for the year 2020 correspond to those of Iran (MIC = 74.076; v-test = 394.988; p = 0.000), Ukraine (MIC = 62.733; v-test = 365.308; p = 0.000), China (MIC = 56.408; v-test = 291.015; p = 0.000), Greece (MIC = 54.289; v-test = 243.657; p = 0.000) and Austria (MIC = 52.133; v-test = 241.398; p = 0.000), among other variables that characterize this cluster.Cluster 3
, as all variables are statistically significant, all variables characterize this cluster (see Fig. 4d). In 2009, all the variables of the migratory background were over-represented, as in the case of Poland (MIC = 35.064; v-test = 73.191; p = 0.000), Croatia (MIC = 31.789; v-test = 56.777; p = 0.000), Syria (MIC = 27.023; v-test = 49.38; p = 0.000), RU (MIC = 32.033; v-test = 49.372; p = 0.000) and Serbia (MIC = 32.033; v-test = 49.372; p = 0.000). From 2015 to 2020, a change was observed as the USA, France, and Spain subpopulations became underrepresented. It is also observed that between 2015 and 2020, the United Kingdom no longer represents this cluster. For the year 2020, it is observed that the subpopulations of Poland (MIC = 32.751; v-test = 251.125; p = 0.000), Croatia (MIC = 27.117; v-test = 171.742; p = 0.000), Serbia (MIC = 25.932; v-test = 144.585; p = 0.000), Syria (MIC = 20.281; v-test = 121.066; p = 0.000), and BA (MIC = 23.835; v-test = 120.809; p = 0.000) are the five most overrepresented subpopulations in this cluster.Cluster 4
are under-represented, although there are a few over-represented variables (see, Fig. 4e). In 2009, the variables Kazagastan (MIC = 19.642; v-test = 21.459; p = 0.000), Germans without a migration background (MIC = 18.297; v-test = 14.523; p = 0.000), and Vietnam (MIC = 16.58; v-test = 14.282; p = 0.000) are over-represented. For the same year, the most underrepresented variables are the subpopulations of France (MIC = 6.218; v-test = −17.707; p = 0.000), Italy (MIC = 6.797; v-test = −17.258; p = 0.000), USA (MIC = 7.334; v-test = −16.322; p = 0.000), Spain (MIC = 5.953; v-test = −16.258; p = 0.000) and UK (MIC = 7.218; v-test = −15.911; p = 0.000), among others. By 2020, the only over-represented sub-population is the German sub-population without a migration background (MIC = 15.596; v-test = 8.687; p = 0.000), while the Spanish (MIC = 5.938; v-test = −60.339; p = 0.000), French (MIC = 5.938; v-test = −58.254; p = 0.000) and Italian subpopulations (MIC = 7.178; v-test = −56.994; p = 0.000) are the most underrepresented.Cluster 5
(see Fig. 4f). The most overrepresented subpopulations in 2009 are Kazakhstan (MIC = 46.609; v-test = 91.192; p = 0.000), Vietnam (MIC = 35.34; v-test = 62.19; p = 0.000) and Germans without a migration background (MIC = 23.93; v-test = 41.097; p = 0.000), among others. In the same year, the most underrepresented subpopulations are the USA (MIC = 8.197; v-test = −13.493; p = 0.000), France (MIC = 8.265; v-test = −11.823; p = 0.000) and Spain (MIC = 8.126; v-test = −10.444; p = 0.000), among other subpopulations. For the year 2020, the three most overrepresented subpopulations are Kazakhstan (MIC = 48.736; v-test = 346.537; p = 0.000), Vietnam (MIC = 45.111; v-test = 269.945; p = 0.000) and RU (MIC = 31.451 v-test = 244.861 p = 0.000), while the USA is the most underrepresented (MIC = 5.524; v-test = −67.792; p = 0.000), followed by France (MIC = 5.78; v-test = −58.401; p = 0.000) and UK (MIC = 6.468; v-test = −55.138; p = 0.000). Finally, the variables of Romania, Croatia, Iran, Bulgaria, and other minorities are not always characteristic of this cluster over time.Cluster 6
was characterized by almost all subpopulations being overrepresented, except for Kazakhstan (MIC = 10.477; v-test = −7.066; p = 0.000), which was the only one underrepresented (see Fig. 4g). In 2020, the three most overrepresented subpopulations were Spain (MIC = 79.323; v-test = 479.471; p = 0.000), France (MIC = 80.279; v-test = 479.408; p = 0.000), and Italy (MIC = 71.934; v-test = 470.824; p = 0.000), along with other subpopulations.cluster 7
in 2015 was revealed by the dynamic cluster analysis. In this cluster, all subpopulations are representative and overrepresented. In 2015, this cluster has Syria (MIC = 30.966; v-test = 142.006; p = 0.000) as the most overrepresented variable, and the second most overrepresented nation is China (MIC = 31.461; v-test = 131.278; p = 0.000) and the third most overrepresented variable is other minorities (MIC = 37.489; v-test = 130-72; p = 0.000). The least overrepresented variable is Kazakhstan (MIC = 13.077; v-test = 7.84; p = 0.000). In the year 2020, this cluster is characterized by China as the most overrepresented variable (MIC = 32.429; v-test = 193.024; p = 0.000). The second most overrepresented variable is Syria (MIC = 28.793; v-test = 185.59; p = 0.000), and the third most overrepresented variable is Croatia (MIC = 32.348; v-test = 185.12; p = 0.000). The least overrepresented variable is again Kazakhstan (MIC = 13.718; v-test = 16.089; p = 0.000).Cluster 3
is located in the city centre, Cluster 2
is located around the city centre, surrounded by Cluster 0
. Finally, Cluster 1
is located on the city’s outskirts.Cluster 0
has only a few overrepresented variables and several underrepresented ones (see Fig. 6a). Analysis using the value test shows that in 2009, subpopulations in Cluster 0
ranging from 80 to 85 are the most overrepresented (MIC = 14.578; v-test = 31.307; p = 0.000), and subpopulations ranging from 30 to 35 are the most underrepresented (MIC = 9.79; v-test = −25.043; p = 0.000), and subpopulations aged 60 to 65 were not significant (MIC = 13.201; v-test = 0.491; p = 0.623). For the year 2020, subpopulations between 85 and 90 years are the most overrepresented (MIC = 13.909; v-test = 104.682; p = 0.000), and subpopulations between 30 and 35 years are the most underrepresented (MIC = 9.327; v-test = −92.725; p = 0.000), showing ageing of the cluster compared to 2009.
Cluster 1
has all variables underrepresented (see Fig. 6b). For the year 2009 in Cluster 1
, the subpopulations between 30 and 35 are the least underrepresented (MIC = 2.804; v-test = −96.097; p = 0.000), and the most underrepresented are the subpopulations between 80 and 85 (MIC = 5.244; v-test = −130.039; p = 0.000). By 2020, the least underrepresented subpopulations in Cluster 1
are those between 30 and 35 (MIC = 2.541; v-test = −334.159; p = 0.000), and the most underrepresented are those between 85 and 90 (MIC = 5.062; v-test = −437.36; p = 0.000).Cluster 2
has all variables overrepresented during 2009 (see Fig. 6c). For the year 2020, the most overrepresented age groups are the 80 to 85-year-olds (MIC = 22.479; v-test = 325.474; p = 0.000), and the least overrepresented groups are the 30 to 35-year-olds (MIC = 23.894; v-test = 161.203; p = 0.000).Cluster 3
, all variables are overrepresented (see Fig. 6d). For the year 2009, the subpopulations from 35 to 40 are the most overrepresented (MIC = 51.23; v-test = 142.501; p = 0.000), and the subpopulations from 85 to 90 are the least overrepresented (MIC = 18.939; v-test = 49.795; p = 0.000), showing that it is a representative cluster of adults. Similarly, in the year 2020, the subpopulations ranging from 35 to 40 are the most overrepresented (MIC = 51.42; v-test = 492.401; p = 0.000), and subpopulations over 90 are the least overrepresented (MIC = 17.28; v-test = 150.838; p = 0.000).Cluster 0
, both variables are overrepresented (see Fig. 8a). For the year 2009, both male (MIC = 14.92; v-test = 139.279; p = 0.000) and female (MIC = 14.92; v-test = 139.279; p = 0.000) subpopulations are equally overrepresented in this cluster. For the year 2020, the male population (MIC = 45.058; v-test = 471.478; p = 0.000) is more overrepresented than the female population (MIC = 461.205; v-test = 461.205; p = 0.000).
Cluster 1
has both variables underrepresented (see, Fig. 8b). Both male and female residential population densities reached the same values in 2009 (MIC = 6.55; v-test = −137.675; p = 0.000). However, for 2020, the male population (MIC = 6.316; v-test = −471.675; p = 0.000) is only slightly more underrepresented than the female population (MIC = 6.233; v-test = −456.146; p = 0.000).Cluster 2
, both variables are overrepresented (see Fig. 8c). Both male and female populations had the same spatial density for 2009 (MIC = 21.696; v-test = 50.164; p = 0.000). However, for 2020, male populations (MIC = 21.66; v-test = 179.541; p = 0.000) are more overrepresented than female populations (MIC = 21.458; v-test = 169.838; p = 0.000).Cluster 3
represents the places with the most significant socio-economic problems. It can be seen that the areas corresponding to clusters 2 and 3 have a larger area in 2009, after the global subprime crisis, and the onset of the COVID-19 pandemic in 2020 these clusters have slightly different shapes.Cluster 0
, both variables are overrepresented and statistically significant (see Fig. 10a). In 2009, SGB III was the most overrepresented (MIC = 13.564; v-test = 15.813; p = 0.000), and SGB II was the least overrepresented (MIC = 12.404; v-test = 6.638; p = 0.000). For the year 2020, SGB III is the most overrepresented (MIC = 12.727; v-test = 68.251; p = 0.000), and SGB II is the least overrepresented (MIC = 12.423; v-test = 54.825; p = 0.000), with a decrease in the former group, and an increase in the latter since 2009.
Cluster 1
, the variables are underrepresented and statistically significant (see Fig. 10b). For 2009, Cluster 1
had SGB II as the least underrepresented socio-economic variable (MIC = 2.388; v-test = −123.634; p = 0.000), and SGB III as the most underrepresented (MIC = 3.107; v-test = −130.744; p = 0.000). Similarly, for 2020, SGB II was the least underrepresented (MIC = 3.012; v-test = −411.076; p = 0.000), and SGB III was the most underrepresented (MIC = 3.227; v-test = −424.48; p = 0.000).Cluster 2
, both variables are overrepresented and statistically significant (see Fig. 10c). For 2009, SGB III was the most overrepresented (MIC = 25.641; v-test = 89.369; p = 0.000), and SGB II was the least overrepresented (MIC = 26.481; v-test = 83.642; p = 0.000). Similarly, in 2020, SGB III was the most overrepresented (MIC = 25.985; v-test = 318.287; p = 0.000), and SGB II was the least overrepresented (MIC = 26.169; v-test = 304.748; p = 0.000).Cluster 3
, both variables are statistically significant and overrepresented (see Fig. 10d). In 2009, SGB II was the most overrepresented (MIC = 55.375; v-test = 125.056; p = 0.000) and SGB III was the least overrepresented (MIC = 46.651; v-test = 113.809; p = 0.000). The same situation occurred in 2020, where SGB II was the most overrepresented (MIC = 54.557; v-test = 419.17; p = 0.000), and SGB III was the least overrepresented (MIC = 49.921; v-test = 396.516; p = 0.000). The map of Berlin is shown in Fig. 11, and Fig. 3d shows the normalized size of clusters over time.
5 Discussion and conclusion
5.1 Comparison with previous research
Author | Cohort | Method | Dimension explored | Main findings |
---|---|---|---|---|
Yamamoto [5] | {1973, 1975, 1990} | Plotting segregation indexes (location quotient) | Ethnic segregation of Turkish inhabitants in Berlin West (7 color-mapped areas) | In 1973, more than half of all Turks in West Berlin lived in Kreuzberg and Wedding. The research reports that in 1975 Turks were the most segregated compared to Germans, Italians, Greeks, Yugoslavs and other groups. By 1990, segregation between Turks and Germans had largely decreased. |
Nakagawa [49] | {1965, 1970, 1975, 1980, 1985} | Hierarchical cluster analysis | Age-group segregation (the area was divided in a two concentric zone model of West Berlin) | The age groups 0-19 and 35 and over are more densely distributed in more densely in Outer Berlin than in Inner Berlin, and the age groups 20 to 34 tend to be more densely distributed in Inner Berlin. |
Kemper [7] | {1991, 1995} | Plotting segregation indexes | • Ethnic segregation (2 zones, West and East, classified in a total of 6 colored areas) • Age-group segregation (Comparison of West and East Berlin, classified in a total of 7 colored areas) | The study notes that age segregation was more pronounced in East Berlin before unification, while socio-economic segregation was more pronounced in West Berlin. After unification, there was a decrease in the age group of children under 6 in the former East Berlin. Also, segregation rates of the foreign population decreased in both former West and East Berlin. |
Kröhnert and Vollmer [15] | {1992, 1994, 1996, 1998, 2000, 2002, 2004} | Cluster analysis | • Gender segregation (5 clusters, at country level) | Berlin is part of a cluster of German geographical areas segregated by gender in which “the sex ratio is above average (…) and the share of students is the second highest among all clusters. The cities have strong service and tourism sectors. Unemployment among young people is low. The proportion of people employed in service sectors is among the highest of all clusters” [15, p. 9]. |
Blokland and Vief [27] | {2007, 2012, 2016} | Plotting segregation indexes (location quotient) | • Ethnic segregation (5 color-mapped areas) • Socio-economic (5 color-mapped areas) | Ethnic indicators: • Foreigners (strong decrease) • Persons with migration background (fair decrease) • Migration background: Turkey and Arabic states (strong decrease) • Migration background: European Union (stable) Socio-economic indicators: • Unemployed persons (stable) • Long-term unemployed persons (stable) • Non-unemployed persons receiving state subsidies (slight increase) • Child poverty (slight increase) |
Marcińczak and Bernt [28] | {2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019} | Regression trees; Hierarchical cluster analysis | • Ethnic segregation (7 clusters) | This research found the following clusters: • Rising pluralist enclaves • Non-isolated host communities I • Stable pluralist areas • Non-isolated host communities II • Established and increasingly pluralist areas • Stable non-isolated host communities • Persistent host communities |
Masías et al. [29] | {2020} | Multivariate Kernel Density Estimation; Non-Negative Matrix Factorization. Maps are provided for each dimension. | • Ethnic segregation (4 clusters) • Age-group segregation (3 clusters) • Socio-economic segregation (3 clusters) | Using a data science approach, it was possible to reveal highly interpretable patterns in the data, confirming the existence of the phenomena of ethnic segregation, age-group segregation and socio-economic segregation. |
Present work | {2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020} | Multivariate Kernel Density Estimation; Dynamic Fuzzy C-Means; Maps and Bump charts are provided for each dimension | • Ethnic segregation (Changes from 7 to 8 clusters) • Age-group segregation (4 clusters) • Socio-economic segregation (3 clusters) • Gender Segregation (3 clusters) | Macro dynamics • The identification of a new cluster was determined. Microdynamics Migration background: • Cluster 0: Lebanon and Turkey become since 2010 the most overrepresented in this cluster • Cluster 1: The most overpreserented is USA • Cluster 2: subpopulation from Iran is the most overrepresented • Cluster 3: subpopulation from Poland is the most overrepresented • Cluster 4: Only Germans are overreprested and the rest groups become underrepresented • Cluster 5: Kazakhstan, Vietnam, former Soviet Republic, among others, are over-represented. • Cluster 6: Spain, France and Italy, among others, remain overrepresented over time. • Cluster 7: subpopulations with migratory backgrounds from Syria and China become the most overrepresent sup-population in the emergent cluster. Age-group segregation: • Cluster 0 only over-represents subpopulations the 65 and 90 subpopulations. • Cluster 1 over-represents young adults, adolescents and teenagers • Cluster 2 over-represents all groups, especially those aged 65-90 • Cluster 3 over-represents young adults and children. Gender segregation: • There is no residential segregation by gender. The clusters appear to mirror changes in population density across the city. Socio-economic segregation: • The clusters can identify areas where there is a higher density of people applying for unemployment benefits. Qualitatively, it can also be observed that there was a change in the distribution of these residential densities across the city in 2009 and 2020. |
5.1.1 Macro dynamics
5.1.2 Micro dynamics
-
Concerning ethnic residential segregation: In terms of micro-dynamic changes, the proposed method allows us to study the changes within each cluster. The richness of the results allows us to observe the overrepresented subpopulations in each cluster and the changes in the classification of each cluster, allowing us to observe the dynamics over time. The results are consistent. They continue to show the results of the now long-past migration waves of “temporary” guest workers (i.e. the so-called Gastarbeiter) from Turkey and Lebanon. However, it is only in the present work that we can observe the positioning of the Syrian and Chinese migrant subpopulations as the most over-represented subpopulations as part of Cluster 7. The fact that both Syrian refugees and asylum seekers from China are known reality of recent immigration to Berlin. For example, Kate Martyr [48], an editor and video producer at DW’s Asia desk, reports on the surge in asylum applications from China to Germany, particularly from the oppressed Uighur minority. Finally, we observe the increase or decrease of the spatial areas occupied by the clusters in Berlin as the normalized cluster size changes, which was noticeable in 2015 due to the structural change of the clusters.
-
Concerning age-group segregation: In general, the bump charts show slight changes in the ranking of the categories of variables describing the phenomenon of age segregation. Age segregation is a demographic phenomenon characterized in detail by Yamamoto, Kemper, and Nakagawa, who used data available before and after the fall of the Berlin Wall. Nakagawa found two clusters in West Berlin, characterized by higher adult densities in outer Berlin compared to populations in inner Berlin. Kemper compared East and West Berlin before and after reunification and found different degrees of segregation in these two areas. Finally, Masías et al. [29] find four clusters with different age group distributions in the city.Through the application of dynamic analysis, our study confirms the existence of age group segregation phenomena, which is materialised in the four clusters we have found. The maps we present do not show idealized concentric zones, as suggested by earlier studies such as Nakagawa’s, but more complex-shaped clusters that can be observed visually. We find that older people are concentrated in the peripheral areas of Berlin, spatially surrounding the other groups within the city, as seen in the maps provided. We also identified areas where young adults are found and clusters where children are over-represented. We also observe that the standardized size of the clusters does not change significantly over time, which can be interpreted to mean that the spatial areas these clusters occupy in space remain relatively stable. This is highly consistent with the observation of Nakagawa, who stated that “residential segregation by age group is a very real phenomenon” [49, p. 134]. In our results, we show with greater detail that the phenomenon of residential age segregation is present in Berlin.
-
On socio-economic residential segregation: We observed that the ranking of the variables remained stable, i.e. in the same ranking position in all clusters during the years studied. Compared to previous research, some similarities can be observed in the locations with the highest rate of people claiming state subsidies (see the maps published by Blokland [27], Fig. 13.3 in p. 257). Finally, we would like to report that we have observed a qualitative change that can be seen in the 2020 map, where cluster areas take on new shapes. The results of the method show that, at least visually, there are socio-economically disadvantaged areas that only expanded in the years 2009 and 2020, which is reflected in the size of the clusters. We should bear in mind that 2009 was part of the subprime financial crisis and in 2020 the economy was under the stress of the COVID-19 outbreak. We believe that the change in cluster shapes may be related to the event of the global COVID-19 pandemic when many individuals in Berlin started to apply for social welfare. However, more research is needed to link this qualitative observation to a cause-effect relationship.
-
On residential segregation by gender: We found changes in the variables describing population densities by gender. The data analysis shows 3 clusters representing different densities of male and female individuals. However, we observe cluster densities that reflect a slight imbalance between females and males. Finally, the normalized cluster size does not vary significantly over time, which means that the spatial areas of the clusters have neither shrunk nor expanded spatially throughout observation.Kröhnert and Vollmer [15] have argued that women from rural areas in Germany migrate to large cities such as Berlin more than men who remain in rural areas. Under this hypothesis, one might expect the possible emergence of clusters in which groups of internal migrants of women form clusters reflecting this phenomenon, which is still unknown to us. However, the variation in high, medium and low population density described by the clusters seems to reflect the variation in population density as a whole. Some changes are numerically small, but qualitatively significant for monitoring the expansion of gender residential segregation observed in other geographical regions (e.g. for examining population sex ratios in China and Saudi Arabia over time and space). Perhaps because the sex ratios in Germany are mostly balanced, the phenomenon can be observed when comparing rural areas with urban areas or between eastern and western Germany, i.e. when looking at data at the country level.