Introduction

Landslides are events that pose a major hazard for human activities and that often cause substantial economic losses and property damages (Hong et al. 2007; Nadim et al. 2006). Landslides, in a strict sense, are the movement of a mass of rock, debris or soil along a downward slope, due to gravitational pull. A variety of movements is associated with landslides, such as flowing, sliding (translational and rotational), toppling or falling. Many landslides exhibit a combination of two or more types of movements, resulting in a complex type (Varnes 1984). They are triggered by a number of external factors, such as intense rainfall, earthquake shaking, water level change, storm waves, rapid stream erosion etc. (Dai et al. 2002). In addition, extensive human interference in hill slope areas for construction of roads, urban expansion along the hill slopes, deforestation, and rapid change in land use contribute to instability. This makes it difficult, if not impossible, to define a single methodology to identify and map landslides, to ascertain landslide hazards, and to evaluate the associated risk (Guzzetti et al. 2005). It thus necessitates a detailed understanding of the physical process of landslides, including historical information on their occurrence. Growing environmental concern in recent years has resulted in a range of quantitative landslide hazard and risk assessment studies (Alexander 2008; Carrara and Pike 2008). The assessment of landslide hazard has become an important assignment for various interest groups comprising technocrats, planners and others, mainly due to an increased awareness of the socio-economic significance of landslides (Devoli et al. 2007). So far, a number of methods has been proposed for quantitative landslide spatial probability mapping, e.g. discriminant analysis (Baeza and Corominas 2001), likelihood ratio (Chung and Fabbri 2003), ANN (Kanungo et al. 2006) and logistic regression (Das et al. 2010; Lee and Pradhan 2007). However, actual methodological developments for quantitative hazard analysis have been scarce, particularly in medium scales (Guzzetti et al. 1999; Van Westen et al. 2006). Except for a limited number of studies (Guzzetti et al. 2005; Hong et al. 2007; Zezere et al. 2004), most of the methods proposed as landslide hazard modelling can best be classified as susceptibility models, as they only provide the estimate of where landslides are expected (Guzzetti et al. 2005).

Varnes (1984) was the first to propose the definition of landslide hazard as ‘the probability of occurrence within a specified period of time and within a given area of a potentially damaging phenomenon’. This definition includes two parameters: the geographical locations (where) and the recurrence between events (when) of the landslides. Later the magnitude of the event was added to the definition of landslide hazard by Aleotti and Chowdhury (1999) and Guzzetti et al. (1999). Quantifying landslide hazard thus necessitates the determination of magnitude probability, along with spatial probability (susceptibility) and temporal probability. We notice that estimating where and when landslides will occur is comparatively straightforward, whereas estimating magnitude is difficult. This is because, unlike other natural hazards such as floods, cyclones and earthquakes, which are controlled by rainfall, wind speed and ground motion respectively, landslides lack a spatially continuous magnitude measurement parameter. It may be argued that landslide magnitude is a function of the momentum, which includes both mass (volume and density) of the landslide material and the expected velocity. However, volume and velocity are difficult to evaluate in a medium scale for large areas. Nevertheless, landslide area can be precisely determined from a multi-temporal inventory map (Guzzetti et al. 2005). Therefore, landslide area can act as a good approximation for landslide magnitude.

Landslide susceptibility mapping aims to differentiate a land surface into homogeneous areas according to their probability of failure caused by mass movements (Varnes 1984). To achieve this objective at medium scales, terrain mapping units (TMU) are generated to evaluate the suitability of landslide occurrence in an area on the basis of such homogeneous conditions (Carrara et al. 1995; Pasuto and Soldati 1999; Soeters and van Westen 1996; Carrara et al. 1991). Furthermore, the use of mapping units is also common in landslide hazard assessment purposes. These are, in principle, homogeneous internally and heterogeneous externally. With increasing sophistication of GIS, TMU are either derived automatically from a combination of geo-environmental factors or semi-automatically using expert knowledge (van Westen et al. 1997). As they are generated independently, i.e. without incorporation of landslide occurrences, TMU fall short of representing actual homogenous susceptible areas. Instead, they can at best represent homogenous terrain conditions with respect to certain geo-environmental factors that control landslides. Furthermore, combining multiple geo-environmental factors to generate TMU can result in an uncontrolled number of units. Therefore, segmentation-based homogenous susceptible units (HSU), which can be generated automatically from a susceptibility map using a region-growing algorithm, may be considered as an alternative to TMU. The HSU can address the inherent homogeneity conditions of factors with respect to landslides, and can suitably replace the TMU for hazard assessment.

The aim of this study is to develop and apply a quantitative methodology for landslide hazard assessment using HSU. We derive the HSU automatically from a grid-based susceptibility map using a region-growing algorithm and an optimal size factor. The temporal and size probabilities are multiplied with spatial probability to obtain a quantitative estimate of landslide hazard for each HSU. We test the methodology using a multi-temporal landslide inventory in a national highway corridor in the Himalayan region.

Methods

A probabilistic landslide hazard assessment procedure demands the determination of three distinct components of hazard assessment, namely spatial, temporal and size probability (Guzzetti et al. 2005). The spatial probability is generally calculated by considering landslide locations and their spatial interaction with geo-environmental factors. This is a measure of spatial locations where landslides may occur in the future. Similarly, temporal probability expresses the frequency of occurrence of landslides in a given period. The size probability of landslides is determined from a sufficiently complete landslide record, and indicates the probability of particular size of landslide to occur. Mathematically, if the probability of the size of a landslide is denoted by \( {P_{{A_L}}} \)and the probability of occurrence of a landslide in a period t by P t , in a HSU at a spatial location i with probability P i , then the joint probability of landslide hazard (H) can be represented as

$$ H = {P_i} \times {P_t} \times {P_{{A_L}}} $$
(1)

Spatial probability (susceptibility) modelling and generation of HSU

The spatial probability of landslide occurrence can be modelled as the probability that a particular area will be affected by landslides, given a set of environmental conditions. In statistical techniques, such as logistic regression models, the occurrence of landslides is considered as a discrete and dichotomous response variable, and the geo-environmental factors that influence it as explanatory variables; for more details see Das et al. (2010). A logistic regression in a Bayesian framework uses three key components, the prior distribution, the likelihood function and the posterior distribution, for estimating the regression parameters. The Bayesian logistic regression model used for the calculation of regression coefficients takes the following form:

$$ \begin{array}{*{20}{c}} {{P_i} = P({y_i} = 1)} \\{{y_i}\sim {\hbox{Bernoulli}}({\hbox{logi}}{{\hbox{t}}^{ - 1}}({\eta_i}))} \\{{\eta_i} = {\beta_0} + \sum\nolimits_{j = 1}^k {{\beta_j}{x_{ij}}} } \\{{\beta_j}\sim N(0,0.00001),j = 0, \ldots, k} \\\end{array} $$
(2)

where y i represents the response variable, the β j ’s are coefficients having independent normal prior distributions with a very high variance, x ij represents the value of the jth variable at ith location and η i is the linear predictor.

Using the Bayes formula, the posterior distribution of the parameters β under this model is given by:

$$ \pi \left( {\beta |y,X} \right)\, \propto \,\prod\limits_{j = 0}^k {\Pr \left( {{\beta_j}} \right)} \times \prod\limits_{i = 1}^n {\Pr \left( {{y_i}|{\eta_i}} \right)} $$
(3)

where, β = (β 0,β 1,…,β k ), y = (y 1,y 2,…,y n ), and \( X = \left[ {{x_{ij}}} \right],i = 1,2, \ldots, n,j = 1,2, \ldots, k \).

This is an extension of the Bayesian formula \( f\left( {\theta |y} \right)\,\,\alpha \,g(\theta ) \times L\left( {y|\theta } \right) \), which relates the posterior distribution as proportional to the product of the prior distribution and the likelihood function. The β j ’s are the mean of the parameter posterior estimates representing the regression coefficients as in case of an ordinary logistic regression for each variable. The analysis was carried out in WinBUGS programme 3.0.3. The GLM programme with the logit link function in WinBUGS was used for the Bayesian logistic regression analysis of the data. To assess the prediction rate of the model a receiver operator characteristic (ROC) curve analysis is carried out. This is a representation of the trade-off between sensitivity and specificity (Gorsevski et al. 2000). Sensitivity is the probability that a landslide cell is correctly classified, a true positive rate, whereas 1-specificity is the false-positive rate.

Generation of HSU through segmentation

Segmentation is a process of dividing a raster image/map into objects or regions based on the homogeneity conditions of the adjacent pixels. It can be done in different ways, using various techniques such as density slicing, region growing and split and merge (Kerle and de Leeuw 2009). We carried out multiresolution segmentation using region-growing algorithm in Definiens Developer, which is guided through the use of scale and shape parameters (Definiens 2009). To divide a susceptibility map into an optimal number of HSU is a challenge, since such optimisation should satisfy homogeneity conditions that are in practice highly variable. In a strict sense, true homogeneity in nature is almost impossible. Multiresolution segmentation is an option that generates segments of different size, and where the user has the option to choose optimal segment size (Baatz and Schape 2000). To choose such an optimal segment size objectively, Espindola et al. (2006) proposed an objective function for measuring the quality of the resulting segments. Therefore, we created segments/objects of different scale parameters, with thresholds ranging from 10 to 50. The scale parameter is a function used to control the maximum allowed heterogeneity of the objects in generating segments, resulting in a higher number of segments for lower scale. To assess the quality of segments and decide upon the optimal segment size, an independency test can be carried out using Moran’s I autocorrelation matrix and intrasegment variance analysis. The function aims at maximizing intrasegment homogeneity and intersegment heterogeneity.

The intrasegment homogeneity is calculated by a weighted average variance formula:

$$ v = \frac{{\sum\limits_{i = 1}^n {{a_i}.{v_i}} }}{{\sum\limits_{i = 1}^n {{v_i}} }} $$
(4)

where v i is the variance and a i is the area of the region i. The intrasegment variance v is a weighted average, where the weights are the areas of each region.

To calculate the intersegment heterogeneity, Moran’s I autocorrelation index was used to calculate the spatial autocorrelation of a segment with adjacent segments. For each region, the algorithm calculates its mean grey value and the relationship with adjacent regions. Moran’s I can be expressed as:

$$ I = \frac{N}{{\sum\limits_i {\sum\limits_j {{w_{ij}}} } }}\frac{{\sum\limits_i {\sum\limits_j {{w_{ij}}({X_i} - \overline X )({X_j} - \overline X )} } }}{{\sum\limits_i {{{({X_i} - \overline X )}^2}} }} $$
(5)

where N is the total number of regions, w ij is the measure of spatial adjacency between segments i and j, X i , X j are the index values of the segments i and j. Therefore, Moran’s I represent how, on average, the mean value of each region differs from the mean values of its neighbours. The objective function thus combines the variance measure and autocorrelation measure using a normalisation procedure (Espindola et al. 2006):

$$ F(v,I) = F(v) + F(I) $$
(6)

Temporal probability of landslides

Provided no significant changes occurred to a natural system, the past is the key to the present. Historical landslide inventories in a time series can give insight into hidden trends in the probability scale for the occurrence of a hazardous event in a particular time frame. Landslides, being highly discrete, can be considered as independent random point events that occur in time. A Poisson model is commonly used to investigate the occurrence of naturally occurring random point events in time (Corner and Hill 1995; Crovelli 2000; Coe et al. 2004). Considering landslides as such, this model has been used to determine the exceedance probability of landslides in time (Coe et al. 2004). Assuming landslide frequency to follow a Poisson model, the probability of experiencing N landslides during time t is given by:

$$ P\left[ {N(t) = n} \right] = {e^{( - \lambda t)}}\frac{{{{(\lambda t)}^n}}}{{n!}}\quad n = 0,1,2, \ldots . $$
(7)

Where

N :

is the total number of landslides that occur during a time t

λ :

average rate of occurrence of landslides

Here, time t is specified, whereas the rate λ is to be estimated from empirical records. In fact, λ can be estimated from a historical catalogue of landslide events, or from a multi-temporal landslide inventory.

Hence, Guzzetti et al. (2005) derived the probability of experiencing one or more landslides during time t (i.e. the exceedance probability) as:

$$ {P_t} = \left[ {N(t) \geqslant 1} \right] = 1 - \exp ( - \lambda t) $$
(8)

In our study, the temporal probability calculation was done for each individual HSU on the basis of the frequency of occurrences of landslides for the 28 years period (1982–2009) from the landslide inventory. For each HSU, we obtained landslide recurrence values, i.e. the expected time between successive failures, based on past events.

Size probability of landslides

The probability that a landslide of a given size occurs can be estimated using frequency–area relationships. Recently, several studies have been carried out to determine the probability of landslide magnitude (area or volume) using frequency–area or frequency–volume statistics of landslides (Malamud et al. 2004). The probability of the landslide size, in terms of it affecting an area greater or equal than a given size, can be modelled using probability density functions, as suggested by Malamud et al. (2004). They showed that the mean area of landslides triggered by an event is approximately independent of the event size. Guzzetti et al. (2005) used the same method for a multi-temporal inventory map covering 45 years of landslide data to calculate the probability of landslide size. For this study, a similar method was used for estimating the probability of landslides area in each class, by considering 28 years of historical landslides record. This is expressed as:

$$ p({A_L}) = \frac{1}{{{N_{LT}}}}\frac{{\delta {N_L}}}{{\delta {A_L}}} $$
(9)

where p(A L ) is the probability density of landslide area, δN L is the number of landslides, with area ranging between A L and δA L , and N LT is the total number of landslides. A scatter plot with landslide area in square kilometres on the x-axis and probability density on the y-axis gives an empirical estimate for a probability distribution on the basis of an existing dataset. The probability density function of the landslide area has a strong correlation with a power-law distribution of type:

$$ p({A_L}) = k{({A_L})^{ - \beta }} $$
(10)

where k and β are constant and β is the power-law scaling exponent. Using Eq. 10, the probability that a landslide has an area exceeding a L , i.e. \( {P_{{A_L}}} = P\left[ {{A_L} \geqslant {a_L}} \right] \), is given by:

$$ {P_{{A_L}}} = \int\limits_{{A_L}}^\infty {p({A_L})\delta {A_L}} = \int\limits_{{A_L}}^\infty {\left[ {k{{({A_L})}^{ - \beta }}} \right]} \delta {A_L} $$
(11)

Use of Eq. 9 requires the catalogue of landslide areas from which the distributions are derived to be statistically substantially complete.

Study area and landslide characterisation

The study area lies between 30°47′29″ and 30°54′45″ N latitude and 78°37′41″ and 78°44′03″ E longitude in the northern Himalayas, India in the catchment of the river Bhagirathi, a tributary of the river Ganges (Fig. 1). The area is traversed by a national highway corridor leading to the famous Gangotri shrine of India in the interior Himalaya (Agarwal and Kumar 1973). The study area of a 12-km long road corridor with a total area of 8.88 km2 is selected judiciously with corroboration that any landslide that occurs in the area affects the road. The area experiences a subtropical temperate climate throughout the year because of its high altitude. Average temperature ranges between 300C in summers and below 5°C in winters with December and January being the coldest months with occasional snow fall. Elevation in the area ranges between 1,550 and 2,100 m with a high relative relief, average elevation of the area is around 1,900 m.

Fig. 1
figure 1

Location and extent of the study area lies between 30°47′29″ N and 30°54′45″ N latitude and 78°37′41″ E and 78°44′03″ E longitude as depicted on a hill shade image generated using a DTM derived from a Cartosat-1 satellite image. The highway runs along the Bhagirathi River in the Himalayas, India. Heights of the places are measured in metres above mean sea level using DGPS survey

The last three decades of rainfall information between 1982 and 2009 show that the highest (1,900 mm) and lowest (600 mm) annual rainfall occurred in years 2003 and 1991, respectively, with an annual average of approximately 1,200 mm.(Vinod Kumar et al. 2008). The area receives heavy precipitation during the summer months starting from mid of June to mid of October and moderate rainfall during the winter months from January to March (Fig. 2). In the Himalayan region, landslides are recurring annually and are prominent during the summer months between June and October when the seasonal monsoon occurs. Landslides in this area are the result of a combination of geotectonics, adverse natural topography, such as steep slopes, weathered rocks and soils, human influences on the topography and high rainfall (Choubey and Ramola 1997; Saha et al. 2005).

Fig. 2
figure 2

Mean monthly rainfall values (left, y-axis) and percentages (right, y-axis) for the period between 1982 and 2009 for the Bhatwari rain gauge station 1,550 m above mean sea level

Site characteristics

Detailed mapping of the study area was carried out using satellite images and multiple field surveys, to ascertain the nature of terrain and the factors influencing landsliding, that vary strongly throughout the world (Ayalew and Yamagishi 2005; Karsli et al. 2009). In the Himalayan terrain, rock strength and geological structures play a major role in the landslide activity. The rock types in the study area include low to high grade metamorphics (green-schist to upper amphibolite facies) which have been deformed repeatedly (Naithani et al. 2009). The dominant rock types in the area include low grade metamorphic rock such as chlorite schist, schistose quartzite and quartz mica schist along with high grade migmatites and gneisses. Geotechnical investigations were carried out in the area using the slope stability probability classification (SSPC) method (Das et al. 2010). The entire road stretch was divided into 32 uniform slope sections based on the attitude of bedding, slope angle and rock types, and the geotechnical data were collected quantitatively for determining the rock mass parameters required in the SSPC system (Das et al. 2010). Rock mass properties, such as intact rock strength (IRS), discontinuity spacing and condition, were tabulated in the field. The IRS computed for the entire slope section varies between 50 and 200 MPa and corresponding cohesion of rock mass varies between 9 and 29 KPa (Table 1). Gneisses of different kinds constitute 87% of the total study area. Twenty field measurements taken in the gneissic areas showed that the IRS varies between 50 and 200 MPa, and the cohesion of rock mass for the same locations varies between 18 and 29 KPa. Detailed assessment showed that the IRS varies due to compositional changes and is higher in migmatite and biotite gneisses compared with the calc-silicate and augen gneisses. This may be because of the spacing and orientation of the joints present in these rocks and the degree of weathering in each rock type. Rocks are jointed and four sets of joints are present in the gneisses with dominant dip directions in 30°, 120°, 140° and 210° from north. Six measurements each were taken in the schists and quartzite areas. The quartzites, white to buff grey/green in colour, are dominantly thinly bedded and contain three to five sets of joints (Das et al. 2010). IRS varies between 50 and 150 MPa and the cohesion is between 15 and 27 KPa. Similarly in schists, IRS varies between 10 and 100 MPa, with cohesion between 10 and 20 KPa. Based on the intact rock strength and other geotechnical parameters rocks in the area were divided into eight sub-types, namely augen gneiss, biotite gneiss, calc-silicate gneiss, migmatites gneiss, quartzite, chlorite schist, quartz mica schist and schistose quartzite (Fig. 3). Table 2 presents the landslide densities computed for all the geo-environmental variables. For the factor lithology, the class representing calc-silicate gneiss has the highest landslide density. Landslide density is also higher in the classes like quartz mica schist and schistose quartzites. This is one of the indications that the rock mass parameters of these lithologies may be favourable for landsliding.

Table 1 Showing the intact rock strength measured for 32 slope sections in the field and the corresponding rock mass cohesion derived using SSPC method
Fig. 3
figure 3

Geological map of the study area showing the lineaments and the eight categories of litho-types identified through rock mass characterization

Table 2 Landslide densities computed for the geo-environmental factor maps used for landslide susceptibility assessment

Terrain derivatives, such as slope gradient and slope aspect, are frequently calculated from elevation information contained in digital elevation models, which has been well documented (Ohlmacher and Davis 2003; Ayalew and Yamagishi 2005; Moore et al. 1991; Guzzetti et al. 2005). We used the photogrammetric software SAT-PP (Zhang and Gruen 2006) to extract for the study area a digital surface model, i.e. a model that includes also above-ground features, from Cartosat-1 stereo data, which has recently been shown to be an accurate source of elevation information (Martha et al. 2010b). This was converted into a digital terrain model (DTM), which in our case only meant removal of vegetation clusters, using the procedure described by Martha et al. (2010a). We derived slope and aspect maps from a topographically corrected 10 × 10 m DTM, using standard ArcGIS functions. Slope angles and aspect values were divided into six and eight classes, respectively, following slope classifications used in other studies (Anbalagan 1992; Kanungo et al. 2006; Das et al. 2010). The slope class (>35–45°) has the highest landslide density in the study area (Table 2). The highest landslide density was observed on slopes with southern aspect, followed by south-west aspect (Table 2).

Road construction severely alters the slope stability in hilly areas, increasing the susceptibility to slope instability and landslides (Chakraborty and Anbalagan 2008). The best way to include the effect of a road section in a slope stability study is to make a buffer around them (Ayalew and Yamagishi 2005; Larsen and Parks 1997). Extent of slope cuttings due to road construction was mapped using field investigation. Landslide frequency in the study area was observed to be highest within about 100 m around the narrow road (average width of 7 m). Thus a road buffer of 50 m was placed on either side of the road centre, marking the area likely influenced by cutting-related slope instability. Other landslide influencing topographic parameters in the area, such as soil depth, terrain geomorphic units, land cover, drainage density and weathering conditions, were derived using ground surveys and interpretation of multi-temporal satellite images detailed in Das et al. (2010).

Landslide identification and mapping

A correct landslide database is the pre-requisite for any kind of landslide study (Varnes 1984). A combination of various sources, means and methods has been suggested for landslide inventory mapping, as no single best method for landslide inventorization exists (Galli et al. 2008; van Westen et al. 2008). A detailed landslide inventory for susceptibility assessment requires mainly the following data input: the location of a landslide, its frequency, potential causes of a landslide and the type of landslide. For the precise landslide identification, accurate landslide mapping and the collection of landslide data from reliable sources plays an important role. The major organizations which keep the updated record of landslides in the Indian Himalayan terrain are the Border Road Organization (BRO) and the Geological Survey of India. The landslide records in the form of digital catalogues of the BRO compiled between 1982 and 2009 were used in this study for preparing the inventory. The BRO catalogue consists of three types of records: (1) registry of landslides, a decadal report on each landslide hitting the road, (2) history of landslides, a quarterly report on significant landslides and (3) daily road stirrup, a report on the reasons of road blockage. All these three types of records were checked simultaneously to compile the landslide database for last 28 years which consists of 380 records of landslide occurrences. The technical records of BRO provided a detailed description of landslide location, morphometry, volume and date of occurrence of landslides. This helped us in identifying the morphological imprints left by the landslide scars on the road corridor leading to detection of landslides. Compilation of records also helped us in identifying the reactivated landslides with little difficulty as well as the frequency of landslides occurring in a particular location.

Extensive field verification was carried out in consultation with the BRO to map the landslides in the study area. A total of 178 active landslides were mapped at the 1:10,000 scale. These were correlated with the BRO records in their digital catalogue of landslides for the road corridor occurring along the cut slopes, as well as in the natural slopes of the road corridor (Fig. 4). Slide events along these active sites were reported 332 times in the last 28 years, with a maximum of 60 occurrences in 1994 (Fig. 5). Landslides in the study area are mostly triggered by monsoon rainfall during July to October every year. The landslides were characterized according to their types of movements, the materials involved and the states or activities of failed slopes (Cruden and Varnes 1996). This was done to understand different geo-environmental factors that control different slope movement types. Field observations revealed that the area is dominated by rock and debris slides (Fig. 6). Accordingly, the landslides considered in this study are mainly translational rock slides and debris slides that are prominent in this area (Das et al. 2010). The materials involved in majority of landslides are a mixture of rocks, pebbles, gravels and cobbles. Landslide bodies were mapped from crown to toe of rupture, as the detachment zones (zone of depletion) are the true susceptible areas, leaving aside the runout zones. We described landslide types according to Cruden and Varnes (1996). The annual summer monsoon in the area during June to October triggers both fresh as well as reactivated landslides. Changes in the water level of the main stream, the Bhagirathi river, also influences toe cutting, resulting in few landslides in the road corridor. Landslides on the cut slopes of the road corridor are smaller in size but occur frequently. A record of every landslide affecting the road corridor is logged by BRO. The mapped landslides cover an area of 0.45 km2, corresponding to 5.6% of the total area (min, 125 m2; max, 40,500 m2; and mean, 3,967 m2). As the overall landslide density was low in the area, we considered all landslide types together for the susceptibility modelling.

Fig. 4
figure 4

Landslide inventory map of the study area showing dominance of rock slides. The 178 landslide bodies experienced a total of 322 landslide events between 1982 and 2009

Fig. 5
figure 5

Histogram showing the frequency of landslide occurrence (left, y-axis) and percentages (right, y-axis) for the period between 1982 and 2009

Fig. 6
figure 6

Landslides occurring along the road corridor: a a large rock-cum-debris slide damaging the houses and the road; b a rock slide blocking the road partially

Hazard assessment

Landslide spatial probability (susceptibility) assessment

The landslide susceptibility map was created using a Bayesian logistic regression model, using a grid-cell-based method. The geo-environmental variable and landslide maps were first rasterized into 10 × 10 m grids, and converted to ASCII format for inputting into the WinBUGS programme to create dummy variables. The landslide map was binarized (1 = ‘landslide’ and 0 = ‘non-landslide’) for model development. Prior to the implementation of the model, the landslide data were divided equally into training and testing samples by adopting a rationalized selection method manually. The regression model was carried out with landslides as response variable and the geo-environmental factors as explanatory variables. The model resulted in mean parameter posterior estimates, standard deviations and quantiles for intercept and coefficients (Table 3).

Table 3 Posterior distribution summaries of parameter estimates of Bayesian logistic regression model for landslide occurrence with reference to significant geo-environmental variables

Analysis of the results indicated that several but not all of the categories of explanatory variables were significant contributors to the model. Out of the total of 53 categorical variable classes considered in the model (slope, 6; terrain units, 7; land cover, 9; soil, 4; aspect, 8; lithology, 8; lineament density, 3; weathering, 3; road buffer, 2; and drainage density, 3), 17 variables were found to be contributing significantly (Table 3). Using the intercepts and coefficients obtained from the Bayesian logistic regression model, a logit formula was created for the linear predictor ηi as detailed below to calculate the landslide probability for each pixel, resulting in a landslide spatial probability map.

$$ \begin{gathered} {\eta_i} = - 1.531 + 0.239{\hbox{ slope }}( > 35^\circ - 45^\circ ) + 0.454{\hbox{ calc - silicate gneiss}} + ( - 1.182){\hbox{ biotite gneiss}} + ( - 0.453) \hfill \\{\hbox{migmatite gneiss}} + 0.436{\hbox{ schistose quartzite}} + 1.205{\hbox{ hogback type highly dissected structural hills}} + \hfill \\1.215{\hbox{ cuesta type moderately dissected denudational hills}} + (0.894){\hbox{ scrubland}} + ( - 0.763){\hbox{ river channel}} + \hfill \\( - 0.762){\hbox{ deep soil}} + ( - 2.546){\hbox{ shallow soil}} - 2.053{\hbox{ high weathering}} + 2.83{\hbox{ aspect (SE)}} + 2.013{\hbox{ aspect (S) + }} \hfill \\0.7{\hbox{ high drainage density}} + 0.529{\hbox{ high lineament density}} + 0.445{\hbox{ road buffer}} \hfill \\\end{gathered} $$
(12)

Accuracy assessment of the model used and validation

The ROC curve (Fig. 7) shows that the area under the curve (AUC) is 0.86, which corresponds to an accuracy of 86% for the model developed using Bayesian logistic regression. The BLR model was validated by using 50% of the landslide cells kept separately for validation. The 2,254 landslide cells that were not used in developing the model, along with an equal number of randomly selected non-landslide cells, were used for the ROC curve analysis. It shows an AUC of 0.839, i.e. an accuracy of 83.9% (Fig. 7). The standard error in the ROC curve in all cases is less than 0.005. The close association of AUC of both training and testing data confirms the correctness of sampling procedure.

Fig. 7
figure 7

ROC curves representing true positive rates (sensitivity) and false-positive rates (1-specificity) for the Bayesian Logistic regression (BLR) model. The area under the curve (AUC) is 0.860 and 0.839 for training and testing data, respectively

Generation of HSU

The grid-cell-based susceptibility map indicating probability values for each cell was considered for the generation of HSU using multiresolution segmentation in eCognition software (Definiens 2009). In eCognition, segmentation is controlled by scale (size), colour and shape, with shape being further classified into compactness and smoothness (Definiens 2009). Our primary aim was to generate segments that are internally homogeneous and should be distinguishable from its neighbourhood. In our study, scale parameters 10 and 50 resulted in 1191 and 56 segments, respectively. The optimal scale parameter determination was based on Moran’s I autocorrelation index and variance analysis, which as calculated for each of the scale parameters. The normalised values of the objective function were plotted to identify the optimal scale factor that controls segments size (Fig. 8). The highest objective function value (1.24) corresponded to segments with scale parameter 21. This is an indication of the optimal intrasegment homogeneity and intersegment heterogeneity (Espindola et al. 2006). Using this optimal size parameter the susceptibility map was divided into 315 statistically independent HSU (Fig. 9).

Fig. 8
figure 8

Objective function derived from Moran’s I autocorrelation index and weighted average variance method. The optimal size factor was found to be 21

Fig. 9
figure 9

Landslide susceptibility map segmented into 315 homogenous susceptible units (HSU) depicting the probability values in the range of 0.0–0.2, 0.2–0.4, 0.4–0.6, 0.6–0.8, and 0.8–1.0

Temporal probability of landslides

Knowing the mean recurrence interval of landslides in each HSU from 1982 to 2007 and assuming that the rate of future slope failures will remain unchanged, and by adopting a Poisson probability model (Eq. 8), we computed the exceedance probability of having one or more landslides in each mapping unit for 1-, 5- and 10-year return periods (Fig. 10A–B). Similar maps can be prepared for any period. The highest values of temporal probability for 1-, 5- and 10-year return periods are 0.826, 0.998 and 0.9999, respectively. As expected, the probability of having one or more slope failures increases with time.

Fig. 10
figure 10

Temporal probability of landslides in each homogeneous unit for a: a 1, b 5 and c 10 years recurrence period

Poisson model validation

The temporal model was checked for its consistency in predicting landslide occurrence. To validate the results the probability values were checked with the field data of landslide occurrence for the year 2008 and 2009. The model tested for its accuracy of the 1 year scenario of 2008 showed that 81% of the failures occurred in the higher probability zones between 0.6 and 1.0, whereas 60% of the failures of the year 2009 occurred in these units. However, for a 5 years scenario, the model prediction showed 91.5% slope failures to take place in the high to very high temporal probability zones.

Probability of landslide size (area)

To calculate the probability of landslide area the same 28-year inventory database was used. We obtained the area of each landslide polygon. Care was taken to calculate the exact size of each landslide, avoiding topological and graphical problems related to the presence of smaller landslides inside larger mass movements (Guzzetti et al. 2005). Figure 11a shows the probability density function of landslide areas in the Himalayan road corridor. We obtained the estimate using the inverse-gamma function of Malamud et al. (2004) (Eq. 9), and found the rollover of the distribution at 800 m2 representing the smaller sized landslides (Fig. 11a) We also found from the analysis of our landslides database that the number of landslides having an area more than 5,000 m2 is less than 20% of the total number. To calculate the hazard, we derived exemplary size probabilities for landslides with 800 m2 and 5,000 m2. Figure 11b shows the probability of a particular range of landslide size to occur in the study area, i.e. the probability that a landslide will have an area that exceeds 800 m2 and 5,000 m2, which were calculated as 0.78 and 0.21, respectively. We also considered another scenario to calculate hazard for all sizes of landslides to occur, where the probability is 1.0. We demonstrated the calculation of hazard for three different size probabilities. However, it can be carried out for any particular size.

Fig. 11
figure 11

Probability density (a) and probability (b) of landslide area in the Himalayan road corridor, using an inverse-gamma function (Malamud et al. 2004). The probability that a landslide will have an area that exceeds 800 m2 and 5,000 m2 are 0.78 and 0.21, respectively

Hazard assessment

Figure 12 shows examples of the landslide hazard assessment obtained by multiplying the values for spatial, temporal and size probabilities for each HSU. The figure portrays landslide hazard for the Himalayan road corridor for nine different conditions, i.e. for three different return periods (1, 5 and 10 years), and for three different landslide sizes, (1) ≥5,000 m2, (2) ≥800 m2 and (3) all sizes. Maps are arranged according to the size probability for three different return periods of 1, 5 and 10 years, respectively. Overall, the results showed that the hazard probability of larger landslides having an area of ≥5,000 m2 is very low (0.0–0.2) for one year as well as the 5 and 10 years recurrence periods, whereas it can be moderate (0.4–0.6) for a landslides area of ≥800 m2. Considering all landslide sizes together in the model, the hazard probability can be higher for the 5 and 10 years recurrence periods.

Fig. 12
figure 12

Landslide hazard maps for three different periods (a) 1 year, (b) 5 years and (c) 10 years, and for three probable sizes: 1, more than 5,000 m2; 2, more than 800 m2; and 3, all sizes. Five classes show different joint probabilities of landslide size, of landslide temporal occurrence and of landslide spatial occurrences

Discussion and conclusions

Landsliding, in general, is a geomorphic slope failure process, triggered by natural as well as anthropogenic factors and is controlled by favourable terrain conditions that act as causal factors (Das et al. 2010). Determining landslide hazard is always a challenge. The problem lies in the data generation, as well as their integration in a conceptual framework. With improved sophistication of GIS programmes, the actual data integration process has gotten easier. Nevertheless, many methods proposed to evaluate quantitatively landslide hazard geographically can best be classified as susceptibility models, because they provide an estimate of spatial probability only (Chung and Fabbri 1999; Soeters and van Westen 1996; Chen and Wang 2007). Guzzetti et al. (2005) proposed a quantitative hazard model using spatial, temporal and size probability. They used geomorpho-hydrological units as TMU to characterize the landslides and to facilitate the calculation of spatio-temporal and size probabilities of landslide hazard. We argue that, being generated independently.without integration of landslide occurrences, TMUs fall short of representing actual homogenous susceptible areas. The HSU, on the other hand, can address the inherent homogeneity conditions of geo-environmental factors with respect to landslides. This is because the HSU can be automatically derived from a susceptibility map generated by combining landslides with geo-environmental variables through data-driven models. For this study, we prepared a multi-temporal landslide inventory map based on the landslide records collected from the Border Roads Organisation for 28 years (1982–2009). We used remote sensing satellite data through visual interpretation for the generation of geo-environmental factor maps. We obtained the susceptibility map using a Bayesian logistic regression analysis of ten thematic variables, including morphological, lithological and structural parameters. We calculated susceptibility on a grid-cell basis and derived the homogeneity conditions from the data-driven output of the susceptibility map. The susceptibility map was divided into 315 HSU using a region-growing algorithm, optimized through an objective function resulting in segments that are statistically independent. To assess the intersegment heterogeneity we used Moran’s I autocorrelation index, and to assess the intrasegment homogeneity we used a weighted average variance method.

Temporal probability for each HSU was calculated using historical landslide records. This was done for three periods (1, 5 and 10 years) with a probability of occurrence of one or more landslides in that particular HSU based on the landslide frequency in that particular unit. We obtained minimum and maximum probability values for different periods. One limitation of the temporal probability calculation is that it depends on the frequency of landslide occurrences in each unit. Therefore, no probabilities can be obtained for those units that have not experienced landslides in the past 28 years but are in principle susceptible. In the present study, temporal landslide records of 28 years gave a trend of annual landslide recurrences and, more precisely, the multiple landslide occurrences in the spatio-temporal domain during the rainy months.

Two landslides of a different size can result in different types of damages depending on the geo-environmental condition of the area, such as topography and land use of the area, human activity in the area and their perception of landslide hazard. Landslide area, therefore, is a good approximation of landslide magnitude (Guzzetti et al. 2005). A commonly used approach for size probability is based on the landslide area or volume (Guthrie and Evans 2004; Malamud et al. 2004; Stark and Hovious 2001). This notion holds true for our study in the sense that in a road corridor, bigger landslides would likely damage more length of road stretch as well it has more probability to cause damage to moving vehicles on the road in comparison to small landslides. During the fieldwork, we noticed a small live landslide on the order of a few m2 that was part of the bigger active landslide. Such landslides, if occurring more than once a day at a particular location, are aggregated in the BRO records as a single landslide for that day. Furthermore, the inverse-gamma distribution of size probability may not properly predict such small landslides.

For this study, nine landslide hazard maps were generated by multiplying spatial, temporal and size probabilities. Each map represents a specific scenario. Scenarios were developed based on three landslide area classes (all sizes, >800 and >5,000 m2) and for three recurrence periods (1, 5 and 10 years). Hazard w.r.t. large landslides i.e. >5,000 m2 is low in the study area, with probability rarely exceeding 0.2. However, the landslide probability of sizes less than 5,000 m2 is relatively higher in the study area. In the 1 year scenario, the probability of landslides repeating themselves in exactly same place is generally low. However, in the 5 years scenario the probability of occurrence of any size of landslide is higher in the northern stretch of the study area, mainly because of the favourable rock types and slope conditions. Looking at a 10-year scenario, the probability of occurrence of landslide is almost certain in any part of the road section, though the probability is low away from the road. Our study highlighted the dynamic nature of landslide hazard mapping and the factors associated with it. The hazard maps presented in Fig. 12 gave the annual, 5 and 10 years probability of experiencing one or more landslides in a particular HSU with a given size. Similar maps can be prepared for any period and any size to provide quantitative information on future slope failures to planners, decision makers, road maintenance authorities and hazard mitigating agencies.

Conditional independence of spatial, temporal and size probabilities was demonstrated for the final hazard assessment (Guzzetti et al. 2005). Generally, difficulties arise to demonstrate the conditional independence of spatial and temporal probabilities. However, this is not the case in our study. This is because the temporal model was constructed by calculating the landslide frequency in each HSU, which is statistically independent. Hence, it can be considered that the spatial and temporal probabilities in our study area are independent of each other. Spatial probability was calculated on a grid-cell-based model and later upgraded to HSU, on which temporal probability was calculated. In addition, the temporal probability calculation is based on the frequency of landslides, which is mainly dependent on monsoonal rainfall pattern that is not considered as one of the covariates of susceptibility mapping. In the present study, it was found that the frequency–area statistics of the multi-temporal data follow the three-parameter inverse-gamma distribution of Malamud et al. (2004), which has been sufficiently demonstrated to be independent of the physical setting and geo-environmental conditions. Thus, it can be concluded that the size probability is independent of susceptibility. In addition the multi-temporal inventory reveals that the landslides occurred in all sizes with different frequencies, indicating the independence of rate of failure from landslide size. Hence, our study sufficiently demonstrates that the three probabilities are conditionally independent.

To understand the landslide mechanism in an area and to identify the unknown factors affecting their occurrence, several geo-environmental variables are generally included in the model and significant ones are retained for generating susceptibility map. Analysis of significant variables revealed several interesting facts. The positive contribution of the variable ‘slope gradient >35–45°’ indicates that in the study area moderate slopes are prone to landslides. Rock types such as calc-silicate gneiss and schistose quartzite are prone to landslides, whereas biotite gneiss and migmatites gneiss resist landsliding in the area. Two significant geomorphology classes, ‘HTHDSH’ and ‘CTMDDH’, contribute positively to the landslide occurrence probability. Land cover class ‘scrubland’ contributes positively, whereas ‘river channel’ contributes negatively, implying their opposite contribution to landsliding event. Contributions from significant aspect classes (South-East and South) are positive, indicating their favourability to landsliding. This is because the sun facing slopes in the Himalayas are less vegetated and more prone to landslides. Lineament density contributes positively to the landslide occurrence, mainly because the majority of the rockslides is controlled by lineaments. Soil classes have a negative contribution to the landslide, which may be because of the rocky nature of the terrain. However, drainage density and road buffer have a positive contribution, indicating their close association with the landsliding process. These factors invariably have control on the landslides occurring along natural slopes. In addition, however, small landslides occurring exclusively along the cut slopes might be controlled more by anthropogenic factors rather than the natural terrain factors. Sensitivity analysis of the landslide controlling geo-environmental factors is important in landslide susceptibility mapping mainly due to two reasons: (1) landslides are highly discrete events and (2) the landslide controlling factors are not entirely independent. A global sensitivity assessment of the susceptibility model was carried out using ROC curve analysis to ascertain the landslide controlling geo-environmental factors. However, a local assessment along the cut slopes of the road corridor through field investigation suggests that the geological factors, such as rock structures and exposed rock-cut surfaces, are more crucial to failure. Modification of the slope along the road section exposes the weak planes of the rocks, aggravating the slope failure process. In general, the sensitivity of the output to the chosen input data, such as our choice of a 10-m grid, the types of data used, accuracy of the DTM, or the choice for the road buffer width, all contain a certain amount of uncertainty.

The present study is an attempt to generate landslide hazard maps quantitatively using homogenous susceptible units. We propose to replace terrain mapping units with more logical parameter, such as HSU for calculating hazard. With increasing sophistication of GIS programmes, a high resolution grid-based landslide susceptibility modelling and further transformation of susceptibility map into HSU is readily possible. Care needs to be taken to carry out a sufficiently robust data-driven susceptibility model that strengthens the generation of HSU.