We assume a cluster
\({\mathcal {C}}\) is composed of a set of racks
Racks. Each rack
\(Rack_n \in Racks\) contains a set of machines
\(m\in {\mathcal {M}}\) such that
Rack(
m) =
\(Rack_n\). A set of files
\({\mathcal {F}}\) is stored in the cluster
\({\mathcal {C}}\) over the machines
\(m\in {\mathcal {M}}\). Every file
\(F_i \in {\mathcal {F}}\) is divided into fixed-sized blocks
\(B_{i}\) (128 MB by default) as stated in Eq. (
1) and stored in a hierarchical file organisation.
$$\begin{aligned} |B_i| = \left\lceil \frac{\text {File Size}}{\text {Block Size}}\right\rceil \end{aligned}$$
(1)
Root path,
Root, is the ‘highest’ level of the hierarchy and every file
\(F_i \in {\mathcal {F}}\) is placed into a certain path
P which is a branch of
Root. Each block
\(b_{ij} \in B_{i}\) is replicated
\({RF}_{i} \in \mathbb {N}^{*}\) times (i.e., the replication factor of file
\(F_i\)). We denote
\(b_{ij}^{u}\) the replica number
u of the block
\(b_{ij}\) where
\(0< u =< {RF}_{i}\). Each replica
\(b_{ij}^u\) is stored a particular machine
\(M(b_{ij}^u)\).
We want to reduce the replication factor from
\(RF_i\) to
\(RF'_i\) such that
\(RF_i > RF'_i\). We introduce a binary variable
\(x_{ij}^{u}\) which takes the value 1 if the replica
u of the block
\(b_{ij}\) exists after reducing the replication factor, or 0 otherwise.
$$\begin{aligned} \sum _{u=1}^{RF_i} x_{ij}^{u} = RF'_{i},~~ \forall ~ F_i \in P, ~~\forall ~j \in \{1,...,|B_i|\} \end{aligned}$$
(2)
To strengthen fault-tolerance and promote data availability, replicas are distributed over different racks according to a rack-awareness condition in the default block placement policy as expressed in Eq. (
3). The proposed algorithm continues the rack-awareness block placement after a successful deletion for
\(\forall ~ F_i \in P, ~~\forall ~j \in \{1,...,|B_i|\}\):
$$\begin{aligned} \left| \left\{ Rack(M(b_{ij}^u) )~|~u \in \{1,...,RF'_{i}\}~and~x_{ij}^u = 1~\right\} \right| \ge 2 \end{aligned}$$
(3)
We defined a variable for the partial block count
\(PBC_m \in \mathbb {N}\) for each machine
\(m \in {\mathcal {M}}\). For a given path,
\(PBC_m\) is computed as a sum of all replicas for
\(\forall ~ F_i \in P\) that are stored on the machine
m as expressed in Eq. (
4):
$$\begin{aligned} PBC_m = ~\sum _{i \in \{1,..., |F_i|\}}~\sum _{j \in \{1,..., |B_i|\}}\sum _{ \begin{array}{c} u \in \{1,...,RF_i\} \\ \wedge ~M(b_{ij}^u)=m \end{array} } x_{ij}^{u} \end{aligned}$$
(4)
Each
m has finite resources: (e.g., CPU and RAM) denoted by
vCore(
m),
RAM(
m) respectively. The network connection between machines
\(m_i, m_j\in {\mathcal {M}}\) in the same rack (i.e.,
\(Rack(m_i) = Rack(m_j)\)) is faster compare to machines are in the different rack (i.e.,
\(Rack(m_i) \ne Rack(m_j)\)). Any submitted job (a.k.a., task) runs on a container allocated to particular node
m with resource requirements
\(vCore_{Cont}\) and
\(RAM_{Cont}\). Note that
\(vCore_{Cont}\) and
\(RAM_{Cont}\) both are global properties of the scheduler [
29]. A machine
\(m\in {\mathcal {M}}\) can run a number of containers
\({\mathcal {K}}_m\) concurrently.
\({{\mathcal {K}}_m}\) is determined by composition of available machines’ resources (i.e.,
vCore(
m) and
RAM(
m)) and containers’ resource requirements (i.e.,
\(vCore_{Cont}\) and
\(RAM_{Cont}\)) as expressed in Eq. (
5):
$$\begin{aligned} {\mathcal {K}}_m = \min \left( \left\lfloor \frac{\textit{RAM}(m)}{\textit{RAM}_{Cont}}\right\rfloor , \left\lfloor \frac{\textit{vCore}(m)}{\textit{vCore}_{Cont}}\right\rfloor \right) \end{aligned}$$
(5)
While deleting replicas, our main objective is to minimise the maximum ratio of
\({PBC_m}\) to
\({{\mathcal {K}}_m}\) for a given path
\(P \subseteq {\mathcal {P}}\) as shown in Eq. (
6). Therefore, in every deletion iteration our algorithm will select a replica that has the biggest division value. Hence, we expect to see the number of replica stored on a node become dependent on
\({{\mathcal {K}}_m}\) after our replication deletion algorithm.
$$\begin{aligned} minimise~ \max _{m\in {\mathcal {M}}}\left( \frac{PBC_m}{{\mathcal {K}}_m}\right) \end{aligned}$$
(6)