Skip to main content
Log in

Operational optimization of wastewater treatment plants: a CMDP based decomposition approach

  • S.I.: Avi-Itzhak-Sobel:Probability
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

This work considers an operational management optimization of wastewater treatment plants. We present a new, CMDP-based optimization model for this problem, as well as a decomposition algorithm for solving it. From a theoretical perspective, we show that the algorithm’s solution is near-optimal and provides an upper bound for its result. For practical considerations, we applied our approach to the plant of a medium-sized European city. We achieved a significant reduction in costs as well as an improvement in compliance with local regulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Altman, E. (1999). Constrained Markov decision processes (Vol. 7). Boca Raton, FL: CRC Press.

    Google Scholar 

  • Belia, E., Amerlinck, Y., Benedetti, L., Johnson, B., Sin, G., Vanrolleghem, P. A., et al. (2009). Wastewater treatment modelling: Dealing with uncertainties. Water Science and Technology, 60(8), 1929.

    Article  Google Scholar 

  • Biowin simulator. http://envirosim.com/

  • Caramanis, C., Dimitrov, N. B., & Morton, D. P. (2014). Efficient algorithms for budget-constrained markov decision processes.

  • Gps-x simulator. http://www.hydromantis.com/GPS-X.html

  • Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean, T., & Boutilier, C. (1998). Hierarchical solution of markov decision processes using macro-actions. In Proceedings of the fourteenth conference on uncertainty in artificial intelligence, pp. 220–229. Morgan Kaufmann Publishers Inc.

  • Laroche, P., Boniface, Y., & Schott, R. (2001). A new decomposition technique for solving markov decision processes. In Proceedings of the 2001 ACM symposium on applied computing, pp. 12–16. ACM

  • Olsson, G., & Newell, B. (1999). Wastewater treatment systems. Modelling, diagnosis and control. London, UK: IWA Publishing.

    Google Scholar 

  • Parr, R. (1998). Flexible decomposition algorithms for weakly coupled markov decision problems. In Proceedings of the fourteenth conference on Uncertainty in artificial intelligence, pp. 422–430. Morgan Kaufmann Publishers Inc.

  • Puterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York, NY: Wiley.

    Book  Google Scholar 

Download references

Acknowledgments

The authors thank Dr. Michael Masin for helpful conversations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Zadorojniy.

Appendix

Appendix

1.1 Transition probability matrix generation

In CMDP problems, transition probability matrices are typically assumed to be known in advance. However, in our WWTP applications, the problem of their statistical estimation constitutes a significant challenge. First, the state space is a very large one. For example, a typical LL problem can involve more than 1000 states and more than 100 actions. Such a model demands estimation of more than \(1000^2 \times 100=100\) million transition probabilities. Second, it is hard to get reliable estimates from real data: data size and quality are typically insufficient. In addition, we should estimate the transition probabilities of “undesirable states” that a reasonably managed plant should not reach.

Therefore, simulation plays a central part in transition probability estimation. In this section, we discuss two issues:

  • Effective coverage of state space via simulation.

  • Interpolation to states that were not visited enough during simulation.

1.1.1 Notation

  • Number of action variables: \(N_a\) .

  • Number of action values for each action variable: \(N_a^i, \ 1 \le i \le N_a\).

  • Overall number of actions: \(|U| = \prod _{i=1}^{N_a} N_a^i\).

  • Number of state variables: \(N_s\).

  • Number of states for each state variable: \(N_s^j, \ 1 \le j \le N_s\).

  • Overall number of states: \(|X| = \prod _{j=1}^{N_s} N_s^j\).

  • Number of visits to state j that were followed by action i: \(V_j^i, \ i \in U, \ j \in X\).

  • Number of transitions to state k from state j after action i has been performed: \(V_{jk}^{i}, \ i \in U, \ j \in X, \ k \in X\).

  • Basic estimate of transition probability matrices:

    $$\begin{aligned} \hat{P}_{jk}^{i} = V_{jk}^{i} / V_j^i, \ i \in U, \ j \in X, \ k \in X. \end{aligned}$$
    (3)

Remark

In the “Appendix”, we consider statistical estimates of transition probabilities. Note that our notation in the “Appendix” is different from the notation on actual transition probabilities in the main body of the paper.

State variables can be divided into two sets: controllable and non-controllable. Examples of non-controllable state variables are influent flow, influent chemical load, or type of time period according to electricity costs. Actions and controllable state variables do not affect transitions between non-controllable state variables. Controllable state variables describe internal or effluent characteristics of the system and can be affected by past actions and state variables of any type. We introduce an additional notation on these two types of state variables.

  • Number of non-controllable state variables is \(I, \ 0 \le I < N_s\). Assume that state variables with indices \(1,\ldots ,I\) are the non-controllable ones.

  • Let \(X_N\) and \(X_C\) be state spaces that correspond to non-controllable and controllable state variables, respectively. The full state space is a Cartesian product of these two spaces: \(X = X_N \times X_C\). Their dimensions are \(|X_N|=\prod _{i=1}^I N_s^i\) and \(|X_C|=\prod _{i=I+1}^{N_s} N_s^i\), respectively.

  • Define \(NS(j), \ j \in X_N\), to be the set of states from the full state space that correspond to state j from the non-controllable space \(X_N\). In the other words, \(NS(j)=\{j\} \times X_C\).

  • Define \(CS(j), \ j \in X_C\), to be the set of states from the full state space that correspond to state j from the controllable space \(X_C\). In the other words, \(CS(j) = X_N \times \{j\}\).

  • Let \(k \in X\). Denote by \(H(k) \in X_N\) the non-controllable part of state k and by \(E(k) \in X_C\) the controllable part of state k. In the other words, state k is the concatenation of states H(k) and E(k).

1.1.2 State-space and action-space coverage

When running a simulation, we chose actions that are taken each hour (basic time interval of the Markov process). The simulation algorithm switches between two regimes: action-driven coverage and state-driven coverage. In the first regime, we seek to cover the action space in a homogeneous way. In the second regime, the goal is to reach “rare states”. Note that the first task is simpler since simulation directly controls actions but not a state the system is in.

Action-driven coverage:

  1. 1.

    Assume that the current state is \(j, j \in X\).

  2. 2.

    Choose the least chosen action so far: \(i^* \ = \ \arg \min V_j^i\). If there are several such actions, choose randomly between them according to uniform distribution.

State-driven coverage:

  1. 1.

    Assume that the current state is \(j, j \in X\).

  2. 2.

    Let \(\bar{V}_k = \sum _{i=1}^{|U|} V_k^{i}\) denote overall number of visits to state k until the current time.

  3. 3.

    Let \(W^i=\sum _{k=1}^{|X|} \hat{P}_{jk}^i / \bar{V}_k\), where \(\hat{P}_{jk}^i\) is the current estimate (3) of transition probabilities.

  4. 4.

    Choose action \(i^* \ = \ \arg \max W^i\). If there are several such actions, choose randomly between them according to uniform distribution.

State-drive coverage chooses actions that can imply visits to states with high \(1/\bar{V}_k\) or, in the other words, “rarely visited states”.

First, action-driven coverage is run for a relatively long period to get a reasonable initial estimate of transition probabilities. Then, we alternate between action-driven and state-driven coverage.

1.1.3 Interpolation of transition probability matrices

There are several drawbacks related to the use of matrix (3) “as is”, i.e. directly computed from the simulation trace. We implemented several methods to interpolate the transition probability matrices and address these challenges.

Method 1. Interpolation based on neighboring states. The WWTP state and control spaces are large. For example, let \(|X|=1000, |U|=100\) and assume that we need at least 10 visits to every state followed by every action. These considerations imply \(10^6\) simulated WWTP hours at least. (In fact, more: some states will be visited more than 10 times.) Since WWTP simulators are not especially fast, the coverage algorithms above do not enable full coverage of the state space under all actions: some states are either not visited or are rarely visited. This can affect the optimization algorithm in non-desirable ways due to absorbing states, transition probabilities based on very small number of observations, etc.

Neighboring states of state \(j, \ j \in X\), are the states such that all state variable values, except one, coincide with state variables of j and difference by one state variable is equal to one adjacent interval. \(L_j\) denotes the set of neighboring states of state j and, in addition, state j itself. Let \(\tilde{L}_j^i\) denote a subset of \(L_j\): states where action i has been taken at least once. \(|\tilde{L}_j^i|\) is the size of this set.

Method 1 uses parameter \(M \ge 0\). Interpolation is performed for actions and states with \(V_i^j \le M\). We typically use \(M=10\). The interpolated probabilities are computed via the following formula:

$$\begin{aligned} \tilde{P}_{jk}^{i} = \frac{\sum _{m \in \tilde{L}_j^i} \hat{P}_{mk}^{i}}{|\tilde{L}_j^i|}, \ k \in X. \end{aligned}$$
(4)

Method 1 can be performed iteratively if the number of visits to adjacent states \(\sum _{m \in \tilde{L}_j^i} V_{mi}\) replaces \(V_j^i\) in the threshold criteria for state j and action i.

Method 2. Interpolation to guarantee matrix irreducibility. If a transition probability matrix is not irreducible for some action i, then the optimization solution can imply that some states are not visited under this action and this is wrong for WWTP (and many other physical systems). Method 2 addresses this problem. It uses parameter \(\varepsilon , \ 0 \le \varepsilon \le 1\). (\(\varepsilon =0.01\) is typically assumed.) Given \(S_0\) is an initial state, we perform the following steps:

  1. 1.

    For all actions \(i \in U\), and states \(k \in X, \tilde{P}_{S_0,k}^{i}=\max (\hat{P}_{S_0,k}^{i},\varepsilon /|X|)\).

  2. 2.

    For all actions \(i \in U\), and states \(j \in X, \tilde{P}_{j,S_0}^{i}=\max (\hat{P}_{j,S_0}^{i},\varepsilon /|X|)\).

  3. 3.

    A straightforward algorithm is used to normalize transition probability matrices (the sum of all rows should be equal to one).

This method is applied at the last stage of interpolation.

Method 3. Interpolation for non-controllable state variables. Statistical noise and Method 1 of interpolation can imply a significantly distorted matrix with respect to non-controllable variables. For example, a distribution of a non-controllable state variable under one policy can be significantly different from its distribution under another policy. To correct this potential distortion, we perform the following steps:

Step 1 Define and compute the following matrix:

$$\begin{aligned} \hat{P}_{jk}^{N} = \frac{\sum _{i=1}^{|U|} \sum _{s \in NS(j)} \sum _{w \in NS(k)} V_{sw}^i}{\sum _{i=1}^{|U|} \sum _{s \in NS(j)} V_{s}^i},\quad j \in X_N,\quad k \in X_N. \end{aligned}$$
(5)

The matrix elements are estimates of action-independent transitional probabilities between non-controllable states. This computation should be based on non-interpolated matrix \(\hat{P}\) from (3).

Step 2 For each action \(i \in U\), compute the following matrix:

$$\begin{aligned} \hat{P}_{jk}^{i,C} = \frac{\sum _{w \in CS(k)} V_{jw}^i}{V_{j}^i},\quad j \in X, \quad k \in X_C. \end{aligned}$$
(6)

The matrix elements are estimates of transitional probabilities from states in the full space X to states in the controllable space \(X_C\). If several iterative interpolations are done, this step should be performed on the last interpolated matrix.

Step 3 Computation of interpolated matrix for estimates of the transition probabilities:

$$\begin{aligned} \tilde{P}_{jk}^{i} = \hat{P}_{H(j),H(k)}^{N} \cdot \hat{P}_{j,E(k)}^{i,C}, \quad j \in X, \quad k \in X. \end{aligned}$$
(7)

Formulae (5)–(7) are based on the following calculation:

$$\begin{aligned} P(x_{t+1}=k | x_t = j, u_t=i)= & {} P(H(x_{t+1}) = H(k), E(x_{t+1}) = E(k) | x_t =j, u_t=i) = \\ P (E(x_{t+1})= & {} E(k) | H(x_{t+1}) = H(k), x_t = j, u_t=i) \cdot P(H(x_{t+1}) \\= & {} H(k) | x_t = j, u_t=i ) . \end{aligned}$$

Due to independence of future non-controllable variables on the current action and controllable variables,

$$\begin{aligned} P(H(x_{t+1}) = H(k) | x_t = j, u_t=i ) \ = \ P(H(x_{t+1}) = H(k)| H(x_t) = H(j)) . \end{aligned}$$

Since current values of non-controllable variables do not affect current values of the controllable variables,

$$\begin{aligned} P(E(x_{t+1})= & {} E(k) | H(x_{t+1}) = H(k), x_t = j, u_t=i) \\= & {} P(E(x_{t+1}) = E(k) | x_t = j, u_t=i ). \end{aligned}$$

Joint use of interpolation methods: practical guidelines We start with Method 1 that can be performed iteratively. Method 1 is followed by Method 3, and Method 2 is typically applied at the last stage of interpolation.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zadorojniy, A., Shwartz, A., Wasserkrug, S. et al. Operational optimization of wastewater treatment plants: a CMDP based decomposition approach. Ann Oper Res 317, 313–330 (2022). https://doi.org/10.1007/s10479-016-2146-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-016-2146-z

Keywords

Navigation