In online gradient descent learning, the local property of the derivative of the output function can cause slow convergence. This phenomenon, called a
, occurs in the learning process of a multilayer network. Improving the derivative term, we propose a simple method replacing the derivative term with a truncated Gaussian function that greatly increases the convergence speed. We then analyze a soft committee machine trained by proposed method, and show how proposed method breaks a plateau. Results showed that the proposed method eventually led to break the symmetry between hidden units.