Feature extraction from timefrequency image based on GLCM
GLCM is a texture feature extraction method for image pixel processing^{13}. Its basic principle is to count the number of pixel pairs in a specific direction and distance. The direction is generally divided into four directions: 0°, 45°, 90° and 135°. The distance is a positive integer, which can be set manually to generate a grayscale cooccurrence matrix. In order to reduce the amount of calculations, the timefrequency images are processed by modifying the image size and the gray level.
STFT is used to convert the bearing vibration signal of each sample into the corresponding timefrequency image, and Figure 1 shows the timefrequency images of a set of samples representing four types of bearing states. Like the timefrequency image generated by STFT^{14} has a large size which is 875×656, the image size is reduced to 128×128. Also, since the gray image has 256 gray levels, which requires a lot of computation, the gray level of the timefrequency image is set to 4, and the 4×4 grayscale cooccurrence matrix is obtained by counting the number of pixel pairs whose direction is 90 degrees and whose distance is 2.
The four features including contrast, correlation, homogeneity and energy calculated from the timefrequency image grayscale cooccurrence matrix do not reach a good classification effect in the ‘experience. Therefore, in this study, the (4 times 4) The cooccurrence matrix is converted into a 16dimensional vector, which is the preliminary characteristic of each timefrequency image. Due to the large numerical range of each preliminary feature in the vector, the preliminary features are processed to avoid the impact of too large a numerical range on the classification effect. The mapminmax function in MATLAB shown in Equation. (1) is used to normalize preliminary features in the training set. As the mapminmax function consists of normalizing each row of data, the matrix composed of the preliminary characteristics of the training set is transformed into a row vector before normalization, and then returned as a matrix after normalization. When eq. (1) is used to normalize the preliminary characteristics in the test set, the values of x_{maximum} and x_{min} are the maximum and minimum values among the preliminary features of the training set rather than the test set.
$${text{y = (y}}_{{{text{max}}}} {text{ – y}}_{{{text{min}}}} {)} times { text{(x – x}}_{{{text{min}}}} {text{)/(x}}_{{{text{max}}}} {text{ – x} }_{{{text{min}}}} {text{) + y}}_{{{text{min}}}}$$
(1)
where x_{maximum} and x_{min} are the maximum and minimum values of each row before normalization, respectively, and y_{maximum} Andy_{min} are the maximum and minimum values of each row after normalization, respectively.
The specific steps for extracting features from the GLCMbased timefrequency image are as follows:
 Step 1::

Reduce the size of the timefrequency image from 875×656 to 128×128.
 2nd step::

Convert color image to gray image. At this point, the pixel range of the timefrequency image is between 0 and 255.
 Step 3::

Convert the gray image to a fourlevel gray image. At this time, the pixel range of the timefrequency image is 1 to 4.
 Step 4::

Count the number of pairs of pixels whose direction is 90 degrees and whose distance is 2, and generate the (4 times 4) grayscale cooccurrence matrix.
 Step 5 ::

Convert the (4 times 4) matrix to a 16dimensional vector, and take this vector as the preliminary characteristic of the timefrequency image. The preliminary features of the training set and the test set are normalized, respectively, and then the features of the GLCMbased timefrequency image can be obtained.
The features of a set of samples with different states using the GLCMbased timefrequency image feature extraction method are given in Fig. 2.
KELM based on MBASA
KELM
Detailed description of ELM can be shown in related literatures^{15.16}ELM can be expressed as follows:
$$f(x) = g(x)omega = g(x)G^{T} left( {frac{I}{C} + GG^{T} } right)^{ – 1} T$$
(2)
where (omega = G^{T} left( {frac{I}{C} + GG^{T} } right)^{ – 1} T) denotes the weight connecting the hidden layer to the output layer, and VS denotes the regularization parameter.
KELM uses kernel function to override feature mapping (g(x)) of ELM, which makes KELM has better convergence and generalization performance than ELM^{17}. KELM can be expressed as follows:
$$f(x) = left[ {begin{array}{*{20}c} {K(x,x_{1} )} vdots {K(x,x_{N} )} end{array} } right]left( {frac{I}{C} + GG^{T} } right)^{ – 1} T$$
$$= left[ {begin{array}{*{20}c} {K(x,x_{1} )} vdots {K(x,x_{N} )} end{array} } right]left( {frac{I}{C} + Delta_{KELM} } right)^{ – 1} T$$
(3)
where (Delta_{KELM}) denotes the matrix of the nucleus,
$$Delta_{{KELM_{i,j} }} = K(x_{i} ,x_{j} ) = g(x_{i} ) cdot g(x_{j} )$$
(4)
The Cauchy kernel is an excellent alternative kernel function, which is employed in this study and can be expressed in the following form:
$$K_{Cauchy} ({mathbf{x}}_{i} ,{mathbf{x}}_{j} ) = frac{1}{{1 + frac{{left {{mathbf{x}}_{i} – {mathbf{x}}_{j} } right^{2} }}{eta }}}$$
(5)
where (eta) is the Cauchy kernel parameter.
The regularization parameter and the kernel parameter of the KELM model can affect the performance of KELM, which should be carefully selected.
Optimization of KELM parameters based on MBASA
BASA mimics the activities of beetle antennae in nature^{18}. However, the traditional beetle antenna search algorithm employs only one beetle, a beetle antenna search is difficult to find the optimal parameters when the ranges of the parameters to be optimized are wide. Thus, MBASA which employs several beetles is presented in this article. Searching for multibeetle antennae increases the possibility of obtaining the optimal parameters.
The MBASAbased KELM parameter selection process can be described as follows:
 Step 1: :

Set beetle positions in vector ({mathbf{x}}^{t}) to youth moment ( (t = 1,2, cdots)). Initialize MBASA settings, including beetle position ({mathbf{x}}^{0})antennae length (d^{0}) and step size (delta^{0}).
 2nd step: :

Assess the fitness of each beetle A fivefold crossvalidation method is used to assess the fitness of the beetles. In the quintuple crossvalidation method, the training samples are also divided into 5 sample subsets, among which 4 sample subsets are used to train the KELM model, and the remaining subset is used to test the KELM model. Each subset can be used as a test subset in turn. Then, the total accuracy of the diagnosis (A_{i}^{{}}) 5 subsets of samples can be obtained as follows:
$$A_{i} = frac{{N_{correct,i} }}{{N_{total} }}$$
(6)
The physical form of Ihe beetle is defined as follows:
$$f({mathbf{x}}_{i} ) = 1 – A_{i}$$
(seven)
 Step 3: :

Get the right and left side search behaviors.
In order to model the search behavior, a random search direction of beetles can be described as follows,
$${vec{mathbf{b}}} = frac{{{mathbf{rv}}left( {k,1} right)}}{{eps + left {{mathbf{rv}}left( {k,1} right)} right}}$$
(8)
where ({mathbf{rv}}left( {m,1} right)) is a mdimensional vector with random values between 1 and 1, m is the dimensions of the position, here, m is set to 2, and (eps = 2^{{{ – }52}}).
The activities of beetle antennae are mimicked by rightsided and leftsided searching behaviors, which are expressed as:
$$left{ begin{gathered} {mathbf{x}}_{i,r} = {mathbf{x}}_{i}^{t} + d^{t} {vec{ mathbf{b}}} hfill {mathbf{x}}_{i,l} = {mathbf{x}}_{i}^{t} – d^{t} {vec{ mathbf{b}}} hfill end{gathered} right.$$
(9)
where ({mathbf{x}}_{i,r}) is the position that is in the search area of the Ithe right side of the beetle, ({mathbf{x}}_{i,l}) is the position that is in the search area of the Ithe left side of the beetle, and (d^{t}) is the detection length of the antennas corresponding to the operating capability at youth moment.
 Step 4: :

Update beetle positions.
The iterative model is generated as Eq. (9) to be associated with odor detection by considering seeking behavior,
$${mathbf{x}}_{i}^{t} = {mathbf{x}}_{i}^{t – 1} + delta^{t} {vec{mathbf{b }}}signleft( {fleft( {{mathbf{x}}_{i,r} } right) – fleft( {{mathbf{x}}_{i,l} } right)} right)$$
(ten)
where (delta^{t}) is the search step, and (signleft( cdot right)) is a sign function.
 Step 5::

Compare the physical condition of ({mathbf{x}}_{i}^{t}) with the fitness of the current best position of the Ie beetle, if (fleft( {{mathbf{x}}_{i}^{t} } right) then (f_{best} = fleft( {{mathbf{x}}_{i}^{t} } right)), ({mathbf{x}}_{i,best} = {mathbf{x}}_{i}^{t})where ({mathbf{x}}_{i,best}) is the current best position of the Ie beetle, and (f_{i,best}) is the fitness of the current best position of the Ie beetle.
 Step 6::

Update the length of the antennas (D) and step size (delta) as following,
$$d^{t} = 0.95d^{{t{ – }1}} + r0$$
(11)
$$delta^{t} = 0.95delta^{t – 1}$$
(12)
where (r0) is the constant.
 Step 7::

Repeat steps 2 through 6 until the stop condition is met.
 Step 8::

Obtain the best position of the best fitness among all the beetles, which are the optimal parameters of KELM.
Bearing failure diagnosis process based on GLCM and KELM based on MBASA
The bearing fault diagnosis process based on GLCM and KELM based on MBASA is shown in Fig. GLCM. Then, based on the training samples, create the fitness function and optimize the regularization parameter and the kernel parameter of KELM using MBASA. Moreover, establish the MBASAKELM model by the optimized regularization parameter and the kernel parameter of KELM. Finally, test the proposed diagnostic model.