2014 Биомедицинские приборы и Ukraine. Optimal Bin Number Selection for Mutual Information Calculation Between EEG and Cardiorhythmogram

Signals In the present work the problem of optimal bin number selection for equidistant Mutual Information (MI) estimator between electroencephalogram (EEG) and cardiorhythmogram (CRG) is ad-dressed. In the previously developed method the bin number selected based on the finding an optimal bin number on the MI values on the range of bin numbers. With application to the real raw EEG and CRG signals it was found that for closely placed or symmetrical channels of EEG data the method can be applied, and the true value of MI value can be found with proposed method. In application to MI calculation between raw EEG and CRG signals that are not significantly coupled, true MI value cannot be estimated with proposed method for small sample size. Reference 12, fig-ures estimation. Abstract Mutual information (MI) is a measure of the amount of information that one random variable contains about another random variable. It is the reduction in the uncertainty of one random variable due to the knowledge of the other [1]. In order to calculate the MI value between some


Abstract
Mutual information (MI) is a measure of the amount of information that one random variable contains about another random variable. It is the reduction in the uncertainty of one random variable due to the knowledge of the other [1]. In order to calculate the MI value between some datasets the knowledge of probability distributions is needed. Most frequently used estimators of probability density function are based on histograms (with fixed or adaptive bin size [6] [5]), k-nearest neighbors and kernels [2][8] [11] [10].
Histograms are used extensively as nonparametric density estimators both to visualize data and to obtain various parameters and characteristics such as entropy, of the underlying density. To estimate the probability distribution the procedure of partitioning the variable values into discrete bins is used to build histograms. Since historgram-based estimator is used in MI calculation for finite discrete data series, one can anticipate the dependence of calculated MI value on the binning. The general idea is to choose a number of bins sufficiently large to capture the major features in the data while ignoring fine details due to random sampling fluctuations' [4]. But the problem may arise since the accuracy and precision of the probability distribution approximation heavily influences the resulting MI value and may give spurious results.
In the paper [12] the MI between two jointly correlated gaussian datasets was estimated for a wide range of sample size/bin number pairs. In this paper the goal of applying the proposed method on real EEG and CRG signals is pursued. It should be determined how the MI value depends on bin size for real supposedly uncorrelated or somehow coupled signals.

Mutual information
Mutual information is given by the formula: where ( )

Method for bin size selection
There are various different methods as well as rules of thumb [9][3] [7] that attempt to give the optimal number of bins but usually the shape of the underlying distribution of the data should be known or there are some other limitations that restrict the use of the method.
MI value heavily depends on both the bin size and sample size [12], which could be arbitrary in case of real life applications. Thus the natural question arises about the optimal choice of one parameter given the value of another.
To find the optimal bin number for getting the most correct value of MI, we propose to use the behavior of MI dependence. The main idea of the proposed technique to select the optimal bin number for MI calculation is to choose the number of bins from the range of bin numbers, for which MI does not change much. It is suggested that selection of bin number in the range where change in bin number would not cause the significant change in MI. The algorithm of finding the optimal bin number is − Calculate difference quotient ( ) − Assume that the lower boundary of bin number is represented by the found value of bin .
MI values were calculated using proposed method for the range of bins from 2 to 80 and for the range of sample sizes. The behavior of MI values is illustrated for representational signals in

Discussion
As can be seen from the Fig. 4 for supposedly uncoupled signals (raw CRG and Fz channel of EEG) the dependence of MI value on selected bin number differs from the coupled signals (raw signals of EEG channels Fz-Fp4, Fp1-Fp2). For the coupled signals (Fp1-Fp2 due to the symmetry of brain activity and Fz-Fp4 due to the closely situated electrodes) the true value of MI can be found for small sample size via proposed method as was shown earlier [12] for correlated Gaussian signals.

Conclusions
In the paper the method of bin number selection for mutual information calculation was tested on real EEG and CRG signals. It was shown that for coupled EEG signals the proposed method can be used for small sample size. For the case of non-correlated and supposedly uncoupled signals the dependence of MI value on bin number for small sample sizes gives no possibility for using the proposed method.