Simulation and Analysis of Bionanopore Dna Sequencing Signals for Genetic Mutations Detection
Main Article Content
Abstract
The application of genomic signal processing methods to the problem of modeling and analysis of nanoporous DNA sequencing signals is considered in the paper. Based on the nucleotide sequences in the norm and in the case of mutations, 1200 signals are simulated, which represent 4 classes: norm, missense mutation, insertion mutation and deletion mutation. Correlation analysis was used to determine the similarity of nanoporous DNA sequencing signals using a cross-correlation function between two current signals in the protein nanopore, specifically signal in norm and in the presence of mutation. The location of the correlation peak determines the type of mutation (insertion or deletion), as well as the alignment of the same nucleotide sequences using a defined signal shift.
The results of applying machine learning methods to the problem of classification of nanoporous DNA sequencing signals significantly depend on the noise level of the registered current signals through the protein nanopore and the type of mutation. Given a relatively low noise level, when the values of the ion current through a protein nanopore for different nucleotides do not intersect, the classification accuracy reaches 100%. In the case of increasing the standard deviation of the law of distribution of noise components, there is an overlap of the levels of current values in the nanopore in the case of its blocking by nucleotides of the close size. As a result, errors in the definition of normal and single nucleotide mutations (missense or nonsense) often occur, especially if the levels of current steps in the nanopore for two nucleotides are similar (for example, guanine and thymine, thymine and adenine, adenine and cytosine) and noise masks their contribution to reduction current in the nanopore. Mutations of insertion and deletion of a certain nucleotide sequence are often classified without errors, because these mutations are characterized by a shift of several nucleotides between normal signals and pathology, which increases the distance between these signals. Among the machine learning methods that have demonstrated the high accuracy of classification of the signals of nanopore-based DNA sequencing, the methods of linear discriminant, k-nearest neighbors classifier (with Euclidean distance and the sufficient number of nearest neighbors), as well as the method of reference vectors should be mentioned. The best results were obtained for the classification method of support vector machines. The use of linear, quadratic and cubic kernel functions shows the high accuracy of correctly classified signals - from 93 to 100%.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
References
Hengyun Lu, Francesca Giordano, Zemin Ning. Oxford Nanopore MinION Sequencing and Genome Assembly, Genomics, Proteomics & Bioinformatics, Vol. 14, Issue 5, 2016, pp. 265-279, DOI: https://doi.org/10.1016/j.gpb.2016.05.004.
D. R. Garalde, C. R. O'Donnell, R. D. Maitra, D. M. Wiberg, G. Wang and W. B. Dunbar, "Modeling the Biological Nanopore Instrument for Biomolecular State Estimation," in IEEE Transactions on Control Systems Technology, vol. 21, no. 6, pp. 2038-2051, 2013, DOI: https://doi.org/10.1109/TCST.2012.2224349.
J. Kim, R. Maitra, K. D. Pedrotti and W. B. Dunbar, "A Patch-Clamp ASIC for Nanopore-Based DNA Analysis," in IEEE Transactions on Biomedical Circuits and Systems, vol. 7, no. 3, pp. 285-295, 2013, DOI: https://doi.org/10.1109/TBCAS.2012.2200893.
Nanoporovoe sekvenuvannya: na porozi tretʹoyi henomnoyi revolyutsiyi [Nanoporous sequencing: on the threshold of the third genomic revolution]. URL: https://biomolecula.ru/articles/nanoporovoe-sekvenirovanie-na-poroge-tretei-genomnoi-revoliutsii
Anastassiou, Dimitris. (2001). Genomic Signal Processing. Signal Processing Magazine, IEEE. 18. 8-20. DOI: https://doi.org/10.1109/79.939833.
Mendizabal-Ruiz G, Román-Godínez I, Torres-Ramos S, Salido-Ruiz RA, Vélez-Pérez H, Morales JA. Genomic signal processing for DNA sequence clustering. PeerJ. 2018 Jan 24;6:e4264. DOI: https://doi.org/10.7717/peerj.4264. PMID: 29379686; PMCID: PMC5786891.
P. Dixit and G. I. Prajapati, "Machine Learning in Bioinformatics: A Novel Approach for DNA Sequencing," 2015 Fifth International Conference on Advanced Computing & Communication Technologies, Haryana, 2015, pp. 41-47, DOI: https://doi.org/10.1109/ACCT.2015.73.
J. Chen and S. T. c. Wang, "Nanotechnology for genomic signal processing in cancer research - A focus on the genomic signal processing hardware design of the nanotools for cancer ressearch," in IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 111-121, Jan. 2007, DOI: https://doi.org/10.1109/MSP.2007.273064
P. Qiu, Z. J. Wang and K. j. R. Liu, "Genomic processing for cancer classification and prediction - Abroad review of the recent advances in model-based genomoric and proteomic signal processing for cancer detection," in IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007, DOI: https://doi.org/10.1109/MSP.2007.273063.
Ravichandran Lakshminarayan et al.. (2011). Waveform Mapping and Time-Frequency Processing of DNA and Protein Sequences. Signal Processing, IEEE Transactions on. 59. 4210 - 4224. DOI: https://doi.org/10.1109/TSP.2011.2157915.
S. Deng, Z. Chen, G. Ding and Y. Li, "Prediction of protein coding regions by combining Fourier and Wavelet Transform," 2010 3rd International Congress on Image and Signal Processing, Yantai, 2010, pp. 4113-4117, DOI: https://doi.org/10.1109/CISP.2010.5648065.
T. Meng et al., "Wavelet Analysis in Current Cancer Genome Research: A Survey," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 6, pp. 1442-14359, 2013, DOI: https://doi.org/10.1109/TCBB.2013.134.
David Stoddart et al. Single-nucleotide discrimination in immobilized DNA oligonucleotides with a biological nanopore//Proceedings of the National Academy of Sciences, 2009, 106 (19), pp.7702-7707. DOII: https://doi.org/10.1073/pnas.0901054106
Kim, Bong-Hyun & Yu, Kijin & Lee, Peter. (2019). Cancer classification of single cell gene expression data by neural network. Bioinformatics (Oxford, England). 36. DOI: https://doi.org/10.1093/bioinformatics/btz772.
Rockwood AL, Crockett DK, Oliphant JR, Elenitoba-Johnson KS. Sequence alignment by cross-correlation. J Biomol Tech. 2005; 16(4):453-458. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2291754 PMID: 16522868
Single-channel recording /edited by Bert Sakmann and Erwin Neher. - Springer. - 705 p. DOI: https://doi.org/10.1007/978-1-4419-1229-9
Bindal, N., Forbes, S.A., Beare, D. et al. COSMIC: the catalogue of somatic mutations in cancer. Genome Biol 12, P3 (2011). DOI: https://doi.org/10.1186/1465-6906-12-S1-P3
The Cancer Genome Atlas Program. URL: https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga
Genomic Data Commons Data Portal. URL: https://portal.gdc.cancer.gov/
Ï. B. AYDÏLEK, "Examining Effects of the Support Vector Machines Kernel Types on Biomedical Data Classification," 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 2018, pp. 1-4, DOI: https://doi.org/10.1109/IDAP.2018.8620879.
A. David and B. Lerner, "Pattern classification using a support vector machine for genetic disease diagnosis," 2004 23rd IEEE Convention of Electrical and Electronics Engineers in Israel, Tel-Aviv, Israel, 2004, pp. 289-292, DOI: https://doi.org/10.1109/EEEI.2004.1361148.
Alessio Fragasso, Sonja Schmid, and Cees Dekker, "Comparing Current Noise in Biological and Solid-State Nanopores," ACS Nano 2020, 14 (2), 1338-1349, DOI: https://doi.org/10.1021/acsnano.9b09353
Shengfa Liang, Feibin Xiang, Zifan Tang, Reza Nouri, Xiaodong He, Ming Dong, Weihua Guan, "Noise in nanopore sensors: Sources, models, reduction, and benchmarking," Nanotechnology and Precision Engineering, Volume 3, Issue 1, 2020, Pages 9-17, DOI: https://doi.org/10.1016/j.npe.2019.12.008.
Wen, C., Zeng, S., Zhang, Z., Hjort, K., Scheicher, R. et al. On nanopore DNA sequencing by signal and noise analysis of ionic current. Nanotechnology, 27: 215502, 2016. DOI: https://doi.org/10.1088/0957-4484/27/21/215502