Simulation and Analysis of Bionanopore Dna Sequencing Signals for Genetic Mutations Detection

Main Article Content

Iryna M. Ievdoshchenko
https://orcid.org/0000-0003-0049-2159
PhD Assoc.Prof. Kateryna Olehivna Ivanko
https://orcid.org/0000-0002-3842-2423
PhD Assoc.Prof. Nataliia Heorhiivna Ivanushkina
https://orcid.org/0000-0001-8389-7906
Vishwesh Kulkarni
https://orcid.org/0000-0002-2285-8652

Abstract

The application of genomic signal processing methods to the problem of modeling and analysis of nanoporous DNA sequencing signals is considered in the paper. Based on the nucleotide sequences in the norm and in the case of mutations, 1200 signals are simulated, which represent 4 classes: norm, missense mutation, insertion mutation and deletion mutation. Correlation analysis was used to determine the similarity of nanoporous DNA sequencing signals using a cross-correlation function between two current signals in the protein nanopore, specifically signal in norm and in the presence of mutation. The location of the correlation peak determines the type of mutation (insertion or deletion), as well as the alignment of the same nucleotide sequences using a defined signal shift.


The results of applying machine learning methods to the problem of classification of nanoporous DNA sequencing signals significantly depend on the noise level of the registered current signals through the protein nanopore and the type of mutation. Given a relatively low noise level, when the values of the ion current through a protein nanopore for different nucleotides do not intersect, the classification accuracy reaches 100%. In the case of increasing the standard deviation of the law of distribution of noise components, there is an overlap of the levels of current values in the nanopore in the case of its blocking by nucleotides of the close size. As a result, errors in the definition of normal and single nucleotide mutations (missense or nonsense) often occur, especially if the levels of current steps in the nanopore for two nucleotides are similar (for example, guanine and thymine, thymine and adenine, adenine and cytosine) and noise masks their contribution to reduction current in the nanopore. Mutations of insertion and deletion of a certain nucleotide sequence are often classified without errors, because these mutations are characterized by a shift of several nucleotides between normal signals and pathology, which increases the distance between these signals. Among the machine learning methods that have demonstrated the high accuracy of classification of the signals of nanopore-based DNA sequencing, the methods of linear discriminant, k-nearest neighbors classifier (with Euclidean distance and the sufficient number of nearest neighbors), as well as the method of reference vectors should be mentioned. The best results were obtained for the classification method of support vector machines. The use of linear, quadratic and cubic kernel functions shows the high accuracy of correctly classified signals - from 93 to 100%.

Article Details

How to Cite
[1]
I. M. Ievdoshchenko, K. O. Ivanko, N. H. Ivanushkina, and V. Kulkarni, “Simulation and Analysis of Bionanopore Dna Sequencing Signals for Genetic Mutations Detection”, Мікросист., Електрон. та Акуст., vol. 26, no. 1, pp. 217265–1 , Apr. 2021.
Section
Electronic Systems and Signals
Author Biographies

Iryna M. Ievdoshchenko, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

Студентка магістратури

PhD Assoc.Prof. Kateryna Olehivna Ivanko, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

Кафедра електронної інженерії, доцент

PhD Assoc.Prof. Nataliia Heorhiivna Ivanushkina, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

Department of Electronic Engineering, Associate Professor

Vishwesh Kulkarni, School of Engineering, University of Warwick

Engineering school, associate professor

 

References

Hengyun Lu, Francesca Giordano, Zemin Ning. Oxford Nanopore MinION Sequencing and Genome Assembly, Genomics, Proteomics & Bioinformatics, Vol. 14, Issue 5, 2016, pp. 265-279, DOI: https://doi.org/10.1016/j.gpb.2016.05.004.

D. R. Garalde, C. R. O'Donnell, R. D. Maitra, D. M. Wiberg, G. Wang and W. B. Dunbar, "Modeling the Biological Nanopore Instrument for Biomolecular State Estimation," in IEEE Transactions on Control Systems Technology, vol. 21, no. 6, pp. 2038-2051, 2013, DOI: https://doi.org/10.1109/TCST.2012.2224349.

J. Kim, R. Maitra, K. D. Pedrotti and W. B. Dunbar, "A Patch-Clamp ASIC for Nanopore-Based DNA Analysis," in IEEE Transactions on Biomedical Circuits and Systems, vol. 7, no. 3, pp. 285-295, 2013, DOI: https://doi.org/10.1109/TBCAS.2012.2200893.

Nanoporovoe sekvenuvannya: na porozi tretʹoyi henomnoyi revolyutsiyi [Nanoporous sequencing: on the threshold of the third genomic revolution]. URL: https://biomolecula.ru/articles/nanoporovoe-sekvenirovanie-na-poroge-tretei-genomnoi-revoliutsii

Anastassiou, Dimitris. (2001). Genomic Signal Processing. Signal Processing Magazine, IEEE. 18. 8-20. DOI: https://doi.org/10.1109/79.939833.

Mendizabal-Ruiz G, Román-Godínez I, Torres-Ramos S, Salido-Ruiz RA, Vélez-Pérez H, Morales JA. Genomic signal processing for DNA sequence clustering. PeerJ. 2018 Jan 24;6:e4264. DOI: https://doi.org/10.7717/peerj.4264. PMID: 29379686; PMCID: PMC5786891.

P. Dixit and G. I. Prajapati, "Machine Learning in Bioinformatics: A Novel Approach for DNA Sequencing," 2015 Fifth International Conference on Advanced Computing & Communication Technologies, Haryana, 2015, pp. 41-47, DOI: https://doi.org/10.1109/ACCT.2015.73.

J. Chen and S. T. c. Wang, "Nanotechnology for genomic signal processing in cancer research - A focus on the genomic signal processing hardware design of the nanotools for cancer ressearch," in IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 111-121, Jan. 2007, DOI: https://doi.org/10.1109/MSP.2007.273064

P. Qiu, Z. J. Wang and K. j. R. Liu, "Genomic processing for cancer classification and prediction - Abroad review of the recent advances in model-based genomoric and proteomic signal processing for cancer detection," in IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007, DOI: https://doi.org/10.1109/MSP.2007.273063.

Ravichandran Lakshminarayan et al.. (2011). Waveform Mapping and Time-Frequency Processing of DNA and Protein Sequences. Signal Processing, IEEE Transactions on. 59. 4210 - 4224. DOI: https://doi.org/10.1109/TSP.2011.2157915.

S. Deng, Z. Chen, G. Ding and Y. Li, "Prediction of protein coding regions by combining Fourier and Wavelet Transform," 2010 3rd International Congress on Image and Signal Processing, Yantai, 2010, pp. 4113-4117, DOI: https://doi.org/10.1109/CISP.2010.5648065.

T. Meng et al., "Wavelet Analysis in Current Cancer Genome Research: A Survey," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 6, pp. 1442-14359, 2013, DOI: https://doi.org/10.1109/TCBB.2013.134.

David Stoddart et al. Single-nucleotide discrimination in immobilized DNA oligonucleotides with a biological nanopore//Proceedings of the National Academy of Sciences, 2009, 106 (19), pp.7702-7707. DOII: https://doi.org/10.1073/pnas.0901054106

Kim, Bong-Hyun & Yu, Kijin & Lee, Peter. (2019). Cancer classification of single cell gene expression data by neural network. Bioinformatics (Oxford, England). 36. DOI: https://doi.org/10.1093/bioinformatics/btz772.

Rockwood AL, Crockett DK, Oliphant JR, Elenitoba-Johnson KS. Sequence alignment by cross-correlation. J Biomol Tech. 2005; 16(4):453-458. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2291754 PMID: 16522868

Single-channel recording /edited by Bert Sakmann and Erwin Neher. - Springer. - 705 p. DOI: https://doi.org/10.1007/978-1-4419-1229-9

Bindal, N., Forbes, S.A., Beare, D. et al. COSMIC: the catalogue of somatic mutations in cancer. Genome Biol 12, P3 (2011). DOI: https://doi.org/10.1186/1465-6906-12-S1-P3

The Cancer Genome Atlas Program. URL: https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga

Genomic Data Commons Data Portal. URL: https://portal.gdc.cancer.gov/

Ï. B. AYDÏLEK, "Examining Effects of the Support Vector Machines Kernel Types on Biomedical Data Classification," 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 2018, pp. 1-4, DOI: https://doi.org/10.1109/IDAP.2018.8620879.

A. David and B. Lerner, "Pattern classification using a support vector machine for genetic disease diagnosis," 2004 23rd IEEE Convention of Electrical and Electronics Engineers in Israel, Tel-Aviv, Israel, 2004, pp. 289-292, DOI: https://doi.org/10.1109/EEEI.2004.1361148.

Alessio Fragasso, Sonja Schmid, and Cees Dekker, "Comparing Current Noise in Biological and Solid-State Nanopores," ACS Nano 2020, 14 (2), 1338-1349, DOI: https://doi.org/10.1021/acsnano.9b09353

Shengfa Liang, Feibin Xiang, Zifan Tang, Reza Nouri, Xiaodong He, Ming Dong, Weihua Guan, "Noise in nanopore sensors: Sources, models, reduction, and benchmarking," Nanotechnology and Precision Engineering, Volume 3, Issue 1, 2020, Pages 9-17, DOI: https://doi.org/10.1016/j.npe.2019.12.008.

Wen, C., Zeng, S., Zhang, Z., Hjort, K., Scheicher, R. et al. On nanopore DNA sequencing by signal and noise analysis of ionic current. Nanotechnology, 27: 215502, 2016. DOI: https://doi.org/10.1088/0957-4484/27/21/215502