Comparison of WGS-based HLA typing bioinformatic tools

Abstract

Introduction. The widespread distribution of sequencing platforms in research and medical institutions, reduction of sequencing costs and its incorporating to clinical practice make it reasonable to estimate additional parameters on the patient’s sequencing data. Thus, whole-exome and whole-genome sequencing data provide information not only about single-nucleotide polymorphisms, small deletions and insertions, some types of structural variants, but also about the HLA genotype. A mass typing of HLA alleles using advanced bioinformatics tools can be performed based on NGS data. Using the most appropriate tools, the results of HLA typing based on NGS data can contribute to an accurate description of HLA allele frequencies in populations, updating the Allele Frequency Net Database, studying the distribution patterns of HLA alleles within different ethnic groups, and searching for associations with autoimmune diseases.

Aim – a search of the optimal whole-genome-based HLA typing tool for adding it into the bioinformatics data processing pipeline.

Material and methods. Whole genome sequencing with further bioinformatic processing was performed for 150 frozen blood samples. HLA typing of WGS data was performed using tools: xHLA, POLYSOLVER, HLA-LA, HLAscan, OptiType, and Kourami. Libraries for target HLA region sequencing for the same 150 samples were prepared using the NGSgo-MX6-1 primer pool (GenDX, The Netherlands) and the NGSgo-LibrX kit (GenDX, The Netherlands). HLA allele typing on target sequencing data was performed using the NGSengine program.

Results. This study examined the HLA typing accuracy of bioinformatics tools xHLA, OptiType, HLAscan, POLYSOLVER, HLA-LA and Kourami on ≥ 30x whole-genome sequencing data from human samples. HLA typing results were obtained with the NGSgo-MX6-1 kit (GenDX, Netherlands), which were taken as reference results. The POLYSOLVER tool showed the highest accuracy for HLA class I typing; xHLA tool with IMGT/HLA database version 3.22.0 – for HLA class II, POLYSOLVER and OptiType tools require significant time and computing resources, therefore, the bioinformatic tools Kourami and HLAscan are more suitable for wide range HLA typing. All of the bioinformatics tools make more errors for typing HLA class II loci than for typing HLA class I loci, despite that the diversity of HLA class II alleles is significantly lower than of class I. The highest number of incorrectly defined alleles was observed for DQB1 typing.

Conclusion. The results and conclusions obtained in this study provide the basis for a methodological approach to selecting the optimal HLA typing tool for use in bioinformatic pipelines for processing whole genome and/or whole exome sequencing data.

Keywords:bioinformatics tools; whole genome sequencing; genotyping; precision of HLA typing

For citation: Kazakova P.G., Mitrofanov S.I., Akhmerova Yu.N., Varlamova O.V., Zemsky P.U., Mkrtchian A.A., Sergeev A.P., Snigir E.A., Feliz N.V., Frolova L.V., Shpakova T.A., Yudin V.S., Keskinov A.A., Yudin S.M., Skvortsova V.I. Comparison of WGS-based HLA typing bioinformatic tools. Immunologiya. 2023; 44 (2): 219–30. DOI: https://doi.org/10.33029/0206-4952-2023-44-2-219-230 (in Russian)

Funding. The study had no sponsor support.

Conflict of interests. Authors declare no conflict of interests.

Authors’ contribution. The concept and design of the study – Kazakova P.G., Yudin V.S., Keskinov A.A., Yudin S.M., Skvortsova V.I.; DNA extraction, preparation of libraries, whole genome and targeted sequencing – Snigir E.A., Varlamova O.V.; monitoring and eliminating errors in the sequencing process – Sergeev A.P.; bioinformatic processing – Kazakova P.G., Mitrofanov S.I.; data analysis and visualization – Kazakova P.G.; text production – Kazakova P.G., Mitrofanov S.I.; text editing and structuring – Akhmerova Yu.N., Zemsky P.U., Mkrtchian A.A., Feliz N.V., Frolova L.V., Shpakova T.A.

References

1. Paltsev M.A., Khaitov R.M., Alekseev L.P., Boldyreva M.N. HLA and clinical transplantology. Molecular medicine. 2009; 2: 3–13. (in Russian)

2. Aptsiauri N., Cabrera T., Mendez R., Garcia-Lora A., RuizCabello F., Garrido F. Role of altered expression of HLA class I molecules in cancer progression. Adv. Exp. Med. Biol. 2007; 601: 123–31. DOI: https://doi.org/10.1007/978-0-387-72005-0_13

3. Robinson J., Barker D.J., Georgiou X., Cooper M.A., Flicek P., Marsh S.G. IPD-IMGT/HLA Database. Nucleic Acids Res. 2020; 48 (D1): 948–55. DOI: https://doi.org/10.1093/nar/gkz950

4. Dendrou C.A., Petersen J., Rossjohn J., Fugger L. HLA variation and disease. Nat. Rev. Immunol. 2018; 18 (5): 325–39. DOI: https://doi.org/10.1038/nri.2017.143

5. Moutsianas L., Jostins L., Beecham A.H., Dilthey A.T., Xifara D.K., Ban M., Shah T.S., Patsopoulos N.A., Alfredsson L., Anderson C.A., Attfield K.E., Baranzini S.E., Barrett J., Binder T., Booth D., Buck D., Celius E.G., Cotsapas C., D’Alfonso S., Dendrou C.A., Donnelly P., Dubois B., Fontaine B., Fugger L., Goris A., Gourraud P.A., Graetz C., Hemmer B., Hillert J.; International IBD Genetics Consortium (IIBDGC), Kockum I., Leslie S., Lill C.M., Martinelli-Boneschi F., Oksenberg J.R., Olsson T., Oturai A., Saarela J., Søndergaard H.B., Spurkland A., Taylor B., Winkelmann J., Zipp F., Haines J.L., Pericak-Vance M.A., Spencer C., Stewart G., Hafler D.A., Ivinson A.J., Harbo H.F., Hauser S.L., De Jager P.L., Compston A., McCauley J.L., Sawcer S., McVean G. Class II HLA interactions modulate genetic risk for multiple sclerosis. Nat. Genet. 2015; 47 (10): 1107–13. DOI: https://doi.org/10.1038/ng.3395

6. Weinstock C., Matheis N., Barkia S., Haager M.C., Janson A., Marković A., Bux J., Kahaly G.J. Autoimmune polyglandular syndrome type 2 shows the same HLA class II pattern as type 1 diabetes. Tissue Antigens. 2011; 77 (4): 317–24. DOI: https://doi.org/10.1111/j.1399-0039.2011.01634.x

7. Dilthey A.T., Gourraud P.A., Mentzer A.J., Cereb N., Iqbal Z., McVean G. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs. PLoS Comput. Biol. 2016; 12 (10). DOI: https://doi.org/10.1371/journal.pcbi.1005151

8.bcl2fastq and bcl2fastq2 Conversion Software. URL: https://emea.support.illumina.com/sequencing/sequencing_software/bcl2fastq-conversion-software/downloads.html (date of access 15.09.2022)

9.Sequencing Analysis Viewer Support. URL: https://support.illumina.com/sequencing/sequencing_software/sequencing_analysis_viewer_sav.html (date of access 15.09.2022)

10.Babraham Bioinformatics – FastQC A Quality Control tool for High Throughput Sequence Data. URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc (date of access 15.09.2022)

11.Illumina DRAGEN Bio-IT Platform Variant calling & secondary genomic analysis software tool. URL: https://www.illumina.com/products/by-type/informatics-products/dragen-bio-it-platform.html (date of access 15.09.2022)

12.Schneider V.A., Graves-Lindsay T., Howe K., Bouk N., Chen H.C., Kitts P.A., Murphy T.D., Pruitt K.D., Thibaud-Nissen F., Albracht D., Fulton R.S., Kremitzki M., Magrini V., Markovic C., McGrath S., Steinberg K.M., Auger K., Chow W., Collins J., Harden G., Hubbard T., Pelan S., Simpson J.T., Threadgold G., Torrance J., Wood J.M., Clarke L., Koren S., Boitano M., Peluso P., Li H., Chin C.S., Phillippy A.M., Durbin R., Wilson R.K., Flicek P., Eichler E.E., Church D.M. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 2017; 27 (5): 849–64. DOI: https://doi.org/10.1101/gr.213611.116

13.Xie C., Zhen X.Y., Wong M., Piper J., Long T., Kirkness E.F., Biggs W.H., Bloom K., Spellman S., Vierra-Green C., Brady C., Scheuermann R.H., Telenti A., Howard S., Brewerton S., Turpaz Y., Venter J.C. Fast and accurate HLA typing from short-read next-generation sequence data with xHLA. Proc. Natl. Acad. Sci. USA. 2017; 114 (30): 8059–64. DOI: https://doi.org/10.1073/pnas.1707945114

14.Shukla S.A., Rooney M.S., Rajasagi M., Tiao G., Dixon P.M., Lawrence M.S., Stevens J., Lane W.J., Dellagatta J.L., Steelman S., Sougnez C., Cibulskis K., Kiezun A., Hacohen N., Brusic V., Wu C.J., Getz G. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat. Biotechnol. 2015; 33 (11): 1152–8. DOI: https://doi.org/10.1038/nbt.3344

15.Szolek A., Schubert B., Mohr C., Sturm M., Feldhahn M., Kohlbacher O. OptiType: precision HLA typing from next-generation sequencing data. J. Bioinform. 2014; 30 (23): 3310–6. DOI: https://doi.org/10.1093/bioinformatics/btu548

16.Ka S., Lee S., Hong J., Cho Y., Sung J., Kim H.N., Kim H.L., Jung J. HLAscan: genotyping of the HLA region using next-generation sequencing data. BMC Bioinform. 2017; 18 (1): 258. DOI: https://doi.org/10.1186/s12859-017-1671-3

17.Lee H., Kingsford C. Kourami: graph-guided assembly for novel human leukocyte antigen allele discovery. Genome Biol. 2018; 19 (1). DOI: https://doi.org/10.1186/s13059-018-1388-2

18.Dilthey A.T., Mentzer A.J., Carapito R., Cutland C., Cereb N., Madhi S.A. HLA*LA-HLA typing from linearly projected graph alignments. Bioinform. 2019; 35 (21): 4394–6. DOI: https://doi.org/10.1093/bioinformatics/btz235

19.MiSeq Reporter Software (MSR). URL: https://www.illumina.com/systems/sequencing-platforms/miseq/products-services/miseq-reporter.html (date of access 15.09.2022)

20.NGSengine GenDx. URL: https://www.gendx.com/product_line/ngsengine/ (date of access 15.09.2022)

21.Pandas-dev/pandas: Pandas 1.4.2. URL: https://zenodo.org/record/6408044 (date of access 15.09.2022)

22.Matplotlib/matplotlib: REL: v3.5.2. URL: https://zenodo.org/record/6513224 (date of access 15.09.2022)

23.Waskom M.L. Seaborn: statistical data visualization. J. Open Source Softw. 2021; 6 (60): 3021. DOI: https://doi.org/10.21105/joss.03021

All articles in our journal are distributed under the Creative Commons Attribution 4.0 International License (CC BY 4.0 license)


JOURNALS of «GEOTAR-Media»