Graphical Data Representation and Analytics to Link the Potential Interaction for Lung Cancer Genes
Graph data representation is an efficient technique for highlighting the relationship among huge and highly linked biological data. With the advent of next-generation sequencing, a large volume of data is being generated. Currently, many graph tools exist to extract semantically associated data. Neo4j, Titan, and OrientDB are examples of well-known graphical representation tools. In this paper, a perfect graphical data storage and interaction retrieval model for lung cancer genes are presented which are collected from different types of databases such as Uniprot, COSMIC, and NCBI. The model contains interactions between genes, proteins, protein domains, their expression in various tissues, involvement in other diseases and their corresponding role, disorder type, mutation location, disease description, etc. By applying different types of queries, many types of unknown relationships have been uncovered which were not well studied earlier such as three proteins named KRAS, NRAS, and RIT1, which have a common domain. Similarly, groups of genes have shown no expression in any other organ except the lungs. Some groups of genes have some types of somatic disorders showing that they may have a related genetic basis. Through the deep analysis, we found different groups of genes which have the same disorder type, mutation location for different diseases, and different genes playing a crucial role in the development of various diseases including lung cancer. Such a type of analysis helps design drugs against the highlighted cause factors of diseases.