Skip to Search, Skip to Main navigation, Skip to local navigation, Skip to Content
Indiana University Purdue University Indianapolis
  •  
  •  

Research Projects

Statistical Energy Functions: a Fragment-Based Approach

The long-term goal of this proposal is to predict protein structures and protein-protein complex structures to facilitate the development of therapeutic drugs. This proposal addresses the urgent need for a more accurate energy function for high-resolution protein-structure prediction and protein-protein interaction prediction. Currently, there are three complementary approaches to this problem, based on: physical principles (physical-based), known protein structures (knowledge-based or statistical), or empirical methods. Among the three, establishing a statistical energy function at an “all-atom” level of detail is the least explored approach. Here we propose a statistical energy function built on a mixture of atoms and molecular fragments, rather than on atoms alone. Inclusion of molecular fragments accounts for many interactions missed partially or wholly by commonly used atom-based approaches. Preliminary studies have shown a multi-fold improvement in the accuracy and specificity of refolding completely unfolded segments with secondary structure elements. This success is a preview of the potential of the proposed fragment-based approach to statistical energy functions.


Structure-Based Prediction of Folding Mechanism

The long-term goals of this proposal are to uncover the fundamental mechanisms of protein folding and/or misfolding, and, eventually, to predict protein structures from sequences using bioinformatic, theoretical, and computational methods. This proposal addresses the challenge of developing a computational model that is detailed enough to capture the specific folding behavior of a given protein, but is simple enough to permit efficient calculations. We propose to develop an all-atom model based on simplified potentials. Preliminary studies show that this new all-atom model allows practical folding simulations using regular PCs, and that it yields unprecedented accuracy in predicting the folding pathway(s) of a given protein. This initial success and productivity provide strong incentives for the further development and validation of this method. The new model will be used to examine the extent to which the tertiary structure encodes the varieties of folding mechanisms and the effect of non-native hydrogen bonding and nonnative hydrophobic interactions. In this regard, various all-atom models of the Pin WW domain will be tested. Results will be compared against available experimental data. The knowledge gained from the proposed studies will, not only advance our understanding of how specific proteins fold, but also should be useful for designing mutants that are optimized for folding and stability. The analytic tools and computational methods developed in the proposal have the potential to be widely used by those who are interested in determining the preferred folding pathways.


Warfare Cancer Care Engineering Project

Background: Cancer is now the number one killer in the United States and the emotional and financial costs to American society are tremendous. As the largest integrated health care delivery system in the United States, the Veterans Health Administration (VHA) will greatly benefit from ways to predict cancer susceptibility and treatment response. The ability to accurately identify individuals at risk for cancer and to implement strategies to prevent its development will profoundly impact the lives of not only veterans, active duty military personnel and their families, but all Americans. The ability to identify those patients whose cancers will not respond to treatment will greatly reduce the number of patients needed to achieve statistical significance in clinical trials with a concomitant drastic reduction in costs.

Objective/Hypothesis: The WCCE project applies principles of systems engineering to the cancer problem by viewing cancer, from its development to the delivery of care, as a “system” that can be mathematically modeled for predictive behavior. The ability to predict cancer susceptibility, treatment response, and optimal care delivery will result in more successful prevention strategies, more effective treatments, and more efficient care delivery.

Study Design: Using OMIC (glycoproteomic, pharmacogenomic, metabolomic, and lipidomic) and other biological data derived from patient tissues and patient and non-cancer control blood samples, iterative, integrated, mathematical models will be generated that identify molecular signatures useful for predicting cancer susceptibility and treatment response. The models will be rapidly tested in the clinic for their predictive power and continually refined as more and more biological and clinical data are added. Focusing on patients serviced by the Roudebush Veterans Affairs Medical Center in Indianapolis, the delivery of cancer care to Indiana Veterans will be evaluated. Underlying the entire WCCE project is the creation of a “Cancer War Room” where the models, information garnered from the literature, and information regarding the delivery of cancer care will be continually visualized and evaluated for actionable strategies that can be rapidly validated in the clinic.

Relevance: Cancer touches almost every American. The ultimate goal of the national cancer research effort is to prevent the development of cancer in the first place, which would have a tremendous impact on the well-being of the American public. Where prevention strategies fail, the goal is to detect cancer early enough to affect a cure using predictive molecular patterns present in biofluids and tissues. Finally and perhaps most realistically in the short term is the ability to treat cancer as a manageable non-life threatening disease. The WCCE project views the entire spectrum of cancer as a “system” that can be modeled for predictability to significantly improve the chances of success in achieving all the above national cancer goals.


APT: the Analytical Proteomics Team

The Informatics and Statistical Team led by Dr. Jake Chen at Indiana University will provide computational support for the Indiana Clinical Proteomics program led by Fred Regnier from Purdue University. In particular, our team will provide areas of expertise in proteomics data tracking, data integration, data analysis, data mining, and data sharing to support the multi-lab and multi-disciplinary analytical needs arising from analyzing clinical samples on multiple Mass Spectrometry platforms. The team will leverage existing IU computing hardware and software resources, develop new databases, customize statistical and analytical software, and gain computational insights. The innovation and new knowledge gained from the team will be directly applicable into assessing the variance of data quality, and in applying the knowledge of platform-dependent Mass Spectrometry (MS) result biases into the unified MS analytical framework for biomarker studies at the national level across different National Cancer Institute designated cancer centers.


Protein Packing Defects as Functional Markers and Drug Targets

Our preliminary structure-based investigations show that water exclusion from deficiently packed hydrogen bonds and other pre-formed electrostatic interactions constitutes a driving factor conferring high specificity to protein association. Thus, an evolutionary conserved feature, the under-dehydrated hydrogen bond, termed dehydron, appears to be a structural marker for interactivity. Dehydrons were experimentally and statistically shown to constitute sticky spots on the protein surface and to be abundant at protein-protein interfaces, especially at those that cannot be understood in terms of standard interactions. The dehydron distribution on the surface of soluble proteins constitutes a determinant of the propensity for association and aberrant aggregation. The identification of dehydrons has relied so far on detailed structural information, a limitation precluding a proteomic analysis. This proposal is geared at introducing a sequence-based predictive method to establish the biological relevance of dehydrons and their potential as markers for drugable targets. Thus, we intend to introduce a powerful unsupervised scanning technology to detect signals of interactivity and drugability at a genomic scale. This goal requires constructing a machine-learning discriminator trained on a structural database. The over-all aim is to develop a sequence-based multi-purpose tool to expand the universe of drugable targets, diagnose propensity for aberrant aggregation and make interactomic inferences. The efficacy of our predictor will be tested on five grounds: a) Assaying for amino-acid variability and determining whether residues predicted solely from sequence to be engaged in dehydrons are actually conserved, b) Using a redundancy-free curated PDB sample as training set, we shall determine the accuracy and precision of the sequence-based predictor using a nonhomologous PDB complement set and annotated SwissProt entries as testing sets, c) Contrasting our results with an alternative dehydron predictor based on a reliable sequence-based predictor of native disorder (PONDR(r)). This dehydron predictor is based on a correlation found between the extent of hydrogen-bond packing and the score of structural disorder, d) Contrasting sequence-based diagnosis of amyloidogenic aggregation with SwissProt annotations and other annotated disease-related sequence repositories; e) Contrasting compiled drug-target quality assessments and structural data and screening profiles for protein-ligand associations with the predicted dehydron patterns. Thus, the novel design concept of "drug inhibitor as a wrapper of functional packing defects" will be explored and validated.


Dynamical Network Modeling for Identifying Systems-level Breast Cancer Biomarkers

Known breast cancer (BRCA) susceptibility genes, e.g. P53, BRCA1, BRCA2, ERBB2 and PTEN, only account for 15-20% of the familial risk. From a view of network biology, these genes/gene products never function in isolation, and one of the emerging themes today is to re-characterize them in their molecular interaction network. On the other hand, it is found that both biological shape and physiological signals have chaotic and/or fractal characteristics, which indicate that many biological systems/networks could be analyzed effectively by applying nonlinear dynamical approaches.

Objective/Rationale: A novel computational framework, called dynamical network modeling, is proposed to reveal, interpret, utilize and validate the systems-level dynamical properties of BRCA-specific molecular networks. Based on the relationship between features of complex networks (e.g. scale-free) and nonlinear dynamical properties (e.g. fractals), the concept of systems-level biomarkers is proposed here by combining two newly introduced concepts - network biomarkers and dynamical biomarkers, to assess whether combining protein interaction database and literature mining with nonlinear dynamical techniques are sufficient to identify systems-level biomarkers for BRCA.

Specific Aims: 1) Characterize dynamical properties of BRCA-specific protein interaction network by utilizing modeling and analysis based on nonlinear dynamics. 2) Validate above network model through inputting gene expression data of BRCA patients and normal healthy individuals downloaded from public websites.

Study Design: 1) Reconstruct BRCA-specific protein interaction network. 2) Build and analyze dynamical model of above network. 3) Find the relationship between the network’s dynamical properties and component protein’s biological significance in their molecular pathway context. 4) Evaluate differentials of dynamical properties between BRCA patients and healthy individuals by masking their gene expression data into the BRCA-specific protein interaction network.


Lung Tumor Motion Behavior Analysis Using 4DCT

Image-guided radiation therapy (IGRT) can potentially increase treatment effectiveness for tumors of the lower abdomen and lungs that undergo respiration-induced motion. However, its success largely depends upon an adequate understanding of tumor motion characteristics and accurate prediction of tumor position at some point in the future. Such predictions constitute a challenging problem since respiration-induced tumor motion is complicated and patient-specific. Particularly challenging are patients having advanced lung disease or who have highly compromised breathing. Their tumor’s motion may be highly erratic with non-uniform period and amplitude, et cetera.

The overall objective of the work proposed in this application is to improve the radiation treatment of moving lung tumors. There are two specific aims under this proposal. The first specific aim covers the behavior analysis of tumor motion and patient breathing. The motion of a tumor will be mathematically characterized by defining parameters that categorize its movement with time during a treatment fraction and also cumulatively over the course of the treatment. Individual patient breathing behavior will be modeled by defining motion properties (e.g., amplitude, frequency, velocity) and their relationships between various breathing states (e.g. exhale, inhale, end of exhale) under various patient biomedical (such as anatomical and physiological) conditions. The second specific aim covers the development of a predictive model for tumor motion. A statistical model for predicting future movement behavior of a tumor based on previous motion patterns will be built and dynamically adjusted during real-time radiation treatment. A Hidden Markov Model with weighted probabilities will be explored. The model is expected to accurately predict respiratory-induced tumor motion to allow for true real-time IGRT. The proposed research is innovative since respiration-induced tumor motion has not been fully characterized and the prediction of tumor motion in various parts of the lung is difficult. Our interdisciplinary team of investigators uniquely combines the diverse range of data management, physics support, and clinical expertise needed to reach a definitive outcome for this research.