> Skip repeated content
photo of David Oliver, PhD

David Oliver, PhD



At the Genomics Center

  • Development, implementation, and maintenance of RNA-seq, ChIP-seq, and ATAC-seq data processing workflows on the high performance computing cluster hosted at Weill Cornell Medicine.
  • Consult with basic and clinical scientists on experimental and statistical design, data analysis, and the choice of appropriate sequencing technologies.
  • Performing statistical analysis of genomics datasets including integration of multiple datasets from different sequencing technologies.
  • Training basic and clinical scientists and students in bioinformatics and statistical methods related to the next generation sequencing techniques.

Research Interests

The genetic code for cellular life is stored in long molecules known as DNA. Human cells contain about 3 billion base pairs of DNA organized into ~40,000 genes. The human body consists of ~37 trillion cells and each cell contains a full copy of information stored as DNA. Errors copying DNA or damage caused by environmental factors can result in an array of possible malfunctions at a cellular level which can influence the function of organs and tissues. While errors in the DNA can be the direct cause of disease, not all diseases stem from DNA errors. Diseases such as many autoimmune diseases cannot be directly linked to errors in DNA. While DNA is the core information required for life, additional information is required to regulate how and when specific portions of the DNA are read and transcribed in each cell. This regulation of what information within DNA is accessible and active/inactive is called epigenetic information. There are many methods of epigenetic regulation, the most well studied of these are DNA methylation, Histone modifications, and DNA accessibility.

While errors in DNA often within a gene resulting in malfunction of the protein product, other errors occur in regions which are important for the control of a gene’s expression. For complex diseases such as systemic lupus erythematosus (SLE) it is clear that the majority of disease associated DNA errors are located in regions important for controlling gene behavior rather than within regions coding for gene products. For this reason, genomic studies focusing on epigenetic control of genes have become foundational to the field.

RNA-seq, ChIP-seq, and ATAC-seq are the primary tools used for studying epigenetics in normal and diseased cells. These tools provide a genome-wide look at gene expression (RNA-seq), transcription factor binding (ChIP-seq), histone modifications (ChIP-seq), and DNA accessibility (ATAC-seq). Individually these tools are very powerful and provide massive amounts of information regarding the cellular system in question. Together these tools can provide insight into the mechanisms of complex diseases such as rheumatoid arthritis (RA) and SLE.

Improving management, visualization, and distribution of genomics data.

Genomics data are complex, noisy, and being generated on a larger and larger scale as the cost of sequencing decreases. Handling data of such size and complexity is impossible without the use of efficient information-processing methods. Additionally, analyzing, visualizing, communicating the results from these studies becomes of pivotal importance to research progress. My research interests are focused on application and development of methods that improve management, visualization, and distribution of large-scale genomics data and convert unstructured data into actionable information for basic and clinical scientists.

Motif discovery in ChIP and ATAC-seq data.

Whole genome-scale assays such as ChIP-seq and ATAC-seq produce a large amount of information that is inherently noisy. These techniques produce genomic sequences which are associated with transcription factors, epigenetic marks, or open chromatin. The challenge of analyzing these types of data is inference about what these data mean in terms of biological function. One of the methods for making biological inference is motif analysis. This approach uses information about how protein complexes interact with DNA via specific DNA sequences called motifs. My work focuses on application of machine learning learning, specifically natural language processing, to predict important motifs from ChIP and ATAC-seq datasets.



Bioinformatics Scientist, David Z. Rosensweig Genomics Research Center, Hospital for Special Surgery



Publications by

Selected Journal Articles

Google Scholar

Industry Relationships

Industry Relationships

One of the goals of HSS is to advance the science of orthopedic surgery, rheumatology, and related disciplines for the benefit of patients. Research staff at HSS may collaborate with outside companies for education, research and medical advances. HSS supports this collaboration in order to foster medical breakthroughs; however, HSS also believes that these collaborations must be disclosed.

As part of the disclosure process, this website lists Research staff collaborations with outside companies if the Research staff member received any payment during the prior year or expects to receive any payment in the next year. The disclosures are based on information provided by the Research staff and other sources and are updated regularly. Current ownership interests and leadership positions are also listed. Further information may be available on individual company websites.

As of May 11, 2024, Dr. Oliver reported no relationships with healthcare industry.

By disclosing the collaborations of HSS Research staff with industry on this website, HSS and its Research staff make this information available to patients and the public, thus creating a transparent environment for those who are interested in this information. Further, the HSS Conflicts of Interest Policy does not permit payment of royalties on products developed by him/her that are used on patients at HSS.