Data analysis

I have experience with “-omic” and epidemiogical data including next generation sequencing (whole exome, whole genome and RNA-sequencing), microarrays, mass spectrometry-based proteomic studies, genetic screens, and chemical-genetic studies. The analysis of these datasets has beeen towards different research and clinical end-points including markers for risk, prognosis and patient benefit to therapy.


I offer services for

  • statistics of data quality control, normalization and preprocessing (next generation sequencing, microarrays or other screens),
  • supervised class distinction-based analyses: e.g. comparisons of 2 or more groups (parameteric or non-parametric, multiple testing corrections etc.),
  • unsupervised class discovery analyses e.g. clustering techniques, probablistic approaches, and downstream cluster reliability indices,
  • patient subtyping approaches,
  • time-series analyses,
  • clinical trials, experimental or observational studies,
  • biomarker discovery and measurement of performance/applicability (e.g. accuracy, hazard ratios, etc.),
  • survival analyses,
  • machine learning for the development of multivariate classifiers (e.g. Naive Bayes’ classifiers, Bayesian networks, Support Vector Machines etc.)
  • approaches based on expectation maximization such as Hidden Markov Models,
  • pathway analyses (e.g. via GO, GSEA, globaltest or other approaches),
  • network and systems biology analyses,
  • development of data analysis pipelines (e.g. via Galaxy),
  • development of web application for interactive exploration of results,
  • data visualization, and
  • production of study reports as well as statistical summaries.

This list is not exhaustive, and in general, I can provide support for most bioinformatic tools and databases, or can assist you to understand specific tools of interest.