In conjunction with ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD '15)

BIOKDD'15 Workshop

Workshop Home

  Important Dates
  Organizers and Program Committee



Bioinformatics is the science of managing, mining, and interpreting information from biological data. Various genome projects have contributed to an exponential growth in DNA and protein sequence databases. Rapid advances in high-throughput technologies, such as microarrays, mass spectrometry and new/next-generation sequencing, can monitor quantitatively the presence or activity of thousands of genes, RNAs, proteins, metabolites, and compounds in a given biological state. The ongoing influx of these data, the pressing need to address complex biomedical challenges, and the gap between the two have collectively created exciting opportunities for data mining researchers.

While tremendous progress has been made over the years, many of the fundamental problems in bioinformatics, such as protein structure prediction, gene-environment interaction, and regulatory network mapping, have not been convincingly addressed. Besides these, new technologies such as next-generation sequencing are now producing massive amounts of sequence data; managing, mining and compressing these data raise challenging issues. Finally, there is a pressing need to use these data coupled with efficient and effective computational techniques to build models of complex biological processes and disease phenotypes. Data mining will play an essential role in addressing these fundamental problems and in the development of novel therapeutic/diagnostic/prognostic solutions in the post-genomics era of medicine.

The goal of this workshop is to encourage KDD researchers to take on the numerous challenges that Bioinformatics offers. This year, the workshop will feature the theme of “Knowledge Discovery on Complex Biological and Medical Data”. This field focuses on the use of data mining and machine learning approaches for the analysis of the large amount of heterogeneous complex biological and medical data being generated. The goal here is to build accurate predictive or descriptive models from these data enabling novel discoveries in basic biology and medicine.

We encourage papers that propose novel data mining techniques for areas including but not limited to

  • Building predictive models for complex phenotypes from large-scale biological data
  • Discovering biological networks and pathways underlying biological processes and diseases
  • Processing of new/next-generation sequencing (NGS) data for genome structural variation analysis, discovery of biomarkers and mutations, and disease risk assessment
  • Discovery of genotype-phenotype associations
  • Novel methods and frameworks for mining and integrating big biological data
  • Comparative genomics
  • Metagenome analysis using sequencing data
  • RNA-seq and microarray-based gene expression analysis
  • Genome-wide analysis of non-coding RNAs
  • Genome-wide regulatory motif discovery
  • Structural bioinformatics
  • Correlating NGS with proteomics data analysis
  • Functional annotation of genes and proteins
  • Cheminformatics
  • Special biological data management techniques
  • Information visualization techniques for biological data
  • Semantic web and ontology-driven data integration methods
  • Privacy and security issues in mining genomic databases


  • Prof. Joydeep Ghosh, University of Texas at Austin, USA
  • Prof. Sean O'donoghue, Garvan Institute of Medical Research, Australia
  • Prof. Wei Wang, University of California - Los Angeles, USA
  • Prof. Eric Xing, Carnegie Mellon University, USA
  • Prof. Yaoqi Zhou, Griffith University, Australia

Program Overview

  • 7:30-8:00am, Arrival Coffee, Level 2 & Level 4 Pre-Function Areas

  • 8:00-9:40am, Session I

    • 8:00-8:10am, Introduction

    • 8:10-8:55am, Invited Talk I, Visual Data Mining in Bioinformatics
      Prof. Sean O'donoghue, Garvan Institute of Medical Research, Australia

    • 8:55-9:40am, Invited Talk II, Data Driven Approaches to High-throughput  Phenotype Generation from Heterogenous Health Records
      Prof. Joydeep Ghosh, University of Texas at Austin, USA

  • 9:40-10:30am, Session II

    • 9:40-10:05am, Selected Talk 1: A fast PC algorithm for high dimensional causal discovery for multi-core PCs
      T. Duy Le, T. Hoang, J. Li, Lin Liu and H. Liu

    • 10:05-10:30am, Selected Talk 2: Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing
      F. Llinares-López, M. Sugiyama, L. Papaxanthos and K. M. Borgwardt

  • 10:30-11:00am, Morning Break, Level 2 & Level 4 Pre-Function Areas

  • 11:00am-12:30pm, Session III

    • 11:00-11:45am, Invited Talk III, Personalized Pan-Omic Association Analysis under Complex Structures
      Prof. Eric Xing, Carnegie Mellon University, USA

    • 11:45-12:30am, Invited Talk IV, Algorithm acceleration for high throughout biology
      Prof. Wei Wang, University of California - Los Angeles, USA

  • 12:30-1:30pm, Working Lunch, Level 2 & Level 4 Pre-Function Areas

  • 1:30-3:05pm, Session IV

    • 1:30-2:15pm, Invited Talk V, Detecting disease-causing genetic variations due to nonsense mutations and micro-insertions and deletions
      Prof. Yaoqi Zhou, Griffith University, Australia

    • 2:15-2:40pm, Selected Talk 3: Transfer String Kernel for Cross-Context Transcription Factor Binding Prediction
      R. Singh, G. Robins and Y. Qi:

    • 2:40-3:05pm, Selected Talk 4: Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning
      S. Mirzaei, T. Sidi, C. Keasar and S. Crivelli

  • 3:05-3:30pm, Afternoon Break, Level 2 & Level 4 Pre-Function Areas

  • 3:30-5:10pm, Session V

    • 3:30-3:55pm, Selected Talk 5: Discovery of Significantly Enriched Subgraphs Associated with Selected Vertices in a Single Graph
      P. Meysman, Y. Saeys, E. Sabaghian, W. Bittremieux, Y. Van de Peer, B. Goethals and K. Laukens

    • 3:55-4:20pm, Selected Talk 6: EigenBiomarker: A Method for Composite Biomarker Detection with Applications in Type 1 Diabetes (T1D)
      M. Lu, S. Huang, J. Odegard, C. Speake, J. Huang and X. Qian

    • 4:20-4:45pm, Selected Talk 7: Multi-site Meta-analysis of Morphometry META-ANALYSIS OF MORPHOMETRY
      J. Neda et al

    • 4:45-5:10pm, Selected Talk 8: Learning Representative Features from EMG Data via Deep Non-Negative Tensor Factorization
      P. Yang and J. He

Important Dates

June 15th, 2015 Deadline for Submission of Papers
June 30th, 2015 Notification of Acceptance; Workshop Registration Open
July 10th, 2015s Submission of Camera-ready Papers
August 10th, 2015 Workshop Presentation

All deadlines are at 11:59 PM Pacific Standard Time.


Papers should be at most 10 pages long, single-spaced, in font size 10 or larger with one-inch margins on all sides. Using the ACM Proceedings Format is highly recommended. Paper should be submitted in PDF format through EasyChair at the following link:

Papers will be published in the webpage. A selection of accepted papers will also be invited to be submitted to a special section of the reputed IEEE Transactions on Computational Biology and Bioinformatics.


To be available.

Workshop Organizers

Program Chairs

Sara C. Madeira
Department of Computer Science and Engineering
Instituto Superior Técnico, Universidade de Lisboa
Av. Rovisco Pais, 1
1049-001 Lisbon, Portugal

Web site:

Jieping Ye
Department of Computational Medicine and Bioinformatics
Department of Electrical Engineering and Computer Science
University of Michigan
2035C Palmer Commons Bldg., 100 Washtenaw Avenue
Ann Arbor, MI 48109-2218

Web site:

General Chairs

Mohammed J. Zaki, Ph.D.
Department of Computer Science
Rensselaer Polytechnic Institute
Troy, NY 12180-3590

Web site:


Jake Y. Chen, Ph.D.
Indiana University School of Informatics
Indiana University - Purdue University Indianapolis
535 W. Michigan St, #493
Indianapolis, IN 46202

Web site:

Program Committee

Francisco Azuaje Luxembourg Institute of Health, Luxembourg
Asa Ben-Hur Colorado State University, USA
Rui Camacho Universidade do Porto, Portugal
João Carriço Universidade de Lisboa, Portugal
Rita Casadio University of Bologna, Italy
Rui Chang Mount Sinai School of Medicine, USA
Luis Pedro Coelho European Molecular Biology Laboratory, Germany
Tijl De Bie University of Bristol, UK
Minghua Deng Peking University, China
Joaquin Dopazo Centro de Investigación Príncipe Felipe, Spain
Gang Fang Mount Sinai School of Medicine, USA
Piero Fariselli University of Bologna, Itlay
Florentino Fdez-Riverola University of Vigo, Spain
Olivier Gevaert Stanford University, USA
Jun Huan University of Kansas, USA
Shuiwang Ji Old Dominion University, USA
Rui Kuang University of Minnesota, USA
Kris Laukens University of Antwerp, Belgium
Alexandra M. Carvalho Instituto Superior Técnico, Portugal
Pier Luigi Martelli University of Bologna, Italy
Yves Moreau Katholieke Universiteit Leuven, Belgium
T. M. Murali Virginia Tech, USA
Xia Ning Indiana University - Purdue University Indianapolis, USA
José Luís Oliveira Universidade de Aveiro, Portugal
Alexandre P. Francisco Universidade de Lisboa, Portugal
Joana P. Gonçalves Delft University of Technology & Netherlands Cancer Institute, Netherlands
Predrag Radivojac Indiana University, USA
Naren Ramakrishnan Virginia Tech, USA
Huzefa Rangwala George Mason University, USA
Chandan K. Reddy Wayne State University, USA
Miguel Rocha Universidade do Minho, Portugal
Saeed Salem North Dakota State University, USA
Min Song Yonsei University, South Korea
Pedro T. Monteiro INESC-ID - Lisboa, Portugal
Andrea Tagarelli University of Calabria, Italy
Raf Van de Plas Delft University of Technology & Vanderbilt University, Netherlands
Susana Vinga IDMEC - Instituto de Engenharia Mecânica, Portugal
Jinbo Xu Toyota Technological Institute at Chicago, USA
Jie Zheng Nanyang Technological University, Singapore

Workshop History

Information on past workshops is available at:

Data Mining

For more information on data mining see SIGKDD and kdnuggets