Bioinformatics is the science of managing, mining, and interpreting information from biological data. Various genome projects have contributed to an exponential growth in DNA and protein sequence databases. Rapid advances in high-throughput technologies, such as microarrays, mass spectrometry and new/next-generation sequencing, can monitor quantitatively the presence or activity of thousands of genes, RNAs, proteins, metabolites, and compounds in a given biological state. The ongoing influx of these data, the pressing need to address complex biomedical challenges, and the gap between the two have collectively created exciting opportunities for data mining researchers.

While tremendous progress has been made over the years, many of the fundamental problems in bioinformatics, such as protein structure prediction, gene-environment interaction, and regulatory network mapping, have not been convincingly addressed. Besides these, new technologies such as next-generation sequencing are now producing massive amounts of sequence data; managing, mining and compressing these data raise challenging issues. Finally, there is a pressing need to use these data coupled with efficient and effective computational techniques to build models of complex biological processes and disease phenotypes. Data mining will play an essential role in addressing these fundamental problems and in the development of novel therapeutic/diagnostic/prognostic solutions in the post-genomics era of medicine.

Workshop History (2001-present)

Data Mining approaches seem ideally suited for Bioinformatics, since it is data-rich, but lacks a comprehensive theory of life's organization at the molecular level. The extensive databases of biological information create both challenges and opportunities for developing novel KDD methods. To highlight these avenues we organized the Workshops on Data Mining in Bioinformatics (BIOKDD 2001-2016), held annually in conjunction with the ACM SIGKDD Conference. This will be the 16th year for the workshop. Past workshops attracted 50-100 participants, from academia, industry and government labs, underscoring the surge of interest in this exciting and rapidly expanding field. The program of the workshops included 10-11 contributed papers, and 1-2 invited talks.

Information on past workshops is available at:  Past Workshops Page

General Call for Papers

The goal of this workshop is to encourage KDD researchers to take on the numerous challenges that Bioinformatics offers. This year, the workshop will feature the theme of “Multiscale and Multimodal Analysis for Computational Biology”. This field focuses on the use of data mining and machine learning approaches for the analysis of the large amount of heterogeneous complex biological and medical data being generated. The direction of deep learning methods is particularly encouraged. The goal here is to build accurate predictive or descriptive models from these data enabling novel discoveries in basic biology and medicine. 

We encourage papers that propose novel data mining techniques for areas including but not limited to :

Development of deep learning methods for biological and clinical data.

Building predictive models for complex phenotypes from large-scale biological data .

Discovering biological networks and pathways underlying biological processes and diseases .

Processing of new/next-generation sequencing (NGS) data for genome structural variation .

Analysis, discovery of biomarkers and mutations, and disease risk assessment .

Discovery of genotype-phenotype associations.

Novel methods and frameworks for mining and integrating big biological data .

Comparative genomics.

Metagenome analysis using sequencing data.

RNA-seq and microarray-based gene expression analysis.

Genome-wide analysis of non-coding RNAs.

Genome-wide regulatory motif discovery.

Structural bioinformatics.

Correlating NGS with proteomics data analysis.

Functional annotation of genes and proteins.


Special biological data management techniques.

Information visualization techniques for biological data .

Semantic web and ontology-driven data integration methods .

Privacy and security issues in mining genomic databases .

Papers should be at most 9 pages long, single-spaced, in font size 10 or larger with one-inch margins on all sides. Camera-ready format papers may be referenced from previous BIOKDD conference proceedings.

Submission of accepted papers: For accepted workshop papers, we require that each camera-ready paper be formatted strictly according to the official ACM Proceedings Format. Please submit PDF file only. To prepare for the camera-ready PDF file submission, you may use either the Microsoft word template or the Latex files preparation instructions found at the ACM website.

Submission URL:


May 21th, 2017 :       Deadline for Submission of papers date

Jun 16th, 2017  :        Notification of Acceptance date

Aug 14th, 2017 :        Workshop date