José Santos, Pedro Barahona, Ludwig Krippahl:
Mining Protein Structure Data.

Complete Text [
.pdf, 85KB]
In: Proceedings of 13th Portuguese Conference on Artificial Intelligence (EPIA 2007), Guimarães, Portugal (3rd - 7th December 2007), 527-540, December 2007
© Universidade do Minho

This paper describes the application of machine learning algorithms to the discovery of knowledge in a protein structure database. The problem addressed is the determination of the solvent exposure of each amino acid residue, using different levels of exposed surface to define exposure. First we introduce the baseline classifier which achieves good prediction results despite only taking into account the amino acid type. Then we explain how we gathered and processed the data and built our classifier to improve the baseline prediction. Finally we test and compare several classifiers (e.g. Neural Networks, C5.0, CART and Chaid), and parameters (level of information per amino acid, SCOP class of protein, sliding window from the current amino acid) that might influence the prediction accuracy. We conclude by showing our models present a modest but statistically significant improvement over the baseline classifier's accuracy.



	author = {Jos\'{e} Santos and Pedro Barahona and Ludwig Krippahl},
	title = {Mining Protein Structure Data},
	booktitle = {Proceedings of 13th Portuguese Conference on Artificial Intelligence, Guimarães, Portugal (3rd--7th December 2007)},
	year = {2007},
	pages = {527--540},
	url = {}