References:
[1] J. Zhao, "Multivariate Statistical Analysis of Protein Variation", A Ph.
D. dissertation, available at http://www.lib.ncsu.edu/theses/available/etd-
12092005-003538/unrestricted/etd.pdf
[2] A. Murzin, S. Brenner, T. Hubbard, and C. Chothia, "SCOP: A
Structural Classification of Proteins Database for the Investigation of
Sequences and Structures," Journal of Molecular Biology, vol. 247, no. 4,
pp. 536-540, 1995.
[3] C. Orengo, A. Michie, S. Jones, D. Jones, M. Swindells, and J.
Thornton, "CATH- A Hierarchic Classification of Protein Domain
Structures," Structure, vol. 5, no. 4, pp. 1093-1108, 1997.
[4] A. Bateman, L. Coin, R. Durbin, R. Finn, V. Hollich, S. Griffiths-Jones,
A. Khanna, M. Marshall, S. Moxon, E. Sonnhammer, D. Holme, C.
Yeats, and S. Eddy, "The Pfam protein Families Database," Nucleic Acids
Res., vol. 32, no. 36, pp. D138-D141, 2004.
[5] O. Camoglu, T. Can, A. Singh, and Y. Wang, "Decision Tree Based
Information Integration for Automated Protein Classification," Journal of
Bioinformatics and Computational Biology (JBCB), Vol. 3, No. 3, pp. 717-
742, 2005.
[6] O. André, F. Daniel, F. Ant├│nio, "Peptide programs: applying fragment
programs to protein classification", Proceeding of the 2nd International
Workshop on Data and Text Mining in Bioinformatics, pp. 37-44, 2008.
[7] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W.
Miller, and D. J. Lipman, "Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs", Nucleic Acids Res.,
vol. 25, no. 17, pp. 3389-3402, 1997.
[8] W. Tian, and J. Skolnick, "How well is enzyme function conserved as a
function of pairwise sequence identity?", Molecular Biological, vol. 3,
no.4, pp. 863-882, 2003.
[9] D. Devos, and A. Valencia, "Intrinsic errors in genome annotation",
Trends Genetics, vol. 17, no.8, pp. 429-431, 2001.
[10] E. N. Baker, V. L. Arcus, and J. S. Lott, "Protein structure prediction
and analysis as a tool for functional genomics", Appl. Bioinformatics,
vol. 2, no. 3, pp. 3-10, 2003.
[11] M. Grotthuss, D. Plewczynski, K. Ginalski, L. Rychlewski, and E. I.
Shakhnovich, "PDB-UF: database of predicted enzymatic functions for
unannotated protein structures from structural genomics", BMC
Bioinformatics, vol. 7, no. 1, pp. 53-56, 2006.
[12] J. C. Whisstock, and A. M. Lesk, "Prediction of protein function from
protein sequence and structure", Q Rev Biophys., vol. 36, no. 3, pp. 307-
340, 2003.
[13] I. Friedberg, "Automated protein function prediction the genomic
challenge", Brief Bioinformatics, vol. 7, no. 3, pp. 225-242, 2006.
[14] I., Melvin, E. Ie, J. Wetson, W. S. Noble, and C. Leslie, "Multi-class
protein classification using adaptive codes", J Mach. Learn. Res., vol. 8,
pp. 1557-1581, 2007.
[15] L. Y. Han , C. Z. Cai, Z. L. Ji, Z. W Cao., J. Cui, and Y. Z. Chen, "
Predicting functional family of novel enzymes irrespective of sequence
similarity: a statistical learning approach", Nucleic Acids Res., vol. 32,
no. 21, pp. 6437-6444, 2004.
[16] R. E. Langlois, M. B. Carson, N. Bhardwaj, and H. Lu "Learning to
translate sequence and structure to function: Identifying DNA binding
and membrane binding proteins" , Annals of Biomedical Engineering,
vol. 35, no. 6, pp. 1043-1052, 2007.
[17] Z. R. Yang, and R. Hamer, "Bio-basis function neural networks in
protein data mining", Current Pharmaceutical Design, vol. 13, no. 14,
pp. 1403-1413, 2007.
[18] J. Busch, P. Ferrari, A. Flesia, S. P. Grynberg, and F. Leonardi," Testing
statistical hypothesis on random trees and applications to the protein
classification problem", Annals of Applied Statistics, Vol.3, No.2, pp.542-
563, 2009.
[19] M. Q. Yang, J. Y. Yang, and O. K. Ersoy, "Classification of proteins
multiple-labelled and single-labelled with protein functional classes",
Int. J Gen. Syst., vol. 36, no.1, pp. 91-109, 2007.
[20] C. Pasquier, V. Promponas, and S. J. Hamodrakas, "PRED-CLASS:
Cascading Neural networks for generalized protein classification and
genome wide applications", Proteins, PROTEINS: Structure, Function,
and Genetics, vol. 44, no.1, pp. 361-369, 2001.
[21] B. J. Webb-Robertson, C. Oehmen, and M. Matzke, "SVM-BALSA:
Remote homology detection based on Bayesian sequence alignment",
Computational Biological Chemistry, vol. 29, no. 6, pp. 440-443, 2005.
[22] Z. D. Zhang, S. Kochhar, and M. G. Grigorov, " Descriptor-based
protein remote homology identification", Protein Science, vol. 14, no.2,
pp. 431-444, 2005.
[23] N. Bhardwaj, R. E. Langlois, G. J Zhao, and H. Lu " Kernel-based
machine learning protocol for predicting DNA binding proteins",
Nucleic Acids Res, vol. 33, no. 20, pp. 6486-6493, 2005.
[24] P. D. Dobson, and A. J. Doig, "Predicting enzyme class from protein
structure without alignments", Journal of Molecular Biology, vol. 345,
no. 1, pp. 187-199, 2005.
[25] Y. D. Cai, and A. J. Doig, "Prediction of Saccharomyces cerevisiae
protein functional class from functional domain composition",
Bioinformatics, vol. 20, no.8, pp. 1292-1300, 2004.
[26] Q. W. Dong, X. L. Wang, and L. Lin, "Application of latent semantic
analysis to protein remote homology detection", Bioinformatics, vol. 22,
no. 3, pp. 285-290, 2005.
[27] R. Kuang, E. Ie, K. Wang, K. Wang, M. Siddiqi, Y. Freund, and C.
Leslie, "Profile-based string kernels for remote homology detection and
motif extraction", Journal of Bioinformatics and Computational
Biology, vol. 3, no.3, pp. 527-550, 2005.
[28] H. Rangwala, and G. Karypis, "Profile-based direct kernels for remote
homology detection and fold recognition", Bioinformatics, vol. 2, no.23,
pp. 4239-4247, 2005.
[29] L. Nanni, S. Mazzara, L. Pattini, and A. Lumini, "Protein classification
combining surface analysis and primary structure", Protein Engineering:
Design and Selection, vol. 22, no. 4, pp. 267-272, 2009.
[30] D. Eisenberg, R. Weiss, and T. Terwilliger, "The Helical Hydrophobic
Moment: A Measure of the Amphiphilicity of a Helix", Nature, vol.4,
pp. 299-371, 1982.
[31] D. Eisenberg, E. Schwarz, M., Komaromy and R. Wall, "Analysis of
Membrane and Surface Protein Sequences with the Hydrophobic
Moment Plot", Journal of Molecular Biology, vol.42, no.1, pp. 125-179,
1984.
[32] L. Pattini, L. Riva, and S. Cerutti, "A wavelet based method to predict
the alpha helix content in the secondary structure of globular proteins",
Proceedings of the IEEE-EMBS, pp.132-133 , 2002.
[33] A. Shepherd, G. Gorse, and J. Thornton, "A novel approach to the
recognition of protein architecture from sequence using Fourier analysis
and neural networks", Proteins, vol. 50, no.2, pp. 290-302, 2003.
[34] A. Antonina, H. Dave, C. John-Marc, and E. Steven, "Data growth and
its impact on the SCOP database: new developments", Nucleic Acids
Res., vol. 36, no. 1, pp. 1-7, 2008.
[35] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H.
Weissig, I.N. Shindyalov, and P.E. Bourne, "The Protein Data Bank",
Nucleic Acids Res., vol. 28, no. 1, pp.235-242, 2000.
[36] L. Lo Conte, S.E. Brenner, T.J.P. Hubbard, C. Chothia, and A.G.
Murzin, "SCOP database in 2002: refinements accommodate structural
genomics", Nucleic Acids Res., vol. 30, no.1, pp. 264-267, 2002.
[37] J. M. Chandonia, G. Hon, N.S. Walker, L. Lo Conte, P. Koehl, M.
Levitt, and S.E. Brenner, "The ASTRAL compendium in 2004",
Nucleic Acids Res., vol. 32, no.1, pp. 189-192, 2004.
[38] D. Wilson, M. Madera, C. Vogel, C. Chothia, and J. Gough, "The
SUPERFAMILY database in 2007: families and functions", Nucleic Acids
Res., vol. 35, Database Issue, pp. 308-313, 2007.