1 Gkm-SVM

Baseline method is used to compare the performance. In DeepSEA, gkm-SVM is selected as baseline classifier. Here are several issues in making such comparison:

  • Gkm-SVM does not support joint learning, so we can only train a independent gkm-SVM for each annotation at a time.
  • Another issue is that gkm-SVM can only support ~20000 training sequences, therefore we should shrink the training set a lot.
  • Gkm-SVM is reported to work well under 300-bp window size and DeepSEA paper confirmed such behavior. So, we may need to use shorter window size for gkm-SVM learning.

Therefore, the comparison cannot be made under rigoriously the same conditions, and it is unclear how to make fair comparison in this case.

2 Alternative choice

Instead of using gkm-SVM, another choice is to use JASPAR verberates core motifs (link) and use the features that are generated from the motif scores (519 motifs in total).

3 Implementation

The classifier that uses motif score as feature is implemented at here and it is trained using DeepSEA training sets and validation sets (four labels are trained, E081, E082, E129, and Noonan). The performance of the best model (in terms of validation loss) is evaluated using DeepSEA test sequences and sequences that are extracted from bed files directly, link.

4 Update

To avoid overfitting, SVM head is implemented as well. Here, to get the probabilistic interpretation, at test time, the sigmoid function is applied on the top of output