1 Motivation

To investigate whether DeepSEA captures features related to known TFs in fetal brain. Here we focus on the first convolutional layer because it can be interpreted as motif detection layer.

2 Strategies

Our goal can be formalized as: for a set of input sequences which k-mer pattern contirbute most to a particular prediction task. The followings list three strategies to implement it:

  1. For each sequence, do in silico mutation for each site and extract k-mer (may use something looser, like (k+m)-mer) surrounding the peak. Aggregate all k-mers collected and extract the common pattern from them.
  2. For each sequence, calculate the derivative of output w.r.t. motif neuron, where the spatial information should be collapsed properly. Aggregate all motif neuron contribution patterns to find the neuron or combination of neurons that contribute most.
  3. For each sequence, calculate the activation of motif neuron (still we need to collapse spatial information properly). Aggregate all motif activation patterns to find the neuron or combination of neurons that are activated most.

3 Analysis

  1. Sequences selection:
    • We would like to focus on sequences that are correctly predicted by our model, namely low-score negative instances and high-score positive instances.
    • GC content and score distribution matched_gc
  2. Motif visualization: see here, pwm visualization
  3. Mutagensis TODO
  4. Sliding window newseq.E081, newseq.Noonan, motif analysis for E081 and Noonan motif analysis for E081, Noonan, hNSC-50 motif analysis for E081, positive critical windows vs. randomly selected signal regions
  5. Motif gradient E081, E082
  6. Motif activation: see analysis on E081, E082, Noonan, newseq.E081, newseq.Noonan, matched_gc.E081, matched_gc.Noonan

4 Aligning Extracted Motifs

Database db/CIS-BP/Homo_sapiens.meme (see here).

  1. 5/4/17: top 10 motifs in motif activation of E081, link
  2. 5/5/17: Selected motifs by combining motif activation and motif gradient results, link
  3. 5/26/17: motifs manually selected from result of motif acitvation pattern in new sequences, newseq.E081 and newseq.Noonan
  4. 5/31/17: top20 motifs in motif activation analysis using GC matched sequences, newseq.matchgc.E081, newseq.matchgc.Noonan