Differences

This shows you the differences between two versions of the page.

--- oxfordfeb162012:start [2015/11/09 15:36] – magiero
+++ oxfordfeb162012:start [2015/11/09 19:59] (current) – magiero
@@ Line 238: / Line 238: @@
   - All strands contain sequence at beginning and end common to all strands and not part of the larger genome (these are artificially attached on?!?!?!?)
   - Library feature vectors constructed for mean current (5-mer model)
+• [0329] Candidate molecule are reduce to feature vectors (by measurement)\\
+• [0330] Candidates compared against library using **alignment algorithm**.  Output score from alignment used to measure similarity to each library member.\\
+• [0331] Comparison by alignment score used a **gap penalty** of -1 and a scoring function of reciprocal absolute difference (i.e. closer matches are higher scores).  See the match to library component 13.\\
+{{alignscore.png?500}}
+• [0333] All test molecules used in this experiment were molecule 13.  Occasionally (< 10% of the time by my guess) these are incorrectly identified as molecule 12 from the library.\\
 === - Measurement of SNPs [0334]-[0339] ===
+• [0335] The candidate molecule is a 19th pattern.  Identical to library 13 molecule except for the 3 SNPs [old][position][new]: T335A, G357T, C385A.  The effect of these SNPs on the measured current is shown below.\\
+{{snpcurrent.png?500}}
+• [0336] Alignment based identification method used previously was repeated.  Majority of molecules still correctly identified with library 13 (erroneous correspondence to 12 rose to maybe just above 10%).\\
+• [0337] For SNP calling:
+  * HMM and Viterbi path used for alignment
+  * This has better path constraint (i.e. will align better through mismatched SNP regions) than...
+  * Needleman-Wunsch
+Alignments shown below compare well with idealized library mutations shown above.\\
+{{snpcurrent2.png?500}}
+• [0338] 176 molecules were aligned and their SNPs clearly identified.  Figure below shows the difference in current between **Viterbi aligned** library and candidate feature vectors.  The 3 SNPs are visible.\\
+• [0338] In the case of 335 and 357 changes seen at **several position** (i.e. not just 1 as intended).  This is because **several measured features** are affected by each single change (i.e. a single change to a sequence affects several adjacent k-mers).\\
+{{snpcompare.png?500}}
 === - Identification of Population and Sub-Population [0340]-[0343] ===
+• [0341] This example is worked through with simulated data.\\
+• [0341] A set of 60 mean current feature vectors of library component 13 is simulated.\\
+• [0341] 10 of these contain a SNP.\\
+• [0341] Gaussian noise of std. dev. 1-pA is added to each value and 5% of values within each vector are deleted at random.\\
+• [0342] Using this dataset a consensus is constructed using the **landmark process** outlined earlier.  The difference at SNP location 337 is clear.\\
+{{consensus.png?500}}
+• [0343] SNP of populations 51-60 gathered from consensus is clear in the figure below.\\
+{{snppop.png?500}}
 === - Identification of a Number of Populations [0344]-[0347] ===
+• [0345] Experiments on 2 and 3 simulated species were done.  That is populations of 2 and 3 different DNA samples were fed into the machine.  Pairwise alignment is used to obtain similarity scores and hence a **similarity tree** is made by **neighbour joining**.\\
+{{tree2.png?300 }} {{ tree3.png?300}}
+• [0347] Identifications (as in example 2) were run for both experiments identifying the population distributions (see histograms).
+{{hist2.png?250 }} {{ hist3.png?250}}
 === - Assembly of a Library [0348]-[0351] ===
+• [0349] Using the 1-18 overlapping strands mentioned above (but without the extra common pattern tacked on to both ends of each strand).\\
+• [0350] A tree by **neighbour joining** on **pairwise alignment** scores was constructed.\\
+• [0350] Since relatively large non-similar regions were expected, a scoring function that does not penalize gaps at the beginning or end of the alignment as strongly as those within alignment was used (see result below).\\
+{{libtree.png?300}}
+• [0350] All sequences have similar relation to two other sequences representing the ~100 base overlap on either side.\\
+• [0351] Progressing through the tree in order of relatedness:
+  - consensus landmarks for the aligned sequences are constructed
+  - the landmark from that alignment then serves as the feature vector for alignment with the next sequence
+  - output of the process is a fully assembled feature vector