oxfordfeb162012:start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
oxfordfeb162012:start [2015/11/07 15:38] magierooxfordfeb162012:start [2015/11/09 19:59] (current) magiero
Line 21: Line 21:
 ==== - Present Invention [0013]-[0020] ==== ==== - Present Invention [0013]-[0020] ====
 • [0013] The **present invention** shows a method for analyzing measurements dependent on the identity of k-mers consisting of...\\ • [0013] The **present invention** shows a method for analyzing measurements dependent on the identity of k-mers consisting of...\\
-• [0014] 1.) deriving a **feature vector** from the measurements +• [0014] 1.) deriving a **feature vector** from the measurements.\\ 
 • [0015] 2.) determining the similarity to at least one other feature vector.\\ • [0015] 2.) determining the similarity to at least one other feature vector.\\
 • [0017] The present invention **does not try to extract the exact sequence**.  Many applications don't need this.\\ • [0017] The present invention **does not try to extract the exact sequence**.  Many applications don't need this.\\
Line 203: Line 203:
   * //abundance of biomarker miRNA//: 20-25-mer RNA circulating in blood; expression associated with disease/cancers   * //abundance of biomarker miRNA//: 20-25-mer RNA circulating in blood; expression associated with disease/cancers
   * //foetal copy number variation//: fragmented foetal DNA circulates in maternal blood; some have additional chromosome copies ([[https://en.wikipedia.org/wiki/Aneuploidy | aneuploidy]]); use capture probes to exons of chromosomes of interest and then sequence these; foetal and maternal DNA could be distinguished by looking for methylation features   * //foetal copy number variation//: fragmented foetal DNA circulates in maternal blood; some have additional chromosome copies ([[https://en.wikipedia.org/wiki/Aneuploidy | aneuploidy]]); use capture probes to exons of chromosomes of interest and then sequence these; foetal and maternal DNA could be distinguished by looking for methylation features
-  * //comparative genomic hybridization (CGH)//: profile **copy number changes**; copy number of various genomic regions can be altered in tumour cells (and in foetuses as above)+  * //comparative genomic hybridization (**CGH**)//: profile **copy number changes**; copy number of various genomic regions can be altered in tumour cells (and in foetuses as above)
   * //viral or bacterial load//: measure of infection severity; number of pathogen RNA and DNA per ml of blood is measured (possibly with enrichment); does not have to be done on whole pathogen genome; early/late state measurements could identify [[https://en.wikipedia.org/wiki/Antigenic_drift | antigenic drift]] and/or [[https://en.wikipedia.org/wiki/Antigenic_variation | antigenic variation]]; applications in //epidemiology// like identification (**strain typing**), disease spread/evolution   * //viral or bacterial load//: measure of infection severity; number of pathogen RNA and DNA per ml of blood is measured (possibly with enrichment); does not have to be done on whole pathogen genome; early/late state measurements could identify [[https://en.wikipedia.org/wiki/Antigenic_drift | antigenic drift]] and/or [[https://en.wikipedia.org/wiki/Antigenic_variation | antigenic variation]]; applications in //epidemiology// like identification (**strain typing**), disease spread/evolution
   * //probes//: e.g. aptamers to a biomarker panel some of which attach to a target molecule; count those that did/didn't bind   * //probes//: e.g. aptamers to a biomarker panel some of which attach to a target molecule; count those that did/didn't bind
Line 212: Line 212:
  
 === - Measurement of Differences [0302]-[0305] === === - Measurement of Differences [0302]-[0305] ===
-• [0303] Example 1: Calling a [[https://en.wikipedia.org/wiki/Single-nucleotide_polymorphism | SNP]].  This may enable identification of new [[https://en.wikipedia.org/wiki/Locus_(genetics) | loci]] (i.e. location of a gene). This may enable identification of paralog-specific variants in [[https://en.wikipedia.org/wiki/Non-allelic_homologous_recombination | NAHR]].\\ +• [0303] __Example 1__: Calling a [[https://en.wikipedia.org/wiki/Single-nucleotide_polymorphism | SNP]].  This may enable identification of new [[https://en.wikipedia.org/wiki/Locus_(genetics) | loci]] (i.e. location of a gene). This may enable identification of paralog-specific variants (**PSVs**) in [[https://en.wikipedia.org/wiki/Non-allelic_homologous_recombination | NAHR]].\\ 
-• [0304] Example 2: **Methylation**.  Can do estimation of **bulk methylation state** of molecules (e.g. whether 100% of the population is 50% modified or whether 50% of the population is 100% modified).  Methylation state of certain genes can be used as a **biomarker for cancer**.\\+• [0304] __Example 2__: **Methylation**.  Can do estimation of **bulk methylation state** of molecules (e.g. whether 100% of the population is 50% modified or whether 50% of the population is 100% modified).  Methylation state of certain genes can be used as a **biomarker for cancer**.\\ 
 +• [0305] __Example 3__: **Splice variants** and/or **translocation breakpoints**; identify position where feature vectors stop matching; or, where half a feature vector maps to locus A and the other half maps to locus B.\\
  
 === - Identification of Presence (with Confidence) [0306]-[0311] === === - Identification of Presence (with Confidence) [0306]-[0311] ===
 +• [0308] __Example 1__: Identify populations related to some degree, but not identical to known molecule. E.g. homology of DNA or protein in rapidly mutating diseases.\\
 +• [0309] __Example 2__: [[https://en.wikipedia.org/wiki/Fusion_transcript | Fusion transcripts]] (as in **splice variants**); detection of fusion transcripts is used in cancer diagnosis; e.g. presence of Bcl-abl fusion transcript indicates leukemia.\\
 +• [0310] __Example 3__: Diagnosis of [[https://en.wikipedia.org/wiki/Non-allelic_homologous_recombination | NAHR]]; messed up recombination (between non-alelic loci) can be detected in change in **copy number** (see **CGH** above).\\
 +• [0311] __Example 4__: Comparing plural parts of DNA to plural stored features; e.g. DNA sequence for known protein domains may be stored in library; part of derived feature vector may match a catalytic domain and part may match a DNA binding domain thus the function of the protein may be deduced.\\
  
 === - Assembly Application [0312]-[317] === === - Assembly Application [0312]-[317] ===
 +• 
  
 ==== - Use Examples [0318]-[0351] ==== ==== - Use Examples [0318]-[0351] ====
  
 === - Data Acquisition [0318]-[0325] === === - Data Acquisition [0318]-[0325] ===
 +• [0323] Single-channel currents measured on Axopatch 200B equipped with 1440A digitizers.\\
 +• [0323] Via Pt electrodes cis connected to ground of Axopatch head stage and trans connected to active electrode of the headstage.\\
  
 === - Identification and Quantification of DNA [0326]-[0333] === === - Identification and Quantification of DNA [0326]-[0333] ===
 +• [0326] This example describes the process of identification of DNA molecules in a solution from a pre-determined library of feature vectors.\\ 
 +• [0327] Library construction performed as follows:
 +  - Take [[https://en.wikipedia.org/wiki/Phi_X_174 | PhiX174]] 5-kb genome
 +  - Chop it up into 18 400-mer sequences overlapping adjacent strands by 100 bases
 +  - All strands contain sequence at beginning and end common to all strands and not part of the larger genome (these are artificially attached on?!?!?!?)
 +  - Library feature vectors constructed for mean current (5-mer model)
 +• [0329] Candidate molecule are reduce to feature vectors (by measurement)\\
 +• [0330] Candidates compared against library using **alignment algorithm**.  Output score from alignment used to measure similarity to each library member.\\
 +• [0331] Comparison by alignment score used a **gap penalty** of -1 and a scoring function of reciprocal absolute difference (i.e. closer matches are higher scores).  See the match to library component 13.\\
 +{{alignscore.png?500}}
 +• [0333] All test molecules used in this experiment were molecule 13.  Occasionally (< 10% of the time by my guess) these are incorrectly identified as molecule 12 from the library.\\
  
 === - Measurement of SNPs [0334]-[0339] === === - Measurement of SNPs [0334]-[0339] ===
 +• [0335] The candidate molecule is a 19th pattern.  Identical to library 13 molecule except for the 3 SNPs [old][position][new]: T335A, G357T, C385A.  The effect of these SNPs on the measured current is shown below.\\
 +{{snpcurrent.png?500}}
 +• [0336] Alignment based identification method used previously was repeated.  Majority of molecules still correctly identified with library 13 (erroneous correspondence to 12 rose to maybe just above 10%).\\
 +• [0337] For SNP calling:
 +  * HMM and Viterbi path used for alignment
 +  * This has better path constraint (i.e. will align better through mismatched SNP regions) than...
 +  * Needleman-Wunsch
 +Alignments shown below compare well with idealized library mutations shown above.\\
 +{{snpcurrent2.png?500}}
 +• [0338] 176 molecules were aligned and their SNPs clearly identified.  Figure below shows the difference in current between **Viterbi aligned** library and candidate feature vectors.  The 3 SNPs are visible.\\
 +• [0338] In the case of 335 and 357 changes seen at **several position** (i.e. not just 1 as intended).  This is because **several measured features** are affected by each single change (i.e. a single change to a sequence affects several adjacent k-mers).\\ 
 +{{snpcompare.png?500}}
  
 === - Identification of Population and Sub-Population [0340]-[0343] === === - Identification of Population and Sub-Population [0340]-[0343] ===
 +• [0341] This example is worked through with simulated data.\\
 +• [0341] A set of 60 mean current feature vectors of library component 13 is simulated.\\
 +• [0341] 10 of these contain a SNP.\\
 +• [0341] Gaussian noise of std. dev. 1-pA is added to each value and 5% of values within each vector are deleted at random.\\
 +• [0342] Using this dataset a consensus is constructed using the **landmark process** outlined earlier.  The difference at SNP location 337 is clear.\\
 +{{consensus.png?500}}
 +• [0343] SNP of populations 51-60 gathered from consensus is clear in the figure below.\\
 +{{snppop.png?500}}
  
 === - Identification of a Number of Populations [0344]-[0347] === === - Identification of a Number of Populations [0344]-[0347] ===
 +• [0345] Experiments on 2 and 3 simulated species were done.  That is populations of 2 and 3 different DNA samples were fed into the machine.  Pairwise alignment is used to obtain similarity scores and hence a **similarity tree** is made by **neighbour joining**.\\
 +{{tree2.png?300 }} {{ tree3.png?300}}
 +• [0347] Identifications (as in example 2) were run for both experiments identifying the population distributions (see histograms).
 +{{hist2.png?250 }} {{ hist3.png?250}}
  
 === - Assembly of a Library [0348]-[0351] ===  === - Assembly of a Library [0348]-[0351] === 
 +• [0349] Using the 1-18 overlapping strands mentioned above (but without the extra common pattern tacked on to both ends of each strand).\\
 +• [0350] A tree by **neighbour joining** on **pairwise alignment** scores was constructed.\\
 +• [0350] Since relatively large non-similar regions were expected, a scoring function that does not penalize gaps at the beginning or end of the alignment as strongly as those within alignment was used (see result below).\\
 +{{libtree.png?300}}
 +• [0350] All sequences have similar relation to two other sequences representing the ~100 base overlap on either side.\\
 +• [0351] Progressing through the tree in order of relatedness:
 +  - consensus landmarks for the aligned sequences are constructed
 +  - the landmark from that alignment then serves as the feature vector for alignment with the next sequence
 +  - output of the process is a fully assembled feature vector
 +
oxfordfeb162012/start.1446910681.txt.gz · Last modified: 2015/11/07 15:38 by magiero

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki