Exercise 9A

Restriction fragment lengths

Once again, let us use the tetrahedral die model to generate a random DNA sequence a million bases long with the probabilities of A, C, G, and T being 0.3, 0.2, 0.25, and 0.25 respectively. This DNA molecule is treated with the enzyme system AsuI, BglII, EcoRI, and Sau3AI allowing complete digestion. What will be the expected restriction fragment length?
(Hint: Find the cut sites for each enzyme listed. Then, find the probability of getting a sequence that will be cut by the enzyme. You can use the fact that each base occurrence is independent of the other bases. Do this for each enzyme. If two enzymes cut at the same place should we count both cuts here? If not, then which should we count? Once you have found the number of cuts for each enzyme, use this to find the total number of cuts. Then, P(get a cut) = total number of successes (cuts)/total number of trials (bases).To get the expected restriction fragment length set up as a geometric distribution with p=probability of getting a cut.)

For the mathematicians/statisticians who have taken stochastic processes before, can you think of some probabilistic models to approximate the variance of the restriction fragment length?