DNA
Testing: An Introduction For Non-Scientists |
An
Illustrated Explanation
by DONALD E. RILEY,
Ph.D. |
University
of Washington
(Copyright 1997-2005) |
| Revised
Edition Posted April 6, 2005 |
| The
explanation of DNA testing that follows
is intended as an introduction to the subject
for those who may have limited backgrounds
in biological science. While basically accurate,
this explanation involves liberal use of
illustration and, in some cases, over-simplification.
Although intended to be informative, this
is brief and incomplete explanation of a
complex subject. The author suggests consulting
the scientific literature for more rigorous
details and alternative views. |
DNA
EXPLAINED IN EASY TERMS |
DNA
is material that governs inheritance of
eye color, hair color, stature, bone density
and many other human and animal traits.
DNA is a long, but narrow string-like object.
A one foot long string or strand of DNA
is normally packed into a space roughly
equal to a cube 1/millionth of an inch on
a side. This is possible only because DNA
is a very thin string.
Our body's cells
each contain a complete sample of our DNA.
One cell is roughly equal in size to the
cube described in the previous paragraph.
There are muscle cells, brain cells, liver
cells, blood cells, sperm cells and others.
Basically, every part of the body is made
up of these tiny cells and each contains
a sample or complement of DNA identical
to that of every other cell within a given
person. There are a few exceptions. For
example, our red blood cells lack DNA. Blood
itself can be typed because of the DNA contained
in our white blood cells. |
| Not
only does the human body rely on DNA but
so do most living things including plants,
animals and bacteria. |
| A
strand of DNA is made up of tiny building-blocks.
There are only four, different basic building-blocks.
Scientists usually refer to these using
four letters, A, T, G, and C. These four
letters are short nicknames for more complicated
building-block chemical names, but actually
the letters (A,T, G and C) are used much
more commonly than the chemical names so
the latter will not be mentioned here. Another
term for DNA's building blocks is the term, "bases." A,
T, G and C are bases. |
| For
example, to refer to a particular piece
of DNA, we might write: AATTGCCTTTTAAAAA.
This is a perfectly acceptable way of describing
a piece of DNA. Someone with a machine called
a DNA synthesizer could actually synthesize
the same piece of DNA from the information
AATTGCCTTTTAAAAA alone. |
| The
sequence of bases
(letters) can code for many properties of
the body's cells. The cells can read this
code. Some DNA sequences encode important
information for the cell. Such DNA is called,
not surprisingly, "coding
DNA." Our cells also contain much
DNA that doesn't
encode anything that we know about. If the
DNA doesn't encode anything, it is called
non-coding DNA or sometimes, "junk
DNA." 1[1] |
| The
DNA code, or genetic code as it is called,
is passed through the sperm and egg to the
offspring. A single sperm cell contains
about three billion bases consisting of
A, T, G and C that follow each other in
a well defined sequence along the strand
of DNA. Each egg cell also contains three
billion bases arranged in a well-defined
sequence very similar, but not identical
to the sperm. |
| Both
coding and non-coding DNAs may vary from
one individual to another. These DNA variations
can be used to identify people or at least
distinguish one person from another. |
What
is a Locus? |
| A
locus (with a hard "c", LOW-KUS)
is simply a location in the DNA. The plural
of locus is, loci ( with a soft "c",
pronounced LOW-S-EYE). Again, the DNA is
a long string like object as illustrated
below. A locus is simply a location in the
DNA. Such locations, or loci, reside at
specific places on chromosomes. |
What
is a Chromosome? |
| When
a cell is getting ready to divide creating
two daughter cells, it packs its DNA into
bundles called chromosomes. Chromosomes
are just bundles of DNA. For humans, there
are consistently 23 pairs of chromosomes,
each with a consistent size and shape. Chromosomes
are numbered. Chromosome number 1 is the
largest chromosome; chromosome number 2
a little smaller and so on. Among the 23
pairs of chromosomes there is a pair called
the sex chromosomes. This is something of
a misnomer, since there are many functions
on the "sex" chromosomes that
have nothing to do with sex. In females,
the sex-chromosome pair consists of two
similar size chromosomes called X chromosomes.
Males have one X and one small Y chromosome. |
 |
| Unless
it has been purified, our DNA is actually
not a loosely tangled string as illustrated
but rather is well organized and packaged
into what are called chromosomes. A chromosome
is a tightly folded bundle of DNA. Chromosomes
are most visible when cells divide. In a
microscope, chromosomes look something like
this without the numbers and letters: |
 |
| The
illustration shows a pair of chromosomes
named chromosome number 4, one pair among
23 pairs of chromosomes. The illustration
also shows the position of a locus that
happens to be called "GYPA." In
this example, the chromosome on the left
has the variation called the B allele while
the chromosome on the right has the variation
called the A allele. |
What
are alleles? |
| Alleles
(ALL-EELS') are just variations at a particular
site on a chromosome. Since each chromosome
has a similar chromosome partner (except
for males with their X and Y chromosomes)
each locus is duplicated. Loci can vary
a bit. If a person has two identical versions
of the locus, they are said to be homozygous
(HOMO-Z-EYE'-GUS). If there is a difference,
they are said to be heterozygous (HETERO-Z-EYE'-GUS). |
FORENSIC
DNA TESTING |
| There
have been two main types of forensic DNA
testing. They are often called, RFLP and
PCR based testing, although these terms
are not very descriptive. Generally, RFLP
testing requires larger amounts of DNA and
the DNA must be undegraded. Crime-scene
evidence that is old or that is present
in small amounts is often unsuitable for
RFLP testing. Warm moist conditions may
accelerate DNA degradation rendering it
unsuitable for RFLP in a relatively short
period of time. |
| PCR-based
testing often requires less DNA than RFLP
testing and the DNA may be partially degraded,
more so than is the case with RFLP. However,
PCR still has sample size and degradation
limitations that sometimes may be under-appreciated.
PCR-based tests are also extremely sensitive
to contaminating DNA at the crime scene
and within the test laboratory. During PCR,
contaminants may be amplified up to a billion
times their original concentration. Contamination
can influence PCR results, particularly
in the absence of proper handling techniques
and proper controls for contamination. |
| PCR
is less direct and somewhat more prone to
error than RFLP. However, PCR has tended
to replace RFLP in forensic testing primarily
because PCR based tests are faster and more
sensitive. |
RFLP
EXPLAINED IN EASY TERMS |
RFLP
has been almost entirely replaced by PCR-based
testing. The following description of RFLP
is included here primarily for historic
reasons (more current formats see below).
RFLP DNA testing has four basic steps:
- The DNA from crime-scene evidence or
from a reference sample is cut with something
called a restriction enzyme. The restriction
enzyme recognizes a particular short sequence
such as AATT that occurs many times in
a given cell's DNA. One enzyme commonly
used is called Hae III (pronounced: Hay
Three) but the choice of enzyme varies.
For RFLP to work, the analyst needs thousands
of cells. If thousands of cells are present
from a single individual, they will all
be cut in same place along their DNA by
the enzyme because each cells DNA is identical
to every other cell of that person.
- The cut DNA pieces are now sorted according
to size by a device called a gel. The
DNA is placed at one end of a slab of
gelatin and it is drawn through the gel
by an electric current. The gel acts like
a sieve allowing small DNA fragments to
move more rapidly than larger ones.
- After the gel has separated the DNA
pieces according to size, a blot or replica
of the gel is made to trap the DNA in
the positions that they end up in, with
small DNA fragments near one end of the
blot and large ones near the other end.
The blot is now treated with a piece of
DNA called a probe. The probe is simply
a piece of DNA that binds to the DNA on
the blot in the position were a similar
sequence (the target sequence) is located.
- The size or sizes of the target DNA
fragments recognized by the probe are
measured. Using the same probe and enzyme,
the test lab will perform these same steps
for many people. These sizes and how they
distribute among large groups of people
form a database. From the database a rough
idea of how common a given DNA size measured
by a given probe is found. The commonness
of a given size of DNA fragment is called
a population frequency.
|
 |
| The
restriction enzyme cuts the DNA into thousands
of fragments of nearly all possible sizes.
The sample is then electrophoretically separated.
The DNA at this point is invisible in the
gel unless the DNA is stained with a dye.
A replica of the gel's DNA is made on something
called a blot (also called a Southern blot)
or membrane. The blot is then probed (mixed
with) a special preparation of DNA that
recognizes a specific DNA sequence or locus.
Often, the probe is a radioactively labeled
DNA sequence (represented by * labeled object
in the figure above). Excess probe is washed
off the blot, then the blot is laid onto
X-ray film. Development reveals bands indicating
the sizes of the alleles for the locus within
each sample. The film is now called an "autorad." The
band sizes are measured by comparing them
with a "ladder" of known DNA sizes
that is run next to the sample. A match
may be declared if two samples have RFLP
band sizes that are all within 5% of one
another in size. |
| For
RFLP analysis to be reliable, all complex
steps of the analysis must be carefully
controlled. Databases must be large meaning
they include many people; they must be representative
of the potential test subjects. Because
of the complexities of populations, databases
must be interpreted with extreme care. For
example, DNA fragment sizes rare in one
population may be very common in other populations.
Further, sub-populations or populations
within populations must be considered. |
PCR
EXPLAINED IN EASY TERMS |
| PCR
is an abbreviation for "polymerase
chain reaction." (POLL'-IM-ER-ACE).
This term applies to a wide variety of different
DNA tests that differ in reliability and
effectiveness. Reliabilities of each kind
of PCR test need independent verification.
PCR itself doesn't accomplish DNA typing,
it only increases the amount of DNA available
for typing. |
| PCR
uses constant regions of DNA sequence to
prime the copying of variable regions of
DNA sequence. |
 |
PCR
typically uses two short pieces of known
DNA called primers (small arrows below). These
serve as starting points for the copying
of a region of DNA. |
 |
| Many
forensic laboratories use commercial supplied
DNA testing kits that contain key components
for certain PCR-based tests. PM plus DQA1TM,
Profiler PlusTM and CofilerTM and IdentifilerTM are all test kits commercially supplied
by PE Applied Biosystems. PowerPlexTM is
another test kit with variations supplied
by Promega. PowerPlex kits have published
primers, an advantage if the precise DNA
targeted is to be recorded for posterity
or studied for research. As of 2005, Profiler
PlusTM and CofilerTM and PowerPlexTM are
probably the most commonly used test kits
in US forensic laboratories. |
CR
Contamination |
| It
is worth considering contamination early
in this discussion since this is a well-recognized
limitation. Unfortunately, the importance
contamination in PCR is often underestimated.
PCR copies DNA efficiently if the initial
DNA is in good condition. A single DNA entity
(molecule) can become millions or billions
of DNA molecules in about three hours. The
PCR process is sometimes compared to a Xerox
machine since many copies are made. While
initially, this is a useful comparison,
it doesn't communicate the true, chain-reaction
nature of PCR. In PCR, the original DNA
is copied, then the copies are copied, those
copies are copied and so on. This results
in dramatic increases in the amount of DNA
that couldn't be easily accomplished in
the Xeroxing analogy. The PCR process deserves
its classification as a "chain-reaction" because
it has much in common with other chain reactions
such as avalanches. |
| PCR
is also very similar to what happens when
a clinical infection occurs. Clinicians
have known for many years that a single
germ (bacterial cell or virus) contaminating
a wound can produce a massive infection.
Similarly, a DNA molecule can contaminate
(infect) a PCR and become a significant
problem. The ability of small amounts of
DNA to produce false and misleading results
is well-known and well-documented within
the research community, where the technology
originated. Anyone who has caught a cold
from an unknown source, or who has a pollen
allergy should have some sense of how easily
PCRs are contaminated. Actually, it is probably
easier to contaminate a PCR than to catch
a cold since unlike our bodies, PCRs lack
immune systems. The only protection PCRs
have is the technique of the analyst, use
of control samples to monitor contaminants
and careful interpretation. |
| Prevention
of false results involves the use of carefully
applied controls and techniques. As described
later, such controls and techniques can
rarely guarantee that contamination hasn't
influenced the results. In forensic DNA
testing, some of the scientifically worst-case
scenarios can be prevented by keeping DNA
samples from known individuals well out
of range of other items of evidence at all
stages. Most forensic DNA laboratories perform
negative controls, blank samples that will
often detect contaminants in the laboratory.
The blanks detect contaminants by showing
partial or full DNA profiles representing
the contaminants. Alternatively, the blank
may show no profile, consistent with, but
not proving that contamination didn't occur.
Unfortunately, a few forensic DNA laboratories
omit their controls. A few favor the controls
by using special equipment on them, or by
not carrying them through the entire procedure.
Such practices are hazardous, especially
when an important evidentiary sample has
a low amount of DNA, degraded DNA, or otherwise
presents as a minimal or partial (see below)
sample. In short, while PCR is a useful
research tool, all applications require
extreme care and vigilance. |
| STRs |
| This
will be presented in some detail because
STRs are important in current, forensic
DNA testing. The abbreviation, STR stands
for Short Tandem Repeat. STRs are the type
of DNA used in most of the currently popular
forensic DNA tests. STR is a generic term
that describes any short, repeating DNA
sequence. For example, the DNA sequence
ATATATATATAT is an STR that has a repeating
motif consisting of two bases, A and T.
It turns out that our DNA has a variety
of STRs scattered among DNA sequences that
encode cellular functions. For reasons that
are not entirely understood, people vary
from one another in the number of repeats
they have, at least for some STR loci. For
example, person #1 may have ATATAT at a
particular locus while person #2 may have
ATATATATATAT. Thus, STRs are often variable
(polymorphic) and these variations are used
to try and distinguish people. The term,
STR doesn't necessarily imply PCR. PCR is
one of many methods that might be used to
help analyze STRs. STRs have also been analyzed
by DNA sequencing for example. To understand
PCR-assisted STR typing, it is useful to
briefly consider how such PCRs are designed. |
Suppose
that laboratory data revealed the following
DNA sequence:
--ATGCTAGTATTTGGATAGATAGATAGATAGATAGATAGATAAAAAAATTTTTTTT--
The STR is underlined and consists of the sequence,
GATA repeated 7 times. The dashes at the beginning
and end of the overall sequence shown indicate
that there is more sequence available both upstream
and downstream of the region shown. Remember,
DNA is relatively very long and linear and we
are just going to look at a small region of it. |
| Now,
let's say we want to design a PCR to examine
this same locus in other people. To design
the PCR, we need two primers, short synthetic
DNA molecules that recognize the region.
One primer might be, ATGCTAGTA (Italics,
in the above sequence) a sequence that would
recognize the DNA flanking the left side
of the STR. The second primer might be,
AAAAAAAATTTTTT. This is called the downstream
primer and it might be difficult to recognize
in the sequence. The reason it is difficult
to recognize at first is that it is the
complement of the sequence, AAAAAAAATTTTTT
(italics, on the right in the longer sequence
above). See "General Considerations",
for a more detailed discussion. |
What
is the complement of a DNA sequence? This
might be more information than you would
like, but to really understand PCR primers,
try to walk through this:
The complement of
a DNA sequence is the sequence written backwards
exchanging all A's for T's, all T's for
A's, all G's for C's and all C's for G's.
For example, the complement of the sequence,
AGTA is TACT. An easy way to get the complement
of a DNA sequence is to write another line
below the original sequence remembering
that A replaces T and G replaces C. Then
read the lower line backwards:
So, for the sequence:
GATCTTAGCTTTAAAGCCC
write the complementary line below it giving:
GATCTTAGCTTTAAAGCCC
CTAGAATCGAAATTTCGGG
Then, just read the lower line backwards (from
right to left) giving the complement:
GGGCTTTAAAGCTAAGATC
In practical words, the upstream (left) primer
can be a direct reading of the target sequence
while the downstream primer (right) must be the
complement of the directly read sequence.
If the above is confusing, it may suffice to think
of the primers as two arrows that point at one
another with the STR located between them. This
is how the PCR targets the locus and the STR. |
| In
practice, PCR primers are usually at least
17 bases in length. The point here is that
to use PCR to target an STR, the primers
recognize constant, conserved sequences
that flank the actual STR. This means that
the actual length of the target sequence
depends on where the primers are placed
in the flanking sequence. For example, the
Promega and PE, Applied Biosystems test
kits use mostly different primers. For example,
the upstream primer could be designed to
recognize DNA 100 bases upstream of the
sequence shown. Similarly, the downstream
primer could be designed to recognize DNA
further downstream. Such placement of the
primers by design, further upstream and
downstream, would make all alleles (variations)
of the STR appear to be larger than if the
primers are placed by design close to the
STR itself. Wherever the primers are placed,
that defines the region we will examine.
That region will then vary among individuals
due to changes in the STR itself as explained
above for the simple STR based on the repeating
AT motif. |
 |
| After
PCR is used to provide many copies of a
given person's STR, the products (copies)
are separated according to size on an electrophoretic
gel (see RFLP above for more details about
gels). The gel can be flat, as for RFLP,
or it can be in a round tube, called a capillary
with a detector at the end of it. Typical
flat gel STR results look like this: |
| The
black bars are called bands. Each band is
made up of many identical-size DNA molecules
that were produced by PCR. The gel separates
smaller bands (DNA molecules) from larger
ones. The bands near the lower end of the
gel are smaller (ie. the DNA fragments are
shorter in length) than those near the top.
For example, looking at the reference ladder,
the first band near the lower end of the
gel is the smallest STR. For simplicity,
let's say this smallest band contains a
single repeat such as CATG, flanked by other
DNA that the primers actually recognize
in everyone's DNA. The next higher band
in the ladder would then contain 2 repeats,
CATGCATG; the next 3 repeats and so on.
By comparing the positions of bands in the
unknown samples with the reference ladder,
the allele sizes are deduced. In this example,
Sample A had bands at the 2-repeat position
and the 5-repeat position. Common terminology
would call this sample a 2,5 type. Sample
B would be called, 2,4. For a single person,
each locus normally has two alleles and
these can be different (heterozygous) or
the same (homozygous). |
Multiplex
STR |
| One
of the more commonly encountered STR test
designs in forensic testing is called, Multiplex
STR. There are multiplex, PCR reagent kits
sold by both: PE Applied Biosystems and
by Promega. Such systems combine three or
more different PCRs in one reaction that
targets distinct STR loci at the same time.
Three of the commonly used loci are called,
CSF1PO, TPOX and THO1. Again, the names
of the loci have historical significance,
but are of little importance as names. |
| Profiler
Plus and CofilerTM (PE Applied Biosystems)
combines 13 different STR loci. PowerplexTM
(Promega) uses the same 13 loci but the
primers used are different. The Promega
kit incorporates published primer sequences,
a significant scientific advantage, since
without the primer sequences, it is unclear
which STRs at some loci are targeted. A
newer typing kit, IdentifilerTM (PE Applied
Biosystems) incorporates the original 13
loci but adds 2 additional loci. By design
(meaning where the primers were placed on
the DNA by the designers) multiplex STR
loci have different, non-overlapping size
ranges so that DNA fragments from the different
loci will have different, non-overlapping
size ranges. Or, if the sizes overlap, they
are tagged with differing dyes to help distinguish
the 13 loci. These test systems have boldly
ambitious designs and should be considered
fairly experimental, especially for samples
whose quantity and/or quality is outside
tested limits. |
quantity
and/or quality is outside tested limits.
There have been
some discrepancies in profiles obtained
with test kits from the two manufactures
when the same samples were analyzed. These
discrepancies are not extremely common but
are noticeable and fairly dramatic when
they occur. Any base within DNA can mutate
(ie. change). For example, an A base at
a particular position can change to a G.
Such mutations usually first appear in a
sperm or egg cell. Each mutation then appears
throughout the body of the person who results
from such a sperm or egg. Discrepancies
in test kit results are thought to be due
to mutations in the sites that the primers
bind. These events are called, primer binding
site mutations or PBS mutations. |
| Multiplex
STRs are often combined with PCR for another
locus called, amelogenin (pronunciation
varies, but usually it is AM'-EEL-O-GEN-IN).
Amelogenin adds little to the discriminating
power of the test. Its purpose is to help
distinguish male and female sources of DNA
by detecting the X and Y chromosomes. The
amelogenin products have sizes that place
them outside the size ranges of the other
loci. |
| Compared
to PCR-based systems originally introduced,
such as PM plus DQA1 (PE Applied Biosystems)
multiplex STRs are technically more simple
and direct at the allele detection stage.
On the other hand, multiplex STR are slightly
more vulnerable to missing alleles. There
are two reasons for this. 1)Larger DNA fragments
are degraded before smaller ones. This is
simply due to the fact that larger DNA molecules
are bigger targets for degradative enzymes
than smaller DNA molecules. 2)PCR itself
favors (will produce more of) smaller DNA
targets compared to larger ones that take
more time to copy. The copying is done by
a protein called an enzyme. It can finish
copying smaller DNA fragments more rapidly
than larger ones. |
| Both
of these factors result in a tendency for
small DNA fragments to be seen more readily
than larger ones. This is not an overwhelming
tendency but certainly should be considered
when amounts of input DNA are low, when
DNA degradation is suspected, and particularly
when a single small STR allele is weakly
observed at a given STR locus. |
| STRs
are prone to an artifact called, "stutter
bands" or "shadow bands." These
are thought to be due to the DNA repeats
slipping out of register during the PCR
process. These are spurious PCR products
that are usually one repeat length smaller
than the main band. The main problem that
these pose is that it may be difficult to
impossible to determine whether light intensity
bands are due to stutter or due to presence
of a mixture. Although the stutter bands
are predictably below (shorter than) the
main band, the stutter bands do often align
with common alleles. |
 |
| Most
forensic laboratories are aware of stutter
artifacts and many take extremely careful
and appropriate countermeasures. However
stutter artifacts conceivably could play
a role if inappropriate attempts are made
to interpret minor components of a mixture. |
| Some
of the current STR
detection/typing
schemes use thin
tubes called capillaries, instead of flat
gels. When a capillary is used, the results
are often displayed as tracings on a graph,
instead of the image display shown above.
On such tracings, each main STR product
will appear as a large peak while stutter
bands appear as smaller peaks (to the left).
The tracings are called, electropherograms
(ELECTRO-FERO-GRAMS). The tracing data should
be accompanied by... |
 |
| numeric
data that reveals: the measured size of
each PCR product, the intensity (peak height)
and the estimated allele size. The numeric
data can be important in determining the
quality of the results. |
 |
| The
two figures above show some alternative
ways in which STR results/data are presented.
Basically the peaks represent tracings of
bands that have come off the end of a gel,
or may represent tracings of the gel itself,
depending on the equipment used. Larger
DNA fragments are on the right and smaller
ones on the left. There are recommended
standards, called thresholds for how high
or low the peaks may be. |
| All
technology has limitations. For multiplex
PCRs, the most serious limitations are in
the areas of samples that are minimal, degraded,
mixed, over-interpreted, contaminated or
even potential combinations of these. Some
current practices lack support by the established
literature. Over-interpretation can also
occur when there are partial profiles. [1]
The scientific system recognizes the human
tendency toward over-interpretation and
offers the countermeasures: independent
review, independent verification, scientific
controls and demonstrations of reproducibility.
These reviews and controls are considered
integral parts of the scientific process. |
| PCR-based
testing is potentially useful since it is
currently the only quick method of amplifying
really minuscule amounts of DNA. However,
it is important to recognize that PCR based
methods are exquisitely sensitive to contamination
and need to be interpreted with extreme
caution. Match probabilities generated with
some STR typing systems may involve extreme
numbers perhaps giving the impression of
an infallible result. Scientific rigor often
requires that extreme numbers be placed
in a context that considers all aspects
of testing including laboratory error rates
and technical limitations. |
| Partial
Profiles |
Use
of "partial profiles" is a newly
emerging and fairly disturbing trend. A
partial profile is one in which not all
of the loci targeted show up in the sample.
For example, if 13 loci were targeted, and
only 9 could be reported, that would be
termed, a partial profile. Failure of all
targeted loci to show up demonstrates a
serious deficiency in the sample. Normally,
all human cells (except red blood cells
and cells called "platelets")
have all 13 loci.
Therefore, a partial profile represents
the equivalent of less than a single human
cell. This presents some important problems:
- A partial profile essentially proves
that one is operating outside of well-characterized
and recommended limits.
- Contaminating DNA usually presents
as a partial profile, although not always.
For this reason, the risk that the result
is a contaminant is greater than for samples
that present as full profiles.
- A partial profile is at risk of being
incomplete and misleading. The partial
nature of it proves that DNA molecules
have been missed. There is no way of firmly
determining what the complete profile
would have been, except by seeking other
samples that may present a full profile.
|
| Most
forensic laboratories will try to obtain
full profiles. Unfortunately, in an important
case, it may be tempting to use a partial
profile, especially if that is all that
one has. However, such profiles should be
viewed skeptically. Over-interpretation
of partial profiles can probably lead to
serious mistakes. Such mistakes could include
false inclusions and false exclusions, alike.
It could be said that, compared to the first
PCR-based tests introduced into the courts,
use of partial profiles represents a decline
in standards. This is because those earlier
tests, while less discriminating, had controls
(known as "control dots") that
helped prevent the use of partial profiles.
The earlier tests will be discussed below,
primarily for historic reasons, but also
because they do still appear on occasion. |
DQA1
(also known as DQ alpha) |
The
PM plus DQA1TM (PE Applied Biosystems) typing
kit targets six genetic loci. All six are
copied in the initial PCR. The products
from this reaction are then placed onto
two separate typing strips. One strip is
for DQ alpha and the other types the remaining
five loci.
There are several
steps in a DQ alpha PCR test:
- DNA from 50 or more cells is extracted.
Notice that this test requires fewer cells
that the RFLP test. Sensitivity (the number
of cells needed) is the main advantage
of PCR tests. However, the increased sensitivity
also makes PCR tests more vulnerable to
trace contaminants, DNA from unexpected
sources, in other words.
- The DNA from the sample is copied over
and over resulting in amplification of
the original target sequence. The copying
or amplification is accomplished in a
machine specially designed for this purpose.
This machine is called a thermal cycler.
- The amplified DNA is now treated with
a variety of probes that are bound to
a blot (see RFLP: Note: In RFLP, the target
DNA is bound to the blot and the probe
DNA is added. For the DQ alpha dot blot,
the probe DNAs are bound to a small blot
strip and the target DNA is added).
|
 |
| From
the pattern of probes that the amplified
DNA binds to, a potential DNA type, also
called a genotype, can be inferred. |
| DQ
alpha typing strips look like this before
any types are obtained: |
 |
| The
invisible dot to the right of the number
1, has a DNA probe for the 1-allele (variation)
for DQ alpha. The invisible dot to the right
of the 2 has a DNA probe for the 2-allele
and so on. The 1-allele itself has variations,
the 1.1,1.2 and 1.3 subtypes, also called
alleles. Notice that the typing strip has
no specific dot or probe for the 1.2 subtype.
Also, the typing strip can't distinguish
between the 4.2 and 4.3 subtypes and there
is a single dot for these. It is quite possible
that there exist DQ alpha alleles that would
be undetected by the typing strip and alleles
that may be further subtypes of the alleles
that the strip does detect. |
| Here
are some examples of how the strips are
read: |
 |
 |
| This
last example brings up an important issue
with DQ alpha typing. The 1.2 allele is
actually the second most common allele in
most populations. This means there will
be frequent situations where the 1.2 allele
may be present but undetected as in the
last example. An obvious question is: Why
not just have a specific probe for the 1.2
allele? The answer is that the typing strip
already maximizes the probing of a relatively
short stretch of DNA. That is, the DQ alpha
locus itself is only about 240 base pairs
long. The multiple probe typing strip was
probably about the best that could be done
in terms of detecting multiple alleles of
this small locus in a single typing step. |
| Historicall,
DQ alpha was often the first PCR-based test
that forensic labs used. Actually, the DQ
alpha system is quite different from the
majority of PCR applications in the scientific
community. This will be explained in more
detail below. |
 |
 |
Polymarker
(PM) |
| The
PM portion of the PM plus DQA1TM kit involves
5 genetic loci in addition to DQ alpha.
(The manufacturer/ distributor's name has
changed from Roche Molecular to PE Applied
Biosystems). These additional loci are named
for historical reasons. The 5 loci are LDLR,
GYPA, HBGG, D7S8 and GC. Each of these represents
a distinct location or locus in the DNA.
The 5 non-DQ alpha loci have rather simple
allelic variations compared to DQ alpha.
For example, there are only two LDLR alleles
detected by the system, allele A and allele
B. The same is true for GYPA and D7S8 that
each have A and B alleles that can be detected
. The loci, HBGG and GC each have A, B and
C alleles, three variations each in other
words. Thus, reading PM typing strips is
fairly simple at least on the surface. Here
are some examples: |
 |
| The
manufacturer recommends a lower limit of
input DNA for PM plus DQA1 typing. The reason
for this lower limit (2 nanograms, ng) is
the possibility of missing alleles if the
input DNA is too low. Missing alleles (related
terms are "allelic dropout" and
differential amplification) is certainly
possible in the author's opinion, particularly
with low amounts of DNA or degraded DNA.
The phenomenon of failing to detect all
alleles present was discussed in the context
of DQ alpha in the original User's Guide
although the conditions under which this
may occur were not precisely defined with
respect to amount of input DNA and the condition
of input DNA. The potential for missing
an allele of course increases if the control
or S dot is absent or extremely weak, remembering
that the C and S dots test whether a threshold
has been reached at the PCR and later stages.
The following example illustrates how a
DNA profile of one person might change to
that of another due to failure to detect
an allele. |
 |
| Failure
to detect alleles under certain circumstances
is a theoretical probability and was actually
demonstrated for DQ alpha in the original
User's Guide. The theory that addresses
this is called, the "stochastic effect." In
addition to the stochastic effect, a PCR
phenomenon called "differential amplification" may
play a role when input DNA amounts are low,
when input DNA is extensively degraded and
possibly at other times. |
| PM
plus DQA1 is frequently used on mixed DNA
samples from two or more people. The following
example illustrates some of the ambiguity
that can arise if interpretations are not
cautious: |
 |
| In
the example above, since two of the loci
(HBGG and GC) show three alleles, the sample
was a mixture of at least two people. The
problem here is that any two people can
be included as contributing to the mixture.
The typing strip is saturated, meaning every
dot that can be showing is showing. A poorly
recognized limitation of the PM strip is
that it is very easily saturated. For example,
two people of types AB/AA/AB/BB/BC (person
1) and AB/BB/AC/AA/AA (person 2) could,
when their DNAs are mixed produce the pattern
in the example. In fact, there are almost
limitless combinations of 2 types that could
produce the pattern. There are also many
combinations of two people that would lead
to a typing strip lacking one or two dots.
Finally, there are many mixtures that may
mimic a single source of DNA. For example: |
 |
| The
profile in this example could have come
from a single person whose profile was,
AB/AB/AC/AB/AB. Alternatively, two people
of types AA/AB/AC/AA/BB and BB/AB/AA/BB/AA
if mixed, could produce the profile. There
are many other possible combinations of
people who, when their DNAs are mixed, could
produce the profile. In fact, the only individuals
excluded are those possessing the HBGG,
B allele and the GC, C allele assuming that
the typing strip is reliably detecting all
the alleles present. Extreme caution should
be used when there is a possibility of a
DNA mixture. It is arguable whether the
system should be relied upon when there
is an unresolved mixture. The ease of saturation
may lead to false inclusions. False exclusions
are also possible when the amount of input
DNA is low, input DNA is degraded or the
S dot is weak or absent. |
GENERAL
CONSIDERATIONS |
| Native
or natural DNA usually has two complementary
strands. The G residues on one strand bind
C residues on the complementary strand and
A residues bind T's. |
 |
 |
| Notice
that the G-C pairs are depicted with three
lines, or bonds between them while A-T base
pairs have only 2 bonds. This property of
the DNA has been recognized since 1953.
The bonds between G-C base pairs and A-T
base pairs are called, hydrogen bonds. It
is well known that the G-C base pairs are
stronger than A-T pairings because of the
extra hydrogen bond for G-C base pairs.
This means that the stability of the DNA
can be predicted based on the % G+C content.
For example, the sequence shown above has
12 G-C base pairs and a total of 25 base
pairs, for a G+C content of 50%. The two
strands of this sequence are held together
more tightly than a similar length sequence
with a 40% G +C content for example. Such
considerations are fairly important for
DNA testing since any use of PCR or probe
hybridization involves the disruption and
reformation of the two strands. |
| For
example, each cycle of PCR involves heating
the DNA to separate the strands followed
by cooling to the appropriate temperature
to allow the primer DNAs to bind accurately
to their complementary sequences. This process
is also important in hybridizations involving
dot strips or Southern blots where the single-stranded
probes must bind accurately to their complementary
target DNA sequences. |
| If
the temperature of the cooling step is too
warm (warmer than optimal) the probe may
not bind to its target sequence. If the
temperature of the cooling step is too cool,
the probe may bind incorrect targets as
well as correct ones. The latter effect
is called cross-hybridization and has been
documented for some of the probes of PM
plus DQA1. Incorrect binding can also happen
to the primers of STR based PCR tests if
conditions are improper. |
| With
regard to accuracy of hybridization, the
binding of a DNA probe to its complementary
DNA target for example, there may be an
important difference comparing the common
research use of probes and primers and systems
like PM plus DQA1 and even multiplex STRs.
The common use is to target a single sequence
in each PCR using two primers to flank the
sequence. This is followed by some form
of analysis of the sole PCR product. Analysis
may involve a Southern blot to size and
probe the PCR product, or DNA sequencing
of the product to determine the precise
sequence of bases. |
| In
contrast, multiplex PCRs begin with the
simultaneous binding of many different primers
(two for each of the loci). If 14 loci are
targeted, there are at least 28 different
primers involved. For Polymarker, this is
followed by simultaneous probings of the
PCR products. The PM typing strip alone
has 14 different probes (one of the 13 dots
has two probes) while the DQA1 strip has
11 probes. Thus these systems are far more
complex than usual applications of PCR.
The complexity was added to speed the analysis.
All of the loci of PM plus DQA1 could be
analyzed one at a time. |
| One
would think that the sequences of the probes
for PM plus DQA1 would have been chosen
to all have roughly the same G + C content
so that they could all be used at the same
temperature with the same relative accuracy
of each probe. However, sequence inspection
reveals that these in fact were not designed
that way. Based on empirically tested formulas
for predicting best temperatures of probe
binding, the S and C dot probes in particular
appear to be as much as 20 C away from their
temperature optima. It is possible that
this observation may account for some of
the known artifacts that have been observed.
There is some evidence that PM plus DQA1
can function consistently when provided
with relatively undegraded, unmixed DNA
samples available in ample amounts. However,
there is evidence that this system can be
fooled by aged or degraded DNA, mixtures
and low input amounts of DNA. For multiplex
systems with unpublished primers, it is
difficult for the scientific community to
evaluate the general, thermal equality of
the primers. |
| Multiplex
systems have the limitations of any PCR
system in terms of the influence of contamination.
Stray DNA molecules can contribute alleles
or complete DNA profiles. PCR is a replication
process similar to the replication of an
infectious agent. Contamination of a PCR
can occur as easily as the spread of the
common cold virus. In fact, it may be easier
to contaminate a PCR than it is to catch
a cold since PCRs have no immune system
to ward of the contaminating DNA. |
| PCR
is potentially useful since it is the only
method of amplifying really minuscule amounts
of DNA. However, it is important to recognize
that PCR methods are sometimes problematic,
exquisitely sensitive to contamination and
need to be interpreted with extreme caution. |
| Analysis
of Separated Sperm and Non-Sperm Fractions. |
| In
order to perform DNA typing on sperm DNA,
it is desirable to separate the sperm DNA
from any other DNA that may be present.
For example, in swabbed materials from a
rape evidence kit, the swabs may contain
non-sperm cells from the victim as well
as sperm and non-sperm cells from the rapist.
To accomplish separation of the sperm cells,
a process known as differential extraction
is often performed. This involves lysing
(breaking open) the non-sperm cells followed
by spinning (centrifugation) the mixture
to remove the still unbroken sperm cells.
To do this, chemicals, usually an enzyme
called proteinase K (PROTEIN-ACE-K) (breaks
down most proteins), and a mild detergent
(breaks down cellular membranes) are added
to the original mixture of sperm and non-sperm
cells. The enzyme and the mild detergent
can lyse most cell types but not sperm because
the sperm cell membranes have cross-linking
chemical bonds called disulfides (pronounced
DI-SUL-FIDES). Actually, the illustration
below is slightly incorrect because the
proteinase K does remove most of the sperm
tails. These were left in the illustration
to assist in following what happens to the
sperm. |
 |
| When
the treated mixture is spun in a centrifuge,
the sperm are forced to the bottom of the
tube because they are dense. On the other
hand, the broken, non-sperm cells are not
very dense so they stay higher in the tube.
This higher portion is called the supernatant
and after the first spin in the centrifuge,
the supernatant can be removed. The supernatant
is referred to in various ways but usually
it is called the non-sperm or the E1 fraction.
The pellet (the material at the bottom of
the tube) is called the sperm or E2 fraction.
Usually, the pellet will be re-suspended
in fresh liquid and re-spun to help purify
it away from non-sperm DNA. |
| Finally,
the sperm fraction is lysed (the sperm cells
broken open) by adding a chemical called
DTT. The DTT breaks the disulfide bonds
releasing the sperm DNA. |
| The
description of this procedure so far is
quite ideal. It works pretty much as described
for fresh samples. Even with fresh samples
however, some of the non-sperm DNA will
be trapped in the sperm pellet. This can
be a major problem if the amount of sperm
is very low or if the samples are aged and
degraded. Often male cells, most likely
immature sperm or white cells may end up
in the supernatant, variously called the “female” fraction
or “non-sperm” fraction. |
Y
chromosome STR testing. |
| Another
way of getting information on a male contributor
is to use PCR to target STRs on the Y chromosome.
Since females have two X chromosomes, instead
of an X and a Y, the male DNA can sometimes
be distinguished even if there is more female
than male DNA present. Such Y-chromosome
STR tests are in use, but they tend to be
used only after the other tests have failed
to give clear results. There are several
reasons: First, the Y-chromosome is a small
chromosome with no pairing partner. Pairing
of the other chromosomes promotes exchange
of DNA, effectively a scrambling event,
known as recombination. An effect of this
is that various loci, when far enough apart,
can become independent markers (they are
said to be in equilibrium). This means having
allele type 1, at locus A on the chromosome
doesn't imply an increased or decreased
probability of having a particular allele
at locus B on the same chromosome. This
is idealized and requires the locus be far
apart on the chromosome, and also that functional
products in the two regions don't interact.
The Y chromosome, on the other hand, only
recombines (with the X chromosome) at a
small region of the short end of the chromosome.
As far as we know, it doesn't recombine
in other regions of its length. One probable
result of this is that loci on the Y-chromosome
may be more dependent on one another than
loci on other chromosomes. Y-chromosome
STR testing is an active area of research,
since despite the limitations, there may
eventually prove to be some advantages. |
Understanding
PCR Contamination |
| Early
in the history of PCR, its pioneers recommended
certain techniques and practices for preventing
and recognizing contamination. A parallel
with sterile technique in medical clinics
is often drawn. By definition, sterile means
the absence of all living organisms, including
bacteria and viruses. For example, sterile
technique is used when working in the vicinity
of an open wound. PCR technique is similar
to sterile technique and even borrows many
basics concepts from it. This includes the
use of sterile instruments and pipettes
that may contact the samples under analysis.
Similar sterile techniques are used by scientists
who grow cells in culture dishes which are
easily contaminated. |
| PCR
technique differs from sterile technique
in that a clinically sterile solution or
instrument may still harbor DNA. DNA usually
survives heat sterilization used to make
clinical solutions and instruments sterile.
Presence of a single bacterium or virus
would violate sterility. Doctors and nurses
think in terms of a sterile "field",
an area where everything present is sterile
and meticulous effort expended to maintain
that condition of sterility. Once a non-sterile
object, or even one whose sterility can
be questioned, enters the area, the field
is no longer considered sterile. Sterile
technique training involves the development
of a mental image of the sterile field and
how to protect it. Finally, one does not
assume success, just because the mental
picture seems un-breached. Post-sterile
technique practices include monitoring patients
for fever and other sign of infection and
giving antibiotics in advance, actions that
basically assume that technique may well
have failed. As rigorous as sterile technique
concepts are, PCR technique involves the
same concepts and more since a properly
sterilized item of equipment, or a sterilized
solution, may contain DNA that would potentially
influence a PCR. For example, large pressure
cookers called autoclaves are effectively
used to sterilize instruments and some solutions
by heating to temperatures (slightly higher
than the temperature of boiling water) that
most infectious organisms can't survive.
However, such temperatures are insufficient
to destroy contaminating DNA. Thus, autoclaves,
while they achieve the condition of clinical
sterility by getting rid of all bacteria,
are not infallible for PCR. In short, PCR
technique needs to go beyond sterile technique.
Disposable instruments and pipettes and
proper design of PCR laboratories are helpful
considerations in this regard. |
| Good
PCR technique is no guarantee that contamination
didn't influence the results. Steps must
be taken to try and detect contamination.
Negative controls are blank PCRs that have
all the components of the evidentiary PCRs
but have no other DNA added intentionally.
Fortunately, there are often two negative
controls used, one when the DNA is extracted,
and another when the PCR is set up. Any
PCR signal in the negative control would
warn that contamination has occurred. Unfortunately,
the negative controls are virtually the
only warning of PCR contamination. Negative
controls may alert the analyst to general
contamination occurring within the lab or
the lab reagents. These controls don't offer
protection against contamination occurring
before the samples arrived at the PCR lab.
Negative controls also can't rule out contamination
of individual samples. The individual samples
lack individual signs of contamination if
it occurs. Unlike a human patient, a PCR
is incapable of showing signs of infection
(contamination) such as fever or undue pain.
PCRs also have no immune system to ward
off contaminants. |
| It
is often said that the most critical source
of PCR contamination is DNA from previous
PCRs. Again, a PCR produces many DNA copies
of the target DNA sequences. Due to shear
number, these copies (called amplicons)
are a hazard for future PCRs. In terms of
DNA typing, stray amplicons could contribute
single or multiple alleles to a genetic
profile. This would manifest itself in the
form of producing, for example, an extra
dot on a DQA1 or PM typing strip or an extra
band in an STR profile. The fact that the
contaminating dot or band is in fact extra
may or may not reveal itself. Thus, amplicons
can lead to mistyping. |
| However,
a more dangerous source of contamination
is what is called genomic DNA. This is DNA
that hasn't yet been amplified. Genomic
DNA doesn't have the high concentration
of the target DNA copies but is a hazard
because genomic DNA could produce an entirely
false DNA profile. Full profile contaminants
have been documented on multiple occasions
and in multiple laboratories. Partial profile
contaminants are more common and sometimes
constitute a poorly recognized risk in using
partial profiles in evidentiary samples
as evidence. When contamination occurs there
is rarely any way to confirm how it happened. |
| For
example, suppose evidence item #1 has little
to no DNA or has DNA degraded beyond the
ability to function in a PCR. Suppose further
that item #2 is a defendants reference blood
stain that would typically have a high concentration
of undegraded genomic DNA from the defendant.
If item #2 comes in close proximity with
item #1, or comes in contact with item #1,
the genomic DNA from item #2 may contaminate
item #1. Subsequent DNA typing of contaminated
item #1 will give the false impression that
the defendant contributed DNA to item #1
during a crime. Similarly, when there are
multiple items of evidence with some having
larger amounts of DNA and some much lower,
cross-contamination is an important consideration. |
| This
is not to say that all PCR-based results
are due to cross-contamination. However,
the ease of cross-contamination and its
potentially misleading effects may sometimes
be under-appreciated, especially in the
context of match probabilities reported
to be extremely rare. |
| Dealing
with contamination. |
| PCR-based
technology has an interesting history. In
PCR's history, contamination has often led
to false results, and erroneous actions
have been taken based on those results.
This has led some investigators to discontinue
and denounce the technology as being, "too
sensitive." In research, PCR-based
results face routine, often vigorous scrutiny
for the possibility that contamination may
have influenced the results. |
| But,
PCR also has an unparalleled advantage of
powerfully increasing the amount of DNA
from small samples. This can be a great
advantage in both research and forensics.
For that reason, many investigators use
PCR. |
Fortunately,
there are ways of dealing with contamination,
or at least limiting its influence:
- It is extremely important to run negative
controls and background controls through
the entire procedure. Such controls are
virtually the only way of detecting low-level
contaminating DNA molecules.
- Once contamination has been detected,
it is important to discard all current
reagents and clean relevant equipment
and work surfaces. Bleach is useful for
cleaning. However, not all equipment can
be cleaned with bleach. Some laboratories
effectively use gas flames to rid metal
utensils of DNA.
- Thermal cyclers (where PCR is carried
out) need to be cleaned. It is not unusual
for sample tubes leak DNA in the thermal
cycler. Such tubes become soft during
temperature extremes and they do not always
seal properly. It is not usual for sample
tubes to have minuscule pin-holes. Sample
contamination due to contaminated thermal
cyclers has been documented. Hot soapy
water, a sponge and a round scrub brush
are useful for cleaning thermal cyclers
and their sample-tube wells.
- Of course the contamination event should
be discussed. However, discussion alone
is rarely, if ever, sufficient since it
may lead to rationalization of the event
and failure to correct it.
- It is critically important to store
samples in proper containers and keep
known samples well-segregated from other
evidence, particularly evidentiary samples
that have small amounts of DNA. Paper
envelopes or wax-paper folds are unsuitable
containers.
- The laboratory should be extremely
careful not to overstate the scientific
value of the evidence. For example, reports
that a profile occurs in 1 in a billion,
randomly selected individuals greatly
overstate the proven error rate of the
technology since false convictions based
on DNA evidence have been established.
Perhaps such rare match probabilities
could be reached if thoroughly independent
samples produced the same results in multiple,
independent, non-communicating laboratories.
But, for single laboratories, extremely
rare match probabilities misrepresent
the scientific value of technology.
|
Some
laboratories prefer to trace the contaminants.
But, many find it is more time-efficient
to perform a general cleaning and reagent
replacement. The latter makes sense because
contaminant sources often vary, and time
spent tracing the contaminant can be easily
wasted. It is important for laboratories
to have procedures that effectively detect,
acknowledge and deal with contamination
|
| |