|
| |
Goals HGP
Goals
The goals of the original HGP were not only to determine more than 3 billion
base pairs in the human genome with a minimal error rate, but also to identify
all the genes in this vast amount of data. This part of the project is still
ongoing, although a preliminary count indicates about 30,000 genes in the human
genome, which is fewer than predicted by many scientists.
Another goal of the HGP was to develop faster, more efficient methods for DNA
sequencing and sequence analysis and the transfer of these technologies to
industry.
The sequence of the human DNA is stored in databases available to anyone on the
Internet. The U.S. National Center for Biotechnology Information (and sister
organizations in Europe and Japan) house the gene sequence in a database known
as Genbank, along with sequences of known and hypothetical genes and proteins.
Other organizations such as the University of California, Santa Cruz[1], and
Ensembl[2] present additional data and annotation and powerful tools for
visualizing and searching it. Computer programs have been developed to analyze
the data, because the data themselves are difficult to interpret without such
programs.
The process of identifying the boundaries between genes and other features in
raw DNA sequence is called genome annotation and is the domain of
bioinformatics. While expert biologists make the best annotators, their work
proceeds slowly, and computer programs are increasingly used to meet the
high-throughput demands of genome sequencing projects. The best current
technologies for annotation make use of statistical models that take advantage
of parallels between DNA sequences and human language, using concepts from
computer science such as formal grammars.
Another, often overlooked, goal of the HGP is the study of its ethical, legal,
and social implications. It is important to research these issues and find the
most appropriate solutions before they become large dilemmas whose effect will
manifest in the form of major political concerns.
All humans have unique gene sequences; therefore the data published by the HGP
does not represent the exact sequence of each and every individual's genome. It
is the combined genome of a small number of anonymous donors. The HGP genome is
a scaffold for future work in identifying differences among individuals. Most of
the current effort in identifying differences among individuals involves single
nucleotide polymorphisms and the HapMap.
| |
|