|
"The sequence of the human
genome," by J.
Craig Venter and 284 others, Science, 291(5507):1304-51, 16
February 2001.
[Authors' affiliations: 14 institutions
worldwide]
Abstract: "A 2.91-billion base
pair (bp) consensus sequence of the euchromatic portion of the human genome
was generated by the whole-genome shotgun sequencing method. The 14.8-billion
bp DNA sequence was generated over 9 months from 27,271,853 high-quality
sequence reads (5.11-fold coverage of the genome) from both ends of plasmid
clones made from the DNA of five individuals. Two assembly strategies--a
whole-genome assembly and a regional chromosome assembly--were used, each
combining sequence data from Celera and the publicly funded genome effort. The
public data were shredded into 550-bp segments to create a 2.9-fold coverage
of those genome regions that had been sequenced, without including biases
inherent in the cloning and assembly procedure used by the publicly funded
group. This brought the effective coverage in the assemblies to eightfold,
reducing the number and size of gaps in the final assembly over what would be
obtained with 5.11-fold coverage. The two assembly strategies yielded very
similar results that largely agree with independent mapping data. The
assemblies effectively cover the euchromatic regions of the human chromosomes.
More than 90% of the genome is in scaffold assemblies of 100,000 bp or more,
and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of
the genome sequence revealed 26,588 protein encoding transcripts for which
there was strong corroborating evidence and an additional ~12,000
computationally derived genes with mouse matches or other weak supporting
evidence. Although gene-dense clusters are obvious, almost half the genes are
dispersed in low G+C sequence separated by large tracts of apparently
noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24%
is in introns, with 75% of the genome being intergenic DNA. Duplications of
segmental blocks, ranging in size up to chromosomal lengths, are abundant
throughout the genome and reveal a complex evolutionary history. Comparative
genomic analysis indicates vertebrate expansions of genes associated with
neuronal function, with tissue-specific developmental regulation, and with the
hemostasis and immune systems. DNA sequence comparisons between the consensus
sequence and publicly funded genome data provided locations of 2.1 million
single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes
differed at a rate of 1 bp per 1250 on average, but there was marked
heterogeneity in the level of polymorphism across the genome. Less than 1% of
all SNPs resulted in variation in proteins, but the task of determining which
SNPs have functional consequences remains an open challenge."
This early-2001 report from Science
was cited 124 times in current journal articles indexed in the
ISI database during July-August 2002. The paper represents the culmination of
the private-sector effort--led by first author J. Craig Venter--to sequence
the human genome. Its latest two-month citation total was second only to that
of the simultaneously published Nature paper that reported sequence
data from the publicly funded human-genome collaboration. (Now cited more than
1,500 times, the Nature paper also has an edge of a few hundred
citations in an overall tally.) Prior to the most recent bimonthly count,
citations to Venter et al. have accrued as follows:
May-June 2002: 146 citations
March-April 2002: 153
January-February 2002: 133
November-December 2001: 175
September-October 2001: 131
July-August 2001: 80
May-June 2001: 62
March-April 2001: 17
Total citations as of 11/2002: 1,210
SOURCE: Hot
Papers Database (Included with a subscription to the ISI print newsletter Science
Watch®, available from the ISI
Research Services Group. Packaged on a CD-ROM that is mailed with each Science
Watch issue, the Hot
Papers Database contains data on hundreds of highly cited papers published
during the last two years. User interface permits searching by author,
organization, journal, field, and more. Total citations, as well as citations
accrued during successive bimonthly periods, can be assessed and graphed. An
updated CD containing the most recent bimonthly data is mailed with every new
issue of Science
Watch,
six times a year. The CD also includes an electronic version of the Science
Watch
issue in HTML format, for personal desktop access.)

Previous Page | Return to SCI-BYTES
Main Menu
| Return to 2002 Menu
If you came from the Thomson Scientific Web site, click
here to return
|