|
"The sequence of the human
genome," by J. Craig Venter and
284 others, Science, 291(5507):1304-51, 16 February 2001.
[Authors' affiliations: 14 institutions
worldwide]
Abstract: "A 2.91-billion base
pair (bp) consensus sequence of the euchromatic portion of the human genome
was generated by the whole-genome shotgun sequencing method. The 14.8-billion
pb DNA sequence was generated over 9 months from 27,271,853 high-quality
sequence reads (5.11-fold coverage of the genome) from both ends of plasmid
clones made from the DNA of five individuals. Two assembly strategies--a
whole-genome assembly and a regional chromosome assembly--were used, each
combining sequence data from Celera and the publicly funded genome effort. The
public data were shredded into 550-bp segments to create a 2.9-fold coverage
of those genome regions that had been sequenced, without including biases
inherent in the cloning and assembly procedure used by the publicly funded
group. This brought the effective coverage in the assemblies to eightfold,
reducing the number and size of gaps in the final assembly over what would be
obtained with 5.11-fold coverage. The two assembly strategies yielded very
similar results that largely agree with independent mapping data. The
assemblies effectively cover the euchromatic regions of the human chromosomes.
More than 90% of the genome is in scaffold assemblies of 100,000 bp or more,
and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of
the genome sequence revealed 26,588 protein encoding transcripts for which
there was strong corroborating evidence and an additional ~12.000
computationally derived genes with mouse matches or other weak supporting
evidence. Although gene-dense clusters are obvious, almost half the genes are
dispersed in low G+C sequence separated by large tracts of apparently
noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24%
is in introns, with 75% of the genome being intergenic DNA. Duplications of
segmental blocks, ranging in size up to chromosomal lengths, are abundant
throughout the genome and reveal a complex evolutionary history. Comparative
genomic analysis indicates vertebrate expansions of genes associated with
neuronal function, with tissue-specific developmental regulation, and with the
hemostasis and immune systems. DNA sequence comparisons between the consensus
sequence and publicly funded genome data provided locations of 2.1 million
single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes
differed at a rate of 1 bp per 1250 on average, but there was marked
heterogeneity in the level of polymorphism across the genome. Less than 1% of
all SNPs resulted in variation in proteins, but the task of determining which
SNPs have functional consequences remains an open challenge."
This early-2001 report from Science
was cited 165 times in current journal articles indexed by
Thomson ISI
during November-December 2002. The paper represents the culmination of the
private-sector effort--led by
first author J. Craig Venter--to sequence the human genome. Its latest
two-month citation total was second
only to that of the simultaneously published Nature paper that reported
sequence data from the publicly funded
human-genome collaboration. (Now cited more than 2,000 times, the Nature
paper also has an edge of a few
hundred in an overall citation tally.) Prior to the most recent bimonthly
count, citations to Venter et al. have
accrued as follows:
September-October 2002: 129 citations
July-August 2002: 124
May-June 2002: 146
March-April 2002: 153
January-February 2002: 133
November-December 2001: 175
September-October 2001: 131
July-August 2001: 80
May-June 2001: 62
March-April 2001: 17
Total citations as of 4/4/03: 1,585
SOURCE: Hot
Papers Database (Included with a subscription to the ISI print newsletter Science
Watch®, available from the ISI
Research Services Group. Packaged on a CD-ROM that is mailed with each Science
Watch issue, the Hot
Papers Database contains data on hundreds of highly cited papers published
during the last two years. User interface permits searching by author,
organization, journal, field, and more. Total citations, as well as citations
accrued during successive bimonthly periods, can be assessed and graphed. An
updated CD containing the most recent bimonthly data is mailed with every new
issue of Science
Watch,
six times a year. The CD also includes an electronic version of the Science
Watch
issue in HTML format, for personal desktop access.)

Previous Page | Return to SCI-BYTES
Main Menu
| Return to 2003 Menu
If you came from the Thomson Scientific Web site, click
here to return
|