01、Human genome story
In 1986, Nobel Prize winner Renato Dulbecco wrote in the journal Science, highlighting the huge role of sequencing the human genome in curing cancer, and called on the US government to support human genome sequencing to promote cancer research.
That same year, Dulbecco and other scientists jointly launched the Human Genome Project. In 1986, the U. S. Department of Energy, which previously investigated the effects of radioactivity on DNA, published a report saying: " Like the contribution of human anatomy to the development of medicine, the understanding of the human genome will provide essential support for the progress of medical and other health sciences.”
In 1990, the human genome project, known as $3 billion, 3 billion base pairs, was officially launched by the U. S. Department of Energy and the National Institutes of Health (NIH). The goal of the program is to read out all 3 billion base pairs encoding human genetic instructions and map the human genome. This project is expected to take 15 years, at a cost of $3 billion, or $1 per base pair. Later, Britain, France, Germany, Japan, India and other countries joined the project, and China officially joined the Human Genome Project in September 1999 and undertook 1% of the sequencing work. The Human Genome Project forms an international human genome sequencing consortium.
On June 26,2000, President Linton solemnly announced at the White House that "the most important and amazing map ever made" —— human genome sketch was completed.

In February 2001, the specific sequence information, sequencing methods and sequence analysis results of the human genome draft of the human genome work were published in the journal Nature and Science, respectively.
Understanding of human genome-wide features:
More than 3 billion DNA encoding letters (bases);
DNA consists of four letters ATCG and, if written, has more than 1.5 million pages;
The structure of the DNA is arranged in base pairs, just like a spiral staircase (double helix);
The total number of genes is about 20,687;
The genome includes 23 chromosome pairs, totaling 46 genes;
The complexity of humans comes from the gene network (more important than the number of individual genes), which can be turned on or off [selective gene expression] in certain circumstances, and function in different combinations to produce almost infinite functions;
Genes constitute only a small part of the genome (only 2%) of the genome, and most DNA either regulates genes, has unknown functions or does nothing (junk DNA);
Part of our past evolution was in the genome, DNA fragments no longer work, they are DNA relics from ancient organisms, these DNA has been dormant, these fragments far exceed genes;
Humans have 99.9% similarity at the DNA level (one base difference per 1,200 bases, called the SNP, how many SNP do humans have?)。
02、The development and principles of sequencing technology
This is a difficult task, because the sequencer can read no more than 1000bp long, therefore, the genome must be divided into fragments that the sequencer can process. Scientists first divide the entire genome into larger fragments of about 150,000 base pairs, then clone thousands of thousands of copies in the bacteria, and then determine where the cloned fragments are on the chromosome. Then, each cloned fragment is further decomposed into a number of random small fragments, and the sequence is determined and then matched according to the end overlap. Finally, the cloned fragments that have completed sequence determination will be positioned to the corresponding position of the chromosome and eventually form the complete genetic code. The technique is exhaustive, but it is too slow. By 1998, national scientists had spent half the money, but had finished only sequencing 3% of the entire human genome.
In 1998, geneticist Craig Venter worked with DNA sequencer manufacturer ABI (including PE) to create Celera Genomics), which began sequencing the human genome, and plans to complete the human genome in 2001.
Venter was originally a sequencing expert at the National Institutes of Health (NIH) and is a member of the International Human Genome Sequencing Consortium. But he thinks with the traditional "chain termination" sequencing efficiency is too low, he put forward a more simple and quick sequencing method, called shotgun sequencing method (Shotgun sequencing), commonly known as "shotgun", this method skip the positioning step, interrupt the genome for millions of DNA fragments, and end sequencing of each segment, and then apply certain algorithm of computer program will have the same end sequence fragment integration together, to get the whole genome sequence. But the proposal was unanimously opposed by the NIH researchers.
While the shotgun sequencing method has previously been used to determine DNA sequences from bacteria and viruses, many experts believe that the accuracy is questionable for such a complex human organism.
Despite Venter, he was unable to get public funding for his approach, much to his frustration.
In July 1992, Venter left the National Institutes of Health and became the founder and chairman of the board of the Genomics Institute (TIGR), a nonprofit genomics institute in Rockville, Maryland. He served as its president until 1998, when he founded Cerrera.
In Serrera, Venter used the "shotgun" method for sequencing, soon surpassing the results of the International Human Genome Sequencing Consortium for eight years, even the Nobel Prize winner, the discovery of DNA double helix structure Watson (James D. Watson) Also have to admit that his discovery was "a great moment in science."
It doesn't matter if the two teams just disagree on sequencing methods, but they also have very different ideas about the processing of sequencing data.
International human genome sequencing alliance from beginning to end that the human genome is the common property of all mankind, and as long as a results will be immediately uploaded to the public data GenBank, and the leadership of the ra, its goal is to become the authoritative source of genomics and related medical and biological information, they are ready to build a powerful genetic database and sell its access, for other companies to find new genes and research and development of new drugs.
In an effort to stop Venter's move to privatize genetic data, the Human Genome Project is constantly speeding up its work, hoping to make the results public before Venter to block any related ownership claims. But they were still one step behind. On April 6,2000, Venter's team told the world that they had finished sequencing the human genome and that they had patented 6,500 human genes. Its value-added database is very useful, and many public scientific research institutions and pharmaceutical companies are scrambling to buy it.
To prevent the human genome patent fell into the hands of Intel, "human gene project" unofficial leader Francis Collins and Intel began secret contact, due to the "human gene project", so Collins face great pressure, the focus of the debate is: the scientific milestone honor should delimit to whose head? Whose genome ranking is more complete, accurate, and useful? Should this most important human data be free to the world?
The intervention of Bill Clinton, then president of the United States, played a key role in the truce. In the end, Venter eventually abandoned the patent request, and the two sides reached an agreement to jointly announce a successful sketch of the human genome.
The genetic battle has finally settled, and the fierce competition itself is good for humans, prompting the sequencing of the human genome far faster than anyone had imagined a decade ago.
Originally according to the original idea, the project will be completed in 2005, before the center to join the competition, with national scientists to 1998 only 3% of progress, really to complete 100% sequencing work to the monkey, but under the stimulation of center, researchers showed incredible cohesion, focus and amazing speed, although Intel first finished genome sequencing, but followed by international group, five years ahead of the original plan, is not a miracle.
In this process, in addition to the contribution of Intel, automated sequencing technology also greatly promoted the work of large-scale genome sequencing, before the automated sequencing technology, people can only rely on manual sequencing, at that time, a DNA sequencing experiment need busy 2-3 days, can only read about 300-1000 base pairs, and human DNA has 3 billion base pairs.(Biological kb, nt, bp representations that are commonly used to describe DNA. And kb = kilobase pair kilobase, nt = nucleotide nucleotide, bp= base pair base pair)
With the maximum read length of 1000bp, 3,000,000,000 divided by 1,000,000,000 experiments, 2,3,000,000, about 16,438.37 years. Decrypting the human genome is a fantasy.
The advent of automated sequencing technology has greatly promoted the feasibility of the human genome project.
Automation is the ultimate secret to Sanger sequencing stand out
The road to the automation of "first-generation sequencing"
In 1987, Leroy Hood and Michael Hunkapiller based on the development of ABI 370, an instrument to automates the Sanger sequencing process. Its most important innovative achievement is the automatic labeling of DNA fragments with fluorescent dyes rather than radioactive molecules. This variation not only makes the experimental method safer, but also allows the computer analysis of the acquired data [Hood et al., 1987] (It is also reported to be 1986). Although this is only a semi-automatic sequencer, it still saves scientists a lot of time. In 1998, ABI launched the ABI Prism 3700 capillary sequencer, which automated its sampling, data collection, quality control and preliminary analysis. It was the first truly fully automatic sequencer, realizing the leap of sequencing technology from manual to automatic, and making a historic contribution to the Human Genome Project.
It is worth mentioning that Venter was the first user of the automatic gene sequencer. In 1986, Nature magazine reported an automatic DNA sequence analysis technology invented by Smith and others, and Venter immediately got in touch with the inventor. A few months later, Venter had the NIH's first automated gene sequencer, with advanced tools, becoming the most genetically discovered geneticist.
What's more, other higher-throughput sequencing technologies are emerging, stimulated by the human genome project.
NGS: The rise of next-generation sequencing technology
In 2005,454 Life Sciences (454 Life Sciences) launched the revolutionary ultra-high-throughput genome sequencing system based on pyrosequencing (using side synthesis and side sequencing technology), ——Genome Sequencer 20 System, meaning a new era for —— secondary sequencing.
The second-generation sequencing technology, represented by side synthesis and side sequencing, Ability to sequence hundreds of thousands to millions of DNA fragments at a time, Called high-throughput sequencing technology (High-throughput sequencing) or massively parallel sequencing (Massively parallel sequencing, MPS), This is a revolutionary change to traditional sequencing, Also because of its transgenerational significance, This type of sequencing technology is called second-generation sequencing or next-generation sequencing (next generation sequencing, NGS). NGS is also called deep sequencing because it can conduct in-depth, detailed and comprehensive analysis of the transcriptome and genome of a species.
As mentioned earlier, Craig Venter (Craig Venter) used the "shotgun" strategy for sequencing the human genome by cutting the genome to be sequenced into random fragments, sequencing the fragments, and then putting the sequencing results together to get a complete genome sequence.
The principle of NGS is similar, is also need to break the genomic DNA to small enough fragments (depending on the size of the sequencer can be determined), sequencing of the fragments at the same time, and finally the sequence, the difference is that, sequencing need batch or use multiple capillary sequencer to these fragments at the same time, and NGS only need a device at the same time for hundreds of thousands to millions of pieces of sequence determination, so, then significantly improve the sequencing speed, greatly shorten the time needed for sequencing a genomic DNA, And it has a lot of cost savings.
The first human genome, from 1990 to 2003, cost over $3 billion.(Although the Human Genome Project was announced in 2000, only 28% of the genome was actually completed, and there were many gaps and an error rate of 1 in 1000. The final version was published in 2003, completing 99% of the genome, with an error rate of less than 1 in 10,000.) Even after 2003, the cost of sequencing a person's genome by Sanger is more than $30-50 million, which is very expensive.
Now, the sequencing cost of the high-throughput sequencer T20 is less than 10 yuan / G. Based on the calculation of a human genome with the whole genome 3G and 30x, the cost is within 1000 yuan.
Sequencing technology is still developing. In addition to the second-generation sequencing, the third-generation sequencing technology represented by PacBio company's SMRT (single molecule real time sequencing) single-molecule real-time sequencing technology and Oxford Nanopore Technologies nanopore single-molecule sequencing technology has also begun to emerge in the market.
Compared with the previous two generations of sequencing technology, the significant advantage of the third generation of sequencing technology is that it can achieve single molecule sequencing, its sequencing process does not rely on PCR amplification, and has super long sequencing length (read length).
These features give the third-generation sequencing technology unique advantages in certain application fields. First, the sequencing process without PCR amplification not only simplifies the experimental process, but also reduces the sequence errors caused by PCR amplification and improves the accuracy of the sequencing data. Secondly, ultra-long read length makes the third generation sequencing technology perform better in genome assembly, structural variation detection, gene expression analysis and other aspects, especially in the analysis of complex gene regions and repetitive sequences, ultra-long read length can provide more comprehensive information. Therefore, the third-generation of sequencing technology has shown broad application prospects in genomics research, clinical diagnosis, genetic disease screening, microbial detection and other fields.

The sequencing principles of different sequencing platforms have their own advantages. We will introduce them one by one in the subsequent learning and sharing.
The company's product recommendation:
1.73025-69-1 https://www.bicbiotech.com/product_detail.php?id=5468
2.96363-20-1 https://www.bicbiotech.com/product_detail.php?id=5469
3.1268520-70-2 https://www.bicbiotech.com/product_detail.php?id=5472
4.1332527-03-3 https://www.bicbiotech.com/product_detail.php?id=5473
5.2407-68-3 https://www.bicbiotech.com/product_detail.php?id=5474
