This blog post explores how third-generation sequencing, overcoming the limitations of existing technologies, enables rapid and precise personal genome sequencing and has driven the shift towards personalized and predictive medicine.
The birth of a human individual begins the moment the fertilized egg starts cell division after sperm and egg unite. As cells divide, their number gradually increases. Each cell differentiates to perform distinct functions, ultimately growing into a complete organism that enables our current life activities. At this stage, which function each cell will perform is determined by genes. These genes are stored in DNA, the genetic material, which encodes information through specific repeating sequences of four nucleotides. Scientists have strived to precisely map this nucleotide sequence, culminating in the Human Genome Project (HGP), which aimed to decode the entire human DNA sequence. Given the technological capabilities of the time, completing this massive project took a lengthy 13 years. This was because, despite human DNA consisting of approximately 3 billion base pairs, the technology available then could only read short stretches of about 300 base pairs at a time. Ultimately, it required a complex process: cutting the DNA into countless fragments, replicating each fragment to analyze its sequence, and then reassembling these fragments into a single continuous sequence.
Throughout the HGP’s 13-year duration, the need for faster and more efficient sequencing methods was constantly raised, and research to meet this demand continued unabated. As a result, various improved techniques were developed. Efforts were made to increase speed by reducing the time required for DNA replication or enhancing equipment efficiency, while maintaining the basic experimental approach. However, these improvements alone had limitations in dramatically reducing the time needed for base sequence analysis. Ultimately, a fundamental change in the experimental method itself was necessary.
Amid this demand, the first technology introduced was Next Generation Sequencing (NGS). While NGS fundamentally shares a similar principle for reading base sequences with conventional methods, it significantly shortened analysis time by dividing DNA into shorter fragments and utilizing a large-scale parallel processing approach to read these fragments simultaneously. This technology became possible thanks to the dramatic improvement in computing power, but it also carried the limitation of still partially retaining the long preprocessing steps and high error rates that were drawbacks of existing analysis methods. For this reason, technologies are now being developed that utilize entirely different principles from conventional methods, enabling the rapid and highly accurate reading of long base sequences in a single pass. These technologies are referred to as third-generation sequencing methods and are collectively termed Single Molecule Real-Time Sequencing (SMRT).
The SMRT technology currently under development is broadly categorized into four main approaches. The first method utilizes the minute luminescence emitted when fluorescently labeled bases bind during DNA synthesis. Typically, light emitted from chemical reactions scatters in multiple directions, making it difficult to detect the weak signals produced in small-scale reactions. However, utilizing a zero-mode waveguide confines light so it cannot propagate within the waveguide and is emitted only in a specific direction. This allows for much more efficient signal capture with the same amount of light. Consequently, by immobilizing polymerase responsible for DNA synthesis on the bottom surface of the waveguide and enabling the real-time synthesis of fluorescently labeled bases, it becomes possible to analyze the base sequence using only the faint luminescence generated during the single-stranded DNA synthesis process. This method relies on luminescence signals, so the probability of errors occurring in the detection device is relatively high. However, these errors are not systematic errors with a specific directionality; they correspond to random errors that can be statistically corrected through repeated measurements. Furthermore, its ease of parallelization enables large-scale simultaneous analysis, allowing for very fast base sequence reading speeds.
The second method involves fixing DNA molecules and then tunneling electrons through them to measure the tunneling energy spectrum corresponding to the type of base, similar to the principle applied in scanning tunneling microscopes. This technology does not require the DNA replication process, enabling the reading of very long strands in a single-molecule state at once. It also requires almost no pretreatment, theoretically promising significant reductions in analysis costs. However, it is still considered premature for full-scale research to proceed actively, as sufficient technical precision and stability have not yet been achieved, and numerous challenges remain, including resolving electronic noise and ensuring device reproducibility.
The third method involves measuring the membrane potential, which varies depending on the base type, as a DNA single strand passes through a microbial membrane protein. After separating the DNA double strand into a single strand, passing it through a nanopore containing the membrane protein causes subtly different electrical changes per base. These are converted into precise electrical signals to interpret the base sequence. This technology offers the advantage of a simple structure, facilitating equipment miniaturization. Like the second method, it can read long DNA strands continuously because it does not require replication. Currently, active research is underway to enhance analytical accuracy by utilizing various media or catalysts to control the speed at which DNA passes through the protein nanopore.
The final method involves measuring the unique electrical signals generated by each base as DNA passes through an extremely narrow pore composed of a conductor and a dielectric to determine the base sequence. This differs from the third method, which utilizes microbial membrane proteins, as it employs a semiconductor-based solid nanopore. Utilizing semiconductor devices is evaluated as a technology with the potential to offer the highest competitiveness in terms of speed and cost, as it ensures structural stability and facilitates mass production. However, the process of implementing this into actual equipment faces numerous technical challenges that must be resolved, such as increased signal noise, measurement deviations due to changes in pore size, and difficulties in controlling DNA movement speed. Consequently, significant hurdles remain before reaching the practical application stage.
Thus, through continuous technological development and advancement, the time and cost required for base sequence analysis have gradually decreased. This signifies that base sequence analysis technology, once used solely for research purposes, is now gradually becoming accessible at a level close to the everyday lives of ordinary people. Genetic testing in health screenings, widely implemented today, is a prime example, allowing for the approximate estimation of the probability of developing diseases like cancer by analyzing base sequences. The most widely known example is Angelina Jolie’s case in 2013, where she underwent preventive mastectomy after learning through genome analysis that she had a very high probability of developing breast cancer. Furthermore, rapid DNA sequencing technology is also crucial in the research of rare genetic diseases. To study genetic diseases at the DNA sequence level, it is necessary to obtain and analyze extensive DNA sequence information not only from the patient themselves but also from their close relatives. Had it taken over a decade to sequence a single person’s DNA, as it did during the Human Genome Project era, such research would have been impossible from the outset.
Furthermore, DNA sequencing holds potential for diverse applications beyond research, extending into everyday life. For instance, the startup 23andMe analyzes customers’ DNA using saliva samples, providing basic health information such as carrier status for rare diseases or the likelihood of developing specific genetic disorders. Simultaneously, it offers data on how closely an individual’s ancestry is linked to various ethnic groups. In the racially diverse American society, such services garnered significant interest, leading to 23andme’s rapid growth. They accumulated the DNA sequence information of countless customers, and this data was used for their own research or provided to other research institutions.
Until now, research has primarily focused on developing technologies to analyze DNA sequences quickly and accurately, and accumulating vast amounts of DNA sequence data based on this. However, sufficient samples have now been secured, and we are reaching a technological level where individual-level DNA sequencing can be analyzed almost in real time. While faster and more precise analysis methods remain important and their research is not rendered meaningless, the more critical task going forward is exploring how to meaningfully utilize human, animal, and pathogen DNA sequence data that can be easily and rapidly obtained anytime, anywhere. One direction could be developing services that appeal to people’s interests, like 23andMe. The potential for application in public health is also vast, such as analyzing disease characteristics and spread patterns based on pathogen mutations. What is clear is that the future of sequencing analysis will be driven not only by improvements in technological speed but also by new ideas on how to innovatively utilize the accumulated data.