Information

What is the charge on oligonucleotide 5' pGpGpApCpT 3' at pH 7.00?

What is the charge on oligonucleotide 5' pGpGpApCpT 3' at pH 7.00?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

What is the charge on the nucleotide 5'pGpGpApCpT 3' @ pH 7.00?

I thought adenine has 1NHgroup and Guanine hasNHandOHgroups and cytosine and thymine hasOHgroups on it and phosphate at 5' end anOHat 3' end .so at pH 2.00-3.00OHwill give away an electron and becomes negatively charged.so is it +2+2-1-1-1 which is -1? please explain?


I didn't quite understand where you're getting all your OH groups, maybe from the enol tuatomers of the nucleotides. I tried to look up the pKa's for the keto tautomers of the nucleotides on Wikipedia1,2,3,4.

In general, if pH is below a functional group's pKa, that group will be protonated, and if pH is above pKa, it will be deprotonated, though this is an equilibrium.

If we assume pH of 7 and assume I have the pKas right ( these values are for the free nucleic acids, no deoxyribose attached ), then the primary amines on A, G, and C will be protonated NH3 groups with positive charges.The amides on G and T should remain deprotonated and uncharged.

So with the sequence GGACT, we have 4 primary amines, so 4 positive charges. However, we also have 5 phosphates, each carrying a negative charge, so +4 plus -5 is -1.

Of course all of this goes out the window once we bind that oligonucleotide to another oligonucleotide, I assume the hydrogen bonding will alter the pKas and charges. For all I know, simply attaching the nucleic acids to a deoxyribose and stacking them together changes the pKas.


There is no positive charge on the bases at pH 7, but there are 4 phosphodiester bonds which have -1 charge on each phosphate so total -4 also the terminal Phosphate at 5' end is free so it has a charge of -2. Thus the overall charge of the oligonucleotide is -4-2=-6 (determined by number of phosphate groups)


There is no positive charge on the bases at pH 7, thus the overall charge of the oligonucleotide is -4 (determined by number of phosphate groups)… See details here: https://books.google.cz/books?id=v9HL5VyRmZcC&lpg=PA204&ots=DQYKZ9PxXg&dq=why%20amino%20group%20of%20nucleo%20bases%20protonated&pg=PA204#v=onepage&q=why%20amino%20group%20of%20nucleo%20bases%20protonated&f=false


Binding and biomimetic cleavage of the RNA poly(U) by synthetic polyimidazoles

Four polyimidazoles were used in the binding and cleavage studies with poly(U). The two polydisperse polyvinylimidazoles were previously described by others, while the other two new polymers of polyethyleneimines were prepared by cationic polymerization of oxazolines. The latter had imidazole units attached to each nitrogen of the polymers. They were characterized by gel permeation chromatography and had very low polydispersities. When they were partially protonated they bound to the poly(U) and catalyzed its cleavage by a process analogous to that used by the enzyme ribonuclease A. The kinetics of the cleavage were followed by an assay we had previously described using phosphodiesterase I from Crotalus venom after the cleavage processes. Cleavage of poly(U) with Zn 2+ was also examined, with and without the polymers. A scheme is described in which these cleavages could be made sequence selective with various RNAs, particularly with important targets, such as viral RNAs.

The polymeric character of natural enzymes has stimulated us to expand the field of artificial enzymes to those based on synthetic polymers, as we have reported earlier (1). Particularly important would be artificial enzymes that could cleave RNA, preferably with sequence specificity (2, 3). They could potentially be used to destroy dangerous viral RNA molecules if we attached a short DNA to the polymers with the necessary sequence for selective binding. This would be an advance in molecular medicine—using an active catalyst rather than simply a modifier of biological systems, as is typical with other medicines.

We have described an analytical method to follow the rates of RNA cleavage (4) by various catalysts or reactants that imitate the mechanism used by ribonuclease A. In this enzyme, the C-2′ hydroxyl group of a ribose attacks the neighboring phosphate that links the C-3′ hydroxyl with a C-5′ hydroxyl of another ribose. The result is the formation of a C-2′/C-3′ cyclic phosphate and a liberated C-5′ hydroxyl group. Then we treat the product with the phosphodiesterase I from Crotalus venom, which cleaves RNA in a different manner, hydrolyzing the remaining C-3′/C-5′ phosphate link to remove the phosphate from C-3′ and leave it behind on C-5′. With RNA, all the products of the second step are nucleotide 5′-phosphates except those where the ribonuclease-like cleavage has left the 5′-position unphosphorylated. With poly(U) as the substrate, every initial ribonuclease-like cleavage site leaves an unphosphorylated uridine after the phosphodiesterase I treatment, that we assay with quantitative HPLC. We have shown that this assay for RNAse-like cleavage of RNA is quantitatively reliable (4).

In ribonuclease A, the cleavage is catalyzed by imidazole rings of two histidines, one less basic as the free imidazole base (His-12) and the other more basic as an imidazolium cation (His-119) (5). In the most likely mechanism (6), the imidazole of His-12 deprotonates the C-2′ hydroxyl group as the oxygen adds to the neighboring phosphate, forming a phosphorane intermediate with help as the positive charged imidazolium group on His-119 adds a proton to an oxyanion of the phosphate. In subsequent steps, the phosphorane intermediate decomposes, expelling the C-5′ hydroxyl group of the next unit while leaving a C-2′/C-3′ cyclic phosphate. This is then hydrolyzed in subsequent steps to the final product with a phosphate group on C-3′ and a hydroxyl group back on C-2′. We have described biomimetic systems that mimic this process, using either concentrated imidazole buffer with RNA itself (6 �) or cyclodextrins carrying attached imidazole rings to cleave RNA analogs (11 �). We now describe the cleavages of poly(U) (>򠌀 units) by two classes of synthetic polyimidazoles as ribonuclease A mimics, again forming a C-5′ hydroxyl group on the leaving RNA unit as with ribonuclease A. We also have evidence for binding of some of the polymers to the RNA prior to cleavage. The cleavage was enhanced by the binding characteristics of the polymer itself (compared with those without binding sites), rather than the addition of exogenous metal cation cofactors (although addition of Zn 2+ did improve the hydrolysis in our cases), which makes it more attractive for further cellular application.


The instant application contains a Seguence Listing which has been submitted in ASCII format via EFS-Web and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jan. 18, 2010, is named 308410US.txt and is 462,420 bytes in size.

FIELD OF THE INVENTION

The invention is in the field of virology. More specifically, the invention is in the field of coronaviruses.

BACKGROUND OF THE INVENTION

Severe acute respiratory syndrome (SARS), a worldwide outbreak of atypical pneumonia with an overall mortality rate of about 3 to 6%, has been attributed to a coronavirus following tests of causation according to Koch's postulates, including monkey inoculation (R. Munch, Microbes Infect 5, 69-74, January 2003). The coronaviruses are members of a family of enveloped viruses that replicate in the cytoplasm of animal host cells (B. N. Fields et al., Fields virology, Lippincott Williams & Wilkins, Philadelphia, 4 th ed., 2001). They are distinguished by the presence of a single-stranded plus sense RNA genome, approximately 30 kb in length, that has a 5′ cap structure and 3′ polyA tract. Hence the genome is essentially a very large mRNA. Upon infection of an appropriate host cell, the 5′-most open reading frame (ORF) of the viral genome is translated into a large polyprotein that is cleaved by viral-encoded proteases to release several nonstructural proteins including an RNA-dependent RNA polymerase (Pol) and an ATPase helicase (Hel). These proteins in turn are responsible for replicating the viral genome as well as generating nested transcripts that are used in the synthesis of the viral proteins. The mechanism by which these subgenomic mRNAs are made is not fully understood, however transcription regulating sequences (TRSs) at the 5′end of each gene may represent signals that regulate the discontinuous transcription of subgenomic mRNAs (sgmRNAs). The TRSs include a partially conserved core sequence (CS) that in some coronaviruses is 5′-CUAAAC-3′. Two major models have been proposed to explain the discontinuous transcription in coronaviruses and arterioviruses (M. M. C. Lai, D. Cavanagh, Adv Virus Res. 48, 1 (1997) S. G. Sawicki, D. L. Sawicki, Adv. Exp. Med Biol. 440, 215 (1998)). The discovery of transcriptionally active, subgenomic-size minus strands containing the antileader sequence and transcription intermediates active in the synthesis of mRNAs (D. L. Sawicki et al., J. Gen Virol 82, 386 (2001) S. G. Sawicki, D. L. Sawicki, J. Virol. 64, 1050 (1990) M. Schaad, R. S. J. Baric, J. Virol. 68, 8169 (1994) P. B. Sethna et al., Proc. Natl. Acad. Sci. U.S.A. 86, 5626 (1989)) favors the model of discontinuous transcription during the minus strand synthesis (S. G. Sawicki, D. L. Sawicki, Adv. Exp. Med Biol. 440, 215 (1998)).

The coronaviral membrane proteins, including the major proteins S (Spike) and M (Membrane), are inserted into the endoplasmic reticulum Golgi intermediate compartment (ERGIC) while full length replicated RNA (+ strands) assemble with the N (nucleocapsid) protein. This RNA-protein complex then associates with the M protein embedded in the membranes of the ER and virus particles form as the nucleocapsid complex buds into the ER. The virus then migrates through the Golgi complex and eventually exits the cell, likely by exocytosis (B. N. Fields et al., Fields virology, Lippincott Williams & Wilkins, Philadelphia, 4 th ed., 2001). The site of viral attachment to the host cell resides within the S protein.

The coronaviruses include a large number of viruses that infect different animal species. The predominant diseases associated with these viruses are respiratory and enteric infections, although hepatic and neurological diseases also occur with some viruses. Coronaviruses are divided into three serotypes, Types I, II and III. Phylogenetic analysis of coronavirus sequences also identifies three main classes of these viruses, corresponding to each of the three serotypes. Type II coronaviruses contain a hemagglutinin esterase (HE) gene homologous to that of Influenza C virus. It is presumed that the precursor of the Type II coronaviruses acquired HE as a result of a recombination event within a doubly infected host cell.

In view of the rapid worldwide dissemination of SARS, which has the potential of creating a pandemic, along with its alarming morbidity and mortality rates, it would be useful to have a better understanding of this coronavirus agent at the molecular level to provide diagnostics, vaccines, and therapeutics, and to support public health control measures.

SUMMARY OF THE INVENTION

In general, the invention provides the genomic sequence of a novel coronavirus, the SARS virus, and provides novel nucleic acid molecules encoding novel proteins that may be used, for example, for the diagnosis or therapy of a variety of SARS virus-related disorders.

In one aspect, the invention provides a substantially pure SARS virus nucleic acid molecule or fragment thereof, for example, a genornic RNA or DNA, cDNA, synthetic DNA, or mRNA molecule. In some embodiments, the nucleic acid molecule includes a sequence substantially identical to any of the sequences of SEQ ID NOs: 1-13, 15-18, 20-30, 90-159, 208, 209. In some embodiments, the nucleic acid molecule includes a sequence from SEQ ID NO: 1, SEQ ID NO:2, or SEQ ID NO: 15 or a fragment of these sequences. In alternative embodiments, the nucleic acid molecule may include a sequence substantially identical to SEQ ID NO: 1, SEQ ID NO:2, or SEQ ID NO: 15, or a fragment thereof In alternative embodiments, the nucleic acid molecule may include a s2m motif (for example, a s2m sequence substantially identical to any of the sequence of SEQ ID NOs: 16, 17, and 18), a leader sequence (for example, a sequence substantially identical to the sequence of SEQ ID NO: 3), or a transcriptional regulatory sequence (for example, a sequence substantially identical to any of the sequence of SEQ ID NOs: 4-13 and 20-30). In alternative embodiments, the nucleic acid molecule includes a sequence substantially identical to any of the sequences of nucleotides 265-13,398 13,398-21,485 21,492-25,259 25,268-26,092 25,689-26,153 26,117-26,347 26,398-27,063 27,074-27,265 27,273-27,641 27,638-27,772 27,779-27,898 27,864-28,118 28,120-29,388 28,130-28,426 28,583-28,795 and 29,590-29,621 of SEQ ID NO: 15. In alternative embodiments, the nucleic acid molecule may encode a polyprotein or a polypeptide. In alternative embodiments, the invention provides a nucleic acid molecule including a sequence complementary to a SARS virus nucleotide sequence.

In an alternative aspect, the invention provides a substantially pure SARS virus polypeptide or fragment thereof, for example, a polyprotein, glycoprotein (for example, a matrix glycoprotein that may include a sequence substantially identical to the sequence of SEQ ID NO: 34), a transmembrane protein (for example, a multitransmembrane protein, a type I transmembrane protein, or a type II transmembrane protein), a RNA binding protein, or a viral envelope protein. In alternative embodiments, the invention provides a replicase 1a protein, replicase 1b protein, a spike glycoprotein, a small envelope protein, a matrix glycoprotein, or a nucleocapsid protein. In alternative embodiments, the invention provides a nucleic acid molecule encoding a SARS virus polypeptide. In alternative embodiments, the SARS virus polypeptide includes an identifiable signal sequence (for example, a signal sequence substantially identical to the sequence of SEQ ID NOs: 76 or 85), a transmembrane domain (for example, a transmembrane domain substantially identical to any of the sequences of SEQ ID NOs: 77-86), a transmembrane anchor, a transmembrane helix, an ATP-binding domain, a nuclear localization signal, a hydrophilic domain, (for example, a hydrophilic domain substantially identical to the sequence of SEQ ID NOs: 87), or a lysine-rich sequence (for example, a sequence substantially identical to the sequence of SEQ ID NO: 14). In alternative embodiments, the SARS virus polypeptide may include a sequence substantially identical to any of the sequences of SEQ ID NOs: 14, 33-36, 64-74, and 76-87.

In alternative embodiments, the invention provides a vector (for example, a gene therapy vector or a cloning vector) including a SARS virus nucleic acid molecule (for example, a molecule including a sequence substantially identical to any of the sequences of SEQ ID NOs: 1-13, 15-18, 20-30, 90-159, 208, 209), or a host cell (for example, a mammalian cell, a yeast, a bacterium, or a nematode cell) including the vector.

In alternative embodiments, the invention provides a nucleic acid molecule having substantial nucleotide sequence identity (for example, 30%, 40%, 50%, 60%, 70%, 80%, 90% or 100% complementarity) to a sequence encoding a SARS virus polypeptide or fragment thereof, for example where the fragment includes at least six amino acids, and where the nucleic acid molecule hybridizes under high stringency conditions to at least a portion of a SARS virus nucleic acid molecule.

In alternative embodiments, the invention provides a nucleic acid molecule having substantial nucleotide sequence identity (for example, 30%, 40%, 50%, 60%, 70%, 80%, 90% or 100% complementarity) to a SARS virus nucleotide sequence, for example where the nucleic acid molecule includes at least ten nucleotides, and where the nucleic acid molecule hybridizes under high stringency conditions to at least a portion of a SARS virus nucleic acid molecule.

In alternative embodiments, the invention provides a nucleic acid molecule comprising a sequence that is antisense to a SARS virus nucleic acid molecule, or an antibody (for example, a neutralizing antibody) that specifically binds to a SARS virus polypeptide.

In alternative embodiments, the invention provides a method for detecting a SARS epitope, such as a virion or polypeptide in a sample, by contacting the sample with an antibody that specifically binds a SARS epitope, such as a virus polypeptide, and determining whether the antibody specifically binds to the polypeptide. In alternative embodiments, the invention provides a method for detecting a SARS virus genome, gene, or homolog or fragment thereof in a sample by contacting a SARS virus nucleic acid molecule, for example where the nucleic acid molecule includes at least ten nucleotides, with a preparation of genomic DNA from the sample, under hybridization conditions providing detection of DNA sequences having nucleotide sequence identity to a SARS virus nucleic acid molecule. In alternative embodiments, the invention provides a method of targeting a protein for secretion from a cell, by attaching a signal sequence from a SARS virus polypeptide to the protein, such that the protein is secreted from the cell.

In alternative aspects, the invention provides a method for eliciting an immune response in an animal, by identifying an animal infected with or at risk for infection with a SARS virus and administering a SARS virus polypeptide or fragment thereof or fragment thereof, or administering a SARS virus nucleic acid molecule encoding a SARS virus polypeptide or fragment thereof to the animal. In alternative embodiments, the administering results in the production of an antibody in the mammal, or results in the generation of cytotoxic or helper T-lymphocytes in the mammal.

In alternative embodiments, the invention provides a kit for detecting the presence of a SARS virus nucleic acid molecule or polypeptide in a sample, where the kit includes a SARS virus nucleic acid molecule, or an antibody that specifically binds a SARS virus polypeptide.

In alternative aspects the invention provides a method for treating or preventing a SARS virus infection by identifying an animal (e.g., a human) infected with or at risk for infection with a SARS virus, and administering a SARS virus nucleic acid molecule or polypeptide, or administering a compound that inhibits pathogenicity or replication of a SARS virus, to the animal. In alternative embodiments, the invention provides the use of a SARS virus nucleic acid molecule or polypeptide for treating or preventing a SARS virus infection.

In alternative aspects the invention provides a method of identifying a compound for treating or preventing a SARS virus infection, by contacting sample including a SARS virus nucleic acid molecule or contacting a SARS virus polypeptide with the compound, where an increase or decrease in the expression or activity of the nucleic acid molecule or the polypeptide identifies a compound for treating or preventing a SARS virus infection.

In alternative aspects the invention provides a vaccine (e.g., a DNA vaccine) including a SARS virus nucleic acid molecule or polypeptide.

In alternative aspects the invention provides a microarray including a plurality of elements, wherein each element includes one or more distinct nucleic acid or amino acid sequences, and where the sequences are selected from a SARS virus nucleic acid molecule or polypeptide, or a antibody that specifically binds a SARS virus nucleic acid molecule or polypeptide.

In alternative aspects the invention provides a computer readable record (e.g., a database) including distinct SARS virus nucleic acid or amino acid sequences.

A “SARS virus” is a virus putatively belonging to the coronavirus family and identified as the causative agent for sudden acute respiratory syndrome (SARS). A SARS virus nucleic acid molecule may include a sequence substantially identical to the nucleotide sequences described herein or fragments thereof. A SARS virus polypeptide may include a sequence substantially identical to a sequence encoded by a SARS virus nucleic acid molecule, or may include a sequence substantially identical to the polypeptide sequences described herein, or fragments thereof.

A compound is “substantially pure” when it is separated from the components that naturally accompany it. Typically, a compound is substantially pure when it is at least 60%, more generally 75% or over 90%, by weight, of the total material in a sample. Thus, for example, a polypeptide that is chemically synthesized or produced by recombinant technology will be generally be substantially free from its naturally associated components. A nucleic acid molecule may be substantially pure when it is not immediately contiguous with (i.e., covalently linked to) the coding sequences with which it is normally contiguous in the naturally occurring genome of the organism from which the DNA of the invention is derived. A nucleic acid molecule may also be substantially pure when it is isolated from the organism in which it is normally found. A substantially pure compound can be obtained, for example, by extraction from a natural source by expression of a recombinant nucleic acid molecule encoding a polypeptide compound or by chemical synthesis. Purity can be measured using any appropriate method such as column chromatography, gel electrophoresis, HPLC, etc.

A “substantially identical” sequence is an amino acid or nucleotide sequence that differs from a reference sequence only by one or more conservative substitutions, as discussed herein, or by one or more non-conservative substitutions, deletions, or insertions located at positions of the sequence that do not destroy the biological function of the amino acid or nucleic acid molecule. Such a sequence can be at least 10%, 20%, 30%, 40%, 50%, 52.5%, 55% or 60% or 75%, or more generally at least 80%, 85%, 90%, or 95%, or as much as 99% or 100% identical at the amino acid or nucleotide level to the sequence used for comparison using, for example, the Align Program (Myers and Miller, CABIOS, 1989, 4:11-17) or FASTA. For polypeptides, the length of comparison sequences maybe at least 4, 5, 10, or 15 amino acids, or at least 20, 25, or 30 amino acids. In alternate embodiments, the length of comparison sequences may be at least 35, 40, or 50 amino acids, or over 60, 80, or 100 amino acids. For nucleic acid molecules, the length of comparison sequences may be at least 15, 20, or 25 nucleotides, or at least 30, 40, or 50 nucleotides. In alternate embodiments, the length of comparison sequences may be at least 60, 70, 80, or 90 nucleotides, or over 100, 200, or 500 nucleotides. Sequence identity can be readily measured using publicly available sequence analysis software (e.g., Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wis. 53705, or BLAST software available from the National Library of Medicine, or as described herein). Examples of useful software include the programs Pile-up and PrettyBox. Such software matches similar sequences by assigning degrees of homology to various substitutions, deletions, insertions, and other modifications.

Alternatively, or additionally, two nucleic acid sequences may be “substantially identical” if they hybridize under high stringency conditions. In some embodiments, high stringency conditions are, for example, conditions that allow hybridization comparable with the hybridization that occurs using a DNA probe of at least 500 nucleotides in length, in a buffer containing 0.5 M NaHPO4, pH 7.2, 7% SDS, 1 mM EDTA, and 1% BSA (fraction V), at a temperature of 65° C., or a buffer containing 48% formamide, 4.8×SSC, 0.2 M Tris-Cl, pH 7.6, 1× Denhardt's solution, 10% dextran sulfate, and 0.1% SDS, at a temperature of 42° C. (These are typical conditions for high stringency northern or Southern hybridizations.) Hybridizations may be carried out over a period of about 20 to 30 minutes, or about 2 to 6 hours, or about 10 to 15 hours, or over 24 hours or more. High stringency hybridization is also relied upon for the success of numerous techniques routinely performed by molecular biologists, such as high stringency PCR, DNA sequencing, single strand conformational polymorphism analysis, and in situ hybridization. In contrast to northern and Southern hybridizations, these techniques are usually performed with relatively short probes (e.g., usually about 16 nucleotides or longer for PCR or sequencing and about 40 nucleotides or longer for in situ hybridization). The high stringency conditions used in these techniques are well known to those skilled in the art of molecular biology, and examples of them can be found, for example, in Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons, New York, N.Y., 1998, which is hereby incorporated by reference.

The terms “nucleic acid” or “nucleic acid molecule” encompass both RNA (plus and minus strands) and DNA, including cDNA, genomic DNA, and synthetic (e.g., chemically synthesized) DNA. The nucleic acid may be double-stranded or single-stranded. Where single-stranded, the nucleic acid may be the sense strand or the antisense strand. A nucleic acid molecule may be any chain of two or more covalently bonded nucleotides, including naturally occurring or non-naturally occurring nucleotides, or nucleotide analogs or derivatives. By “RNA” is meant a sequence of two or more covalently bonded, naturally occurring or modified ribonucleotides. One example of a modified RNA included within this term is phosphorothioate RNA. By “DNA” is meant a sequence of two or more covalently bonded, naturally occurring or modified deoxyribonucleotides. By “cDNA” is meant complementary or copy DNA produced from an RNA template by the action of RNA-dependent DNA polymerase (reverse transcriptase). Thus a “cDNA clone” means a duplex DNA sequence complementary to an RNA molecule of interest, carried in a cloning vector.

An “isolated nucleic acid” is a nucleic acid molecule that is free of the nucleic acid molecules that normally flank it in the genome or that is free of the organism in which it is normally found. Therefore, an “isolated” gene or nucleic acid molecule is in some cases intended to mean a gene or nucleic acid molecule which is not flanked by nucleic acid molecules which normally (in nature) flank the gene or nucleic acid molecule (such as in genomic sequences) and/or has been completely or partially purified from other transcribed sequences (as in a cDNA or RNA library). In some cases, an isolated nucleic acid molecule is intended to mean the genome of an organism such as a virus. An isolated nucleic acid of the invention may be substantially isolated with respect to the complex cellular milieu in which it naturally occurs. In some instances, the isolated material will form part of a composition (for example, a crude extract containing other substances), buffer system or reagent mix. In other circumstances, the material may be purified to essential homogeneity, for example as determined by PAGE or column chromatography such as HPLC. The term therefore includes, e.g., a genome a recombinant nucleic acid incorporated into a vector, such as an autonomously replicating plasmid or virus or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., a cDNA or a genomic DNA fragment produced by PCR or restriction endonuclease treatment) independent of other sequences. It also includes a recombinant nucleic acid which is part of a hybrid gene encoding additional polypeptide sequences. Preferably, an isolated nucleic acid comprises at least about 50, 80 or 90 percent (on a molar basis) of all macromolecular species present. Thus, an isolated gene or nucleic acid molecule can include a gene or nucleic acid molecule which is synthesized chemically or by recombinant means. Recombinant DNA contained in a vector are included in the definition of “isolated” as used herein. Also, isolated nucleic acid molecules include recombinant DNA molecules in heterologous host cells, as well as partially or substantially purified DNA molecules in solution. In vivo and in vitro RNA transcripts of the DNA molecules of the present invention are also encompassed by “isolated” nucleic acid molecules. Such isolated nucleic acid molecules are useful in the manufacture of the encoded polypeptide, as probes for isolating homologous sequences (e.g., from other species), for gene mapping (e.g., by in situ hybridization with chromosomes), or for detecting expression of the nucleic acid molecule in tissue (e.g., human tissue, such as peripheral blood), such as by Northern blot analysis.

Various genes and nucleic acid sequences of the invention may be recombinant sequences. The term “recombinant” means that something has been recombined, so that when made in reference to a nucleic acid construct the term refers to a molecule that is comprised of nucleic acid sequences that are joined together or produced by means of molecular biological techniques. The term “recombinant” when made in reference to a protein or a polypeptide refers to a protein or polypeptide molecule which is expressed using a recombinant nucleic acid construct created by means of molecular biological techniques. The term “recombinant” when made in reference to genetic composition refers to a gamete or progeny with new combinations of alleles that did not occur in the parental genomes. Recombinant nucleic acid constructs may include a nucleotide sequence which is ligated to, or is manipulated to become ligated to, a nucleic acid sequence to which it is not ligated in nature, or to which it is ligated at a different location in nature. Referring to a nucleic acid construct as “recombinant” therefore indicates that the nucleic acid molecule has been manipulated using genetic engineering, i.e. by human intervention. Recombinant nucleic acid constructs may for example be introduced into a host cell by transformation. Such recombinant nucleic acid constructs may include sequences derived from the same host cell species or from different host cell species, which have been isolated and reintroduced into cells of the host species. Recombinant nucleic acid construct sequences may become integrated into a host cell genome, either as a result of the original transformation of the host cells, or as the result of subsequent recombination and/or repair events.

As used herein, “heterologous” in reference to a nucleic acid or protein is a molecule that has been manipulated by human intervention so that it is located in a place other than the place in which it is naturally found. For example, a nucleic acid sequence from one species may be introduced into the genome of another species, or a nucleic acid sequence from one genomic locus may be moved to another genomic or extrachromasomal locus in the same species. A heterologous protein includes, for example, a protein expressed from a heterologous coding sequence or a protein expressed from a recombinant gene in a cell that would not naturally express the protein.

By “antisense,” as used herein in reference to nucleic acids, is meant a nucleic acid sequence that is complementary to one strand of a nucleic acid molecule. In some embodiments, an antisense sequence is complementary to the coding strand of a gene, preferably, a SARS virus gene. The preferred antisense nucleic acid molecule is one which is capable of lowering the level of polypeptide encoded by the complementary gene when both are expressed in a cell. In some embodiments, the polypeptide level is lowered by at least 10%, or at least 25%, or at least 50%, as compared to the polypeptide level in a cell expressing only the gene, and not the complementary antisense nucleic acid molecule.

A “probe” or “primer” is a single-stranded DNA or RNA molecule of defined sequence that can base pair to a second DNA or RNA molecule that contains a complementary sequence (the target). The stability of the resulting hybrid molecule depends upon the extent of the base pairing that occurs, and is affected by parameters such as the degree of complementarity between the probe and target molecule, and the degree of stringency of the hybridization conditions. The degree of hybridization stringency is affected by parameters such as the temperature, salt concentration, and concentration of organic molecules, such as formamide, and is determined by methods that are known to those skilled in the art. Probes or primers specific for SARS virus nucleic acid sequences or molecules may vary in length from at least 8 nucleotides to over 500 nucleotides, including any value in between, depending on the purpose for which, and conditions under which, the probe or primer is used. For example, a probe or primer may be 8, 10, 15, 20, or 25 nucleotides in length, or may be at least 30, 40, 50, or 60 nucleotides in length, or maybe over 100, 200, 500, or 1000 nucleotides in length. Probes or primers specific for SARS virus nucleic acid molecules may have greater than 20-30% sequence identity, or at least 55-75% sequence identity, or at least 75-85% sequence identity, or at least 85-99% sequence identity, or 100% sequence identity to the nucleic acid sequences described herein. In various embodiments of the invention, probes having the sequences: 5′-ATg AAT TAC CAA gTC AAT ggT TAC-3′, SEQ ID NO: 160 5′-gAA gCT ATT CgT CAC gTT Cg-3′, SEQ ID NO: 161 5′-CTg TAg AAA ATC CTA gCT ggA g-3′, SEQ ID NO: 162 5′-CAT AAC CAg TCg gTA CAg CTA-3′, SEQ ID NO: 163 5′-TTA TCA CCC gCgAAg AAg CT-3′, SEQ ID NO: 164 5′-CTC TAg TTg CATGAC AgC CCT C-3′, SEQ ID NO: 165 5′-TCg TgC gTg gAT TggCTT TgA TgT-3′, SEQ ID NO: 166 5′-ggg TTg ggA CTA TCC TAA gTg TgA-3′, SEQ ID NO: 167 5′-TAA CAC ACA AAC ACC ATC ATC A-3′, SEQ ID NO: 168 5′-ggT Tgg gAC TAT CCT AAg TgT gA-3′, SEQ ID NO: 169 5′-CCA TCA TCA gAT AgA ATC ATC ATA-3′, SEQ ID NO: 170 5′-CCT CTC TTg TTC TTg CTC gCA-3′, SEQ ID NO: 171 5′-TAT AgT gAg CCg CCA CAC Atg-3′, SEQ ID NO: 172 5′-TAACACACAACICCATCATCA-3′, SEQ ID NO: 173 5′-CTAACATGCTTAGGATAATGG-3′, SEQ ID NO: 174 5′-GCCTCTCTTGTTCTTGCTCGC-3′, SEQ ID NO: 175 5′-CAGGTAAGCGTAAAACTCATC-3′, SEQ ID NO: 176 5′-TACACACCTCAGCGTTG-3′, SEQ ID NO: 177 5′-CACGAACGTGACGAAT-3′, SEQ ID NO: 178 5′-GCCGGAGCTCTGCAGAATTC-3′, SEQ ID NO: 179 5′-CAGGAAACAGCTATGAC TTGCATCACCACTAGTTGTGCCACCAGGTT-3′, SEQ ID NO: 180 5′-TGTAAAACGACGGCCAGTTGATGGGATGGGACTATCCTAAGTGTGA-3′, SEQ ID NO: 181 5′-GCATAGGCAGTAGTTGCATC-3′, SEQ ID NO: 182, as well as sequences amplified by specific combinations of these probes, may be excluded from specific uses according to the invention. Probes can be detectably-labeled, either radioactively or non-radioactively, by methods that are known to those skilled in the art. Probes can be used for methods involving nucleic acid hybridization, such as nucleic acid sequencing, nucleic acid amplification by the polymerase chain reaction, single stranded conformational polymorphism (SSCP) analysis, restriction fragment polymorphism (RFLP) analysis, Southern hybridization, northern hybridization, in situ hybridization, electrophoretic mobility shift assay (EMSA), and other methods that are known to those skilled in the art.

By “complementary” is meant that two nucleic acid molecules, e.g., DNA or RNA, contain a sufficient number of nucleotides that are capable of forming Watson-Crick base pairs to produce a region of double-strandedness between the two nucleic acids. Thus, adenine in one strand of DNA or RNA pairs with thymine in an opposing complementary DNA strand or with uracil in an opposing complementary RNA strand. It will be understood that each nucleotide in a nucleic acid molecule need not form a matched Watson-Crick base pair with a nucleotide in an opposing complementary strand to form a duplex.

By “vector” is meant a DNA molecule derived, e.g., from a plasmid, bacteriophage, or mammalian or insect virus, or artificial chromosome, that may be used to introduce a polypeptide, for example a SARS virus polypeptide, into a host cell by means of replication or expression of an operably linked heterologous nucleic acid molecule. By “operably linked” is meant that a nucleic acid molecule such as a gene and one or more regulatory sequences (e.g., promoters, ribosomal binding sites, terminators in prokaryotes promoters, terminators, enhancers in eukaryotes leader sequences, etc.) are connected in such a way as to permit the desired function e.g. gene expression when the appropriate molecules (e.g., transcriptional activator proteins) are bound to the regulatory sequences. A vector may contain one or more unique restriction sites and may be capable of autonomous replication in a defined host or vehicle organism such that the cloned sequence is reproducible. By “DNA expression vector” is meant any autonomous element capable of directing the synthesis of a recombinant peptide. Such DNA expression vectors include bacterial plasmids and phages and mammalian and insect plasmids and viruses. A “shuttle vector” is understood as meaning a vector which can be propagated in at least two different cell types, or organisms, for example vectors which are first propagated or replicated in prokaryotes in order for, for example, subsequent transfection into eukaryotic cells. A “replicon” is a unit that is capable of autonomous replication in a cell and may includes plasmids, chromosomes (e.g., mini-chromosomes), cosmids, viruses, etc. A replicon may be a vector.

A “host cell” is any cell, including a prokaryotic or eukaryotic cell, into which a replicon, such as a vector, has been introduced by for example transformation, transfection, or infection.

An “open reading frame” or “ORF” is a nucleic acid sequence that encodes a polypeptide. An ORF may include a coding sequence having i.e., a sequence that is capable of being transcribed into mRNA and/or translated into a protein when combined with the appropriate regulatory sequences. In general, a coding sequence includes a 5′ translation start codon and a 3′ translation stop codon.

A “leader sequence” is a relatively short nucleotide sequence located at the 5′ end of an RNA molecule that acts as a primer for transcription.

A “transcriptional regulatory sequence” “TRS” or “intergenic sequence” is a nucleotide sequence that lies upstream of an open reading frame (ORF) and serves as a template for the reassociation of a nascent RNA strand-polymerase complex.

A “frameshift mutation” is caused by a shift in a open reading frame, generally due to a deletion or addition of at least one nucleotide, such that an alternative polypeptide is ultimately translated.

By “detectably labeled” is meant any means for marking and identifying the presence of a molecule, e.g., an oligonucleotide probe or primer, a gene or fragment thereof, a cDNA molecule, a polypeptide, or an antibody. Methods for detectably-labeling a molecule are well known in the art and include, without limitation, radioactive labeling (e.g., with an isotope such as 32 P or 35 S) and nonradioactive labeling such as, enzymatic labeling (for example, using horseradish peroxidase or alkaline phosphatase), chemiluminescent labeling, fluorescent labeling (for example, using fluorescein), bioluminescent labeling, antibody detection of a ligand attached to the probe, or detection of double-stranded nucleic acid. Also included in this definition is a molecule that is detectably labeled by an indirect means, for example, a molecule that is bound with a first moiety (such as biotin) that is, in turn, bound to a second moiety that may be observed or assayed (such as fluorescein-labeled streptavidin). Labels also include digoxigenin, luciferases, and aequorin.

A “peptide,” “protein,” “polyprotein” or “polypeptide” is any chain of two or more amino acids, including naturally occurring or non-naturally occurring amino acids or amino acid analogues, regardless of post-translational modification (e.g., glycosylation or phosphorylation). An “polyprotein”, “polypeptide”, “peptide” or “protein” of the invention may include peptides or proteins that have abnormal linkages, cross links and end caps, non-peptidyl bonds or alternative modifying groups. Such modified peptides are also within the scope of the invention. The term “modifying group” is intended to include structures that are directly attached to the peptidic structure (e.g., by covalent coupling), as well as those that are indirectly attached to the peptidic structure (e.g., by a stable non-covalent association or by covalent coupling to additional amino acid residues, or mimetics, analogues or derivatives thereof, which may flank the core peptidic structure). For example, the modifying group can be coupled to the amino-terminus or carboxy-terminus of a peptidic structure, or to a peptidic or peptidomimetic region flanling the core domain. Alternatively, the modifying group can be coupled to a side chain of at least one amino acid residue of a peptidic structure, or to a peptidic or peptido-mimetic region flanking the core domain (e.g., through the epsilon amino group of a lysyl residue(s), through the carboxyl group of an aspartic acid residue(s) or a glutamic acid residue(s), through a hydroxy group of a tyrosyl residue(s), a serine residue(s) or a threonine residue(s) or other suitable reactive group on an amino acid side chain). Modifying groups covalently coupled to the peptidic structure can be attached by means and using methods well known in the art for linking chemical structures, including, for example, amide, alkylamino, carbamate or urea bonds.

A “polyprotein” is the polypeptide that is initially translated from the genome of a plus-stranded RNA virus, for example, a SARS virus. Accordingly, a polyprotein has not been subjected to post-translational processing by proteolytic cleavage into its processed protein products, and therefore, retains its cleavage sites. In some embodiments of the invention, the protease cleavage sites of a polyprotein may be modified, for example, by amino acid substitution, to result in a polyprotein that is incapable of being cleaved into its processed protein products.

An antibody “specifically binds” or “selectively binds” an antigen when it recognizes and binds the antigen, but does not substantially recognize and bind other molecules in a sample, having for example an affinity for the antigen which is 10, 100, 1000 or 10000 times greater than the affinity of the antibody for another reference molecule in a sample. A “neutralizing antibody” is an antibody that selectively interferes with any of the biological activities of a SARS virus polypeptide or polyprotein, for example, replication of the SARS virus, or infection of host cells. A neutralizing antibody may reduce the ability of a SARS virus polypeptide to carry out its specific biological activity by about 50%, or by about 70%, or by about 90% or more, or may completely abolish the ability of a SARS virus polypeptide to carry out its specific biological activity. Any standard assay for the biological activity of any SARS virus polypeptide, for example, assays determining expression levels, ability to infect host cells, or ability to replicate DNA, including those assays described herein or known to those of skill in the art, may be used to assess potentially neutralizing antibodies that are specific for SARS virus polypeptides.

A “signal sequence” is a sequence of amino acids that may be identified, for example by homology or biological activity to a peptide sequence with the known function of targeting a polypeptide to a particular region of the cell. A signal sequence or signal peptide may be a peptide of any length, that is capable of targeting a polypeptide to a particular region of the cell. In some embodiments, the signal sequence may direct the polypeptide to the cellular membrane so that the polypeptide may be secreted. In alternate embodiments, the signal sequence may direct the polypeptide to an intracellular compartment or organelle, such as the Golgi apparatus, or to the surface of a virus, such as the SARS virus. In alternate embodiments, a signal sequence may range from about 13 or 15 amino acids in length to about 60 amino acids in length.

A “transmembrane protein” is an amphipathic protein having a hydrophobic region (“transmembrane domain”) that spans the lipid bilayer of the cell membrane from the cytoplasm to the cell surface, or spans the viral envelope, interspersed between hydrophilic regions on both sides of the membrane. The number of hydrophobic regions in an amphipathic protein is often proportional to the number of times that proteins spans the lipid bilayer. Thus, a single transmembrane protein spans the lipid bilayer once, and has a single transmembrane domain, while a multi-transmembrane protein spans the lipid bilayer multiple times. Multi-transmembrane proteins may enable virus entry into a host cell, or act to initiate transduction of a signal from the cell surface to the interior of the cell, for example, by a conformational change upon ligand binding. A “transmembrane anchor” is a transmembrane domain that maintains a polypeptide in its position in the cell membrane or viral envelope and is generally hydrophobic. A transmembrane anchor may generally be in the structure of an alpha helix, i.e., a “transmembrane helix”. Multi-transmembrane proteins may have multiple transmembrane alpha-helices.

A “nuclear localization signal” is an amino acid sequence that permits the entry of a polypeptide into the nucleus of a cell through nuclear pores. A nuclear localization signal generally has a cluster of positively charged residues, for example, lysines. A “lysine-rich sequence” is a sequence having at least two contiguous lysine residues, or at least three contiguous lysine residues. In some embodiments, a lysine-rich sequence may be a nuclear localization signal.

An “ATP binding domain” is a consensus domain that is found in many ATP or GTP-binding proteins, and that forms a flexible loop (P-loop) between alpha-helical and beta pleated sheet domains. The general consensus for an ATP binding domain may be (A or G)-XXXXGK-(S or T).

A “RNA binding protein” is a protein that is capable of binding to a RNA molecule (see, for example, “RNA Binding Proteins: New Concepts in Gene Regulation” 1st ed, eds. K. Sandberg and S. E. Mulroney, Kluwers Academic Publishers, 2001). RNA binding proteins may contain common structural features such as arginine-rich tracts, for example, arginines alternating with aspartates, serines, or glycines, or zinc finger regions. RNA binding proteins may also have a common ribonucleotide sequence domain. RNA binding proteins are believed to play diverse roles in modulating post-transcriptional gene expression.

An “immune response” includes, but is not limited to, one or more of the following responses in a mammal: induction of antibodies, B cells, T cells (including helper T cells, suppressor T cells, cytotoxic T cells, γδ T cells) directed specifically to the antigen(s) in a composition or vaccine, following administration of the composition or vaccine. An immune response to a composition or vaccine thus generally includes the development in the host mammal of a cellular and/or antibody-mediated response to the composition or vaccine of interest. In general, the immune response will result in prevention or reduction of infection by a SARS virus.

An “immunogenic fragment” of a polypeptide or nucleic acid molecule refers to an amino acid or nucleotide sequence that elicits an immune response. Thus, an immunogenic fragment may include, without limitation, any portion of any of the SARS virus sequences described herein, or a sequence substantially identical thereto, that includes one or more epitopes (the antigenic determinant i.e., site recognized by a specific immune system cell, such as a T cell or a B cell). An “epitope” may include amino acids in a spatial orientation that they are non-contiguous in the amino acid sequence but are near each other due to the three dimensional conformation of the polypeptide. A epitope may include at least 3, 5, 8, or 10 or more amino acids. Immunogenic fragments or epitopes may be identified using standard methods known to those of skill in the art, such as epitope mapping techniques or antigenicity or hydropathy plots using, for example, the Omiga version 1.0 program from Oxford Molecular Group (see, for example, U.S. Pat. No. 4,708,871). Immunogenic fragments or epitopes may also be identified using methods for determining three dimensional molecule structure such as X-ray crystallography or nuclear magnetic resonance.

A “sample” may be a tissue biopsy, amniotic fluid, cell, blood, serum, plasma, urine, stool, sputum, conjunctiva, or any other specimen, or any extract thereof, obtained from a patient (human or animal), test subject, or experimental animal. A “sample” may also be a cell or cell line created under experimental conditions, and constituents thereof (such as cell culture supematants, cell fractions, infected cells, etc.). The sample may be analyzed to detect the presence of a SARS virus gene, genome, polypeptide, nucleic acid molecule or virion, or to detect a mutation in a SARS virus gene, expression levels of a SARS virus gene or polypeptide, or the biological function of a SARS virus polypeptide, using methods that are known in the art. For example, methods such as sequencing, single-strand conformational polymorphism (SSCP) analysis, or restriction fragment length polymorphism (RFLP) analysis of PCR products derived from a sample can be used to detect a mutation in a SARS virus gene ELISA or western blotting can be used to measure levels of SARS virus polypeptide or antibody affinity northern blotting can be used to measure SARS mRNA levels, or PCR can be used to measure the level of a SARS virus nucleic acid molecule.

Other features and advantages of the invention will be apparent from the following description of the drawings and the invention, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-D show phylogenetic analyses of SARS proteins. Unrooted phylogenetic trees were generated by clustalw (Thompson, J. D. et al., Nucleic Acids Res 22, 4673-80, Nov. 11, 1994) bootstrap analysis using 1000 iterations. Genbank accessions for protein sequences are as follows: FIG. 1A : Replicase 1A: BoCov (Bovine Coronavirus): AAL40396, 229E (Human Coronavirus): NP07355, MHV (Mouse Hepatitis Virus): NP045298, AIBV (Avian Infectious bronchitis virus): CAC39113, TGEV (Transmissible Gastroenteritis Virus): NP058423. FIG. 1B : Matrix Glycoprotein: PHEV (Porcine hemagglutinating encephalomyelitis virus): AAL80035, BoCov (Bovine Coronavirus): NP150082, AIBV & AIBV2 (Avian infectious bronchitis virus): AAF35863 & AAK83027, MHV (Mouse hepatitis virus): AAF36439, TGEV (Transmissible gastroenteritis virus): NP058427, 229E & OC43 (Human Coronavirus): NP073555 & AAA45462, FCV (Feline coronavirus): BAC01160. FIG. 1C : Nucleocapsid: MHV (Mouse hepatitis virus): P18446, BoCov (Bovine coronavirus): NP150083, AIBV (Avian infectious bronchitis virus): AAK27162, FCV (Feline coronavirus): CAA74230, PTGV (Porcine transmissible gastroenteritis virus): AAM97563, 229E & OC43 (Human coronavirus): NP073556 & P33469, PHEV (porcine hemagglutinating encephalomyelitis virus): AAL80036, TCV (Turkey coronavirus): AAF23873. FIG. 1D : S (Spike) Protein: BoCov (Bovine coronavirus): AAL40400, MHV (Mouse hepatitis virus): P11225, OC43 & 229E (Human coronavirus): S44241 & AAK32191, PHEV (Porcine hemagglutinating encephalomyelitis virus): AAL80031, PRC (Porcine respiratory coronavirus): AAA46905, PEDV (Porcine epidemic diarrhea virus): CAA80971, CCov (Canine coronavirus): S41453, FICV (Feline infectious peritonitis virus): BAA06805, AIBV (Avian infectious bronchitis virus): AA034396.

FIG. 2 shows a schematic representation of the ORFs and s2m motif in the 29,736-base SARS virus genome.

FIGS. 3A-P show nucleotide sequences of the 29,736-base genome of the SARS virus (SEQ ID NOs: 1 and 2).

FIG. 4 shows an alignment of the s2m regions from Avian infectious bronchitis virus (AIBV SEQ ID NO: 32) and equine rhinovirus serotype 2 (ERV-2 SEQ ID NO: 31) with the 3′ untranslated region (UTR SEQ ID NO: 18) of the SARS virus (TOR2). The conserved areas in the s2m region are indicated by asterisks.

FIG. 5 shows the amino acid sequence of the SARS virus S (Spike) Glycoprotein (SEQ ID NO: 33).

FIG. 6 shows the amino acid sequence of the SARS virus M (Matrix) Glycoprotein (residues 1-220 of SEQ ID NO: 34).

FIG. 7 shows the amino acid sequence of the SARS virus E (Small envelope) protein (SEQ ID NO: 35).

FIG. 8 shows the amino acid sequence of the SARS virus N (Nucleocapsid) Protein (SEQ ID NO: 36).

FIG. 9 shows an alignment of the matrix glycoprotein M from the SARS virus (Tor2_M or ORF5 SEQ ID NO: 34) and various other matrix glycoproteins (SEQ ID NOs: 37-43). Asterisks (*) indicate percentage identity to the SARS matrix protein as calculated by Align (Myers and Miller, CABIOS (1989) 4:11-17).

FIGS. 10A-B show an alignment of the nucleocapsid protein N from tehj SARS virus (Tor2_N SEQ ID NO: 36) and various other nucleocapsid proteins (SEQ ID NOs: 44-52 and SEQ ID NO: 199 of AIBV2 nucleocapsid protein [Avian infectious bronchitis virus 2]). Asterisks (*) indicate percentage identity to the SARS nucleocapsid protein calculated by Align (Myers and Miller, CABIOS (1989) 4:11-17).

FIGS. 11A-K show the nucleotide sequence of the 29,751-base genome of the SARS virus (SEQ ID NO: 15).

FIG. 12 shows a schematic representation of the ORFs and s2m motif in the 29,751-base SARS virus genome.

FIGS. 13A-D show phylogenetic analyses of SARS proteins. Unrooted phylogenetic trees were generated by clustalw 1.74 (J. D. Thompson, D. G. Higgins, T. J. Gibson, Nucleic Acids Res 22, 4673-80 (Nov. 11, 1994) using the BLOSUM comparison matrix and a bootstrap analysis of 1000 iterations. Numbers indicate bootstrap replicates supporting each node. Phylogenetic trees were drawn with the Phylip Drawtree program 3.6a3 (Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Departnent of Genetics, University of Washington, Seattle). Branch lengths indicate the number of substitutions per residue. Genbank accessions for protein sequences: A: Replicase 1A: BoCoV (Bovine Coronavirus): AAL40396, HCoV-229E (Human Coronavirus):NP07355, MHV (Mouse Hepatitis Virus): NP045298, IBV (Avian Infectious bronchitis virus): CAC39113, TGEV (Transmissible Gastroenteritis Virus): NP058423. B: Membrane Glycoprotein: PHEV (Porcine hemagglutinating encephalomyelitis virus): AAL80035, BoCoV (Bovine Coronavirus):NP150082, IBV & IBV2 (Avian infectious bronchitis virus): AAF35863 & AAK83027, MHV (Mouse hepatitis virus): AAF36439, TGEV (Transmissible gastroenteritis virus):NP058427, HCoV-229E & HCoV-OC43 (Human Coronavirus): NP073555 & AAA45462, FCoV (Feline coronavirus): BAC01160. C: Nucleocapsid: MHV (Mouse hepatitis virus): P18446, BoCoV (Bovine coronavirus): NP150083, IBV 1 & 2 (Avian infectious bronchitis virus): AAK27162 & NP040838, FCoV (Feline coronavirus): CAA74230, PTGV (Porcine transmissible gastroenteritis virus): AAM97563, HCoV-229E & HCoV-OC43 (Human coronavirus): NP073556 & P33469, PHEV (porcine hemagglutinating encephalomyelitis virus): AAL80036, TCV (Turkey coronavirus): AAF23873. D: S (Spike) Protein: BoCoV (Bovine coronavirus): AAL40400, MHV (Mouse hepatitis virus): P11225, HCoV-OC43 & HCoV-229E (Human coronavirus): S44241 & AAK32191, PHEV (Porcine hemagglutinating encephalomyelitis virus): AAL80031, PRCOV (Porcine respiratory coronavirus): AAA46905, PEDV (Porcine epidemic diarrhea virus): CAA80971, CCoV (Canine coronavirus): S41453, FIPV (Feline infectious peritonitis virus): BAA06805, IBV (Avian infectious bronchitis virus): AAO34396.

FIGS. 14A-F show an alignment of the spike glycoprotein S from the SARS virus (Tor2_S SEQ ID NO: 33) and various other spike glycoproteins (SEQ ID NOs: 53-62). Asterisks (*) indicate percentage identity to the SARS spike protein as calculated by Align (Myers and Miller, CABIOS (1989) 4:11-17).

FIG. 15 shows an alignment between the SARS virus Small envelope protein E (TOR2_E SEQ ID NO: 35) and the Envelope protein (Protein 4) (X1 protein) (ORF 3) from Porcine transmissible gastroenteritis coronavirus (strain Purdue). Swissprot accession number P09048 (PGV SEQ ID NO: 63), as calculated by FASTA (world wide web at ebi “dot” ac “dot” uk “forward slash” fasta33).

FIGS. 16A-B show the amino acid sequence of the SARS virus Replicase 1A protein (SEQ ID NO: 64).

FIG. 17 shows the amino acid sequence of the SARS virus Replicase 1B protein (SEQ ID NO: 65).

FIG. 18 shows the amino acid sequence of ORF3 of SARS virus (SEQ ID NO: 66).

FIG. 19 shows the amino acid sequence of ORF4 of SARS virus (SEQ ID NO: 67).

FIG. 20 shows the amino acid sequence (SEQ ID NO: 68) of ORF6 (nucleotides 27059-27247 of the 29,736-base genome sequence) or ORF 7 (nucleotides 27,074-27,265 of the 29,751-base genome sequence) of SARS virus.

FIG. 21 shows the amino acid sequence (SEQ ID NO: 69) of ORF7 (nucleotides 27258-27623 of the 29,736-base genome sequence) or ORF 8 (nucleotides 27,273-27,641 of the 29,751-base genome sequence), of SARS virus.

FIG. 22 shows the amino acid sequence (SEQ ID NO: 70) of ORF8 (nucleotides 27623-27754 of the 29,736-base genome sequence) or ORF9 8 (nucleotides 27,638-27,772 of the 29,751-base genome sequence) of SARS virus.

FIG. 23 shows the amino acid sequence (SEQ ID NO: 71) of ORF9 (nucleotides 27764-27880 of the 29,736-base genome sequence) or ORF10 (nucleotides 27,779-27,898 of the 29,751-base genome sequence) of SARS virus.

FIG. 24 shows the amino acid sequence (SEQ ID NO: 72) of ORF10 (nucleotides 27849-28100 of the 29,736-base genome sequence) or ORF11 (nucleotides 27,864-28118 of the 29,751-base genome sequence) of SARS virus.

FIG. 25 shows the amino acid sequence of ORF13 of SARS virus (SEQ ID NO: 73).

FIG. 26 shows the amino acid sequence of ORF14 of SARS virus (SEQ ID NO: 74).

FIG. 27 shows an alignment of the secreted region of the SARS virus ORF 10 (SEQ ID NO: 201) of the 29,751-base genome sequence (sars) with the conotoxin from Conus ventricosus (conotoxin) (SEQ ID NO: 200). Sequence identity is indicated by asterisks and sequence homology is indicated by dots.

DETAILED DESCRIPTION OF THE INVENTION

In general, the invention provides nucleic acid molecules, polypeptides, and other reagents derived from a SARS virus, as well as methods of using such nucleic acid molecules, polypeptides, and other reagents.

The genome sequence ( FIGS. 3A-P , 11A-K, SEQ ID NOs: 1, 2, and 15) reveals that the SARS coronavirus is only moderately related to other known coronaviruses, including two human coronaviruses, OC43 and 229E. Thus, the SARS virus is a previously unknown virus. The 5′ end of the SARS genome contains a 5′ leader sequence (Table 1 SEQ ID NO: 3) with sequence similarity to the highly conserved coronavirus core leader sequence, 5′-CUAAAC-3 (SEQ ID NO: 75 Sawicki, S. G., et al., Adv Exp Med Biol 440, 215-9, 1998 Lai, M. M. and D. Cavanagh, Adv Virus Res 48, 1-100, 1997). Transcriptional regulatory sequences (TRSs) were identified upstream of all open reading frames (ORFs) (Tables 1 and 2 SEQ ID NOs: 3-13 and 20-30). ORF9 and ORF10 of the 29,736-base SARS genome (ORF 10 and ORF 11 of the 29,751 base genome) overlap by 12 amino acids, and have matches to the TRS consensus in close proximity to their respective initiating methionine codons.

The 3 ′ UTR sequence (SEQ ID NO: 18) of SARS virus contains a s2m region having the sequence ACATTTTCATCGAGGCCACGCGGAGTACGAT CGAGGGTACAGTGAAT SEQ ID NO: 16) that includes a conserved, discontinuous 32 base-pair s2m motif. The conserved 32 base-pair motif is a universal feature of astroviruses that has also been identified in avian coronavirus (AIBV) and the ERV-2 equine rhinovirus. This motif has been identified by Jonassen C. M. et al. (J Gen Virol 1998 April 79 (Pt 4):715-8) as GCCGNGGCCACGC(G/C)GAGTA(C/G)GANCGAGGGTACAG(G/C) (SEQ ID NO: 19), where N is generally not part of the conserved motif, and can be any nucleotide. The region corresponding to the 32 base-pair motif in SARS virus includes the sequence: CGAGGCCACGCGGAGTACGATCGAGGGTACAG (SEQ ID NO: 17), and spans positions 29590-29621 of the 29,751 base genome. FIG. 4 shows an alignment of the s2m regions from Avian infectious bronchitis virus (AIBV SEQ ID NO: 32) and equine rhinovirus serotype 2 (ERV-2 SEQ ID NO: 31), as defined in Jonassen C. M. et al. (J Gen Virol 1998 April 79 (Pt 4):715-8), with the entire 3′ untranslated region (UTR) of the SARS virus (TOR2) (SEQ ID NO: 18).

The coding potentials of the 29,736-base and 29,751-base genomes are depicted in FIGS. 2 and 12 , respectively. Open reading frames (ORFs) include the Replicase 1a and 1b translation products, the Spike glycoprotein, the small Envelope protein, the Membrane and the Nucleocapsid protein. Construction of unrooted phylogenetic trees using this set of known proteins from representatives of the three known coronaviral groups reveals that the proteins encoded by the SARS virus do not readily cluster more closely with any known group than with any other ( FIGS. 1A-D and 13A-D). In addition, nine novel ORFs have been analyzed.

The Replicase 1a ORF located at nucleotides 250-13395 of the 29,736-base genome, and nucleotides 265-13,398 of the 29,751-base genome, and replicase 1b ORF located at nucleotides 13395-21467 of the 29,736-base genome, and nucleotides 13,398-21,485 of the 29,751-base genome, occupy 21.2 kb of the SARS virus genome ( FIGS. 2 and 12 ). These genes encode a number of proteins that are produced by proteolytic cleavage of a large polyprotein (Ziebuhr, J. et al., J Gen Virol 81, 853-79, April, 2000). A frame shift mutation interrupts the protein-coding region, separating the 1a and 1b open-reading frames. The proteins encoded by the Replicase 1a and 1b ORFs are depicted in FIGS. 16A-B and 17, SEQ ID NOs: 64 and 65).

The Spike glycoprotein (S) (E2 glycoprotein gene FIGS. 2 and 12 nucleotides 21477 to 25241 of the 29,736-base genome, and nucleotides 21,492 to 25,259 of the 29,751-base genome) encodes a surface projection glycoprotein precursor of about 1,255 amino acids in length ( FIG. 5 SEQ ID NO: 33), which may be significant in the virulence of the SARS virus. Mutations in this gene are correlated with altered pathogenesis and virulence in other coronaviruses (B. N. Fields et al., Fields virology (Lippincott Williams & Wilkins, Philadelphia, ed. 4 th , 2001). In other coronaviruses, the mature spike protein is inserted in the viral envelope with the majority of the protein exposed on the surface of the particles. Three molecules of the Spike protein form the characteristic peplomers or corona-like structures of this virus family. Analysis of the spike glycoprotein with SignalP (Nielson, H. et al., Prot Engineer. 10:1-6 (1997) indicates a signal peptide (MFIFLLFLTLTSG SEQ ID NO: 76)(probability 0.996) with cleavage between residues 13 and 14. TMHMM (Sonnhammer, E. L. et al., Proc Int Conf Intell Syst Mol Biol 6, 175-82 (1998)) indicates a transmembrane domain near the C-terminal end (WYVWLGFIAGLIAIVMVTILLCC SEQ ID NO: 183). Together these data indicate a type I membrane protein with N-terminus and the majority of the protein (residues 14-1195) on the outside of the cell-surface or virus particle, which may be responsible for binding to a cellular receptor. The SARS virus Spike glycoprotein has limited sequence identity to other, known Spike glycoproteins ( FIGS. 14A-F ).

ORF 3 ( FIGS. 2 and 12 nucleotides 25253-26074 of the 29,736-base genome and nucleotides 25,268-26,092 of the 29,751-base genome) encodes a protein of 274 amino acids ( FIG. 18 SEQ ID NO: 66) that lacks significant similarities to any known protein when analyzed with BLAST (Altschul, S. F. et al., Nucleic Acids Res 25, 3389-402, Sep. 1, 1997), FASTA (Pearson, W. R. and D. J. Lipman, Proc Natl Acad Sci USA 85, 2444-8, April, 1988) or PFAM (Bateman, A. et al., Nucleic Acids Res 30, 276-80, Jan. 1, 2002). Analysis of the N-terminal 70 amino acids with SignalP indicates the existence of a signal peptide (MDLFMRFFTLRSITAQ SEQ ID NO: 184) and a cleavage site (probability 0.540). Both TMpred (Hofinan, K. and W. Stoffel, Biol. Chem. Hoope-Seyler 374, 166 (1993) and TMHMM indicate three trans-membrane regions spanning approximately residues 34-56 (TIPLQASLPFGWLVIGVAFLAVF, SEQ ID NO: 77), 77-99 (FQFICNLLLLFVTIYSHLLLVAA, SEQ ID NO: 78), and 103-125 (AQFLYLYALIYFLQCINACRIIM, SEQ ID NO: 79). Both TMpred and TMHMM indicate that the C-terminus and a large 149 amino acid domain is located inside the viral or cellular membrane. The C-terminal (interior) region of the protein, corresponding to about amino acids 124-274 (MRCWLCWKCKSKNPLLYDANYFVCWHTHNYDYCIPYNSVTDTIVVTEGDGI STPKLKEDYQIGGYSEDRHSGVKDYVVVHGYFTEVYYQLESTQITTDTGIENAT FFIFNKLVKDPPNVQIHTIDGSSGVANPAMDPIYDEPTTTTSVPL SEQ ID NO: 185) may encode a protein domain with ATP-binding properties (PD037277).

ORF 4 ( FIG. 12 nucleotides 25,689-26,153 of the 29,751-base genome) encodes a predicted protein of 154 amino acids ( FIG. 19 SEQ ID NO: 67). This ORF overlaps entirely with ORF 3 and the E protein. ORF4 may be expressed from the ORF mRNA using an internal ribosomal entry site. BLAST analyses failed to identify matching sequences. Analysis with TMPred predicts a single transmembrane helix, amino acids 1-20 MMPTTLFAGTHITMTTVYHI, SEQ ID NO: 186.

The small envelope protein E ( FIGS. 2 and 12 nucleotides 26102-26329 of the 29,736-base genome and nucleotides 26,117-26,347, ORF 5, of the 29,751-genome) encodes a protein of 76 amino acids ( FIG. 7 SEQ ID NO: 35). BLAST and FASTA comparisons indicate that the protein, while novel, is homologous to multiple envelope proteins (alternatively known as small membrane proteins) from several coronaviruses. An alignment of the SARS virus E protein with the envelope protein of Porcine transmissible gastroenteritis coronavirus indicates approximately 28% sequence identity between the two proteins over a 61 amino acid overlap, as calculated by FASTA ( FIG. 15 ). PFAM analysis of the protein indicates that the small envelope protein E is a member of the NS3_EnvE protein family. InterProScan (R. Apweiler et al., Nucleic Acids Res 29, 37-40, Jan. 1, 2001 Zdobnov, E. M. and R. Apweiler, Bioinformatics 17, 847-8, September, 2001) analysis indicates that the protein is a component of the viral envelope, and homologs of it are found in other viruses, including gastroenteritis virus and murine hepatitis virus. SignalP analysis indicates the presence of a transmembrane anchor (probability 0.939). TMpred analysis indicates a similar transmembrane anchor at positions 17-34 (VLLFLAFVVFLLVTLAIL, SEQ ID NO: 80), which is consistent with the known association of homologous proteins with the viral envelope. TMHMM indicates a type II membrane protein with the majority of the 46 residue C terminus hydrophilic domain (TALRLCAYCCNIVNVSLVKPTVYVYSRVKNLNSSEGVPDLLV SEQ ID NO: 187) located on the surface of the viral particle. The E protein may be important for viral replication.

The Matrix glycoprotein M ( FIGS. 2 and 12 nucleotides 26383-27045 of the 29,736-base genome and nucleotides 26,398-27,063, ORF 6, of the 29,751-genome) encodes a protein of 221 amino acids ( FIG. 6 SEQ ID NO: 34). BLAST and FASTA analysis of the protein, while novel, reveals homologies to coronaviral matrix glycoproteins ( FIG. 9 ). The association of the spike glycoprotein (S) with the matrix glycoprotein (M) may be an essential step in the formation of the viral envelope and in the accumulation of both proteins at the site of virus assembly. Analysis of the amino acid sequence with SignalP indicates a signal sequence (probability 0.932), located at approximately residues 1-39 (MADNGTITVEELKQLLEQWNLVIGFLFLAWIMLLQFAYS SEQ ID NO: 188) that is unlikely to be cleaved. TMHMM and TMpred analysis both indicate the presence of three trans-membrane helices, located at approximately residues 15-37 (LLEQWNLVIGFLFLAWIMLLQFA SEQ ID NO: 81), 50-72 (LVFLWLLWPVTLACFVLAAVYRI SEQ ID NO: 82) and 77-99 (GGIAIAMACIVGLMWLSYFVASF SEQ ID NO: 83), with the 121 amino acid hydrophilic domain on the inside of the virus particle, where it may interact with nucleocapsid. The hydrophilic domain may run from approximately amino acids PLRGTIVTRPLMESELVIGAVIIRGHLRMAGHSLGRCDIKDLPKEITVATSRTLS YYKLGASQRVGTDSGFAAYNRYRIGNYKLNTDHAGSNDNIALLVQ (SEQ ID NO: 189) i.e. approximately amino acids 95 or 99 to 221 of SEQ ID NO: 34. PFAM analysis reveals a match to PFAM domain PF01635, and alignments to 85 other sequences in the PFAM database bearing this domain, which is indicative of the coronavirus matrix glycoprotein.

ORF6 ( FIG. 2 nucleotides 27059-27247 of the 29,736-base genome sequence) or ORF 7 ( FIG. 12 nucleotides 27,074-27,265 of the 29,751-base genome sequence) encodes a protein of 63 amino acids ( FIG. 20 SEQ ID NO: 68). TMpred analysis indicates a trans-membrane helix located between residues 3 or 4 and 22 (HLVDFQVTIAEILIIIMRTF SEQ ID NO: 84), with the N-terminus located outside the viral particle.

Similarly, the gene encoding ORF7 ( FIG. 2 nucleotides 27258-27623 of the 29,736-base genome sequence) or ORF 8 ( FIG. 12 nucleotides 27,273-27,641 of the 29,751-base genome sequence), encoding a protein of 122 amino acids ( FIG. 21 SEQ ID NO: 69), has no significant BLAST or FASTA matches to known proteins. Analysis of this sequence with SignalP indicates a cleaved signal sequence (MKIILFLTLIVFTSC SEQ ID NO: 85) (probability 0.995), with the cleavage site located between residues 15 and 16. TMpred and TMHMM analysis also indicates a trans-membrane helix located approximately at residues 99-117 (SPLFLIVAALVFLILCFTI SEQ ID NO: 86). Together these data indicate that this protein is a type I membrane protein with the major hydrophilic domain of the protein (residues 16-98 ELYHYQECVRGTTVLLKEPCP SGTYEGNSPFHPLADNKFALTCTSTHFAFACADGTRHTYQLRARSVSPKLFIRQ EEVQQELY SEQ ID NO: 87) and the amino-terminus is oriented inside the lumen of the ER/Golgi, or on the surface of the cell membrane or virus particle,depending on the membrane localization of the protein.

ORF8 ( FIG. 2 nucleotides 27623-27754 of the 29,736-base genome sequence) or ORF9 ( FIG. 12 nucleotides 27,638-27,772 of the 29,751-base genome sequence), encodes a protein of 44 amino acids ( FIG. 22 SEQ ID NO: 70). FASTA analysis of this sequence revealed some weak similarities (37% identity over a 35 amino acid overlap) to Swiss-Prot accession Q9M883, annotated as a putative sterol-C5 desaturase. A similarly weak match to a hypothetical Clostridium perfringens protein (Swiss-Prot accession CPE2366) was also detected. TMpred indicated a single strong trans-membrane helix FYLCFLAFLLFLVLIMLIIFWFS, SEQ ID NO: 190, with little preference for alternate models in which the N-terminus was located inside or outside the particle.

Similarly ORF9 ( FIG. 2 nucleotides 27764-27880 of the 29,736-base genome sequence) or ORF10 ( FIG. 12 nucleotides 27,779-27,898 of the 29,751-base genome sequence) encoding a protein of 39 amino acids ( FIG. 23 SEQ ID NO: 71), exhibited no significant matches in BLAST and FASTA searches but encodes a trans-membrane helix LLIVLTCISLCSCICTVVQ (SEQ ID NO: 191) by TMPred, with the N-terminus located within the viral particle. The region immediately upstream of this protein exhibits a strong match to the TRS consensus (Table 2), indicating that a transcript initiates from this site. The large number of cysteine residues (6) may result in cross linking of the amino acids. Amino acids ICTVVQRCASNKPHVLEDPCKVQH (SEQ ID NO: 192) of this protein may be secreted. The secreted amino acids exhibit homology to toxin proteins, for example, to the conotoxin of Conus ventricosus ( FIG. 27 ). Antigenic peptides from the hydrophilic (secreted) region, for example, CICTVVQRCASNKPHVLEDPCK (SEQ ID NO: 193), were used to generate monoclonal antibodies using standard techniques. Furthermore, the C terminal amino acids form a sequence that shares homology to farnesylation sites (CKQH), which generally require C terminal location to be functional. This protein may act as a virulence factor and/or may facilitate transmission to humans.

ORF10 ( FIG. 2 nucleotides 27849-28100 of the 29,736-base genome sequence) or ORF11 ( FIG. 12 nucleotides 27,864-28118 of the 29,751-base genome sequence) encoding a protein of 84 amino acids ( FIG. 24 SEQ ID NO: 72) exhibited only very short (9-10 residues) matches to a region of the human coronavirus E2 glycoprotein precursor (starting at residue 801). Analysis by SignalP and TMHMM predict a soluble protein. A detectable alignment to the TRS consensus sequence was also found (Table 2).

The protein (422 amino acids FIG. 8 SEQ ID NO: 36) encoded by the Nucleocapsid gene ( FIG. 2 nucleotides 28105-29370 of the 29,736-base genome sequence FIG. 12 , nucleotides 28,120-29,388 of the 29,751-base genome sequence) aligns well with nucleocapsid proteins from other representative coronaviruses ( FIGS. 10A-B ), although a short lysine rich region (KTFPPTEPKKDKKKKTDEAQ SEQ ID NO: 14) is unique to SARS. This region is suggestive of a nuclear localization signal Since some coronaviruses are able to replicate in enucleated cells, the SARS virus nucleocapsid protein may have evolved a novel nuclear function, which may play a role in pathogenesis. In addition, the basic nature of this peptide suggests it may assist in RNA binding. The SARS nucleocapsid protein is also a good candidate for diagnostic tests.

ORF13 ( FIG. 12 nucleotides 28,130-28,426 of the 29,751-base genome sequence) encodes a novel protein of 98 amino acids ( FIG. 25 SEQ ID NO: 73). ORF 14 ( FIG. 12 nucleotides 28,583-28,795 of the 29,751-base genome sequence) encodes a novel protein of 70 amino acids ( FIG. 26 SEQ ID NO: 74). TMPred predicts a single transmembrane helix VVAVIQEIQLLAAVGEILLLEW (SEQ ID NO: 194).

Various features of the SARS virus genome are summarised in Table 3. While Table 3 refers to the 29,751-base genome sequence, the features are also applicable to the 29,736-base genome sequence (SEQ ID NOs: 1 and 2).

Various polymorphisms may exist in the SARS virus. In the SARS 29,736-base genome sequences (SEQ ID NO: 1 or 2), for example, nucleotides 7904, 16607, 19168, 24857, or 26842 may be C or T or nucleotides 19049, 23205, or 25283 may be G or A, and in the SARS 29,751-base genome sequence (SEQ ID NO: 15), for example, nucleotides 7919, 16622, 19183, 24872, or 26857 may be C or T or nucleotides 19064, 23220, or 25298 may be G or A. In some embodiments, the nucleotide changes may result in no change in the encoded amino acid, or in a conservative or non-conservative change in the encoded amino acid. In some embodiments, a nucleotide change, as described herein, at position 7904 or 7919, may result in a A to V amino acid substitution, in the Replicase 1A protein coding region a change at position 19168 or 19183 may result in a V to A amino acid substitution, in the Replicase IB protein coding region a change at position 23205 or 23220 may result in a A to S amino acid substitution (non-conservative change), affecting the Spike glycoprotein coding region a change at position 25283 or 25298 may result in a R to G amino acid substitution (non-conservadve change), affecting ORF3 or a change at position 26842 or 26857 may result in a S to P amino acid substitution (non-conservative change), affecting the Nucleocapsid protein coding region, in the SARS 29,736-base (SEQ ID NO: 1 or 2) and 29,751-base genome (SEQ ID NO: 15) sequences, respectively. In various embodiments, a nucleotide or amino acid sequence including a particular polymorphism may be selected, for example, for use in the methods of the invention, or may be excluded, for example, from a particular use according to the invention.

Various alternative embodiments of the invention are described below. These embodiments include, without limitation, identification and use of SARS virus nucleic acid and amino acid sequences for diagnostic or therapeutic uses.

Diagnosis of SARS Virus-Related Disorders

A SARS virus-related disorder is any disorder that is mediated by the SARS virus, or by a nucleic acid molecule or polypeptide derived from the SARS virus. Accordingly, SARS virus nucleic acid molecules and polypeptides may be used to diagnose and identify a SARS virus-related disorder in a mammal, for example, a human or a domestic, farm, wild, or experimental animal. In some embodiments, SARS virus nucleic acid molecules and polypeptides may be used to screen such animals, e.g., civet cats, for the presence of SARS virus. A SARS virus-related disorder may be a hepatic, enteric, respiratory, or neurological disorder, and may be accompanied by one or more symptoms or indications including, but not limited to, fever, cough, shortness of breath, headache, low blood oxygen concentration, liver damage, or reduced lymphocyte numbers. Accordingly, samples for diagnosis may be obtained from cells, blood, serum, plasma, urine, stool, conjunctiva, sputum, asopharyngeal or oropharyngeal swabs, tracheal aspirates, bronchalveolar lavage, pleural fluid, amniotic fluid, or any other specimen, or any extract thereof, or by tissue biopsy of for example lungs or major organs, obtained from a patient (human or animal), test subject, or experimental animal.

A SARS virus-related disorder may be diagnosed by amplifying a SARS nucleic acid molecule or fragment thereof from a sample. Probes or primers for use in amplification may be prepared using standard techniques. In some embodiments, probes or primers are selected from regions of a SARS virus genome as described herein that show limited sequence homology or identity (e.g., less than 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% identity) to other viruses or pathogens, or to host sequences.

Nucleic acid sequences can be amplified as needed by methods known in the art. For example, this can be accomplished by e.g., polymerase chain reaction “PCR” of DNA or of RNA by reverse transcriptase-PCR or “RT-PCR” (See generally PCR Technology: Principles and Applications for DNA Amplification (ed. H. A. Erlich, Freeman Press, NY, N.Y., 1992) PCR Protocols: A Guide to Methods and Applications (eds. Innis, et al., Academic Press, San Diego, Calif., 1990) Mattila et al., Nucleic Acids Res. 19, 4967 (1991) Eckert et al., PCR Methods and Applications 1, 17 (1991) PCR (eds. McPherson et al., IRL Press, Oxford) and U.S. Pat. No. 4,683,202 issued Jul. 28, 1987 to Mullis) Variations of standard PCR techniques, such as for example real time RT-PCR using internal as well as amplification primers, resulting in increased sensitivity and speed, and reduction of risk of sample contamination (see for example Higuchi, R., et al., “Kinetic PCR Analysis: Real-time Monitoring of DNA Amplification Reactions,” Bio/Technology, vol. 11, pp. 1026-1030 (1993) Heid et al, “Real Time Quantitative PCT”, Genome Research, 1996, pp. 986-994 Gibson U E et al., “A novel method for real time quantitative RT-PCR,” Genome Res. 1996 October 6(10):995-1001), or the “Tacman” approach to PCR, described by for example Holland et al, Proc. Natl. Acad. Sci., 88: 7276-7280 (1991), may be performed.

Other suitable amplification and analytical methods include the single base primer extension (see for example U.S. Pat. No. 6,004,744), mini-sequencing, ligase chain reaction (LCR) (see for example Wu and Wallace, Genomics 4, 560 (1989), Landegren et al., Science 241, 1077 (1988), transcription amplification (Kwoh et al., Proc. Natl. Acad. Sci. USA 86, 1173 (1989)), and self-sustained sequence replication (Guatelli et al., Proc. Nat. Acad. Sci. USA, 87, 1874 (1990)) and nucleic acid based sequence amplification (NASBA). The latter two amplification methods involve isothermal reactions based on isothermal transcription, which produce both single stranded RNA (ssRNA) and double stranded DNA (dsDNA) as the amplification products in a ratio of about 30 or 100 to 1, respectively.

A SARS virus-related disorder may also be diagnosed using an antibody directed against a SARS virus nucleic acid or amino acid sequence that specifically binds a nucleic acid molecule or polypeptide. In an alternative embodiment, the antibody may be directed against a SARS polypeptide, for example, the S polypeptide or fragment thereof that is located on the surface of the SARS virion. Methods for preparation of antibodies or for assaying antibody binding are well known in the art.

Serological diagnosis may included detection of antibodies against a SARS virus polypeptide or nucleic acid molecule, e.g., the Nucleocapsid protein, produced in response to infection using techniques such as indirect fluorescent antibody testing or enzyme-linked immunosorbent assays (ELISA). A SARS virus-related disorder may also be diagnosed by for example performing in situ probe hybridization studies on tissue specimens.

In some aspects, diagnostic tests as described herein or known to those of skill in the art may be performed for SARS virus variants that exhibit increased pathogenicity, such as strains having redundant sequences.

In some embodiments, reagents for diagnosis (e.g, probes, primers, antibodies, etc.) may be provided in kits which may optionally include instructions for using the reagent or may include other reagents for performing the appropriate assay e.g., controls, standards, buffers, etc.

Therapy or Prophylaxis for SARS Virus-Related Disorders

Compounds according to the invention may also be used to provide therapeutics or prophylactics for SARS virus-related disorders. Accordingly, such compounds may be used to treat a mammal, for example, a human or a domestic, farm, wild, or experimental animal that has or is at risk for a SARS virus-related disorder. Such compounds may include, without limitation, compounds that interfere with SARS virus replication, expression of SARS virus proteins, or the ability of the SARS virus to infect a host cell. Accordingly, in some embodiments, compounds that act as antagonists to SARS virus polypeptides may be used as therapeutics or prophylactics for SARS virus related disorders. In some embodiments, purified SARS virus polypeptides may be used as for example competitive inhibitors to disrupt viral function. For example, a Spike protein lacking a functional domain, or having some other modification that maintains binding but reduces or eliminates pathogenicity, may be used to disrupt viral function. In some embodiments, antibodies that bind SARS virus polypeptides or nucleic acid molecules, for example, humanized antibodies, may be used as therapeutics or prophylactics.

In some embodiments, the SARS-virus compounds may be used as vaccines, or may be used to develop vaccines. For example, peptides derived from portions of SARS-virus proteins or polypeptides located on the outside of the virion or cell surface may be useful for vaccines or for generation of therapeutic or prophylactic antibodies.

A “vaccine” is a composition that includes materials that elicit a desired immune response. A vaccine may select, activate or expand memory B and T cells of the immune system to, for example, enable the elimination of infectious agents, such as a SARS virus, or a component thereof. In some embodiments, a vaccine includes a suitable carrier, such as an adjuvant, which is an agent that acts in a non-specific manner to increase the immune response to a specific antigen, or to a group of antigens, enabling the reduction of the quantity of antigen in any given vaccine dose, or the reduction of the frequency of dosage required to generate the desired immune response.

Vaccines according to the invention may include SARS virus polypeptides and nucleic acid molecules described herein, or immunogenic fragments thereof. In some embodiments, a SARS virus Spike polypeptide, Envelope polypeptide, or membrane glycoprotein or fragments thereof may be suitable for vaccine applications. In some embodiments, the vaccines may be multivalent and include one or more epitopes from a SARS virus polypeptide or fragment thereof.

In some embodiments of the invention, a vaccine may include a live or killed microorganism e.g., a SARS virus or a component thereof. If a live SARS virus is used, which may be administered in the form of an oral vaccine, is may contain non-revertible genetic alterations (for example, large deletions or insertions in the genomic sequence) that reduce or eliminate the virulence of the virus (“attenuated virus”), but not its induction of an immune response. In some embodiments, a live vaccine may include an attenuated non-SARS microorganism (e.g, bacteria or virus such as vaccinia virus) that is capable of expressing a SARS virus polypeptide or immunogenic fragment thereof as described herein. In some embodiments, a vaccine may include SARS virus polypeptides or nucleic acid molecules having modifications that facilitate ease of administration. For example, an indigestible SARS virus polypeptide or nucleic acid molecule may be used for oral administration, and a modification that is suitable for inhalation may be used for administration to the lung.

A “nucleic acid vaccine” or “DNA vaccine” as used herein, is a nucleic acid construct comprising a polynucleotide encoding a polypeptide antigen, particularly an antigenic amino acid subsequence identified by methods described herein or known in the art. The nucleic acid construct can also include transcriptional promoter elements, enhancer elements, splicing signals, termination and polyadenylation signals, and other nucleic acid sequences. Thus, a nucleic acid vaccine is generally introduced into a subject animal using for example one or more DNA plasmids including one or more antigen-coding sequences (for example, a SARS virus Envelope polypeptide or membrane glycoprotein sequence) that are capable of transfecting cells in vivo and inducing an immune response (see for example Whalen RG et al. DNA-mediated immunization and the energetic immune response to hepatitis B surface antigen. Clin Immunol Immunopathol 1995 75:1-12 Wolff J A et al. Direct gene transfer into mouse muscle in vivo. Science 1990 247:1465-8 Fynan E F et al. DNA vaccines: protective immunizations by parental, mucosal, and genegun inoculations. Proc Natl Acad Sci USA 1993 90:11478-82). In some embodiments, a library of nucleic acid fragments may be prepared by cloning SARS virus genomic DNA into a plasmid expression vector using known techniques and the library then used as a nucleic acid vaccine (see for example Barry M A, et al. Protection against mycoplasma infection using expression-library immunization. Nature 1995 377:632-5).

The subject is administered the nucleic acid vaccine using standard methods. The vertebrate can be administered parenterally, subcutaneously, intravenously, intraperitoneally, intradermally, intramuscularly, topically, orally, rectally, nasally, buccally, vaginally, by inhalation spray, or via an implanted reservoir in dosage formulations containing conventional non-toxic, physiologically acceptable carriers or vehicles. Alternatively, the subject is administered the nucleic acid vaccine through the use of a particle acceleration or bombardment instrument (a “gene gun”). The form in which it is administered (e.g., capsule, tablet, solution, emulsion) will depend in part on the route by which it is administered. For example, for mucosal administration, nose drops, inhalants or suppositories can be used. The nucleic acid vaccine can be administered in conjunction with known adjuvants. The adjuvant is administered in a sufficient amount, which is that amount that is sufficient to generate an enhanced immune response to the nucleic acid vaccine. The adjuvant can be administered prior to (e.g., 1 or more days before) inoculation with the nucleic acid vaccine concurrently with (e.g., within 24 hours of) inoculation with the nucleic acid vaccine contemporaneously (simultaneously) with the nucleic acid vaccine (e.g., the adjuvant is mixed with the nucleic acid vaccine, and the mixture is administered to the vertebrate) or after (e.g., 1 or more days after) inoculation with the nucleic acid vaccine. The adjuvant can also be administered at more than one time (e.g., prior to inoculation with the nucleic acid vaccine and also after inoculation with the nucleic acid vaccine). As used herein, the term “in conjunction with” encompasses any time period, including those specifically described herein and combinations of the time periods specifically described herein, during which the adjuvant can be administered so as to generate an enhanced immune response to the nucleic acid vaccine (e.g., an increased antibody titer to the antigen encoded by the nucleic acid vaccine, or an increased antibody titer to the pathogenic agent). The adjuvant and the nucleic acid vaccine can be administered at approximately the same location on the vertebrate for example, both the adjuvant and the nucleic acid vaccine are administered at a marked site on a limb of the subject.

In some embodiments, expression of a SARS virus gene or coding or non-coding region of interest may be inhibited or prevented using RNA interference (RNAi) technology, a type of post-transcriptional gene silencing. RNAi may be used to create a functional “knockout”, i.e. a system in which the expression of a gene or coding or non-coding region of interest is reduced, resulting in an overall reduction of the encoded product. As such, RNAi may be performed to target a nucleic acid of interest or fragment or variant thereof, to in turn reduce its expression and the level of activity of the product which it encodes. Such a system may be used for therapy or prophylaxis, as well as for functional studies. RNAi is described in for example published US patent applications 20020173478 (Gewirtz published Nov. 21, 2002) and 20020132788 (Lewis et al. published Nov. 7, 2002). Reagents and kits for performing RNAi are available commercially from for example Ambion Inc. (Austin, Tex., USA) and New England Biolabs Inc. (Beverly, Mass., USA).

The initial agent for RNAi in some systems is thought to be dsRNA molecule corresponding to a target nucleic acid. The dsRNA is then thought to be cleaved into short interfering RNAs (siRNAs) which are 21-23 nucleotides in length (19-21 bp duplexes, each with 2 nucleotide 3′ overhangs). The enzyme thought to effect this first cleavage step has been referred to as “Dicer” and is categorized as a member of the Rnase III family of dsRNA-specific ribonucleases. Alternatively, RNAi may be effected via directly introducing into the cell, or generating within the cell by introducing into the cell a suitable precursor (e.g. vector, etc.) of such an siRNA or siRNA-like molecule. An siRNA may then associate with other intracellular components to form an RNA-induced silencing complex (RISC). The RISC thus formed may subsequently target a transcript of interest via base-pairing interactions between its siRNA component and the target transcript by virtue of homology, resulting in the cleavage of the target transcript approximately 12 nucleotides from the 3′ end of the siRNA. Thus the target mRNA is cleaved and the level of protein product it encodes is reduced.

RNAi may be effected by the introduction of suitable in vitro synthesized siRNA or siRNA-like molecules into cells. RNAi may for example be performed using chemically-synthesized RNA, for which suitable RNA molecules may chemically synthesized using known methods. Alternatively, suitable expression vectors may be used to transcribe such RNA either in vitro or in vivo. In vitro transcription of sense and antisense strands (encoded by sequences present on the same vector or on separate vectors) may be effected using for example T7 RNA polymerase, in which case the vector may comprise a suitable coding sequence operably-linked to a T7 promoter. The in vitro-transcribed RNA may in embodiments be processed (e.g. using E. coli RNase III) in vitro to a size conducive to RNAi. The sense and antisense transcripts combined to form an RNA duplex which is introduced into a target cell of interest. Other vectors may be used, which express small hairpin RNAs (shRNAs) which can be processed into siRNA-like molecules. Various vector-based methods are known in the art. Various methods for introducing such vectors into cells, either in vitro or in vivo (e.g. gene therapy) are known in the art.

Accordingly, in an embodiment, expression of a polypeptide including an amino acid sequence substantially identical to a SARS virus sequence may be inhibited by introducing into or generating within a cell an siRNA or siRNA-like molecule corresponding to a nucleic acid molecule encoding the polypeptide or fragment thereof, or to an nucleic acid homologous thereto. In various embodiments such a method may entail the direct administration of the siRNA or siRNA-like molecule into a cell, or use of the vector-based methods described above. In an embodiment, the siRNA or siRNA-like molecule is less than about 30 nucleotides in length. In a further embodiment, the siRNA or siRNA-like molecules are about 21-23 nucleotides in length. In an embodiment, siRNA or siRNA-like molecules comprise and 19-21 bp duplex portion, each strand having a 2 nucleotide 3′ overhang. In embodiments, the siRNA or siRNA-like molecule is substantially identical to a nucleic acid encoding the polypeptide or a fragment or variant (or a fragment of a variant) thereof. Such a variant is capable of encoding a protein having the activity of a SARS virus polypeptide. In embodiments, the sense strand of the siRNA or siRNA-like molecule is substantially identical to a SARS virus nucleic acid molecule or a fragment thereof (RNA having U in place of T residues of the DNA sequence).

SARS Virus Protein Expression

In general, SARS virus polypeptides according to the invention, may be produced by transformation of a suitable host cell with all or part of a SARS virus polypeptide-encoding genomic or cDNA molecule or fragment thereof (e.g., the genomic DNA or cDNAs described herein) in a suitable expression vehicle. Those skilled in the field of molecular biology will understand that any of a wide variety of expression systems may be used to provide the recombinant protein. The precise host cell used is not critical to the invention. The SARS virus polypeptide may be produced in a prokaryotic host (e.g., E. coli or a virus, for example, a coronovirus such as human OC43 or 229E, a bovine coronavirus, or a virus used for gene therapy, such as an adenovirus) or in a eukaryotic host (e.g., Saccharomyces cerevisiae, insect cells, e.g., Sf21 cells, or mammalian cells, e.g., COS 1, NIH 3T3, VeroE6, or HeLa cells). Such cells are available from a wide range of sources (e.g., the American Type Culture Collection, Rockland, Md. also, see, e.g., Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons, New York, 1994). The method of transformation or transfection and the choice of expression vehicle will depend on the host system selected. Transformation and transfection methods are described, e.g., in Ausubel et al. (supra) expression vehicles may be chosen from those provided, e.g., in Cloning Vectors: A Laboratory Manual, P. H. Pouwels et al, 1985, Supp. 1987), or from commercially available sources. Suitable animal models, e.g. a ferret animal model, or any other animal model suitable for analysis of SARS virus infection or expression of SARS virus nucleic acid molecules may be used.

In an alternative embodiment, the baculovirus expression system (using, for example, the vector pBacPAK9) available from Clontech (Pal Alto, Calif.) may be used. If desired, this system may be used in conjunction with other protein expression techniques, for example, the myc tag approach described by Evan et al. (Mol. Cell Biol. 5:3610-3616, 1985). In an alternative embodiment, a SARS virus polypeptide may be produced by a stably-transfected mammalian cell line. A number of vectors suitable for stable transfection of mammalian cells are available to the public, e.g., see Pouwels et al (supra) methods for constructing such cell lines are also publicly available, e.g., in Ausubel et al. (supra). In one example, cDNA encoding the SARS virus polypeptide is cloned into an expression vector which includes the dihydrofolate reductase (DHFR) gene. Integration of the plasmid and, therefore, the SARS virus polypeptide-encoding gene into the host cell chromosome is selected for by inclusion of 0.01-300 μM methotrexate in the cell culture medium (as described in Ausubel et al., supra). This dominant selection can be accomplished in most cell types. Recombinant protein expression can be increased by DHFR-mediated amplification of the transfected gene. Methods for selecting cell lines bearing gene amplifications are described in Ausubel et al. (supra) such methods generally involve extended culture in medium containing gradually increasing levels of methotrexate. DHFR-containing expression vectors commonly used for this purpose include pCVSEII-DHFR and pAdD26SV(A) (described in Ausubel et al., supra). Any of the host cells described above or, preferably, a DHFR-deficient CHO cell line (e.g., CHO DHFR.sup.—cells, ATCC Accession No. CRL 9096) are among the host cells preferred for DHFR selection of a stably-transfected cell line or DHFR-mediated gene amplification.

Once the recombinant SARS virus polypeptide is expressed, it is isolated, e.g., using affinity chromatography. In one example, an anti-SARS virus polypeptide antibody (e.g., produced as described herein) may be attached to a column and used to isolate the SARS virus polypeptide. Lysis and fractionation of SARS virus polypeptde-harboring cells prior to affinity chromatography may be performed by standard methods (see, e.g., Ausubel et al., supra). In another example, SARS virus polypeptides may be purified or substantially purified from a mixture of compounds such as an extract or supernatant obtained from cells (Ausubel et al., supra). Standard purification techniques can be used to progressively eliminate undesirable compounds from the mixture until a single compound or minimal number of effective compounds has been isolated.

Once isolated, the recombinant protein can, if desired, be further purified, e.g., by high performance liquid chromatography (see, e.g., Fisher, Laboratory Techniques In Biochemistry And Molecular Biology, eds., Work and Burdon, Elsevier, 1980).

Polypeptides of the invention, particularly short SARS virus peptide fragments, can also be produced by chemical synthesis (e.g., by the methods described in Solid Phase Peptide Synthesis, 2nd ed., 1984 The Pierce Chemical Co., Rockford, Ill.).

These general techniques of polypeptide expression and purification can also be used to produce and isolate useful SARS virus protein fragments or analogs (described herein).

In certain alternative embodiments, the SARS polypeptide might have attached any one of a variety of tags. Tags can be amino acid tags or chemical tags and can be added for the purpose of purification (for example a 6-histidine tag for purification over a nickel column). In other preferred embodiments, various labels can be used as means for detecting binding of a SARS polypeptide to another polypeptide, for example to a cell surface receptor. Alternatively, SARS DNA or RNA may be labeled for detection, for example in a hybridization assay. SARS virus nucleic acids or proteins, or derivatives thereof, may be directly or indirectly labeled, for example, with a radioscope, a fluorescent compound, a bioluminescent compound, a chemiluminescent compound, a metal chelator or an enzyme. Those of ordinary skill in the art will know of other suitable labels or will be able to ascertain such, using routine experimentation. In yet another embodiment of the invention, the polypeptides disclosed herein, or derivatives thereof, are linked to toxins.

Isolation and Identification of Additional SARS Virus Molecules

Based on the SARS virus sequences described herein, the isolation and identification of additional SARS virus-related sequences such as SARS virus genes and of additional SARS virus strains or isolates is made possible using standard techniques. In addition, the SARS virus sequences provided herein also provide the basis for identification of homologous sequences from other species and genera from both prokaryotes and eukaryotes such as viruses, bacteria, fungi, parasites, yeast, and/or mammals. In some embodiments, the nucleic acid sequences described herein may be used to design probes or primers, including degenerate oligonucleotide probes or primers, based upon the sequence of either DNA strand. The probes or primers may then be used to screen genomic or cDNA libraries for sequences from for example naturally occurring variants or isolates of SARS viruses, using standard amplification or hybridization techniques.

In some embodiments, binding partners may be identified by tagging the polypeptides of the invention (e.g., those substantially identical to SARS virus polypeptides described herein) with an epitope sequence (e.g., FLAG or 2HA), and delivering it into host cells, either by transfection with a suitable vector containing a nucleic acid sequence encoding a polypeptide of the invention, followed by immunoprecipitation and identification of the binding partner. Cells may be infected with strains expressing the FLAG or 2HA fusions, followed by lysis and immunoprecipitation with anti-FLAG or anti-2HA antibodies. Binding partners may be identified by mass spectroscopy. If the polypeptide of the invention is not produced in sufficient quantities, such a method may not deliver enough tagged protein to identify its partner. As part of a complementary approach, each polypeptide of the invention may be cloned into a mammalian transfection vector fused to, for example, 2HA, GFP and/or FLAG. Following transfection, HeLa cells may be lysed and the tagged polypeptide immunoprecipitated. The binding partner may be identified by SDS PAGE followed by mass spectroscopy.

In some embodiments, polypeptides or antibodies of the invention may be tagged, produced, and used for example on affinity columns and/or in immunological assays to identify and/or confirm identified target compounds. FLAG, HA, and/or His tagged proteins can be used for such affinity columns to pull out host cell factors from cell extracts, and any hits may be validated by standard binding assays, saturation curves, and other methods as described herein or known to those of skill in the art.

In some embodiments, a two hybrid system may be used to study protein-protein interactions. The nucleic acid sequences described herein, or sequences substantially identical thereto, can be cloned into the pBT bait plasmid of the two hybrid system, and a commercially available murine spleen library of 5×10 6 independent clones, may be used as the target library for the baits. Potential hits may be further characterized by recovering the plasmids and retransforming to reduce false positives resulting from clonal bait variants and library target clones which activate the reporter genes independent of the cloned bait. Reproducible hits may be studied further as described herein.

Virulence may be assayed as described herein or as known to those of skill in the art. Once coding sequences have been identified, they may be isolated using standard cloning techniques, and inserted into any suitable vector or replicon for, for example, production of polypeptides. Such vectors and replicons include, without limitation, bacteriophage X (E. coli), pBR322 (E. coli), pACYC177 (E. coli), pKT230 (gram-negative bacteria), pGV1 106 (gram-negative bacteria), pLAFR1 (gram-negative bacteria), pME290 (non-E. coli gram-negative bacteria), pHV14 (E. coli and Bacillus subtilis), pBD9 (Bacillus), pIJ61 (Streptomyces), pUC6 (Streptomyces), YIp5 (Saccharomyces), YCp19 (Saccharomyces) or bovine papilloma virus (mammalian cells). In general, the polypeptides of the invention may be produced in any suitable host cell transformed or transfected with a suitable vector. The method of transformation or transfection and the choice of expression vehicle will depend on the host system selected. A wide variety of expression systems may be used, and the precise host cell used is not critical to the invention. For example, a polypeptide according to the invention may be produced in a prokaryotic host (e.g., E. coli) or in a eukaryotic host (e.g., Saccharoinyces cerevisiae, insect cells, e.g., Sf21 cells, or mammalian cells, e.g., NIH 3T3, HeLa, or COS cells). Such cells are available from a wide range of sources (e.g., the American Type Culture Collection, Manassus, Va.). Bacterial expression systems for polypeptide production include the E. coli pET expression system (Novagen, Inc., Madison, Wis.), and the pGEX expression system (Pharmacia).

In one aspect, compounds according to the invention include SARS virus nucleic acid molecules and polypeptides, such as the sequences disclosed in the Figures and Tables herein, and throughout the specification, and fragments thereof. In alternative embodiments, compounds according to the invention may be nucleic acid molecules that are at least 10 nucleotides in length, and that are derived from the sequences described herein. In alternative embodiments, compounds according to the invention may be peptides that are at least 5 amino acids in length, and that are derived from the sequences described herein.

In alternative embodiments, a compound according to the invention can be a non-peptide molecule as well as a peptide or peptide analogue. A peptide or peptide analogue will generally be as small as feasible while retaining full biological activity. A non-peptide molecule can be any molecule that exhibits biological activity as described herein or known in the art. Biological activity can, for example, be measured in terms of ability to elicit a cytotoxic response, to mediate DNA replication, or any other function of a SARS virus molecule.

Compounds can be prepared by, for example, replacing, deleting, or inserting an amino acid residue of SARS peptide or peptide analogue, as described herein, with other conservative amino acid residues, i.e., residues having similar physical, biological, or chemical properties, and screening for biological function.

It is well known in the art that some modifications and changes can be made in the structure of a polypeptide without substantially altering the biological function of that peptide, to obtain a biologically equivalent polypeptide. Such modifications may be made for the purpose of modifying function, or for facilitating administration or enhancing stability or inhibiting breakdown for, for example, therapeutic uses. For example, an indigestible SARS virus compound according to the invention may be used for oral administration a modification that is suitable for inhalation may be used for administration to the lung or addition of a leader sequence may increase protein expression levels.

In one aspect of the invention, SARS virus-derived peptides or epitopes may include peptides that differ from a portion of a native leader, protein or SARS virus sequence by conservative amino acid substitutions. The peptides and epitopes of the present invention also extend to biologically equivalent peptides that differ from a portion of the sequence of novel peptides of the present invention by conservative amino acid substitutions. As used herein, the term “conserved amino acid substitutions” refers to the substitution of one amino acid for another at a given location in the peptide, where the substitution can be made without substantial loss of the relevant function. In making such changes, substitutions of like amino acid residues can be made on the basis of relative similarity of side-chain substituents, for example, their size, charge, hydrophobicity, hydrophilicity, and the like, and such substitutions may be assayed for their effect on the function of the peptide by routine testing.

In some embodiments, conserved amino acid substitutions may be made where an amino acid residue is substituted for another having a similar hydrophilicity value (e.g., within a value of plus or minus 2.0), where the following may-be an amino acid having a hydropathic index of about −1.6 such as Tyr (−1.3) or Pro (−1.6)s are assigned to amino acid residues (as detailed in U.S. Pat. No. 4,554,101, incorporated herein by reference): Arg (+3.0) Lys (+3.0) Asp (+3.0) Glu (+3.0) Ser (+0.3) Asn (+0.2) Gln (+0.2) Gly (0) Pro (−0.5) Thr (−0.4) Ala (−0.5) His (−0.5) Cys (−1.0) Met (−1.3) Val (−1.5) Leu (−1.8) Ile (−1.8) Tyr (−2.3) Phe (−2.5) and Trp (−3.4).

In alternative embodiments, conserved amino acid substitutions may be made where an amino acid residue is substituted for another having a similar hydropathic index (e.g., within a value of plus or minus 2.0). In such embodiments, each amino acid residue may be assigned a hydropathic index on the basis of its hydrophobicity and charge characteristics, as follows: Ile (+4.5) Val (+4.2) Leu (+3.8) Phe (+2.8) Cys (+2.5) Met (+1.9) Ala (+1.8) Gly (−0.4) Thr (−0.7) Ser (−0.8) Trp (−0.9) Tyr (−1.3) Pro (−1.6) His (−3.2) Glu (−3.5) Gln (−3.5) Asp (−3.5) Asn (−3.5) Lys (−3.9) and Arg (−4.5).

In alternative embodiments, conserved amino acid substitutions may be made where an amino acid residue is substituted for another in the same class, where the amino acids are divided into non-polar, acidic, basic and neutral classes, as follows: non-polar: Ala, Val, Leu, Ile, Phe, Trp, Pro, Met acidic: Asp, Glu basic: Lys, Arg, His neutral: Gly, Ser, Thr, Cys, Asn, Gln, Tyr.

Conservative amino acid changes can include the substitution of an L-amino acid by the corresponding D-amino acid, by a conservative D-amino acid, or by a naturally-occurring, non-genetically encoded form of amino acid, as well as a conservative substitution of an L-amino acid. Naturally-occurring non-genetically encoded amino acids include beta-alanine, 3-amino-propionic acid, 2,3-diamino propionic acid, alpha-aminoisobutyric acid, 4-amino-butyric acid, N-methylglycine (sarcosine), hydroxyproline, ornithine, citrulline, t-butylalanine, t-butylglycine, N-methylisoleucine, phenylglycine, cyclohexylalanine, norleucine, norvaline, 2-napthylalanine, pyridylalanine, 3-benzothienyl alanine, 4-chlorophenylalanine, 2-fluorophenylalanine, 3-fluorophenylalanine, 4-fluorophenylalanine, penicillamine, 1,2,3,4-tetrahydro-isoquinoline-3-carboxylix acid, beta-2-thienylalanine, methionine sulfoxide, homoarginine, N-acetyl lysine, 2-amino butyric acid, 2-amino butyric acid, 2,4,-diamino butyric acid, p-aminophenylalanine, N-methylvaline, homocysteine, homoserine, cysteic acid, epsilon-amino hexanoic acid, delta-amino valeric acid, or 2,3-diaminobutyric acid.

In alternative embodiments, conservative amino acid changes include changes based on considerations of hydrophilicity or hydrophobicity, size or volume, or charge. Amino acids can be generally characterized as hydrophobic or hydrophilic, depending primarily on the properties of the amino acid side chain. A hydrophobic amino acid exhibits a hydrophobicity of greater than zero, and a hydrophilic amino acid exhibits a hydrophilicity of less than zero, based on the normalized consensus hydrophobicity scale of Eisenberg et aL (J. Mol. Bio. 179:125-142, 184). Genetically encoded hydrophobic amino acids include Gly, Ala, Phe, Val, Leu, Ile, Pro, Met and Trp, and genetically encoded hydrophilic amino acids include Thr, His, Glu, Gln, Asp, Arg, Ser, and Lys. Non-genetically encoded hydrophobic amino acids include t-butylalanine, while non-genetically encoded hydrophilic amino acids include citrulline and homocysteine.

Hydrophobic or hydrophilic amino acids can be further subdivided based on the characteristics of their side chains. For example, an aromatic amino acid is a hydrophobic amino acid with a side chain containing at least one aromatic or heteroaromatic ring, which may contain one or more substituents such as —OH, —SH, —CN, —F, —Cl, —Br, —I, —NO2, —NO, —NH2, —NHR, —NRR, —C(O)R, —C(O)OH, —C(O)OR, —C(O)NH2, —C(O)NHR, —C(O)NRR, etc., where R is independently (C1-C6) alkyl, substituted (C1-C6) alkyl, (C1-C6) alkenyl, substituted (C1-C6) alkenyl, (C1-C6) alkynyl, substituted (C1-C6) alkynyl, (C5-C20) aryl, substituted (C5-C20) aryl, (C6-C26) alkaryl, substituted (C6-C26) alkaryl, 5-20 membered heteroaryl, substituted 5-20 membered heteroaryl, 6-26 membered alkheteroaryl or substituted 6-26 membered alkheteroaryl. Genetically encoded aromatic amino acids include Phe, Tyr, and Tryp, while non-genetically encoded aromatic amino acids include phenylglycine, 2-napthylalanine, beta-2-thienylalanine, 1,2,3,4-tetrahydro-isoquinoline-3-carboxylic acid, 4-chlorophenylalanine, 2-fluoropbenylalanine3-fluorophenylalanine, and 4-fluorophenylalanine.

An apolar amino acid is a hydrophobic amino acid with a side chain that is uncharged at physiological pH and which has bonds in which a pair of electrons shared in common by two atoms is generally held equally by each of the two atoms (i.e., the side chain is not polar). Genetically encoded apolar amino acids include Gly, Leu, Val, Ile, Ala, and Met, while non-genetically encoded apolar amino acids include cyclohexylalanine. Apolar amino acids can be further subdivided to include aliphatic amino acids, which is a hydrophobic amino acid having an aliphatic hydrocarbon side chain. Genetically encoded aliphatic amino acids include Ala, Leu, Val, and Ile, while non-genetically encoded aliphatic amino acids include norleucine.

A polar amino acid is a hydrophilic amino acid with a side chain that is uncharged at physiological pH, but which has one bond in which the pair of electrons shared in common by two atoms is held more closely by one of the atoms. Genetically encoded polar amino acids include Ser, Thr, Asn, and Gln, while non-genetically encoded polar amino acids include citrulline, N-acetyl lysine, and methionine sulfoxide.

An acidic amino acid is a hydrophilic amino acid with a side chain pKa value of less than 7. Acidic amino acids typically have negatively charged side chains at physiological pH due to loss of a hydrogen ion. Genetically encoded acidic amino acids include Asp and Glu. A basic amino acid is a hydrophilic amino acid with a side chain pKa value of greater than 7. Basic amino acids typically have positively charged side chains at physiological pH due to association with hydronium ion. Genetically encoded basic amino acids include Arg, Lys, and His, while non-genetically encoded basic amino acids include the non-cyclic amino acids ornithine, 2,3,-diaminopropionic acid, 2,4-diaminobutyric acid, and homoarginine.

It will be appreciated by one skilled in the art that the above classifications are not absolute and that an amino acid may be classified in more than one category. In addition, amino acids can be classified based on known behaviour and or characteristic chemical, physical, or biological properties based on specified assays or as compared with previously identified amino acids. Amino acids can also include bifunctional moieties having amino acid-like side chains.

Conservative changes can also include the substitution of a chemically derivatised moiety for a non-derivatised residue, by for example, reaction of a functional side group of an amino acid. Thus, these substitutions can include compounds whose free amino groups have been derivatised to amine hydrochlorides, p-toluene sulfonyl groups, carbobenzoxy groups, t-butyloxycarbonyl groups, chloroacetyl groups or formyl groups. Similarly, free carboxyl groups can be derivatized to form salts, methyl and ethyl esters or other types of esters or hydrazides, and side chains can be derivatized to form O-acyl or O-alkyl derivatives for free hydroxyl groups or N-im-benzylhistidine for the imidazole nitrogen of histidine. Peptide analogues also include amino acids that have been chemically altered, for example, by methylation, by amidation of the C-terminal amino acid by an alkylamine such as ethylamine, ethanolamine, or ethylene diamine, or acylation or methylation of an amino acid side chain (such as acylation of the epsilon amino group of lysine). Peptide analogues can also include replacement of the amide linkage in the peptide with a substituted amide (for example, groups of the formula —C(O)—NR, where R is (C1-C6) alkyl, (C1-C6) alkenyl, (C1-C6) alkynyl, substituted (C1-C6) alkyl, substituted (C1-C6) alkenyl, or substituted (C1-C6) alkynyl) or isostere of an amide linkage (for example, —CH2NH—, —CH2S, —CH2CH2—, —CH═CH— (cis and trans), —C(O)CH2—, —CH(OH)CH2—, or —CH2SO—).

The compound can be covalently linked, for example, by polymerisation or conjugation, to form homopolymers or heteropolymers. Spacers and linkers, typically composed of small neutral molecules, such as amino acids that are uncharged under physiological conditions, can be used. Linkages can be achieved in a number of ways. For example, cysteine residues can be added at the peptide termini, and multiple peptides can be covalently bonded by controlled oxidation. Alternatively, heterobifunctional agents, such as disulfide/amide forming agents or thioether/amide forming agents can be used. The compound can also be constrained, for example, by having cyclic portions.

In some embodiments, three dimensional molecular modeling techniques may be used to identify or generate compounds that may be useful as therapeutics or diagnostics. Standard molecular modeling tools may be used, for example, those described in L-H Hung and R. Samudrala, PROTINFO: secondary and tertiary protein structure prediction, Nucleic Acids Research, 2003, Vol. 31, No. 13 3296-3299 A. Yamaguchi, et al. , Enlarged FAMSBASE: protein 3D structure models of genome sequences for 41 species, Nucleic Acids Research, 2003, Vol. 31, No. 1 463-468 J. Chen, et al., MMDB: Entrez's 3D-structure database, Nucleic Acids Research, 2003, Vol. 31, No. 1 474-477 R. A. Chiang, et al., The Structure Superposition Database, Nucleic Acids Research, 2003, Vol. 31, No. 1 505-510.

Peptides or peptide analogues can be synthesized by standard chemical techniques, for example, by automated synthesis using solution or solid phase synthesis methodology. Automated peptide synthesizers are commercially available and use techniques well known in the art. Peptides and peptide analogues can also be prepared using recombinant DNA technology using standard methods such as those described in, for example, Sambrook, et aL (Molecular Cloning: A Laboratory Manual. 2.sup.nd, ed., Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989) or Ausubel et al. (Current Protocols in Molecular Biology, John Wiley & Sons, 1994).

Compounds, such as peptides (or analogues thereof) can be identified by routine experimentation by, for example, modifying residues within SARS peptides introducing single or multiple amino acid substitutions, deletions, or insertions, and identifying those compounds that retain biological activity, e.g., those compounds that have cytotoxic ability.

In general, candidate compounds for prevention or treatment of SARS virus-mediated disorders are identified from large libraries of both natural product or synthetic (or semi-synthetic) extracts or chemical libraries according to methods known in the art. Candidate or test compounds may include, without limitation, peptides, polypeptides, synthesised organic molecules, naturally occurring organic molecules, and nucleic acid molecules. In some embodiments, such compounds screen for the ability to inhibit SARS virus replication or pathogenicity, while maintaining the infected cell's ability to grow or survive.

Those skilled in the field of drug discovery and development will understand that the precise source of test extracts or compounds is not critical to the method(s) of the invention. Accordingly, virtually any number of chemical extracts or compounds can be screened using the exemplary methods described herein or using standard methods. Examples of such extracts or compounds include, but are not limited to, plant-, fungal-, prokaryotic- or animal-based extracts, fermentation broths, and synthetic compounds, as well as modification of existing compounds. Numerous methods are also available for generating random or directed synthesis (e.g., semi-synthesis or total synthesis) of any number of chemical compounds, including, but not limited to, saccharide-, lipid-, peptide-, and nucleic acid-based compounds. Synthetic compound libraries are commercially available. Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant, and animal extracts are commercially available from a number of sources, including Biotics (Sussex, UK), Xenova (Slough, UK), Harbor Branch Oceanographic Institute (Ft. Pierce, Fla.), and PharmaMar, U.S.A. (Cambridge, Mass.). In addition, natural and synthetically produced libraries of, for example, SARS virus polypeptides containing leader sequences, are produced, if desired, according to methods known in the art, e.g., by standard extraction and fractionation methods. Furthermore, if desired, any library or compound is readily modified using standard chemical, physical, or biochemical methods.

When a crude extract is found to modulate cytotoxicity or viral infection, further fractionation of the positive lead extract is necessary to isolate chemical constituents responsible for the observed effect. Thus, the goal of the extraction, fractionation, and purification process is the careful characterization and identification of a chemical entity within the crude extract having, for example, anti-cytotoxicity or anti-viral properties. The same assays described herein for the detection of activities in mixtures of compounds can be used to purify the active component and to test derivatives thereof. Methods of fractionation and purification of such heterogenous extracts are known in the art. If desired, compounds shown to be useful agents for treatment are chemically modified according to methods known in the art. Compounds identified as being of therapeutic, prophylactic, diagnostic, or other value in for example cell culture systems, such as a Vero E6 culture system, may be subsequently analyzed using a ferret animal model, or any other animal model suitable for analysis of SARS.

The compounds of the invention can be used to prepare antibodies to SARS virus peptides, protein, polyproteins, or analogs thereof, or to SARS virus nucleic acid molecules or analogs thereof using standard techniques of preparation as, for example, described in Harlow and Lane (Antibodies A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., 1988), or known to those skilled in the art. Antibodies may include polyclonal antibodies, monoclonal antibodies, hybrid antibodies (e.g., divalent antibodies having different pairs of heavy and light chains), chimeric antibodies (e.g., antibodies having constant and variable domains from different species and/or class), modified antibodies (e.g, antibodies in which the naturally occurring sequence has been altered by for example recombinant techniques), Fab antibodies, anti-idiotype antibodies, etc. Antibodies can be tailored to minimise adverse host immune response by, for example, using chimeric antibodies containing an antigen binding domain from one species and the Fc portion from another species, or by using antibodies made from hybridomas of the appropriate species. For example, “humanized” antibodies may be used for administration to humans.

To generate SARS virus polypeptide-specific antibodies, a SARS virus polypeptide coding sequence may be expressed, for example, as a C-terminal fusion with glutathione S-transferase (GST) (Smith et al., Gene 67:31-40, 1988). The fusion polypeptide may then be purified on glutathione-Sepharose beads, eluted with glutathione cleaved with thrombin (at the engineered cleavage site), and purified to the degree necessary for immunization of rabbits. Primary immunizations are carried out with Freud's complete adjuvant and subsequent immunizations with Freud's incomplete adjuvant. Antibody titres are monitored by Western blot and immunoprecipitation analyzes using the thrombin-cleaved SARS virus polypeptide fragment of the GST-SARS virus fusion polypeptide. Immune sera are affinity purified using CNBr-Sepharose-coupled SARS virus polypeptide. Antiserum specificity is determined using a panel of unrelated GST polypeptides.

As an alternate or adjunct immunogen to GST fusion polypeptides, peptides corresponding to relatively unique hydrophilic SARS virus polypeptides may be generated and coupled to keyhole limpet hemocyanin (KLH) through an introduced C-terminal lysine. Antiserum to each of these peptides is similarly affinity purified on peptides conjugated to BSA, and specificity tested in ELISA and Western blots using peptide conjugates, and by Western blot and immunoprecipitation using SARS virus polypeptide expressed as a GST fusion polypeptide.

Alternatively, monoclonal antibodies may be prepared using the SARS virus polypeptides described above and standard hybridoma technology (see, e.g., Kohler et al., Nature, 256:495, 1975 Kohler et al., Eur. J Immunol. 6:511, 1976 Kohler et al., Eur. J. Immunol. 6:292, 1976 Hammerling et al., In Monoclonal Antibodies and T Cell Hybridomas, Elsevier, NY, 1981 Ausubel et al., supra). Once produced, monoclonal antibodies are also tested for specific SARS virus polypeptide recognition by Western blot or immunoprecipitation analysis (by the methods described in Ausubel et al., supra). Antibodies which specifically recognize SARS virus polypeptides are considered to be useful in the invention such antibodies may be used, e.g., in an immunoassay to monitor the level of SARS virus polypeptides produced by a mammal (for example, to determine the amount or location of a SARS virus polypeptide).

In an alternative embodiment, antibodies of the invention are not only produced using the whole SARS virus polypeptide, but using fragments of the SARS virus polypeptide which are unique or which lie outside highly conserved regions and appear likely to be antigenic, by criteria such as high frequency of charged residues may also be used. In one specific example, such fragments are generated by standard techniques of PCR and cloned into the pGEX expression vector (Ausubel et al., supra). Fusion polypeptides are expressed in E. coli and purified using a glutathione agarose affinity matrix as described in Ausubel et al. (supra). To attempt to minimize the potential problems of low affinity or specificity of antisera, two or three such fusions are generated for each polypeptide, and each fusion is injected into at least two rabbits. Antisera are raised by injections in a series, preferably including at least three booster injections. SARS virus antibodies may also be prepared against SARS virus nucleic acid molecules.

Antibodies may be used as diagnostics, therapeutics, or prophylactics for SARS virus-related disorders. Antibodies may also be used to isolate SARS virus and compounds by for example affinity chromatography, or to identify SARS virus compounds isolated or generated by other techniques.

In some aspects, biological assays, such as diagnostic or other assays, using high density nucleic acid, polypeptide, or antibody arrays, for example high density miniaturized arrays or “microarrays,” of SARS virus nucleic acid molecules or polypeptides, or antibodies capable of specifically binding such nucleic acid molecules or polypeptides, may be performed. Macroarrays, performed for example by manual spotting techniques, may also be used. Arrays generally require a solid support (for example, nylon, glass, ceramic, plastic, silicon, nitrocellulose or PVDF membranes, microwells, microbeads, e.g., magnetic microbeads, etc.) to which the nucleic acid molecules or polypeptides or antibodies are attached in a specified two-dimensional arrangement, such that the pattern of hybridization is easily determinable. Suspension arrays (particles in suspension) that are coded to facilitate identification may also be used. SARS virus nucleic acid molecules or polypeptide probes or targets may be compounds as described herein.

In some embodiments, high density nucleic acid arrays may for example be used to monitor the presence or level of expression of a large number of SARS virus nucleic acid molecules or genes or for detecting or identifying SARS virus nucleic acid sequence variations, mutations or polymorphisms. For the purpose of such arrays, “nucleic acids” may include any polymer or oligomer of nucleosides or nucleotides (polynucleotides or oligonucleotides), which include pyrimidine and purine bases, preferably cytosine, thymine, and uracil, and adenine and guanine, respectively, or may include peptide nucleic acids (PNA). In an alternative aspect, the invention provides nucleic acid microarrays including a number of distinct nucleic acid sequence arrays of the invention, thus providing specific “sets” of sequences. The number of distinct sequences may for example be any integer between 2 and 1×10 5 , such as at least 10 2 , 10 3 , 10 4 , or 10 5 .

The invention also provides gene knockout and expression libraries. Thus, nucleic acid molecules encoding SARS virus polypeptides or proteins (e.g., PCR products of ORF's or total mRNA) may for example be attached to a solid support, hybridized with single stranded detectably-labeled cDNAs (corresponding to an “antisense” orientation), and quantified using an appropriate method such that a signal is detected at each location at which hybridization has taken place. The intensity of the signal would then reflect the level of gene expression. Comparison of results from viruses, for example, of different strains or from different samples or subjects, would elucidate differing levels of expression of specified genes. Using similar techniques, homologous nucleic acids may be identified from different viruses if SARS virus nucleic acids are used in the microarray, and probed with nucleic acid molecules from different viruses or subjects. In some embodiments, this approach may involve constructing his-tagged ORP expression libraries of viral genomes in a bacterial host, similar to an expression library in yeast (Martzen M. R. et al., 1999. Science, 286:1153). ORF-encoded protein activities may for example be detected in purified his-tagged protein pools in cases where activities cannot be detected in extracts or cells. In one aspect of the invention, arrayed libraries may be constructed of viral strains each of which bears a plasmid expressing a different SARS virus ORF under control of an inducible promoter. ORFs are amplified using PCR and cloned into a vector that enables their expression as N-terminal his-tagged polypeptides. These amplicons are also used to construct hybridization microarrays and enable targeted gene disruption, reducing expenses. A suitable expression host is selected, and genes encoding particular biochemical activities are identified by screening arrayed pools of his-tagged proteins as described previously (Martzen M. R., McCraith S. M., Spinelli S. L., Torres F. M., Fields S., Grayhack E. J., and Phizicky E. M., 1999. Science, 286:1153).

In some embodiments, protein arrays (including antibody or antigen arrays) may be used for the analysis and identification of SARS virus polypeptides or host responses to such polypeptides. Thus, protein arrays may be used to detect SARS virus polypeptides in a patient distinguish a SARS virus polypeptide from a host polypeptide detect interactions between SARS virus polypeptides and for example host proteins determine the efficacy of potential therapeutics, such as small molecules or ligands that may bind SARS virus polypeptides determine protein-antibody interactions and/or detect the interaction of enzyme-substrate interactions. Protein arrays may also be used to detect SARS virus antigens and antibodies in samples to profile expression of SARS virus polypeptides to identify suitable antibodies or map epitopes or for a variety of protein function analyses.

A variety of methods are known for making and using microarrays, as for example disclosed in Cheung V. G., et al., 1999. Nature Genetics Supplement, 21:15-19 Lipshutz R. J., et al., 1999. Nature Genetics Supplement, 21:20-24 Bowtell D. D. L., 1999. Nature Genetics Supplement, 21:25-32 Singh-Gasson S., et al., 1999. Nature Biotechnol., 17:974-978 and Schweitzer B., et al., 2002. Nature Biotechnol., 20:359-365. Thus, for example, microarrays may be designed by synthesizing oligonucleotides with sequence variations based on a reference sequences, such as any SARS virus sequences described herein. Methods for storing, querying and analyzing microarray data have for example been disclosed in, for example, U.S. Pat. No. 6,484,183 U.S. Pat. No. 6,188,783 and Holloway A. J., et al., 2002. Nature Genetics Supplement, 32:481-489. Protein arrays may be constructed, detected, and analysed using methods known in the art for example mass spectrometric techniques, immunoassays such as ELISA and western (dot) blotting combined with for example fluorescence detection techniques, and adapted for high throughput analysis, as described in for example MacBeath, G. and Schreiber, S. L. Science 2000, 289, 1760-1763 Levit-Binnun N, et al. (2003) Quantitative detection of protein arrays. Anal Chem 75:1436-41 Kukar T, et al. (2002) Protein microarrays to detect protein-protein interactions using red and green fluorescent proteins. Anal Biochem 306:50-4 Borrebaeck C A, et al. (2001) Protein chips based on -recombinant antibody fragments: a highly sensitive approach as detected by mass spectrometry. Biotechniques 30:1126-1132 Huang R P (2001) Detection of multiple proteins in an antibody-based protein microarray system. J Immunol Methods 255:1-13 Emili A Q and Cagney G (2000) Large-scale functional analysis using peptide or protein arrays. Nature Biotechnol 18:393-397 Zhu H, et al. (2000) Analysis of yeast protein kinases using protein chips. Nature Genet 26:283-9 Lueking A, et al. (1999) Protein Microarrays for Gene Expression and Antibody Screening. Anal. Biochem. 270:103-111 or Templin M F, et al. (2002) Protein microarray technology. Drug Discov Today 7:815-822. Tools for microarray techniques are available commercially from for example Affymetrix, Santa Clara, Calif. Nanogen, San Diego, Calif. or Sequenom, San Diego, Calif.

Computer Readable Records

Nucleic acid and polypeptide sequences, as described herein, or a fragment thereof, may be provided in a variety of media to facilitate access to these sequences and enable the use thereof. According, SARS virus nucleic acid and polypeptide sequences of the invention may be recorded or stored on computer readable media, using any technique and format that is appropriate for the particular medium.

In alternative embodiments, the invention provides computer readable media encoded with a number of distinct nucleic acid or amino acid data sequences of the invention. The number of distinct sequences may for example be any integer between 2 and 1×10 5 , such as at least 10 2 , 10 3 , 10 4 or 10 5 . In one embodiment, the invention features a computer medium having a plurality of digitally encoded data records. Each data record may include a value representing a nucleic acid or amino acid sequence of the invention. In some embodiments, the data record may further include values representing the level of expression, level or activity of a nucleic acid or amino acid sequence of the invention. The data record can be structured as a table, for example, a table that is part of a database such as a relational database (for example, a SQL database of the Oracle or Sybase database environments). The invention also includes a method of communicating information about a sample, for example by transmitting information, for example transmitting a computer readable record as described herein, for example over a computer network. The polypeptide and nucleic acid sequences of the invention, and sequence information pertaining thereto, may be routinely accessed by one of ordinary skill in the art for a variety of purposes, including for the purposes of comparing substantially identical sequences, etc. Such access may be facilitated using publicly available software as described herein. By “computer readable media” is meant any medium that can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape optical storage media such as CD-ROM electrical storage media such as RAM and ROM and hybrids of these categories such as magnetic/optical storage media.

Pharmaceutical and Veterinary Compositions, Dosages, and Administration

Compounds of the invention can be provided alone or in combination with other compounds (for example, small molecules, peptides, or peptide analogues), in the presence of a liposome, an adjuvant, or any pharmaceutically acceptable carrier, in a form suitable for administration to humans or to animals.

Conventional pharmaceutical practice may be employed to provide suitable formulations or compositions to administer the compounds to patients suffering from or presymptomatic for SARS. Any appropriate route of administration may be employed, for example, parenteral, intravenous, subcutaneous, intramuscular, intracranial, intraorbital, ophthalmic, intraventricular, intracapsular, intraspinal, intracisternal, intraperitoneal, intranasal, aerosol, or oral administration. In some embodiments, compounds are delivered directly to the lung, by for example, formulations suitable for inhalation. In some embodiments, gene therapy techniques may be used for administration of SARS virus nucleic acid molecules, for example, as DNA vaccines.Formulations may be in the form of liquid solutions or suspensions for oral administration, formulations may be in the form of tablets or capsules and for intranasal formulations, in the form of powders, nasal drops, or aerosols.

Methods well known in the art for making formulations are found in, for example, “Remington's Pharmaceutical Sciences” (18 th edition), ed. A. Gennaro, 1990, Mack Publishing Company, Easton, Pa. Formulations for parenteral administration may, for example, contain excipients, sterile water, or saline, polyalkylene glycols such as polyethylene glycol, oils of vegetable origin, or hydrogenated napthalenes. Biocompatible, biodegradable lactide polymer, lactide/glycolide copolymer, or polyoxyethylene-polyoxypropylene copolymers may be used to control the release of the compounds. Other potentially useful parenteral delivery systems for modulatory compounds include ethylene-vinyl acetate copolymer particles, osmotic pumps, implantable infusion systems, and liposomes. Formulations for inhalation may contain excipients, for example, lactose, or may be aqueous solutions containing, for example, polyoxyethylene-9-lauryl ether, glycocholate and deoxycholate, or may be oily solutions for administration in the form of nasal drops, or as a gel.

If desired, treatment with a compound according to the invention may be combined with more traditional therapies for the disease.

For therapeutic or prophylactic compositions, the compounds are administered to an individual in an amount sufficient to stop or slow the replication of the SARS virus, or to confer protective immunity against future SARS virus infection. Amounts considered sufficient will vary according to the specific compound used, the mode of administration, the stage and severity of the disease, the age, sex, and health of the individual being treated, and concurrent treatments. As a general rule, however, dosages can range from about 1 μg to about 100 mg per kg body weight of a patient for an initial dosage, with subsequent adjustments depending on the patient's response, which can be measured, for example by determining the presence of SARS nucleic acid molecules, polypeptides, or virions in the patient's peripheral blood.

In the case of vaccine formulations, an inmunogenically effective amount of a compound of the invention can be provided, alone or in combination with other compounds, with an adjuvant, for example, Freund's incomplete adjuvant or aluminum hydroxide. The compound may also be linked with a carrier molecule, such as bovine serum albumin or keyhole limpet hemocyanin to enhance immunogenicity. In general, compounds of the invention should be used without causing substantial toxicity. Toxicity of the compounds of the invention can be determined using standard techniques, for example, by testing in cell cultures or experimental animals and determining the therapeutic index, i.e., the ratio between the LD50 (the dose lethal to 50% of the population) and the LD100 (the dose lethal to 100% of the population). In some circumstances however, such as in severe disease conditions, it may be necessary to administer substantial excesses of the compositions.

Virus isolation was performed on a bronchoaveolar lavage specimen of a fatal SARS case belonging to the original case cluster from Toronto, Canada. All work with the infectious agent was performed in a biosafety level 3 (BSL3) laboratory using a N100 mask for personal protection. Samples were removed from BSL3 after addition of the RNA extraction buffer. The virus isolate, named the “Tor2 isolate” was grown in African Green Monkey Kidney (Vero E6) cells, the viral particles were purified, and the genetic material (RNA) was extracted from the Tor2 isolate (Poutanen, S. M. et al., N Engl J Med, Apr. 10, 2003). More specifically, one hundred microlitre specimens were used to inoculate Vero E6 cells (ATCC CRL 1586) on Dulbecco's Modified Eagle Medium supplemented with penicillin/streptomycin, glutamine and 2% fetal calf serum. The culture was incubated at 37° C. Cytopathogenic effect was observed 5 days post inoculation. The virus was passaged into newly seeded Vero E6 cells which showed a cytopathogenic effect as early as 2 days post infection (multiplicity of infection 10 −2 ). A virus stock was prepared from passage 2 of these cells and preserved in liquid nitrogen. The titer of the virus stock was determined to be 1×10 7 plaque forming units (p.f.u.) by plaque assay and 5×10 6 by tissue culture infectious dose (TCID) 50.

For virus propagation, 10×T-162 flasks of Vero E6 cells were infected with a multiplicity of infection of 10 −2 . When infected cells showed a cytopathognic effect of ‘4+’ (48 hours post infection), the cultures were then frozen and thawed to lyse the cells, and the supernatants were clarified from cell debris by centrifugation at 10,000 rpm in a Beckman high-speed centrifuge. The supernatants were treated with DNAse and RNAse for 3 hours at 37° C. to remove any cellular genomic nucleic acids and subsequently extracted with an equal volume of 1,1,2-trichloro-trifluoroethane. The top fraction was ultra-centrifuged through a 5%/40% glycerol step gradient at 151,000×g for 1 hour at 4° C. The virus pellet was resuspended in PBS. RNA was isolated using a commercial kit from QIAGEN and stored at −80° C. for further use.

cDNA Library Construction

The RNA and subsequent products were handled under biosafety level 2 (BSL2) conditions. The RNA sample was converted to a cDNA library, using a combined random-priming and oligo-dT priming strategy, and resultant subgenomic clones were processed under level 1 biosafety conditions. More specifically, purified viral RNA (55 ng) was used in the construction of a random primed and oligo-dT primed cDNA library, using the SuperScript Choice System for cDNA synthesis (Invitrogen). Linkers 5′-AATTCGCGGCCGCGTCGAC-3′, SEQ ID NO: 195, and 5′-pGTCGACGCGGCCGCG-3′, SEQ ID NO: 196, were ligated following cDNA synthesis. The cDNA synthesis products were visualized on agarose gels, revealing the anticipated low-yield smear. To produce sufficient cDNA for cloning, the cDNA product was size fractionated on a low-melting point preparative agarose gel, followed by PCR amplification using a single PCR primer 5′AATTCGCGGCCGCGTCGAC-3′, SEQ ID NO: 197, specific to the linkers. This yielded sufficient material for cloning.

Size-selected cDNA products were cloned and single sequence reads were generated from each end of the insert from randomly picked clones. A list of the SARS virus clones is provided in the accompanying sequence listing, which is incorporated by reference herein (SEQ ID NOs: 92-159, 208 and 209).

More specifically, size-selected cDNAs were ligated into the pCR4-TOPO TA cloning vector (Invitrogen, CA), or after digestion with the restriction nuclease Not I into the pBR194c vector (The Institute for Genomic Research, Rockville, Md., USA). Ligated clones were then transformed by electroporation into DH10B T1 cells (Invitrogen), plated on 22 cm agar plates with the appropriate antibiotic and grown for 16 hours at 37° C. Colonies were picked into 384-well Axygen culture blocks containing 2×YT media and grown in a shaking incubator for 18 hours at 37° C. Cells were lysed and DNA purified using standard laboratory procedures. Sequencing primers for the 194c clones were 5′-GGCCTCTTCGCTATTACGC-3′ (forward primer) (SEQ ID NO: 159) and 5′ TGCAGGTCGACTCTAGAGGAT-3′ (reverse primer) (SEQ ID NO: 198).

DNA Sequencing and Assembly of Reads

Sequences were assembled and the assembly edited to produce the genomic sequence of the SARS virus. More specifically, DNA sequencing of both ends of the plasmid templates was achieved using Applied Biosystems BigDye terminator reagent (version 3), with electrophoresis and data collection on AB 3700 and 3730 XL instruments DNA sequence reads were screened for non-viral contaminating sequences, trimmed for quality using PHRED (Ewing, B, and P. Green, Genome Res 8, 186-94, March, 1998) and assembled using PHRAP (Gordon, D. et al. Genome Res 8, 195-202, March, 1998). Simultaneously, sequences were used in BLAST searches of viral nucleotide and non-redundant protein datasets (NCBI, National Library of Medicine) to search for similarities. Sequence assemblies were visualized using CONSED (Gordon, D. et al. Genome Res 8, 195-202, March, 1998). Sequence mis-assemblies and contig joins were identified using Miropeats (Parsons, J. D., Comput Appl Biosci 11, 615-9 (December, 1995). As sequence data accrued, the additional sequences were assembled until it became apparent that the additional depth of sampling was increasing depth of coverage but not extending the length of the contig. At this point, 3,080 sequencing reads were generated, 2,634 of which were assembled into a single large contig.

The sequence information was imported into an ACEDB database (Durbin, J. Thierry-Mieg. 1991. A C. elegans Database. Documentation, code and data available from anonymous FTP servers at lirmm “dot” lirmm “dot” fr cele “dot” mrc-1mb “dot” cam “dot” ac “dot” uk and ncbi “dot” n1m “dot” nih “dot” gov) and subjected to biological analysis including the identification of open reading frames, detection of similar sequences by BLAST and searching for apparent frameshifts. When frameshifts were identified by this analysis, the sequence assembly was consulted for evidence of sequencing errors and if found, they were corrected. The sequences were also searched for any that could extend the 5′ end of the sequence and these were incorporated when found. High quality sequence discrepancies between different sequence reads were identified and resolved. Sequence reads classified as deleted or chimeric were identified through manual inspection and removed from the assembly. The resulting sequence has an average PHRED consensus quality score of 89.96. The lowest quality bases in the assembly are in the immediate vicinity of the 5′ and 3′ ends of the viral genome, with the lowest quality base having a PHRED score of 35. Most (29,694 of the 29,736 (99.86%)) of the bases have a consensus score of 90. Almost all regions of the genome are represented by reads derived from both strands of the plasmid sequencing templates, the exceptions being 50 bases at the 5′ end represented by a single sequencing read, and 5 bases at the 3′ end represented by a single read. The average base in the assembly is represented by 30 reads in the forward direction and 30 reads in the reverse direction, as determined by PHRED. RT-PCR products predicted from the sequence and spanning the entire genome yield PCR products of the anticipated size on agarose gels. To confirm the 5′ end of the viral genome RACE was performed using the RLM-RACE kit from Ambion, and primers 5′-CAGGAAACAGCTATGACACCAAGAACAAGGCTCTCCA-3′ (SEQ ID NO: 90) and 5′-CAGGAAACAGCTATGACGATAGGGCCTCTTCCACAGA-3′ (SEQ ID NO: 91). Fourteen clones were recovered and sequenced. Analysis of these sequences confirmed the 5′ end of the coronavirus genome. The SARS genomic sequences have been deposited into Genbank (Accession Nos. AY274119.1, AY274119.2, and AY274119.3).

While the invention has been described in connection with specific embodiments thereof, it will be understood that it is capable of further modifications and this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure that come within known or customary practice within the art to which the invention pertains, and may be applied to the essential features set forth herein and in the scope of the appended claims.

All patents, patent applications, and publications referred to herein are hereby incorporated by reference in their entirety to the same extent as if each individual patent, patent application, or publication was specifically and individually indicated to be incorporated by reference in its entirety.


Crystallization of metallic nanoparticles on short DNA oligonucleotides in alkaline aqueous solution

Silver, zinc, and copper nanoparticles were reductively crystallized on thymine-containing single-stranded DNA decanucleotides, in aqueous solutions containing the corresponding metal ions and sodium borohydride, at 4 °C, pH 11.8, and 15 mM of NaCl. Iron nanoparticles were not formed under these conditions. Formation of aggregates was monitored by UV-Vis spectrophotometry, and characterized by scanning electron microscopy and dynamic light scattering. The absorption spectra for most Ag + -containing samples showed a major band centered around 425 nm and a shoulder between 500 and 600 nm, indicative of the presence of silver particles in a wide size distribution, in the 0.1–5 μm range. This was verified by micrographs and size distribution observed in scattering profiles. Silver-particle morphology indicates formation of cubic structures instead of spherical geometries. For the case of zinc and copper nanoparticles, quasi-spherical structures of approximately 100 nm were crystallized along the DNA chains by reduction reaction, and then released to the bulk solvent. Thymine residues show a higher effectiveness in nucleation of metallic cations in comparison to other DNA nucleobases, particularly in their deprotonated form, at high pH values. Adenines flanking central runs of deprotonable bases in the model oligonucletide chain affect the production of silver crystals on the DNA decamers. pH value and NaCl concentration have a significant effect on the production of silver particles, as alkaline and moderate-ionic-strength conditions are required to crystallize such structures. The tuning of size and morphology of metallic nanoparticles by the proper choice of oligonucleotide sequence and physical conditions may have interesting applications in the field of nanobiotechnology.

This is a preview of subscription content, access via your institution.


Discussion

It has been proposed that CST is a telomere-specific RPA-like complex 12 . Recent structural studies by us and other groups demonstrated a close structural resemblance between Stn1-Ten1 and RPA32-RPA14 22, 23 . Although the solution structure of the DNA-binding OB fold of Cdc13 is available, the relationship between Cdc13 and RPA70 remains unclear due to the lack of structural information on other regions of Cdc13 and the lack of sequence similarity between Cdc13 and RPA. In this work, our bioinformatic and structural analyses provide the first direct evidence for the existence of multiple OB folds in Cdc13, which is characteristic of RPA70. The similarity between the Cdc13OB1-Pol1CBM and the RPA70N-p53 complexes further extends the parallel between Cdc13 and RPA70 (Figure 4C and 4D). However, despite these similarities, there are substantial differences between Cdc13 and RPA70. First, unlike Stn1-Ten1, none of the two structurally defined OB folds of Cdc13 show similarity to their counterparts in RPA70 outside the central β-barrel cores (Figure 2B) 25, 26 . Second, the two central OB folds of RPA70 are required for efficient DNA binding, whereas Cdc13 uses just its OB3 for binding 41 . These marked differences suggest that the resemblance between Cdc13 and RPA70 may be the result of convergent evolution. In other words, Cdc13 may not have evolved from the ancestral RPA70, but were instead recruited by the Stn1-Ten1 complex to provide single-stranded DNA-binding activity. In keeping with this idea, we found that Candida spp. Cdc13 proteins contain only two OB folds that correspond to the C-terminal half of Saccharomyces spp. proteins. In addition, the recently identified CTC1 proteins, the largest components in the human and plant CST complexes, are much larger proteins and show no sequence similarity to either Cdc13 or RPA70, supporting the disparate origins of these proteins 17, 18 . While we cannot rule out the possibility that a common origin for these proteins is obscured by extremely rapid evolutionary divergence, it seems clear that the structural and functional relationships between Cdc13/CTC1 and Stn1-Ten1 are quite distinct from those between RPA70 and RPA32–14.

One striking result of this study is that homodimerization appears to be a conserved feature of Cdc13. Except for CgCdc13, most Saccharomyces and Kluyveromyces Cdc13 proteins form dimers through their N-terminal OB1 domains. In contrast, homodimerization of Candida Cdc13 proteins and CgCdc13 is mediated by the C-terminal OB fold. The use of OB4 for dimerization by CgCdc13 is somewhat surprising, given the closer kinship of this yeast to Saccharomyces than to Candida spp. Perhaps this represents another case of convergent evolution. For example, an accidental loss of OB1 dimerization by CgCdc13 may have provided the selection pressure for the evolution of other dimerization mechanisms, resulting eventually in the utilization of OB4. The prevalence of Cdc13 dimerization suggests that this property may facilitate interaction of Cdc13 with multiple targets. For example, one established function of OB1 dimerization is to facilitate the interaction with Pol1 our mutagenesis data clearly showed that dimerization of ScCdc13 OB1 domain is required for Pol1 binding. The significance of OB4 dimerization is less clear. A possible function for the dimerization of this domain is suggested by the homodimerization of many telomere-binding proteins such as fission yeast Taz1 and human TRF1 and TRF2 42, 43, 44, 45, 46 . Because of the low intrinsic affinity of individual DNA-binding domains, these proteins require dimerization for stable telomere DNA interaction 42, 44, 46 . Thus, even though the S. cerevisiae Cdc13 can clearly bind DNA as a monomer, it is possible that dimerization of the smaller Cdc13 proteins in Candida spp. may enhance their DNA-binding activity. Indeed, we found recently that the OBDBD of CtCdc13 interacts weakly with the cognate telomere repeat and requires the OB4 domain for high-affinity DNA binding (EYY and NL, manuscript in preparation). Yet another potential function for Cdc13 dimerization is suggested by the reported multimerization of the telomerase complex. Although the data are somewhat inconclusive, both yeast and human telomerase have been proposed to function as dimmers 47, 48 . Because Cdc13 is known to interact with the Est1 component of yeast telomerase, dimerization of Cdc13 could help bring two telomerase complexes into close vicinity for proper function. Further studies are needed to test these possibilities and reveal the full functional significance of Cdc13 dimerization in regulating and maintaining budding yeast telomeres.


RESULTS

Validation of the WBS-PCR using endogenously expressed mRNAs

The WBS-PCR procedure ( Supplementary Figure S1 ) makes use of 40-μm sagittal cross-sections of a mouse, typically generated for biodistribution studies of novel radiolabelled therapeutic compounds involving Quantitative Whole-Body Autoradiography (QWBA) ( 14 ), the spatial resolution of a 1536 round-well plate and a potent tissue lysis buffer. By placing the mouse section on a pre-filled 1536-well plate and extracting the target from the exposed tissues by simply inverting the plate, the entire section is deconvoluted into separate tissue lysates, which can be subjected to down-stream analytical assays such as RT-qPCR. Typically, a mouse section covers ∼363 wells of the 1536-well plate which, when analyzed on a 384-well qPCR plate, reserves 20 wells for the inclusion of reference samples or standards. Subsequently, with the use of spreadsheets (i.e. Excel) and imaging software (i.e. Tissue View), the localization of the analyte is visualized by converting the qPCR signals into an image and overlaying them with a picture taken of the whole-body cross-section prior to the extraction procedure.

Because different endogenous RNAs and oligonucleotides (i.e. mRNA, miRNA, siRNA and antagomir) require different purification methods for their optimal isolation from tissues, we investigated the possibility to measure their levels directly in tissue lysates without the need for extensive purification steps. This approach minimizes the loss of signal intensity owing to biased purification procedures and allows for high through-put sample analysis using liquid handling robotics. Another advantage of this approach is that the lysates contain endogenous genomic DNA, which can be used to normalize the mRNA signals obtained in the RT-qPCR ( 15 , 16 ) without the need to analyse large panels of housekeeping genes ( 17 ). A potential disadvantage of this approach, on the other hand, is that most reagents used for preparing tissue lysates contain potent protein denaturants, high salt concentrations and chelating agents, which could inhibit or reduce the efficiency of the enzymatic reactions required for quantification ( 18 ). However, by taking advantage of the sensitivity of typical RT-qPCR based assays, the use of these reagents could be permitted by incorporating a dilution step to reduce or eliminate the inhibitory elements introduced during the extraction procedure. We have tested several lysis buffers and found that the Clarity OTX lysis buffer was the most compatible with RT-qPCR. To demonstrate this compatibility, we established the expression profiles of a small panel of tissue-specific mRNAs obtained by subjecting the diluted tissue lysates ( Figure 1 a, Supplementary Table S1 ) directly into the RT-qPCR reaction. We also measured the levels of genomic 18S in the lysates by qPCR and used these internal reference signals to normalize the obtained mRNA signals. The compatibility of the tissue lysis buffer with RT-qPCR was confirmed with several specific expression profiles: Insulin-like growth factor-binding protein 1 (IGFBP-1) mRNA was limited to liver and kidney ( 19 ), Myosin heavy chain 6 (Myh6) mRNA could only be detected in lysates prepared from heart and lung ( 20 ) and the mRNA encoding Myelin basic protein (Mbp) ( 21 ) could exclusively be detected in the brain. As expected, similar expression patterns were observed when RT-qPCR reactions were performed using purified total RNA ( Supplementary Figure S2 , Supplementary Table S2 ).

Biodistribution of endogenous and exogenous tissue-specific mRNAs. ( a ) The compatibility of the tissue lysis buffer was validated by characterizing the expression profiles of tissue-specific mRNAs: Insulin-like Growth Factor-Binding Protein 1 (IGFBP1), Myosin heavy chain 6 (Myh6) and Myelin Basic Protein (Mbp) in lysates prepared from Thymus (Th), Lung (Lu), Heart (H), Skeletal muscle (S.M), Kidney (K), Brain (B), Liver (Li) and Spleen (S). Signals were normalized against genomic 18S and the maximum averaged signal for each mRNA was set to 100%. ( b ) WBS-PCR biodistribution of the mRNAs characterized in (a). WBS (upper panel) represents the whole body section before lysis. Tissue annotation: Brain (B), Spinal Cord (Sc), Salivary gland (Sg), Heart (H), Liver (Li), Blood (Bd) and Gastro-Intestinal tract (GI). All signals were normalized against genomic 18S (Lower panel). ( c ) WBS-PCR biodistribution of human embryonic lethal, abnormal vision, Drosophila-like 1 (ELAVL1), human Actin Beat (Actb) and glyceraldehyde 3-phosphate dehydrogenase (GAPDH) mRNAs in a HCT-116 human tumour-bearing mouse. Tissue annotation in the WBS: Eye (E), Brain (B), Liver (Li), Gastro-Intestinal tract (GI) and HCT-116 tumour (T). All signals were normalized against genomic 18S (Lower panel).

Biodistribution of endogenous and exogenous tissue-specific mRNAs. ( a ) The compatibility of the tissue lysis buffer was validated by characterizing the expression profiles of tissue-specific mRNAs: Insulin-like Growth Factor-Binding Protein 1 (IGFBP1), Myosin heavy chain 6 (Myh6) and Myelin Basic Protein (Mbp) in lysates prepared from Thymus (Th), Lung (Lu), Heart (H), Skeletal muscle (S.M), Kidney (K), Brain (B), Liver (Li) and Spleen (S). Signals were normalized against genomic 18S and the maximum averaged signal for each mRNA was set to 100%. ( b ) WBS-PCR biodistribution of the mRNAs characterized in (a). WBS (upper panel) represents the whole body section before lysis. Tissue annotation: Brain (B), Spinal Cord (Sc), Salivary gland (Sg), Heart (H), Liver (Li), Blood (Bd) and Gastro-Intestinal tract (GI). All signals were normalized against genomic 18S (Lower panel). ( c ) WBS-PCR biodistribution of human embryonic lethal, abnormal vision, Drosophila-like 1 (ELAVL1), human Actin Beat (Actb) and glyceraldehyde 3-phosphate dehydrogenase (GAPDH) mRNAs in a HCT-116 human tumour-bearing mouse. Tissue annotation in the WBS: Eye (E), Brain (B), Liver (Li), Gastro-Intestinal tract (GI) and HCT-116 tumour (T). All signals were normalized against genomic 18S (Lower panel).

As a next step, the biodistribution of the same panel of mRNA targets was investigated using the WBS-PCR method ( Figure 1 b). The efficiency of the extraction procedure was monitored by measuring genomic 18S signals across the whole body section ( Figure 1 b, lower panel). As expected, genomic 18S displayed an overall homogeneous biodistribution pattern with some elevated levels observed in the brain. These elevated levels could be due to variations in tissue cell-densities and extraction efficiency. To compensate for these potential biases, all WBS-PCR signals are corrected against genomic 18S. Unlike genomic 18S, the mRNA targets displayed more distinct biodistribution patterns that corroborated the expression profiles obtained for the tissue homogenates in all cases. The biodistribution pattern obtained for IGFBP-1 mRNA co-localized with the liver, whereas Myh6 mRNA signals co-localized only with the heart region. Interestingly, the Mbp biodistribution pattern confirmed the expression of mRNA in brain but interestingly also in regions along the spinal cord. Intrigued with this observation, the RT-qPCR reaction was repeated on purified total RNA ( Supplementary Figure S2 and Supplementary Table S2 ), and the presence of Mbp mRNA in both brain and spinal cord could be confirmed. This observation illustrates the power of the WBS-PCR in visualizing the biodistribution of uncharacterized mRNAs.

To further validate the optical resolution that could be achieved with the WBS-PCR, we investigated the biodistribution pattern of human mRNAs encoding ELAVL1 (embryonic lethal, abnormal vision, Drosophila-like 1), Actb (Beta actin) and GAPDH (glyceraldehyde-3-phosphate dehydrogenase) in a section obtained from a human tumour (HCT-116)-bearing mouse ( Figure 1 c) using human specific RT-qPCR primers. As expected, the biodistribution pattern for the human mRNAs co-localized exactly with the region of tumour growth. Because the 18S primers recognize both human and mouse sequences, genomic 18S could be detected across the section with elevated levels in the tumour region ( Figure 1 c, lower panel). Taken together, these data indicate that the deconvolution of a whole-body mouse section into 363 separate samples followed by the re-assembly of the RT-qPCR signals into an image can generate accurate interpretable biodistribution data and may provide more detailed expression information than can be derived from individually isolated tissues.

Whole body scanning PCR for the localization of miRNAs and siRNAs

The RNA biodistribution data reported above were obtained using the highly specific TaqMan assay, which relies on the specific amplification of a target gene using two primers and the hydrolysis of a TaqMan probe. Although similar probe-based assays are described for the detection of siRNAs ( 22 , 23 ), we set out to develop an RT-qPCR assay with the aim of reducing the amount of sample handling steps without compromising assay specificity or sensitivity ( Figure 2 a). Like most assays, the first step relies on the classical conversion of RNA into cDNA by reverse transcriptase-mediated primer extension ( 24 ). In the second step, a qPCR-mix is directly added to the heat-inactivated reverse-transcription reaction followed by an anti-primer quenching-based real-time PCR reaction ( 25 ). Real-time fluorescent quantification takes place during the extension cycle of the PCR reaction. During this cycle, the un-incorporated forward primers will be quenched by the quencher-labelled anti-primer whereas forward primers that constitute the double-stranded PCR products will not. An exponentially increasing fluorescent signal will be the result of successful target amplification. Because miRNAs and siRNAs are similar in size and hence pose similar challenges with respect to designing reliable primers, the specificity of the assay was evaluated in a typical matrix study whereby the level of cross-reactivity was measured against a panel of closely related Let-7 miRNA family members ( Figure 2 b). Taken into account that the perfectly matched target was arbitrarily set at 100%, the highest cross-reactivity (∼3%) was observed only when Let-7d and Let-7b levels were determined using primers designed against Let-7a or Let-7c, respectively. All other template/primer combinations resulted in minimal cross-reactivity ranging between 0 and 1%. Similar levels of specificity were reported using commercially available TaqMan miRNA assays ( 26 ). In addition to this, all the Let-7 assays displayed a large dynamic range (8–10 logs) with a lower limit of detection ranging from 0.02 to 0.002 femtogram, highlighting the level of sensitivity that can be achieved using this RT-qPCR assay ( Supplementary Figure S3 ). The performance of the RT-qPCR assay was further characterized by measuring the relative expression levels of a panel of miRNAs in the same mouse organ homogenates described previously ( Figure 2 c and Supplementary Table S3 ). As expected, detection of the well characterized heart- and muscle-specific miRNAs (e.g. miR-208a, miR-1a, miR-133a), liver-specific (e.g. miR-122) and brain-enriched miRNAs (e.g. miR-124, miR-127, miR-128, miR-132, miR-137, miR-139) in the appropriate organs ( 27–29 ) confirms the compatibility of our method with measuring miRNAs directly in tissue lysates. In addition to this, RT-qPCR analysis of miRNA 122-5p, 208a-3p, 124-3p, 124-5p, 191-5p and 16-5p using total RNA as input in the reaction resulted in similar expression profiles ( Supplementary Figure S4 and Supplementary Table S4 ). Taken together, the data strongly suggest that the lysis procedure is indeed compatible with the specific detection of tissue-derived miRNAs.

Characterization and validation of the miRNA RT-qPCR method. ( a ) RT-qPCR assay for miRNAs and siRNAs. The RT/Reverse primer is used to generate a cDNA copy of the RNA template during the Reverse Transcription step. Real-time fluorescent quantification occurs during the qPCR step with the help of a fluorescently labelled Forward primer hybridizing to the elongated RT/Reverse primer. Signals generated by non-incorporated forward primers are quenched by a quencher-labelled anti-primer. ( b ) The specificity of the RT-qPCR method was verified by analysing the back-ground signal intensity generated by let-7 miRNA-specific primers using either Let-7a, -7b, -7c, -7d, -7e, -7f, -7g, -7i as template in the reaction. Average values for the perfectly matched targets were set at 100%. Error bars, STDEV ( n = 4). ( c ) miRNA expression profile in diluted tissue homogenates: Thymus (Th), Lung (Lu), Heart (H), Skeletal Muscle (S.M), Kidney (K), Brain (B), Liver (Li) and Spleen (S). Signals were normalized against genomic 18S, and the maximum averaged signal for each mRNA was set to 100%.

Characterization and validation of the miRNA RT-qPCR method. ( a ) RT-qPCR assay for miRNAs and siRNAs. The RT/Reverse primer is used to generate a cDNA copy of the RNA template during the Reverse Transcription step. Real-time fluorescent quantification occurs during the qPCR step with the help of a fluorescently labelled Forward primer hybridizing to the elongated RT/Reverse primer. Signals generated by non-incorporated forward primers are quenched by a quencher-labelled anti-primer. ( b ) The specificity of the RT-qPCR method was verified by analysing the back-ground signal intensity generated by let-7 miRNA-specific primers using either Let-7a, -7b, -7c, -7d, -7e, -7f, -7g, -7i as template in the reaction. Average values for the perfectly matched targets were set at 100%. Error bars, STDEV ( n = 4). ( c ) miRNA expression profile in diluted tissue homogenates: Thymus (Th), Lung (Lu), Heart (H), Skeletal Muscle (S.M), Kidney (K), Brain (B), Liver (Li) and Spleen (S). Signals were normalized against genomic 18S, and the maximum averaged signal for each mRNA was set to 100%.

The specificity of the RT-qPCR assay was further supported by the biodistribution patterns obtained for the tissue-specific miRNAs miR-122 (liver), miR-208a (heart) and miR-124-3p/-5p (brain), and the ubiquitously expressed miRNAs miR-191 and miR-16, using the WBS-PCR method ( Figure 3 a). Again, genomic 18S was used to confirm extraction efficiency and to normalize all miRNA signals ( Figure 3 , lower panel). Interestingly, the expression of miR-124-3p and miR-124-5p, both processed from the same pre-miRNA, displayed very similar brain-specific biodistribution patterns.

Biodistribution of endogenous tissue-specific miRNAs and an exogenous tritium-labelled siRNA. ( a ) Biodistribution of tissue-specific miRNAs using WBS-PCR. Tissue annotation: Brain (B), Heart (H), Liver (Li), Stomach (St) and Gastro-Intestinal tract (GI). All signals were normalized against genomic 18S (Lower panel). ( b ) Biodistribution of tritium-labelled unformulated siRNA targeting Mrp4 by phosphor imaging (QWBA) and WBS-PCR (Mrp4 siRNA) at 10 min post dosing (the whiter the area in the autoradiograms using QWBA, the higher the concentration of radioactivity) as well as the WBS-PCR visualization of the biodistribution obtained for miR-191. All miRNA and siRNA signals were normalized against genomic 18S (Lower panel). ( c ) Detection limit of the RT-qPCR assay using 3′-truncated (upper panel), 5′-truncated (middle panel) and 3′/5′-truncated (lower panel) Mrp4 anti-sense strands as template in the reaction.

Biodistribution of endogenous tissue-specific miRNAs and an exogenous tritium-labelled siRNA. ( a ) Biodistribution of tissue-specific miRNAs using WBS-PCR. Tissue annotation: Brain (B), Heart (H), Liver (Li), Stomach (St) and Gastro-Intestinal tract (GI). All signals were normalized against genomic 18S (Lower panel). ( b ) Biodistribution of tritium-labelled unformulated siRNA targeting Mrp4 by phosphor imaging (QWBA) and WBS-PCR (Mrp4 siRNA) at 10 min post dosing (the whiter the area in the autoradiograms using QWBA, the higher the concentration of radioactivity) as well as the WBS-PCR visualization of the biodistribution obtained for miR-191. All miRNA and siRNA signals were normalized against genomic 18S (Lower panel). ( c ) Detection limit of the RT-qPCR assay using 3′-truncated (upper panel), 5′-truncated (middle panel) and 3′/5′-truncated (lower panel) Mrp4 anti-sense strands as template in the reaction.

As a more stringent test, we compared the biodistribution pattern of the WBS-PCR method with the traditional quantitative whole-body autoradiograph (QWBA) obtained from a mouse intravenously dosed with an unformulated, tritium labelled siRNA targeting the rat multidrug resistance protein 4 (Mrp4 siRNA) ( 10 ) ( Figure 3 b and Supplementary Figure S5 ). Analysis of a mouse section, sacrificed 10 min post intravenous dosing, revealed that the high levels of total radiolabelled components observed in kidney and liver, represented by the white regions in the autoradiograms using QWBA, were also present in the biodistribution pattern obtained using the WBS-PCR method. Despite the presence of radioactivity in the salivary gland, the WBS-PCR failed to confirm the presence of siRNA in this organ. Lack of siRNA RT-qPCR signal could not be ascribed to poor extraction efficiency due to the clear presence of both genomic 18S and miR-191 in this region. Indeed, LC-MS analysis confirmed that the radioactive signal observed in the salivary gland did not represent that of intact siRNA but rather that of the tritiated metabolite mono-Uridine (data not shown). To investigate the limitation of the RT-qPCR with respect to detecting metabolites, we generated various truncations of the anti-sense strand of the siRNA duplex, spiked them into rat total RNA and determined for each template the lowest amount of target that could still be detectable by RT-qPCR ( Figure 3 c). As expected, deletion of more than two nucleotides at either the 5′ or 3′ end of the target sequence results in a rapid decrease in sensitivity of the RT-qPCR assay. Hence, metabolites such as mono-Uridine would also not be detected by RT-qPCR. Taken together these data confirm that the signals obtained in the WBS-PCR originate from either full length RNA or minimally truncated metabolites. Additionally, lack of detection of metabolites, such as mono-Uridine, provides a signal specificity, whereas the classical QWBA approach provides the detection of total radioactivity from parent compounds and metabolites.

Whole body scanning PCR for the localization of chemically modified oligonucleotides

Most therapeutic oligonucleotides contain various degrees of chemical modifications to confer appropriate characteristics such as nuclease resistance, affinity, specificity, safety, distribution and cellular uptake. Although chemical modifications can improve the pharmacokinetic and dynamic properties of these oligonucleotides, they also pose analytical challenges. For example, a 2′- o -Methyl-modified anti-miRNA-16 oligonucleotide (AMO-miR-16) is readily detectable by RT-qPCR ( 30 ) whereas a 2′- o -(2-methoxyethyl)-modified sequence is not ( Supplementary Figure S6 ). For this reason, we developed the CL-qPCR assay a two-step assay that uses qPCR to quantify the amount of product formed in a self-directed chemical-ligation of two oligodeoxynucleotides templated by the fully complementary analyte ( 11 , 31 ) ( Supplementary Figure S7a ), independently of the sequence ( Supplementary Figure S7b ) or the chemical modifications present in the template ( Supplementary Figure S8 ). During the first step, binding of the two DNA oligonucleotides, side-by-side, on the target initiates the reaction of the nucleophilic phosphorothioate group, located on the 3′-end of one oligonucleotide, with the electrophilic carbon at the 5′-end of the adjacent oligonucleotide. This proximity-dependent reaction results in displacement of the sulphonate leaving group and the formation of a carbon–sulphur bond, covalently linking the two oligonucleotides into a single unique DNA-oligonucleotide ( Figure 4 a), which can be quantified by PCR, despite the carbon–sulphur bond being somewhat longer than the carbon–oxygen bond of regular DNA.

Validation of the biodistribution obtained for AMO-miR-16 using CL-qPCR. ( a ) Chemical reaction that occurs during the side-by-side target hybridization of a DNA-oligo containing a 3′-phosphorothioate group and a DNA-oligo containing a 5′-biphenylsulphonlyl group. The template-mediated, proximity-dependent reaction results in displacement of the biphenylsulphonyl leaving group and the formation of a carbon–sulphur bond. (BASE represents Adenosine, Guanosine, Thymidine or Cytidine). ( b ) Quantification of AMO-miR-16 in plasma isolated from AMO-miR-16- (black) and PBS-treated (grey) mice. Values are the average of triplicate measurements from five animals. Error bars, s.e.m ( n = 15). * P < 0.05 (Mann–Whitney Rank Sum test). ( c ) The biodistribution of miR-16 and AMO-miR-16 were determined in mouse whole body sections, treated either with AMO-miR-16 (left panel) or with PBS (right panel) and normalized against genomic 18S. Tissue annotation: Eye (E), Brain (B), Lung (Lu), Heart (H), Liver (Li), Stomach (St), Kidney (K), Bone (Bo) and Gastro-Intestinal tract (GI). ( d–f ) Quantification of AMO-miR-16 (d), miR-16 (e) and miR-191 (f) in tissues isolated from mice treated either with AMO-miR-16 (black) or PBS (grey). Values are the average of triplicate measurements from four animals. Error bars, s.e.m. ( n = 12). * P < 0.05 [Mann–Whitney Rank Sum test (d and e), t -test (f), n.s.: not significant].

Validation of the biodistribution obtained for AMO-miR-16 using CL-qPCR. ( a ) Chemical reaction that occurs during the side-by-side target hybridization of a DNA-oligo containing a 3′-phosphorothioate group and a DNA-oligo containing a 5′-biphenylsulphonlyl group. The template-mediated, proximity-dependent reaction results in displacement of the biphenylsulphonyl leaving group and the formation of a carbon–sulphur bond. (BASE represents Adenosine, Guanosine, Thymidine or Cytidine). ( b ) Quantification of AMO-miR-16 in plasma isolated from AMO-miR-16- (black) and PBS-treated (grey) mice. Values are the average of triplicate measurements from five animals. Error bars, s.e.m ( n = 15). * P < 0.05 (Mann–Whitney Rank Sum test). ( c ) The biodistribution of miR-16 and AMO-miR-16 were determined in mouse whole body sections, treated either with AMO-miR-16 (left panel) or with PBS (right panel) and normalized against genomic 18S. Tissue annotation: Eye (E), Brain (B), Lung (Lu), Heart (H), Liver (Li), Stomach (St), Kidney (K), Bone (Bo) and Gastro-Intestinal tract (GI). ( d–f ) Quantification of AMO-miR-16 (d), miR-16 (e) and miR-191 (f) in tissues isolated from mice treated either with AMO-miR-16 (black) or PBS (grey). Values are the average of triplicate measurements from four animals. Error bars, s.e.m. ( n = 12). * P < 0.05 [Mann–Whitney Rank Sum test (d and e), t -test (f), n.s.: not significant].

Next, we investigated the biodistribution of the MOE-modified AMO-miR-16 using CL-qPCR. For this, two groups of mice, each consisting of five animals, were dosed intravenously with either AMO-miR-16 (80 mg/kg) or PBS. Plasma samples were collected at various time points over a period of 24 h, and AMO-miR-16 levels were quantified using the CL-qPCR method. As expected, AMO-miR-16 was rapidly cleared from the plasma reaching undetectable levels within 1 h post dosing ( Figure 4 b and Supplementary Figure S9 ). Despite the rapid plasma clearance, WBS-PCR on sections obtained 24 h post dosing revealed a broad biodistribution pattern of AMO-miR-16, suggesting extensive uptake of AMO-miR-16 in a wide variety of tissues ( Figure 4 c). The highest levels of AMO-miR-16 could be observed in the kidney whereas no signal could be detected in samples co-localizing with the brain. Interestingly, a 20-mer MOE-modified PO oligonucleotide targeting human intercellular adhesion molecule-1 mRNA dosed in rats displays a similar plasma pharmacokinetic profile and biodistribution pattern, confirming the performance of the CL-qPCR ( 32 ). In addition to this, dosing of AMO-miR-16 lowered the global biodistribution levels of miR-16 in the AMO-miR-16-treated animal compared with those in the PBS-treated animal. The biodistribution of genomic 18S, on the other hand, was very similar between the two animals and did not suggest that lack of miR-16 could be due to poor extraction efficiencies. To confirm the WBS-PCR observation, AMO-miR-16 ( Figure 4 d and Supplementary Figure S10 ) and miR16 levels ( Figure 4 e and Supplementary Figure S11 ) were measured in the organs isolated from the remaining four animals. Indeed, quantification of AMO-miR-16 by CL-qPCR confirmed the presence of compound in the kidney, liver, lung and spleen but not in the brain of the treated animals. The presence of AMO-miR-16 also significantly reduced miR-16 levels in these tissues whereas miR-191 levels ( Figure 4 f and Supplementary Figure S12 ) remained unaffected, suggesting that the loss of miR-16 could indeed be the result of AMO binding. Taken together, these data confirm both the robustness of the CL-qPCR method in the quantification of heavily chemically modified oligonucleotides in biological samples and its application in biodistribution studies using WBS-PCR.


Effect of structure levels on surface-enhanced Raman scattering of human telomeric G-quadruplexes in diluted and crowded media

Human telomeric G-quadruplexes are emerging targets in anticancer drug discovery since they are able to efficiently inhibit telomerase, an enzyme which is greatly involved in telomere instability and immortalization process in malignant cells. G-quadruplex (G4) DNA is highly polymorphic and can adopt different topologies upon addition of electrolytes, additives, and ligands. The study of G-quadruplex forms under various conditions, however, might be quite challenging. In this work, surface-enhanced Raman scattering (SERS) spectroscopy has been applied to study G-quadruplexes formed by human telomeric sequences, d[A3G3(TTAGGG)3A2] (Tel26) and d[(TTAGGG)4T2] (wtTel26), under dilute and crowding conditions. The SERS spectra distinctive of hybrid-1 and hybrid-2 G-quadruplexes of Tel26 and wtTel26, respectively, were observed for the sequences folded in the presence of K + ions (110 mM) in a buffered solution, representing the diluted medium. Polyethylene glycol (5, 10, 15, 20, and 40% v/v PEG) was used to create a molecular-crowded environment, resulting in the formation of the parallel G-quadruplexes of both studied human telomeric sequences. Despite extensive overlap by the crowding agent bands, the SERS spectral features indicative of parallel G4 form of Tel26 were recognized. The obtained results implied that SERS of G-quadruplexes reflected not only the primary structure of the studied human telomeric sequence, including its nucleobase composition and sequence, but also its secondary structure in the sense of Hoogsteen hydrogen bonds responsible for the guanine tetrad formation, and finally its tertiary structure, defining a three-dimensional DNA shape, positioned close to the enhancing metallic surface.

This is a preview of subscription content, access via your institution.


What is the charge on oligonucleotide 5' pGpGpApCpT 3' at pH 7.00? - Biology

Open Journal of Applied Biosensor Vol.03 No.02(2014), Article ID:46020,8 pages
10.4236/ojab.2014.32002

Detection of Breast Cancer 1 (BRCA1) Gene Using an Electrochemical DNA Biosensor Based on Immobilized ZnO Nanowires

Nur Azimah Mansor 1 , Zainiharyati Mohd Zain 1 , Hairul Hisham Hamzah 1 , Mohd Shihabuddin Ahmad Noorden 2,3 , Siti Safura Jaapar 2 , Valerio Beni 4 , Zafar Husain Ibupoto 5

1 Faculty of Applied Sciences, Universiti Teknologi MARA, Shah Alam, Malaysia

2 Faculty of Pharmacy, Universiti Teknologi MARA, Puncak Alam, Malaysia

3 Atta-ur-Rahman Institute for Natural Product Discovery, Universiti Teknologi MARA, Puncak Alam, Malaysia

4 Biosensors and Bioelectronics Centre, Department of Physics, Chemistry, and Biology, Linköping University, Linköping, Sweden

5 Physical Electronics and Nanotechnology Division, Department of Science and Technology (ITN), Campus Norrköping, Linköping University, Norrköping, Sweden

Copyright © 2014 by authors and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY).

Received 14 February 2014 revised 1 April 2014 accepted 8 April 2014

Herein we report an electrochemical DNA biosensor for the rapid detection of sequence (5’ AAT GGA TTT ATC TGC TCT TCG 3’) specific for the breast cancer 1 (BRCA1) gene. The proposed electrochemical genosensor is based on short oligonucleotide DNA probe immobilized onto zinc oxide nanowires (ZnONWs) chemically synthesized onto gold electrode via hydrothermal technique. The morphology studies of the ZnONWs, performed by field emission scanning electron microscopy (FESEM), showed that the ZnO nanowires are uniform, highly dense and oriented perpendicularly to the substrate. Recognition event between the DNA probe and the target was investigated by differential pulse voltammetry (DPV) in 0.1 M acetate buffer solution (ABS), pH 7.00 as a result of the hybridization, an oxidation signal was observed at +0.8 V. The influences of pH, target concentration, and non-complimentary DNA on biosensor performance were examined. The proposed DNA biosensor has the ability to detect the target sequence in the range of concentration between 10.0 and 100.0 µM with a detection limit of 3.32 µM. The experimental results demonstrated that the prepared ZnONWs/Au electrodes are suitable platform for the immobilization of DNA.

Zinc Oxide Nanowires, DNA Biosensor, Breast Cancer Gene, BRCA1, DNA Hybridization, Differential

Breast cancer, that affects mainly inner lining of milk ducts, is one of the major causes of death of our times. Distant metastases are regarded as the major reason of death, so an early stage diagnosis, and subsequently treatment, of the cancer is highly required [1] . Breast cancer has been associated to various gene mutations but the most common mutations, all associated to the activation of breast cancer cells, are BRCA1, BRCA2 and p53 which occurred in the error-correcting mechanisms. These mutations are either inherited or acquired after birth and usually they promote other mutations, which led to increasing speed in cells division, lack of attachment and metastasis to distant organs [2] [3] . Breast cancer 1 (BRCA1) is a tumor suppressor gene where the BRCA1 protein is involved in the prevention of fast and uncontrolled growth of cells. Mutations in this gene make the BRCA1 protein unable to repair damaged DNA leading in this way cells to grow uncontrollably and form tumor. The BRCA1 gene was identified in 1994 this is located in the region from 38.449.840 to 38.530.994 based pair on band 21 of the chromosome 17 [4] . This gene is responsible for 40% of risk of breast cancers to the carrier [5] [6] . Subsequently, these point mutations have been proposed as possible biomarkers in the early detection and screening of breast cancer.

Ultimately site specific sensing platforms are strongly needed for early detection of breast cancer gene during biopsy. This is due to the fact that the existing nucleic acid (DNA) detection techniques, as northern blotting, ribonuclease protection assays, reverse transcription-polymerase chain reaction (RT-PCR) and DNA sequencing have several limitations, including low sensitivity, poor selectivity, expensive and nonlinearity to the target strength [7] [8] . Therefore, highly specific, rapid, inexpensive and convenient breast cancer gene detection systems will be required to support clinician for monitor early disease progressions of disease in suspected individuals. This will also allow that a more appropriate cancer treatment can be planned out to minimize toxicity according to the specific tumor cell type [9] . The development of DNA based sensing can be an alternative screening to a guide treatment. In recent years, electrochemical DNA biosensors have received much attention providing simple, accurate and rapid responses, high sensitivity, inherent selectivity, and being inexpensive platform for molecular detection [10] .

Biosensor is a compact analytical device having a biological recognition element intimately integrated with a physio-chemical transducer. The three main components of a biosensor are: the biological recognition elements such as enzyme, DNA, antibody, and nucleic acid, the transducer that converts the biological recognition event into a quantifiable signal and finally the signal processing system. The five principle transducer classes are optical, thermometric, electrochemical, piezoelectric, and magnetic. Electrochemical transduction due to its better sensitivity, selectivity, reproducibility, and easy maintenance as well as low cost, has gained a lot of popularity in DNA biosensors [11] - [13] . Electrochemical biosensors also played an important role in the transition towards point-of-care diagnostic device as demonstrated by the world wide distributed glucosensors.

The basic principle of DNA biosensor usually relies on the immobilization of single stranded oligonucleotide (ssDNA) probe on the surface of an electrode providing in this way the specificity towards a specific target DNA. Recognition event, based on DNA hybridization, is then converted into readable signal by transducer.

The performance of an electrochemical genosensor relies heavily on the properties of the supporting materials these should provide a good environment for DNA immobilization, without compromising its biological activity, and providing good transduction abilities [14] . Currently the use of nanomaterials in the biosensor field has become one of the top research interests among the scientific community due to the fact that these offer many advantages, such as large surface-to-volume ratio, high surface reaction activity and fast electron communication. Subsequently the electrochemical sensor designed around them has been shown to have low limit of detection and to allow detection of analyte from small volumes [15] . Moreover, the coupling of electrochemical devices and nanoscale materials offers a unique multiplexing capability for simultaneous measurements of multiple bio-markers [10] . Among these nanomaterials, zinc oxide (ZnO) nanostructures provide a solid platform for the biosensing. ZnO exhibits excellent properties for the fabrication of biosensors, such as good biocompatibility, chemical stability, non-toxicity, and fast electron transfer rate [16] . Most of all, the ZnO behaves as an excellent transducer because of its high isoelectric point (IEP) value, approximately 9.5, that facilitates the immobilization of low isoelectric IEP DNA or proteins [17] . Among many nano-structured ZnO, the ZnO nanowires (ZnONWs) have been widely studied for the immobilization of biomolecules [18] . These nanostructures present some unique chemical and physical properties caused by nanoscale effect which provide new opportunities for the de- veloping of DNA biosensor.

Recently, reports on electrochemical genosensors for the detection of breast cancer genes and cells have been reported [19] [20] . However, up to date, the developer of DNA biosensor for the detection of breast cancer gene (BRCA1) based on modified gold electrode has not yet reported. The main objective of the work reported herein was to demonstrate that ZnO nanowires, chemically grown onto Au substrate, could provide a suitable platform for the development of electrochemical DNA biosensor for the detection of (BRCA1) gene. Furthermore this concept could be further elaborated into array format for the multiplexed detection of other breast cancer genes such as BRCA2, and p53 and can also be applied on real samples of breast cancer cells.

Electrochemical measurements were performed at room temperature using AUTOLAB-PGSTAT 302N (Echo Chemie, The Netherlands). The three electrodes electrochemical systems consisted of: ZnONWs/Au as working electrode, a platinum wire as auxiliary electrode, and an Ag/AgCl as reference electrode. The DPV were performed in 15 mL of 0.1 M acetate buffer solution (ABS) (pH 7.00) from 0.4 V to 1.16 V with a modulation amplitude = 0.025 V, an interval time (t1) = 0.5 s, modulation time (t2) = 0.05 s, step potential = 0.005 V and a scan rate of 0.01 Vs −1 . The acetate buffer solution was purged with nitrogen gas for 20 min prior to each experiment. The ZnONWs/Au electrode surface was characterized by field emission scanning electron microscope (FESEM).

Zinc nitrate hexahydrate, Zn (NO3)2・6H2O, hexamethylenetetramine, C6H12N4, and zinc acetate dehydrate, Zn (CH3COO)2∙2H2O were purchased from Sigma Aldrich, Germany. The 21-base pair single stranded DNAs were purchased from First BASE (IDT Inc, USA). The sequences of the different oligonucleotide are as follows (underlined are the mismatched bases):

Probe DNA: 5’ AAT GGA TTT ATC TGC TCT TCG 3’

Target DNA: 5’ CGA AGA GCA GAT AAA TCC ATT 3’

Three-base mismatch: 5’ CGA AGA GGA GAA AAA TCG ATT 3’ 5’ CAA AGA GCA GAT AGA TCC GTT 3’

The oligonucleotide stock solutions (100 µM) were prepared with deionized water and stored at 4˚C when not in use. Acetate buffer solution (ABS) of 0.1M concentration with different pH values were prepared by mixing different volumes of 0.1 M acetic acid and sodium acetate with deionized water.

2.3. Synthesis of Zinc Oxide Nanowires (ZnONWs) onto the Surface of Gold Electrode

The growth of ZnO nanowires was performed by hydrothermal method. Firstly, a gold electrode/substrate on silicon wafer purchased from Sigma Aldrich, Stockholm, Sweden was cleaned with isopropanol, washed with deionized water and finally dried at room temperature. As first step of the synthesis, a seed layer of zinc acetate dihydrate was prepared by spin coated for three times at 2500 r.p.m for 30 s onto the gold substrate. Following spin coating the substrate was annealed at 120˚C for 10 - 20 mins. The seed particles containing electrodes were affixed onto a Teflon sample holder and then immersed in the growth solution of (0.075 M) zinc nitrate hexahydrate and (0.075 M) hexamethylenetetramine at 98˚C for 9 hours. After the growth of ZnO is completed, the grown nanostructures were washed with deionized water in order to remove eventual residual particles. Finally the ZnONWs/Au electrodes were left to dry at room temperature.

2.4. Immobilization of Probe ssDNA onto the Surface of ZnONWs/Au Electrode

40 µL containing 80 µM of ssDNA molecules (probe) were drop casted onto the ZnONWs/Au electrode. The ssDNA molecules were left for immobilization onto ZnO NWs/Au electrode for 2 hours. Following the prepared electrodes were rinsed with ABS for 5 s to remove the unbound probes. The resulting electrode was labeled as ssDNA/ZnONWs/Au.

2.5. Hybridization of ssDNA

Hybridization assay was performed by spotting onto the sensor surface 40 µL of a solution containing the desired concentration, 80 µM of the complementary ssDNA, mismatches ssDNA and non-complementary ssDNA respectively. The hybridization was left to take place for 30 min. following the electrode was then rinsed with ABS for 5 s to remove the non-hybridized targets ssDNA.

3.1. Morphologies of ZnONWs Modified Gold Electrode and Immobilized ssDNA onto the Surface of ZnONWs Modified Gold Electrode

In Figure 1(A) a typical FESEM image of the successfully hydrothermally grown ZnONWs onto gold surface electrode is presented. It is clearly seen that the overall grown ZnONWs are highly dense and vertically aligned with hexagonal faces to the surface of gold electrode. In addition, the size of the nanohexagones was estimated to be in the range between 450 and 550 nm with uniform density and spatial distribution. In principle, the diameter and length of grown ZnONWs can be tuned by a fine control over the growth process parameters such as the concentration of seed solution, the reagent stoichiometry, the temperature, and the pH of the growth solution [21] [22] . Moreover, the FESEM image following the immobilization of the ssDNA probe is shown in Figure 1(B). As clearly shown in this figure, the immobilization of the ssDNA onto the ZnONWs modified electrode resulted in a homogenous coverage. It can be concluded that the ZnONWs structure provided a good platform for the immobilization of the ssDNA.

Figure 1 . FESEM image of grown zinc oxide nanowires (ZnONWs) at the surface of gold electrode (A), and immobilized ssDNA onto the surface ZnONWs modified gold electrode (B).

3.2. Conformation of Immobilization of Probe ssDNA and Hybridization of Target ssDNA onto the ZnO NWs Modified Gold Electrode.

The hybridization of the immobilized ssDNA probe with the complementary ssDNA (target) was studied by differential pulse voltammetry. Figure 2 shows the DPV responses obtained for different sensing surfaces (curves c, e and f) and after hybridization with different DNA targets (curves a, b and d). No visible peaks were recorded when DPV were conducted at Au electrode (curve f) on the other hand when voltammetric measurements were performed onto the ZnONWs/Au electrode modified with the ssDNA (as from section 2.4) a clear oxidation peak at 0.94 V and a shoulder at ca. 0.62 V were recorded. These two peaks were probably associated with the oxidation of the Adenine and Guanine (0.94 V) bases of the DNA probe. As it can be observed from curve a, the hybridization with the complementary target resulted in a large broader peak at ca. 0.8 V [23] [24] . No peaks were recorded when hybridization were performed with mismatched sequences (curves b and d) confirming the specificity of the developed genosensor. Furthermore, as from curve c, this proved that the presence of the ssDNA probe is crucial to perform DNA hybridation with its target because we observed no electrochemical response in its absence. The results presented above indicated that the hybridization process at the developed sensor surface is very selective which took place only in the presence of fully complementary target. On top of that, a significant electrochemical response was recorded when complementary DNA is hybridized onto the sensor surface.

3.3. Effect of DNA Target Loading on DNA Biosensor Response

Figure 3 represents the DNA biosensor response with various concentrations of DNA target, hybridized for 90 min. at the sensor surface. Well defined hybridization signals were observed over the concentration range of 10 - 100 µM of target DNA. Upon further increase of target concentration (>100 µM), the hybridization of DNA signal slightly decreased indicating that the saturation of surface has been reached. Thus, 100 µM was chosen as the optimum target concentration on the hybridization reaction of immobilized ssDNA at the surface of ZnO NWs/Au electrode. This is attributed to full surface coverage of ZnONWs/Au electrode at this concentration.

Figure 2 . Differential pulse voltamograms (DPVs) that were obtained from immobilization of 80 µM ssDNA onto the surface of ZnONWs modified gold electrode after the probes were exposed to 100µM complimentary or target DNA (a), 3 mismatches or non-complimentary ssDNA (b) and (d), without ssDNA immobilization (c), for bare gold electrode (e) and ZnO NWs modified gold electrode-ssDNA (f). The DPVs were measured at a potential scanned between 0.4 to 1.16 V with a modulation amplitude = 0.025 V, interval time (t1) = 0.5 s, modulation time (t2) = 0.05 s and step potential = 0.005 V in 0.1 M acetate buffer solution at pH 7.

3.4. Dependence of Biosensor Response on pH

Figure 4 illustrates the effect of pH on the response of the sensor. In this set of experiments the genosenor prepared according to the protocol described in section 2.4 was hybridized for 90 min. with 80 µM and DPV response recorded for different pH values. The recorded response presented a maximum at pH 7 indicating the optimal pH for DNA detection. The increased response of DNA biosensor at pH 7 can be described to a more efficient hybridization process. At more acidic condition, the protonation reaction of the phosphodiester of the DNA can reduce the solubility of the DNA molecule, which eventually decreases the DNA hybridization. In addition, under a more basic medium, decreases of DNA hybridization process occurred and therefore a tendency to produce a lower DPV current.

3.5. Calibration Characteristic and the Stability of DNA Biosensor

Figure 5 demonstrates the effect of different complementary DNA concentrations on the biosensor response. The linear response was obtained when the BRCA1 concentrations is within the range of 10 - 100 µM. The current response increased linearly (R 2 = 0.994) with a sensitivity of 6.36 µA∙µM −1 whereas bare gold electrode showed lower sensitivity (1.97 µA∙µM −1 ). The limit of detection (LOD) of BRCA1 of the developed biosensor-based ZnONWs modified gold electrode, calculated as the signal of the blank plus three time the standard deviation [25] , was found to be 3.32 µM. An estimate LOD of 3.79 × 10 −7 M was reported by Li and co-workers [3] for the label free DNA hybridization for BRCA1 based upon single walled carbon nanotube modified screen printed graphite electrodes. This can be understood that the application of ZnONWs on our modified gold electrode for DNA hybridization to the target of BRCA1 tends to give a great sensitivity in our developed biosensor.

The ZnONWs modified gold electrode revealed to be an excellent electrode substrate for the development of a sensitive DNA biosensor for the detection of breast cancer gene BRCA1. The FESEM image showed that nanostructure has been successfully grown at the surface of the gold electrode with an average diameter between 450 - 550 nm and with a good orientation. Furthermore we demonstrated that direct electrochemistry of DNA via the use of DPV measurement, was suitable for the selective detection of short DNA sequence associated with the BRCA1 gene, when DNA probes were chemisorbed onto the ZnONWs/Au surface. Presence of complementary DNA sequence resulted in a well-defined oxidation peak at ca. 0.8 V. The optimized pH condition for DNA biosensor operation was at pH 7. The response of the developed DNA biosensor was linear within complimentary target concentrations in between of 10 - 100 µM. In addition, the LOD was obtained on the li-

Figure 3 . Current produced from DNA hybridization of 80 µM immobilized ssDNA on ZnONWs modified gold electrode to its target (BRCA1) in various concentrations (10 to 120 µM). The signal of DPV currents were measured at a potential scanned between 0.4 to 1.16 V with a modulation amplitude = 0.025 V, interval time (t1) = 0.5 s, modulation time (t2) = 0.05 s and step potential = 0.005 V in 0.1 M acetate buffer solutions (pH 7).

Figure 4 . The pH effect on the biosensor response as indicated by the hybridization current with complementary DNA in 0.1 M ABS. The signal of DPV currents were measured at a potential scanned between 0.4 to 1.16 V with a modulation amplitude = 0.025 V, interval time (t1) = 0.5 s, modulation time (t2) = 0.05 s and step potential = 0.005 V in 0.1 M buffer solutions (pH 3-11).

Figure 5 . Calibration plot presenting the changes of oxidation signals measured in the presence of DNA hybridization between 80 µM BRCA1 probe and various concentration levels of BRCA1 targets. The signal of DPV currents were measured at a potential scanned between 0.4 to 1.16 V with a modulation amplitude = 0.025 V, interval time (t1) = 0.5 s, modulation time (t2) = 0.05 s and step potential = 0.005 V in 0.1 M buffer solutions (pH 7).

near response of developed method, which was found to be 3.32 µM.

We would like to thank the Ministry of Higher Education Malaysia for the ERGS grant (600/RMI/st/ERGS/ 5/3/fst12/2011) and Universiti Teknologi MARA for financial support via postgraduate teaching assistant scheme (UPTA) to Nur Azimah Mansor for conducting this research.

Nur AzimahMansor,Zainiharyati MohdZain,Hairul HishamHamzah,Mohd Shihabuddin AhmadNoorden,Siti SafuraJaapar,ValerioBeni,Zafar HusainIbupoto, (2014) Detection of Breast Cancer 1 (BRCA1) Gene Using an Electrochemical DNA Biosensor Based on Immobilized ZnO Nanowires. Open Journal of Applied Biosensor,03,9-17. doi: 10.4236/ojab.2014.32002


UV absorption and emission spectroscopy

The UV-vis absorption and emission spectra were recorded on Agilent Technologies Cary series UV-vis-NIR absorbance and Cary eclipse fluorescence spectrophotometers, respectively using 10 mm path-length quartz cuvettes. The absorption spectra were scanned from 230 to 800 nm. Fluorescence emission titrations were carried out in 20 mM PBS (100 mM KCl/NaCl), pH 7.4 using 1 μM of TGP18 with incremental addition of different pre-annealed GQ and duplex DNAs until saturation was reached. The excitation wavelength for TGP18 was fixed at 560 nm and the emission wavelength was scanned from 570 nm to 800 nm. The slits for excitation and emission were set at 5 nm. The normalised changes in fluorescence of TGP18 were recorded and used to calculate the dissociation constant (KD) value by plotting the change in fluorescence (∆F/∆Fmax) at 640 nm versus increasing concentration of TGP18. The experimental data points obtained were fitted in one site specific binding equation:

where ∆F/∆Fmax = Change in fluorescence, L = GQ concentration and KD = dissociation constant. Similar experiment was performed with duplex DNA, where 1 μM TGP18 was titrated with increasing concentrations of duplex DNA (0-30 μM). The LOD value was calculated by using the equation LOD = K × Sb/m, where K is a numerical factor chosen according to the confidence level desired, Sb the standard deviation of the blank measurements (n = 3) and m the sensitivity of the calibration curve [48].


Materials and Methods

General Procedure for the Synthesis of Polymers 3.

To a solution of 2-ethyl-2-oxazoline (3.97 g, 40 mmol) in chlorobenzene (10 mL) at room temperature was quickly but carefully added methyl triflate (MeOTf, 197 μL, 1.8 mmol). The mixture was then heated at 80 °C for 1 h and then rapidly cooled by immersing the flask in dry ice. Then 4 mL of a 5% sodium carbonate solution was added to the polymer solution, and the mixture was stirred for 30 min. The aqueous layer was separated, and the organic layer was extracted with 5% sodium carbonate solution. The aqueous layers were separated, combined, and stirred overnight at room temperature to hydrolyze the terminal oxazolinium cation. The cloudy mixture was acidified with diluted HCl to give a clear solution of pH < 6 and was then extracted with methylene chloride. The combined organic layers were dried with anhydrous magnesium sulfate and then concentrated. A white solid was precipitated by dropwise addition of diethyl ether to the concentrated solution, and the solid was collected by filtration and dried overnight in a vacuum oven to afford the desired polymer 1 (n ≈ 23) (3.1 g, yield 78%). 1 H NMR (400 MHz, CD3OD) δ 3.51–3.70 (m, 4H, N-CH2-CH2-), 2.96–3.10 (m, 0.127 H, H3C-N), 2.36–2.47 (m, 2H, C(O)-CH2-CH3), 1.10–1.11 (m, 3H, -CH2-CH3). To a suspension of linear polyethylenimine derived from oxazoline polymerization (1, 1 g, n ≈ 23) in 10 mL of water was added 15 mL of concentrated (35%) HCl. The reaction mixture was heated at 100 °C for 48 h to remove the propionyl groups. Excess of hydrogen chloride was removed under reduced pressure and the remaining white solid was dissolved in 10 mL of water. The resulting mixture was made alkaline by adding aqueous sodium hydroxide, and the formed white precipitate was filtrated and washed with water to afford the desired product linear polyethylenimine 2 (230 mg, yield 53%). 1 H NMR (400 MHz, CD3OD) δ 2.77 (s, 4H, NH-CH2-CH2-). To a suspension of polymer 2 (200 mg, 4.65 mmol, n ≈ 23) in 10 mL of deionized water was added 4(5)-(hydroxymethyl)imidazole (1.04 g, 10.6 mmol) and potassium carbonate (4.4 g, 31.9 mmol). The reaction mixture was heated at 100 °C for 24 h and then cooled to room temperature. The residue was put into a dialysis tube and dialyzed against the following solvents (each 12 to approximately 24 h): 50% methanol in water, 25% methanol in water, and water. The final residue was evaporated and dried to afford the desired product 3 as a hydroscopic brown solid (281 mg, yield 49%) (grafting ratio: > 95% of ethylenimine units, according to 1 H integration). 1 H NMR (400 MHz, CD3OD) δ 7.60–7.61 (m, 1H, imidazol), 6.69–6.96 (m, 1H, imidazol), 3.91 (m, 1H, N-CH2-C), 3.60 (m, 1H, N-CH2-C), 2.60–2.67 (m, 4H, NH-CH2-CH2-). Detailed description of materials and methods, the synthetic procedures, and characterization spectra are given in SI Text.

Acid-Base Titration.

The buffering capacities of polymers 2 and 3 were determined by acid-base titration. Then 0.25 mmol of 2 or 3 (concentration of ethyleneimine units) were added to 6 mL of deionized water, and then 5 mL of 0.1 N HCl was added to the suspension in order to make a clear solution and adjust the pH to ca. 2. Then 200 μL portions of 0.05 M NaOH were sequentially added to the solution, and the pH was measured after each addition with an UltraBASIC Benchtop pH Meter (UB-5) with an UltraBasic glass body pH electrode.

Determination of Binding Acticity.

UV spectra were recorded at room temperature, relative to poly(U) (180 nmol in U) and polymer 3 (n ≈ 23) dissolved in 5 mL of 10 mM phosphate buffer (Na2HPO4-NaH2PO4) at pH 7.55. CD spectra were recorded at room temperature, relative to poly(U) (375 nmol in U) and polymer 3 (n ≈ 23) directly dissolved in 5 mL of 10 mM phosphate buffer (Na2HPO4-NaH2PO4) at pH 7.5.

Enzymatic Kinetic Assay.

Buffer A refers to pH 7.00 (25 °C), 0.125 M Trizma® [tris-(hydroxymethyl)-aminomethane] base/succinic acid, 0.125 M NaCl, 18.8 mM MgCl2. Buffers for the cleavage reaction were made by Trizma® base/Trizma® hydrochloride. The ionic strength was adjusted to 0.1 M with sodium chloride. The pH of the reaction solution was measured before the reaction at 25 °C and extrapolated to 80 °C, the reaction temperature, with the aid of the known temperature dependence. All buffers were kept frozen before use. Reaction solutions were prepared by combining appropriate amount of poly(U), p-nitrobenzenesulfonic acid sodium salt as internal standard, polymer catalysts, and buffer solution to give a reaction mixture (for polyvinylimidazoles, 30% ethanol in Trizma-HCl solution was used as the buffer). At appropriate intervals, samples were removed from the reaction mixture, cooled to room temperature, and immediately treated with venom phosphodiesterase I solution (approximately 25 units·mL -1 in buffer A). The tube was capped, vigorously mixed, and submerged in a thermally equilibrated block heater at 40 °C. After incubating for 6 h, the samples were removed and placed in dry ice until ready for analysis by HPLC. Uridine concentrations in the reaction samples were calculated from the equation previously obtained using the initial concentration of the standard. The pseudo-first-order rate constant kobs for the cleavage were obtained from dividing the slope of a plot of [Uridine] vs. time by the initial concentration of poly(U). A representative procedure, processing of cleavage sample HPLC spectra, and the results are described in SI Text.


Watch the video: Anti-Herpetic Drugs - How They Work (June 2022).


Comments:

  1. Takus

    I'm sorry, but, in my opinion, they were wrong. I propose to discuss it.

  2. Vudolar

    Absolutely agrees with you. In this something is a good idea, it agrees with you.

  3. Voodoojas

    This is interesting. Say please - where can I read about this?

  4. Votaxe

    Interesting article, respect to the author

  5. Enda

    An interesting topic, thanks to the author pleased, tell me, where did you see something similar here? once more hoa to poyuzat.

  6. Tataur

    Quite right! The idea is good, you agree.

  7. Zair

    Bravo, you were visited simply by the brilliant idea



Write a message