Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

UniProt release 2019_11

Published December 18, 2019

Headline

Thicker than water

We know about blood types and their incompatibility; transfusing someone who is O- with AB+ blood can be lethal. The ABO alleles present on chromosome 9 decide our blood type. The A and B antigens are a set of red blood cell surface carbohydrates ending in α-1,3-linked N-acetylgalactosamine and α-1,3-linked galactose respectively, while type O blood has neither of these cell surface sugars. Sequence variations in the ABO gene determine if the encoded protein has α-1,3-N-acetylgalactosaminyltransferase activity and makes type A blood, or if it has α-1,3-galactosyltransferase activity and makes type B blood. When both alleles are present, we make type AB blood. Deletion of a single G nucleotide in the ABO gene leads to a truncated inactive product and type O blood, which has the non-modified H antigen.

To improve the usability of blood, people have tried for years to find a way to enzymatically convert A or B blood to type O; it seems an obvious way to increase the supply of universal donor (which would still require Rhesus matching). While such enzymes have been found, they are not yet ideal, as they either work at high concentration or have very specific buffer requirements, not met by blood.

By screening human fecal metagenomic libraries, Rahfeld et al. have isolated a pair of enzymes from the obligate gut anaerobe Flavonifractor plautii that efficiently converts the A to H antigen (type O). The first enzyme (A type blood N-acetyl-alpha-D-galactosamine deacetylase, ADAC) deacylates all A antigen subtypes tested (and there are many), while the second enzyme (A type blood alpha-D-galactosamine galactosaminidase, AGAL) removes the residual galactosamine moiety. This reaction can occur on red blood cells and in blood, as opposed to a buffer system, and at low enzyme concentration, and thus shows promise for uses in blood production. Further testing is underway, and we still need a way to remove the B antigen, but this could well help increase the flexibility of our blood supply. It still won’t solve the world shortage of blood, only more donors can do that...

As of this release, ADAC and AGAL have been annotated and are available in UniProtKB/Swiss-Prot.

UniProtKB news

Change of FT and CC sections in UniProtKB text format

We have changed the format of the FT and CC section of the UniProtKB text files. The changes of the FT section likely affects all parsers, and software will have to be adapted accordingly. The changes of the CC section are smaller, but may also require code adaptations depending on the CC annotation types that you parse.

The motivation for this change is described in the section “Functional annotation of different gene products in UniProtKB/Swiss-Prot” below, where you can also find the technical details and examples under the heading Text format.

Change of line length in UniProtKB text format

Historically, the lines of the UniProtKB text format have been wrapped at 75 characters for technical reasons (terminal screen size and data processing capabilities). When these technical restrictions vanished, we introduced exceptions for data like URLs, protein names and cross-references where line wrapping does not improve readability. These lines can be up to 255 characters long, but most lines are still wrapped at 75 characters for readability. We have now increased the maximum number of characters for wrapped lines to 80 in the context of the format change of the FT section of the UniProtKB text format for the functional annotation of different gene products in UniProtKB/Swiss-Prot described below.

Functional annotation of different gene products in UniProtKB/Swiss-Prot

To reduce database redundancy, the UniProtKB/Swiss-Prot policy is to describe, whenever possible, all protein products that are encoded by one gene in a given species in a single entry. This includes isoforms generated by alternative promoter usage, alternative splicing, alternative initiation and ribosomal frameshifting. We assign a name and a unique identifier to each isoform and choose one of them to be the canonical sequence that is shown in the UniProtKB text and XML format (the RDF format shows all sequences). All positional annotations in the entry referred to this canonical sequence until this release. Some gene products are precursors that are processed by proteolytic cleavage to generate the biologically active product(s). These products are described by their location on the sequence, a name and a unique identifier.

When isoforms, or products of proteolytic cleavage, are known to differ in their function or other characteristics, we generally describe this in the text of the respective annotations. To make this information also accessible to software applications, we adapted the UniProtKB text format to describe the product to which an annotation applies in a computer-processable way. The schemas of the XML and RDF format already supported this and required no changes. The following sections describe the changes for the text format and how the data is represented in the XML and RDF format.

Text format

Isoforms are described in ALTERNATIVE PRODUCTS annotations in the CC section. The products of proteolytic cleavage are described in PEPTIDE and CHAIN annotations in the FT section. All three annotation types provide a name (<ProductName>) and a unique ID (<ProductId>) for the product that they describe:

  • ALTERNATIVE PRODUCTS annotations show the name of an isoform in the Name field and its ID in the IsoId field.
    CC   -!- ALTERNATIVE PRODUCTS:
    ...
    CC       Name=<ProductName>;
    CC         IsoId=<ProductId>; Sequence=Displayed;
    
  • PEPTIDE and CHAIN annotations showed the name of a proteolytic cleavage product in the <Description> field and its ID in the FTId field in the previous text format:
    FT   CHAIN       <B>    <E>       <ProductName>.
    FT                                /FTId=<ProductId>.
    

    In the new text format that is described in more details in the FT section they are shown in the /note= and /id= qualifiers, respectively:
    FT   CHAIN           <B>..<E>
    FT                   /note="<ProductName>"
    FT                   /id="<ProductId>"
    

Example: O60443

CC   -!- ALTERNATIVE PRODUCTS:
CC       Event=Alternative splicing; Named isoforms=3;
CC       Name=1; Synonyms=Long;
CC         IsoId=O60443-1; Sequence=Displayed;
CC       Name=2; Synonyms=Short;
CC         IsoId=O60443-2; Sequence=VSP_004190;
CC         Note=No experimental confirmation available.;
CC       Name=3;
CC         IsoId=O60443-3; Sequence=VSP_044276;
...
FT   CHAIN         1    496       Gasdermin-E.
FT                                /FTId=PRO_0000148178.
FT   CHAIN         1    270       Gasdermin-E, N-terminal.
FT                                {ECO:0000269|PubMed:27281216,
FT                                ECO:0000305|PubMed:28459430}.
FT                                /FTId=PRO_0000442786.
FT   CHAIN       271    496       Gasdermin-E, C-terminal.
FT                                {ECO:0000305|PubMed:28459430}.
FT                                /FTId=PRO_0000442787.
CC section

The annotation types in the CC section describe a product by its name (isoform names are prefixed with the term “Isoform”). In the format descriptions below this name is represented by <ProductName>. Different products are described in separate annotations (see FUNCTION and BIOPHYSICOCHEMICAL PROPERTIES examples).

All annotation types of the CC section start with:

CC   -!- <TYPE>:

Where <TYPE> is a value from the controlled vocabulary of annotation types.

In some annotation types the content of the annotation used to directly follow the <TYPE>, and lines were wrapped at 75 chars:

CC   -!- <TYPE>: <Content>

In the new format a <ProductName> may be added between the <TYPE> and the <Content> and lines are wrapped at 80 chars (see Change of line length in UniProtKB text format):

CC   -!- <TYPE>: [<ProductName>]: <Content>

The <ProductName> is surrounded by square brackets and separated by a colon from the <Content> to make it possible to parse it with a POSIX ERE like this one:

^CC   -!- ([^:]+):(?: \[(.+?)\]:)? (.+)

Where $1=<TYPE>, $2=<ProductName>, $3=<Content>.

In annotation types where the content is structured as a list of different fields that are formatted according to custom rules for better readability, the annotation content starts on a new line:

CC   -!- <TYPE>:
CC       <Content>

In the new format a <ProductName> may be added after the <TYPE> and this line is not wrapped (i.e. it may in rare cases exceed 80 chars).

CC   -!- <TYPE>: [<ProductName>]:
CC       <Content>

The format of the <Content> remains unchanged.

A <ProductName> cannot be added to ALTERNATIVE PRODUCTS and INTERACTION annotations. The INTERACTION format will be adapted in a different way to describe binary interactions that involve isoforms and/or products of proteolytic cleavage (see Change of annotation topic 'Interaction').

Please note that the previous text format of SUBCELLULAR LOCATION, COFACTOR and MASS SPECTROMETRY annotations already allowed to specify a product name/ID, but we have adapted it to be consistent with all other annotation types.

Representative examples for different annotation types are shown here:

FUNCTION

Example: Q96F85

CC   -!- ALTERNATIVE PRODUCTS:
CC       Event=Alternative splicing; Named isoforms=2;
CC       Name=1; Synonyms=CRIP1a;
CC         IsoId=Q96F85-1; Sequence=Displayed;
CC       Name=2; Synonyms=CRIP1b;
CC         IsoId=Q96F85-2; Sequence=VSP_035598;

Previous format:

CC   -!- FUNCTION: Isoform 1 suppresses cannabinoid receptor CNR1-mediated
CC       tonic inhibition of voltage-gated calcium channels. Isoform 2 does
CC       not have this effect. {ECO:0000269|PubMed:17895407}.

New format:

CC   -!- FUNCTION: [Isoform 1]: Suppresses cannabinoid receptor CNR1-mediated
CC       tonic inhibition of voltage-gated calcium channels.
CC       {ECO:0000269|PubMed:17895407}.
CC   -!- FUNCTION: [Isoform 2]: Does not suppress cannabinoid receptor CNR1-
CC       mediated tonic inhibition of voltage-gated calcium channels.
CC       {ECO:0000269|PubMed:17895407}.

DISEASE

Example: P35555

FT   CHAIN      2732   2871       Asprosin. {ECO:0000305|PubMed:27087445,
FT                                ECO:0000305|PubMed:9817919}.
FT                                /FTId=PRO_0000436882.

Previous format:

CC   -!- DISEASE: Marfan lipodystrophy syndrome (MFLS) [MIM:616914]: A
CC       syndrome characterized by congenital ...
CC       Note=The disease is caused by mutations affecting the gene
CC       represented in this entry. Asprosin: Mutations specifically affect
CC       Asprosin, a hormone peptide present at the C-terminus of
CC       Fibrillin-1 chain, which is cleaved from Fibrillin-1 following
CC       secretion (PubMed:27087445). {ECO:0000269|PubMed:27087445}.

New format:

CC   -!- DISEASE: [Asprosin]: Marfan lipodystrophy syndrome (MFLS) [MIM:616914]:
CC       A syndrome characterized by congenital ...
CC       Note=The disease is caused by mutations affecting the gene represented
CC       in this entry. {ECO:0000269|PubMed:27087445}.

SUBCELLULAR LOCATION

Please note that the previous text format of SUBCELLULAR LOCATION annotations already allowed to describe a product by its name in the optional first field. To be consistent with all other annotation types we have added square brackets around the product name.

Example: Q13421

CC   -!- ALTERNATIVE PRODUCTS:
...
CC       Name=3; Synonyms=SMRP;
CC         IsoId=Q13421-2; Sequence=VSP_021059, VSP_021060;
...
FT   CHAIN        37    286       Megakaryocyte-potentiating factor.
FT                                /FTId=PRO_0000253560.

Previous format:

CC   -!- SUBCELLULAR LOCATION: Cell membrane; Lipid-anchor, GPI-anchor.
CC       Golgi apparatus.
CC   -!- SUBCELLULAR LOCATION: Megakaryocyte-potentiating factor: Secreted.
CC   -!- SUBCELLULAR LOCATION: Isoform 3: Secreted.

New format:

CC   -!- SUBCELLULAR LOCATION: Cell membrane; Lipid-anchor, GPI-anchor. Golgi
CC       apparatus.
CC   -!- SUBCELLULAR LOCATION: [Megakaryocyte-potentiating factor]: Secreted.
CC   -!- SUBCELLULAR LOCATION: [Isoform 3]: Secreted.

MASS SPECTROMETRY

Please note that the previous text format of MASS SPECTROMETRY annotations already allowed to describe a product (by its sequence range and an optional isoform ID) in the Range field. To be consistent with all other annotation types we have replaced the Range field by a <ProductName> field.

Example: P09493

CC   -!- ALTERNATIVE PRODUCTS:
...
CC       Name=3; Synonyms=Fibroblast, TM3;
CC         IsoId=P09493-3; Sequence=VSP_006577, VSP_006579;

Previous format:

CC   -!- MASS SPECTROMETRY: Mass=32875.93; Method=MALDI; Range=1-284
CC       (P09493-3); Evidence={ECO:0000269|PubMed:11840567};

New format:

CC   -!- MASS SPECTROMETRY: [Isoform 3]: Mass=32875.93; Method=MALDI;
CC       Evidence={ECO:0000269|PubMed:11840567};

RNA EDITING

Example: Q9P225

CC   -!- ALTERNATIVE PRODUCTS:
...
CC       Name=3;
CC         IsoId=Q9P225-3; Sequence=VSP_031913, VSP_031914, VSP_031915;

Previous format:

CC   -!- RNA EDITING: Modified_positions=Not_applicable; Note=Exon 13
CC       included in isoform 3 is extensively edited in brain.
CC       {ECO:0000269|PubMed:20835228};

New format:

CC   -!- RNA EDITING: [Isoform 3]: Modified_positions=Not_applicable; Note=Exon
CC       13 is extensively edited in brain. {ECO:0000269|PubMed:20835228};

WEB RESOURCE

Example: P50570

CC   -!- ALTERNATIVE PRODUCTS:
...
CC       Name=1;
CC         IsoId=P50570-1; Sequence=Displayed;

Previous format:

CC   -!- WEB RESOURCE: Name=The UMD-DNM2-isoform 1 mutations database;
CC       URL="http://www.umd.be/DNM2/";

New format:

CC   -!- WEB RESOURCE: [Isoform 1]: Name=The UMD-DNM2-isoform 1 mutations
CC       database;
CC       URL="http://www.umd.be/DNM2/";

CATALYTIC ACTIVITY

Example: Q2YHF0

FT   CHAIN      1475   2092       Serine protease NS3.
FT                                {ECO:0000250|UniProtKB:P29990}.
FT                                /FTId=PRO_0000268140.
...
FT   CHAIN      2488   3387       RNA-directed RNA polymerase NS5.
FT                                {ECO:0000250|UniProtKB:P29990}.
FT                                /FTId=PRO_0000268144.

Previous format:

CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=Selective hydrolysis of -Xaa-Xaa-|-Yaa- bonds in which
CC         each of the Xaa can be either Arg or Lys and Yaa can be either
CC         Ser or Ala.; EC=3.4.21.91;
CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=a ribonucleoside 5'-triphosphate + RNA(n) = diphosphate +
CC         RNA(n+1); Xref=Rhea:RHEA:21248, Rhea:RHEA-COMP:11128, Rhea:RHEA-
CC         COMP:11129, ChEBI:CHEBI:33019, ChEBI:CHEBI:61557,
CC         ChEBI:CHEBI:83400; EC=2.7.7.48; Evidence={ECO:0000255|PROSITE-
CC         ProRule:PRU00539};

New format:

CC   -!- CATALYTIC ACTIVITY: [Serine protease NS3]:
CC       Reaction=Selective hydrolysis of -Xaa-Xaa-|-Yaa- bonds in which each of
CC         the Xaa can be either Arg or Lys and Yaa can be either Ser or Ala.;
CC         EC=3.4.21.91;
CC   -!- CATALYTIC ACTIVITY: [RNA-directed RNA polymerase NS5]:
CC       Reaction=a ribonucleoside 5'-triphosphate + RNA(n) = diphosphate +
CC         RNA(n+1); Xref=Rhea:RHEA:21248, Rhea:RHEA-COMP:11128, Rhea:RHEA-
CC         COMP:11129, ChEBI:CHEBI:33019, ChEBI:CHEBI:61557, ChEBI:CHEBI:83400;
CC         EC=2.7.7.48; Evidence={ECO:0000255|PROSITE-ProRule:PRU00539};

COFACTOR

Please note that the previous text format of COFACTOR annotations already allowed to describe a product by its name in the optional first field. To be consistent with all other annotation types we have added square brackets around the product name.

Example: P26662

FT   CHAIN      1027   1657       Serine protease NS3. {ECO:0000255}.
FT                                /FTId=PRO_0000037644.
FT   CHAIN      1658   1711       Non-structural protein 4A. {ECO:0000255}.
FT                                /FTId=PRO_0000037645.

Previous format:

CC   -!- COFACTOR: Serine protease NS3:
CC       Name=Zn(2+); Xref=ChEBI:CHEBI:29105;
CC         Evidence={ECO:0000269|PubMed:9060645};
CC       Note=Binds 1 zinc ion. {ECO:0000269|PubMed:9060645};
CC   -!- COFACTOR: Non-structural protein 5A:
CC       Name=Zn(2+); Xref=ChEBI:CHEBI:29105; Evidence={ECO:0000250};
CC       Note=Binds 1 zinc ion in the NS5A N-terminal domain.
CC       {ECO:0000250};

New format:

CC   -!- COFACTOR: [Serine protease NS3]:
CC       Name=Zn(2+); Xref=ChEBI:CHEBI:29105;
CC         Evidence={ECO:0000269|PubMed:9060645};
CC       Note=Binds 1 zinc ion. {ECO:0000269|PubMed:9060645};
CC   -!- COFACTOR: [Non-structural protein 5A]:
CC       Name=Zn(2+); Xref=ChEBI:CHEBI:29105; Evidence={ECO:0000250};
CC       Note=Binds 1 zinc ion in the NS5A N-terminal domain. {ECO:0000250};

BIOPHYSICOCHEMICAL PROPERTIES

Example: Q9ULC5

CC   -!- ALTERNATIVE PRODUCTS:
...
CC       Name=1; Synonyms=ACSL5b, ACSL5-fl;
CC         IsoId=Q9ULC5-1; Sequence=Displayed;
...
CC       Name=3; Synonyms=ACSL5delta20;
CC         IsoId=Q9ULC5-4; Sequence=VSP_038233;

Previous format:

CC   -!- BIOPHYSICOCHEMICAL PROPERTIES:
CC       Kinetic parameters:
CC         KM=0.11 uM for palmitic acid (isoform 1 at pH 7.5)
CC         {ECO:0000269|PubMed:17681178};
CC         KM=0.38 uM for palmitic acid (isoform 1 at pH 9.5)
CC         {ECO:0000269|PubMed:17681178};
CC         KM=0.04 uM for palmitic acid (isoform 3 at pH 7.5)
CC         {ECO:0000269|PubMed:17681178};
CC         KM=0.15 uM for palmitic acid (isoform 3 at pH 8.5)
CC         {ECO:0000269|PubMed:17681178};
CC       pH dependence:
CC         Optimum pH is 9.5 (isoform 1), 7.5-8.5 (isoform 3).
CC         {ECO:0000269|PubMed:17681178};

New format:

CC   -!- BIOPHYSICOCHEMICAL PROPERTIES: [Isoform 1]:
CC       Kinetic parameters:
CC         KM=0.11 uM for palmitic acid (at pH 7.5)
CC         {ECO:0000269|PubMed:17681178};
CC         KM=0.38 uM for palmitic acid (at pH 9.5)
CC         {ECO:0000269|PubMed:17681178};
CC       pH dependence:
CC         Optimum pH is 9.5. {ECO:0000269|PubMed:17681178};
CC   -!- BIOPHYSICOCHEMICAL PROPERTIES: [Isoform 3]:
CC       Kinetic parameters:
CC         KM=0.04 uM for palmitic acid (at pH 7.5)
CC         {ECO:0000269|PubMed:17681178};
CC         KM=0.15 uM for palmitic acid (at pH 8.5)
CC         {ECO:0000269|PubMed:17681178};
CC       pH dependence:
CC         Optimum pH is 7.5-8.5. {ECO:0000269|PubMed:17681178};

SEQUENCE CAUTION

Example: Q9NQS3

Previous format:

CC   -!- ALTERNATIVE PRODUCTS:
...
CC       Name=1;
CC         IsoId=Q9NQS3-1; Sequence=Displayed;
...
CC       Name=3;
CC         IsoId=Q9NQS3-3; Sequence=VSP_046893, VSP_046894;
CC         Note=Ref.2 (BAC11404) sequence differs from that shown due to
CC         erroneous termination (Truncated C-terminus). {ECO:0000305};
...
CC   -!- SEQUENCE CAUTION:
CC       Sequence=AAH17572.1; Type=Erroneous initiation; Note=Truncated N-terminus.; Evidence={ECO:0000305};

New format:

CC   -!- ALTERNATIVE PRODUCTS:
...
CC       Name=1;
CC         IsoId=Q9NQS3-1; Sequence=Displayed;
...
CC       Name=3;
CC         IsoId=Q9NQS3-3; Sequence=VSP_046893, VSP_046894;
...
CC   -!- SEQUENCE CAUTION: [Isoform 1]:
CC       Sequence=AAH17572.1; Type=Erroneous initiation; Note=Truncated N-terminus.; Evidence={ECO:0000305};
CC   -!- SEQUENCE CAUTION: [Isoform 3]:
CC       Sequence=BAC11404.1; Type=Erroneous termination; Note=Truncated C-terminus.; Evidence={ECO:0000305};
FT section

Note: The format descriptions make use of POSIX ERE syntax.

All positional annotations in the FT section previously referred to the canonical sequence that is shown in the UniProtKB entry. This was the text format of these annotation types:

 FT   <TYPE>      <B>    <E>       (<Description>.)?( {<Evidences>}.)?
(FT                                /FTId=<Id>.)?

Where

  • <TYPE> is a value from the controlled vocabulary of positional annotation types.
  • <B> and <E> are amino acid positions on the canonical sequence. For most annotation types, they are the begin and end position of a sequence range, but they have other semantics for some types (e.g. CROSSLNK and DISULFID).
  • <Description> may provide information in addition to that conveyed by the <TYPE> and the location <B> and <E>. This field is mandatory for some annotation types and optional for others.
  • <Evidences> are optional and added between curly braces.
  • <Id> is a unique annotation identifier that is mandatory for some annotation types, including CHAIN and PEPTIDE where it corresponds to the <ProductId>.

We have modified this format in order to describe amino acid positions on isoforms sequences. The new format is inspired by the INSDC’s feature table format to enable code reuse:

 FT   <TYPE>          <Location>
(FT                   /<Qualifier>(="<Value>")?)*

Where

  • <TYPE> is a value from the controlled vocabulary of positional annotation types.
  • <Location> is a sequence location on the canonical or an isoform sequence. We will use for now only a subset of the INSDC Location types: A <Location> must be either a single <Position> or a range of <Position> that may optionally be preceded by an isoform ID. The < and > symbols may be used with begin and end positions to indicate that the begin or end point is beyond the specified amino acid position. Please note that we have to extend the INSDC Location format with the ? symbol to allow us to represent all existing UniProtKB locations. This symbol may precede a <Position> to indicate that the exact position is unsure, or it may substitute the <Position> when the position is unknown.
    (<IsoformId>:)?((<|\?)?<Position>|\?)(..((>|\?)?<Position>|\?))?
    
  • /<Qualifier> may provide information in addition to that conveyed by the <TYPE> and <Location>. While we will follow the format of the INSDC Qualifiers, we will introduce our own <Qualifier> types where necessary. For this format change, we will represent the existing data with 3 qualifiers:
    • /note= will show the content of the current <Description> field.
    • /evidence= will show the content of the current <Evidences> field.
    • /id= will show the content of the current /FTId= field.

In a future format change, we may introduce more <Location> and <Qualifier> types to structure the description of positional annotations further.

Lines are wrapped at 80 chars (see section Change of line length in UniProtKB text format above).

Example: P84077

This example illustrates the format change with a selection of representative positional annotation types that refer to the canonical sequence.

Previous format:

FT   INIT_MET      1      1       Removed. {ECO:0000244|PubMed:19413330,
FT                                ECO:0000244|PubMed:22223895,
FT                                ECO:0000269|PubMed:25255805,
FT                                ECO:0000269|PubMed:25807930}.
FT   CHAIN         2    181       ADP-ribosylation factor 1.
FT                                /FTId=PRO_0000207378.
...
FT   NP_BIND     126    129       GTP. {ECO:0000244|PDB:1HUR,
FT                                ECO:0000244|PDB:1RE0,
FT                                ECO:0000244|PDB:1U81,
FT                                ECO:0000244|PDB:3O47, ECO:0000305}.
...
FT   VARIANT      35     35       Y -> H (in PVNH8; decreased interaction
FT                                with GGA3; dbSNP:rs879036238).
FT                                {ECO:0000269|PubMed:28868155}.
FT                                /FTId=VAR_081272.
...
FT   HELIX         6      9       {ECO:0000244|PDB:1HUR}.

New format:

FT   INIT_MET        1
FT                   /note="Removed"
FT                   /evidence="ECO:0000244|PubMed:19413330,
FT                   ECO:0000244|PubMed:22223895, ECO:0000269|PubMed:25255805,
FT                   ECO:0000269|PubMed:25807930"
FT   CHAIN           2..181
FT                   /note="ADP-ribosylation factor 1"
FT                   /id="PRO_0000207378"
...
FT   NP_BIND         126..129
FT                   /note="GTP"
FT                   /evidence="ECO:0000244|PDB:1HUR, ECO:0000244|PDB:1RE0,
FT                   ECO:0000244|PDB:1U81, ECO:0000244|PDB:3O47, ECO:0000305"
...
FT   VARIANT         35
FT                   /note="Y -> H (in PVNH8; decreased interaction with GGA3;
FT                   dbSNP:rs879036238)"
FT                   /evidence="ECO:0000269|PubMed:28868155"
FT                   /id="VAR_081272"
...
FT   HELIX           6..9
FT                   /evidence="ECO:0000244|PDB:1HUR"

Example: P0C551

This example illustrates the use of the < and ? symbols in UniProtKB locations.

Previous format:

FT   SIGNAL       <1      ?       {ECO:0000250}.
FT   PROPEP        ?     17       {ECO:0000250}.
FT                                /FTId=PRO_0000293097.
FT   CHAIN        18    142       Acidic phospholipase A2 KBf-grIB.
FT                                /FTId=FTId=PRO_0000293098.

New format:

FT   SIGNAL          <1..?
FT                   /evidence="ECO:0000250"
FT   PROPEP          ?..17
FT                   /evidence="ECO:0000250"
FT                   /id="PRO_0000293097"
FT   CHAIN           18..142
FT                   /note="Acidic phospholipase A2 KBf-grIB"
FT                   /id="PRO_0000293098"

Example: P12821

This example illustrates how positional annotations for isoforms are represented.

Previous format:

CC   -!- ALTERNATIVE PRODUCTS:
CC       ...
CC       Name=Testis-specific; Synonyms=ACE-T;
CC         IsoId=P12821-3, P22966-1;
CC         Sequence=VSP_035120, VSP_035121;
CC         Note=Variant in position: 32:S->P (in dbSNP:rs4317). Variant in
CC         position: 49:S->G (in dbSNP:rs4318).;
...
FT   VARIANT     154    154       A -> T (in dbSNP:rs13306087).
FT                                /FTId=VAR_029139.

New format:

CC   -!- ALTERNATIVE PRODUCTS:
CC       ...
CC       Name=Testis-specific; Synonyms=ACE-T;
CC         IsoId=P12821-3, P22966-1;
CC         Sequence=VSP_035120, VSP_035121;
...
FT   VARIANT         154
FT                   /note="A -> T (in dbSNP:rs13306087)"
FT                   /id="VAR_029139"
...
FT   VARIANT         P12821-3:32
FT                   /note="S -> P (in dbSNP:rs4317)"
FT                   /id="VAR_x"
FT   VARIANT         P12821-3:49
FT                   /note="S -> G (in dbSNP:rs4318)"
FT                   /id="VAR_y"

XML format

The UniProtKB XSD already allowed to describe the product to which an annotation applies and required no changes.

Isoforms are described in “alternative products” annotations. The products of proteolytic cleavage are described in “peptide” and “chain” annotations. All three annotation types provide a name (<ProductName>) and/or a unique ID (<ProductId>) for the product that they describe:

  • “alternative products” annotations describe each isoform by an isoform element of isoformType. The isoformType describes the product IDs and names with sequences of id and name elements (where the first element in each sequence is the main product ID/name).
    <comment type=“alternative products”>
      ...
      <isoform>
        <id><ProductId></id>
        <id><OldProductId></id>
        <name><ProductName></name>
        <name><AlternativeProductName></name>
        ...
      </isoform>
      ...
    </comment>
    
  • “peptide” and “chain” annotations show the name and ID of a proteolytic cleavage product in the description and id attributes of the featureType.
    <feature type=“chain” description="<ProductName>" id="<ProductId>">
    ...
    </feature>
    
commentType

The commentType has two ways to indicate that the annotation applies to a specific product:

  • An optional molecule element of moleculeType allows to describe a product by its name or/and unique ID. It is currently only used for “subcellular location” and “cofactor” annotations (see examples below). In the future it may be used for all annotations that are represented by commentType.
  • An optional sequence of location elements of locationType allows to describe the sequence coordinates of an annotation. The locationType has an optional sequence attribute that is only set (to an isoform ID) when the coordinates are not for the canonical sequence. Sequence coordinates may currently be given for “rna editing”, “sequence caution” and “mass spectrometry” annotations. In the future sequence caution and mass spectrometry annotations will no longer describe sequence coordinates.

subcellular location

Example: Q13421

<comment type="alternative products">
  ...
  <isoform>
    <id>Q13421-2</id>
    <name>3</name>
    <name>SMRP</name>
    <sequence type="described" ref="VSP_021059 VSP_021060"/>
    ...
  </isoform>
  ...
</comment>
...
<feature type="chain" description="Megakaryocyte-potentiating factor"
                      id="PRO_0000253560">
  <location>
    <begin position="37"/>
    <end position="286"/>
  </location>
</feature>
<comment type="subcellular location">
  <subcellularLocation>
    <location>Cell membrane</location>
    <topology>Lipid-anchor</topology>
    <topology>GPI-anchor</topology>
  </subcellularLocation>
  <subcellularLocation>
    <location>Golgi apparatus</location>
  </subcellularLocation>
</comment>
<comment type="subcellular location">
  <molecule>Megakaryocyte-potentiating factor</molecule>
  <subcellularLocation>
    <location>Secreted</location>
  </subcellularLocation>
</comment>
<comment type="subcellular location">
  <molecule>Isoform 3</molecule>
  <subcellularLocation>
    <location>Secreted</location>
  </subcellularLocation>
</comment>

cofactor

Example: P26662

<feature type="chain" description="Serine protease NS3"
                      id="PRO_0000037644" evidence="4">
  <location>
    <begin position="1027"/>
    <end position="1657"/>
  </location>
</feature>
<feature type="chain" description="Non-structural protein 4A"
                      id="PRO_0000037645" evidence="4">
  <location>
    <begin position="1658"/>
    <end position="1711"/>
  </location>
</feature>
<comment type="cofactor">
  <molecule>Serine protease NS3</molecule>
  <cofactor evidence="14">
    <name>Zn(2+)</name>
    <dbReference type="ChEBI" id="CHEBI:29105"/>
  </cofactor>
  <text evidence="14">Binds 1 zinc ion.</text>
</comment>
<comment type="cofactor">
  <molecule>Non-structural protein 5A</molecule>
  <cofactor evidence="3">
    <name>Zn(2+)</name>
    <dbReference type="ChEBI" id="CHEBI:29105"/>
  </cofactor>
  <text evidence="3">Binds 1 zinc ion in the NS5A N-terminal domain.</text>
</comment>
featureType

The featureType has a mandatory location element of locationType to describe the sequence coordinates of an annotation.

Example: P84077

<feature type="initiator methionine" description="Removed" evidence="6 7 21 22">
  <location>
    <position position="1"/>
  </location>
</feature>
<feature type="chain" description="ADP-ribosylation factor 1" id="PRO_0000207378">
  <location>
    <begin position="2"/>
    <end position="181"/>
  </location>
</feature>
...
<feature type="nucleotide phosphate-binding region" description="GTP" evidence="1 2 3 4 25">
  <location>
    <begin position="126"/>
    <end position="129"/>
  </location>
</feature>
...
<feature type="sequence variant" description="In PVNH8; decreased interaction with GGA3; dbSNP:rs879036238." id="VAR_081272" evidence="23">
  <original>Y</original>
  <variation>H</variation>
  <location>
    <position position="35"/>
  </location>
</feature>
...
<feature type="helix" evidence="1">
  <location>
    <begin position="6"/>
    <end position="9"/>
  </location>
</feature>

The locationType has an optional sequence attribute that is only set (to an isoform ID) when the coordinates are not for the canonical sequence.

Example: P12821

Previous representation:

<comment type="alternative products">
  ...
  <isoform>
    <id>P12821-3</id>
    <id>P22966-1</id>
    <name>Testis-specific</name>
    <name>ACE-T</name>
    <sequence type="described" ref="VSP_035120 VSP_035121"/>
    <text>Variant in position: 32:S->P (in dbSNP:rs4317). Variant in position: 49:S->G (in dbSNP:rs4318).</text>
  </isoform>
  ...
</comment>
...
<feature type="sequence variant" description="In dbSNP:rs13306087." id="VAR_029139">
  <original>A</original>
  <variation>T</variation>
  <location>
    <position position="154"/>
  </location>
</feature>

New representation:

<comment type="alternative products">
  ...
  <isoform>
    <id>P12821-3</id>
    <id>P22966-1</id>
    <name>Testis-specific</name>
    <name>ACE-T</name>
    <sequence type="described" ref="VSP_035120 VSP_035121"/>
  </isoform>
  ...
</comment>
...
<feature type="sequence variant" description="In dbSNP:rs13306087." id="VAR_029139">
  <original>A</original>
  <variation>T</variation>
  <location>
    <position position="154"/>
  </location>
</feature>
...
<feature type="sequence variant" description="In dbSNP:rs4317." id="VAR_x">
  <original>S</original>
  <variation>P</variation>
  <location sequence="P12821-3">
    <position position="32"/>
  </location>
</feature>
<feature type="sequence variant" description="In dbSNP:rs4318." id="VAR_y">
  <original>S</original>
  <variation>G</variation>
  <location sequence="P12821-3">
    <position position="49"/>
  </location>
</feature>

RDF format

The UniProt RDF schema ontology already allowed to describe the product to which an annotation applies and required no changes for this purpose.

The RDF format has a single hierarchy of Annotation classes with various intermediary classes. The subclass Sequence_Annotation groups all classes that refer to a location on a protein sequence. This location is represented with FALDO and always indicates the FALDO reference sequence for the location (the RDF format makes no special case for a canonical sequence). Annotations that do not refer to a specific location on a protein sequence, but that apply to a given product, describe the sequence of this product with a sequence property. The object of this property may be a Sequence or a Chain_Annotation / Peptide_Annotation that describes a sequence that is the product of proteolytic processing.

Please note that the change of mass spectrometry annotations required an adaptation of the hierarchy of Annotation classes: The Mass_Spectrometry_Annotation class no longer is an rdfs:subClassOf of the Sequence_Annotation class, but a direct rdfs:subClassOf of the Annotation class.

Example: Q13421

@prefix up: <http://purl.uniprot.org/core/> .
@prefix uniprot: <http://purl.uniprot.org/uniprot/> .
@prefix isoform: <http://purl.uniprot.org/isoforms/> .
@prefix annotation: <http://purl.uniprot.org/annotation/> .
@prefix faldo: <http://biohackathon.org/resource/faldo#> .

uniprot:Q13421
  up:annotation
    annotation:PRO_0000253560 ,
    <Q13421#SIPADAC7D651EFC09CC> ,
    <Q13421#SIP307BEB951103B073> ,
    <Q13421#SIPB6746E472B99B031> ,
    ...
  up:sequence
    isoform:Q13421-1 ,
    isoform:Q13421-3 ,
    isoform:Q13421-2 ,
    isoform:Q13421-4 ;

annotation:PRO_0000253560
  rdf:type up:Chain_Annotation ;
  rdfs:comment "Megakaryocyte-potentiating factor" ;
  up:range range:22853569102360878tt37tt286 .
range:22853569102360878tt37tt286
  rdf:type faldo:Region ;
  faldo:begin position:22853569102360878tt37 ;
  faldo:end position:22853569102360878tt286 .
position:22853569102360878tt37
  rdf:type faldo:Position , faldo:ExactPosition ;
  faldo:position 37 ;
  faldo:reference isoform:Q13421-1 .
position:22853569102360878tt286
  rdf:type faldo:Position , faldo:ExactPosition ;
  faldo:position 286 ;
  faldo:reference isoform:Q13421-1 .

<Q13421#SIPADAC7D651EFC09CC>
  rdf:type up:Subcellular_Location_Annotation ;
  up:locatedIn <Q13421#SIP04927440DF8EB941> ,
               <Q13421#SIP727DF431EB6C89EC> .

<Q13421#SIP307BEB951103B073>
  rdf:type up:Subcellular_Location_Annotation ;
  up:locatedIn <Q13421#SIPD59D33F5047A94FD> ;
  up:sequence annotation:PRO_0000253560 .

<Q13421#SIPB6746E472B99B031>
  rdf:type up:Subcellular_Location_Annotation ;
  up:locatedIn <Q13421#SIPD59D33F5047A94FD> ;
  up:sequence isoform:Q13421-2 .

isoform:Q13421-1
  rdf:type up:Simple_Sequence ;
  up:modified "2006-10-17"^^xsd:date ;
  up:version 2 ;
  up:precursor true ;
  up:mass 68986 ;
  up:crc64Checksum "FA17E3609B6CC9CA"^^xsd:token ;
  up:name "1" ;
  rdf:value "MALPTARPLLGSCGTPALGSLLFLLFSL ... LLASTLA" .
isoform:Q13421-3
  rdf:type up:Modified_Sequence ;
  up:name "2" ;
  up:basedOn isoform:Q13421-1 ;
  up:modification annotation:VSP_021059 .
  rdf:value "MALPTARPLLGSCGTPALGSLLFLLFSL ... LLASTLA" ;
isoform:Q13421-2
  rdf:type up:Modified_Sequence ;
  up:name "3" , "SMRP" ;
  up:basedOn isoform:Q13421-1 ;
  up:modification annotation:VSP_021059 , annotation:VSP_021060 .
  rdf:value "MALPTARPLLGSCGTPALGSLLFLLFSL ... LRAPLPC" ;
isoform:Q13421-4
  rdf:type up:Modified_Sequence ;
  up:name "4" ;
  up:basedOn isoform:Q13421-1 ;
  up:modification annotation:VSP_021058 , annotation:VSP_021059 .
  rdf:value "MALPTARPLLGSCGTPALGSLLFLLFSL ... LLASTLA" ;

Cross-references to RNAct

Cross-references have been added to RNAct, a database of protein–RNA interaction predictions for model organisms with supporting experimental data.

RNAct is available at https://rnact.crg.eu.

The format of the explicit links is:

Resource abbreviationRNAct
Resource identifierUniProtKB accession number
Optional information 1Molecule type

Example: Q9Y2I1

Show all entries having a cross-reference to RNAct.

Text format

Example: Q9Y2I1

DR   RNAct; Q9Y2I1; protein.

XML format

Example: Q9Y2I1

<dbReference type="RNAct" id="Q9Y2I1">
   <property type="molecule type" value="protein"/>
</dbreference>

RDF format

Example: Q9Y2I1

uniprot:Q9Y2I1
  rdfs:seeAlso <http://purl.uniprot.org/rnact/Q9Y2I1> .
<http://purl.uniprot.org/rnact/Q9Y2I1>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/RNAct> ;
  rdfs:comment "protein" .

Change of the cross-references to Pharos

We have introduced an additional field in the cross-references to the Pharos database to indicate the development status of a target. Targets are categorized into four development/druggability levels (TDLs), ranging from Tclin for approved drugs with known mechanisms of action, to Tdark for targets about which virtually nothing is known.

Text format

Example: P33151

DR   Pharos; P33151; Tbio.

XML format

Example: P33151

<dbReference type="Pharos" id="P33151">
  <property type="development level" value="Tbio"/>
</dbReference>

This change does not affect the XSD, but may nevertheless require code changes.

RDF format

Example: P33151

uniprot:P33151
  rdfs:seeAlso <http://purl.uniprot.org/pharos/P33151> .
<http://purl.uniprot.org/pharos/P33151>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/Pharos> ;
  rdfs:comment "Tbio" .

Removal of the cross-references to PMAP-CutDB

Cross-references to PMAP-CutDB have been removed.

Changes to the controlled vocabulary of human diseases

New diseases:

Changes to the controlled vocabulary for PTMs

New term for the feature key ‘Cross-link’ (‘CROSSLNK’ in the flat file):

  • 6-(S-cysteinyl)-8alpha-(pros-histidyl)-FAD (Cys-His)

Changes to keywords

Deleted keyword:

  • Complete proteome

Proteomes changes

The UniProt Proteomes portal is offering protein sequence sets obtained from the translation of completely sequenced genomes. Published genomes from NCBI Genome used to be brought into UniProt if they satisfy the following criteria:

  • The genome is annotated and a set of coding sequences is available.
  • The number of predicted coding sequences falls within a statistically significant range of published proteomes from neighbouring species.

We have changed these criteria to publish all proteomes that can be derived from NCBI genomes that are not considered to be low quality assemblies. We now use a subset of the RefSeq reasons to exclude a genome assembly to determine which proteomes to bring into UniProtKB and we give the reason(s) why a proteome is excluded from UniProtKB. We also provide two metrics to help users to assess the quality of a proteome:

  • A score obtained with the BUSCO software.
  • A score based on the number of coding sequences expected based on neighbouring species, “Complete Proteome Detector (CPD)”.

The “Complete proteome” keyword was removed from all UniProtKB entries. Individual proteomes can be retrieved from the UniProt website by their unique proteome identifier, e.g. UP000005640.

UniProt is an ELIXIR core data resource
Main funding by: National Institutes of Health

We'd like to inform you that we have updated our Privacy Notice to comply with Europe’s new General Data Protection Regulation (GDPR) that applies since 25 May 2018.

Do not show this banner again
单机德州扑克中文版