Background


An increasing number of proteic molecules have been identified as either allergens or non-allergens during the past 20 years. Currently, there is no consensus among different Web-based databases because they have individually gathered different numbers of so-called “authentic” allergens and, in some cases, have even incorrectly included some non-allergens as “allergens”. Collecting, organizing, and displaying veritable data that is reported in the scientific literature and having them interactively served to draw unanimous conclusions is becoming a major concern of many investigators who rely on this knowledge to evaluate the allergenicity of their starting materials.




Motivation


The mission of this study is to facilitate research by providing all-inclusive but redundant allergens and analysis platform to the research community through the curated Allergenia database, website and repository (http://allergenia.gzhmu.edu.cn)




Features


Allergenia Allergen Database strives for convenient utilization and knowledge sharing. All the resources of the database are open to the public and the scientific researchers without any reservation. Just by one click on the “Browse Database”, all the allergens would be retrieved and displayed in the screen. As for a certain allergen, the source page would be redirected and more detained information would be obtained by click on Allergen Code. Anyway, most databases would provide this kind of information.

The differential typical characteristics of Allergenia database comparing to other ones is as follows: Besides the sequences and recognized code of each allergen listed in Allergenia, their detailed status, such as whether being labeled as allergen or not in Uniprot database; whether being collected or not as allergens in Allergome database / IUIS database; whether the IgE binding evidence is sufficient or not; whether the published literature(s) has definitely recorded the query sequence(s) as allergens, and so on, were also provided for each allergen, thus individually positioned all the allergens in a three-dimensional way.

The differential typical characteristics of Allergenia database comparing to other ones is as follows: Besides the sequences and recognized code of each allergen listed in Allergenia, their detailed status, such as whether being labeled as allergen or not in Uniprot database; whether being collected or not as allergens in Allergome database / IUIS database; whether the IgE binding evidence is sufficient or not; whether the published literature(s) has definitely recorded the query sequence(s) as allergens, and so on, were also provided for each allergen, thus individually positioned all the allergens in a three-dimensional way.

After extensive and thorough investigation, we gathered 2108 authentic allergenic protein sequences and eliminated any redundant sequences. This collection exceeds other allergen databases such as IUIS, FARRP, Allergome, etc. IUIS and FARRP each contain fewer allergen sequences and include some redundant sequences, and, while the Allergome database includes a large number of sequences, it also contains a large number of redundant sequences and a few false “allergen” sequences.

At this time, FARRP (allergenonline), IUIS, and Allergenome, etc. are the most widely used allergen website databases. Researchers would encounter some faults and/or limitations when surf on these databases. Our Allergenia database substantially outperforms these comprehensive allergen databases and by far exceeds the more narrow databases such as the InformAll Allergenic Food Database.

The following four important aspects are also featured in the Allergenia database: 1) the highest number veritable allergens are included with the authenticity of each allergen sequence individually verified; 2) no false allergens (non-allergens) are included; 3) only non-redundant allergenic sequences are listed; 4) No entry listed without amino acid sequence.

In order to assess the allergenicity of proteins that may be modified and/or directly introduced into foods/drugs by genetic engineering, the Allergenia database strives to provide a practical service that allows for researchers to efficiently utilize the sequence search results.

The practical advantage of the Allergenia website lies in its ability to search multiple-sequences with a single click, which retrieves the results in a vertical column format and displays the exact-matched fragment sequences on the left side of each corresponding allergen accession number.

Even more, the Allergenia website provides two kinds of sequence search services. One is a routine BLAST search of allergen sequences that omits the display of results that may have matching identities to short fragments. Furthermore, the readers can choose the level for the search threshold, which is thus able to retrieve results with different matching levels, i.e. the higher the matching identity requested, the longer the exact matched sequence needed. If no matching level is chosen, the website is directed to retrieve the sequence with a default threshold of 100% identity to the full-length sequence.

The other service is especially adapted to the FAO/WHO allergen rules. By using this special search, the Allergenia website displays different identity levels in 5 group columns: Column 1, a minimum of 35% sequence identity over sliding windows of 80 amino acid segments of each query protein with subject allergens; Column 2, identities less than 35% over sliding windows of 80 amino acid segments of each query protein, but with at least one segment identity of eight or more contiguous amino acids; Column 3, identities less than 35% for sliding windows of 80 amino acid segments of each query protein, but with at least one maximum segment of seven contiguous amino acids; Column 4, identities less than 35% for sliding windows of 80 amino acid segments of each query protein, but at least one maximum segment of an exact six amino acid match; Column 5, identities less than 35% for sliding windows of 80 amino acid segments of each query protein, but with no more than a five amino acid exact match. This design was made on the basis of recent studies demonstrating that matches of only 5 contiguous amino acids can lead to cross-reactivity between non-homologous allergens.

This type of searches using the Allergenia procedures clearly demonstrates all the possibilities of cross-reactivity. Users can select and rely on individual results for further analysis. Currently, no other database or website provides such detailed results.

In conclusion, the Allergenia database provides the most abundant yet non-redundant resource for finding actual allergen sequences without containing any non-allergen sequences. In addition, the Allergenia website provides a friendly interactive interface for initial bioinformatic evaluation of the safety and allergenicity of proteins that might be modified for or directly taken into the human body. Therefore, it would be more desirable and used more often than any other allergen database by the researchers who do research on proteins for human food and/or medicine, and especially now, in the era of the one-vote veto for safety considerations.