In daily research, there is a great need to retrieve genes related to diverse terms through queries, such as which genes are related to certain diseases or specific biological functions. Generally, researchers may combine multiple terms as a search and query in PubMed; however, due to the huge amount of literature and gene synonyms, manual retrieval of genes related to a certain topic in PubMed is exceedingly laborious and time-consuming.
Here, we present CooLGeN, a text-mining server designed to identify human genes related to various topics from PubMed/MEDLINE or GeneRIF, as well as interactions among the resulting genes, that greatly saves time and obtains more results than manual queries. The databases of CooLGeN are updated daily to keep pace with PubMed. Like PubMed, CooLGeN allows research-ers to perform complex Boolean searches for any terms in the abstracts, sentences, or combination thereof while limiting publication year, impact factor, and inputted genes.
The server automatically downloads the newest abstracts from the PubMed FTP site and updates the database on a daily basis. Human genes and molecular interactions that appear in PubMed abstracts are recognized by our rule-based methods (GenCLiP 2.0, Bioinformatics, 2014). For GeneRIF sentences, gene names are recognized based on the assigned gene ID. CooLGeN contains three main web interfaces: an input page, a result page and a gene network view page. CooLGeN allows any free term query and retrieves the co-occurrence of search terms and genes (all human genes or inputted genes) in the literature. Users can filter PubMed abstracts by limiting the publication year and impact factor of the journal. Generally, users can: (i) build the search using Boolean operators (AND, OR, NOT) to combine any terms; (ii) tag terms with a search field “[A]” or “[S]” to query the PubMed abstract or sentence, respectively; and (iii) tag a gene symbol with “[G]” as a specific query to obtain its curated PPIs (HPRD, CORUN, BioGRID and IntAct), text-mined interactions and co-occurring genes.
The result page lists all co-occurring genes and presents the corresponding literature with highlighted information. Users can select resulting genes and add other known related genes to construct a comprehensive network.
- Boolean operators: AND, OR, NOT. Boolean operators must be entered in uppercase letters. The space is treated as AND operator. Boolean logic refers to the logical relationships among search terms. Boolean operators are processed from left to right. Use parentheses to nest terms together so they will be processed as a unit.
- Search field tags:[S], [A], [G], [GS], [GA]. [S] and [A] are used to search any terms in the literature, [S] means search terms in sentence, [A] means search terms in abstract. [G], [GS] and [GA] means the term will be considered as a gene symbol, all the synonyms of this gene will be search. [G]/[GS] means search gene in sentence, [GA] means search gene in abstract. The search field tag must follow the term. At most, you can use four tags.
Multiple free terms that are wish to search in same field are recommended to bring into one search string and then tag with the search field. The absence of a search tag will be considered as searching free text in sentence.
- When input one gene symbol and tag [G] or [GS], its curated PPIs, text-mined interactions and co-occurring genes will be showed.
- When you select to search in GeneRIF sentence, [A] and [GA] is not work, and ignored.
- Enclosing the phrase in double quotes, such as “cancer stem cell”.
- Escaped characters: [ ] ( ) “”, should be used in pairs.
- The length of search term is limited in 3~500 characters.
- The result for too general terms may not be showed, such as “cell”, “protein”, “gene”, and so on.
1. Single term search
- autophagy [S]
- Search for genes that co-occur with “autophagy” in sentence.
- autophagy [A]
- Search for genes that co-occur with “autophagy” in abstract while search in MEDLINE.
- EZH2 [G]
- EZH2 [GS]
- Search for genes that co-occur with EZH2 and its synonyms in sentence. Also, this search returns text-mined interactions and curated PPIs of EZH2.
- EZH2 [GA]
- Search for genes that co-occur with EZH2 and its synonyms in PubMed abstract.
2. Phrase search
- “nasopharyngeal carcinoma”
- Search for genes that co-occur with the phrase “nasopharyngeal carcinoma” in sentence.
3. Boolean search
- metastasis “nasopharyngeal carcinoma”
metastasis AND “nasopharyngeal carcinoma”
- Two queries produce the same results which are genes that co-occur with “metastasis” and the phrase “nasopharyngeal carcinoma” in sentence.
NPC OR “nasopharyngeal carcinoma”
- Search for genes that co-occur with NPC or the phrase “nasopharyngeal carcinoma” in sentence.
NPC NOT “Niemann Pick type C”
- Search for genes that co-occur with “NPC” but not with the phrase “Niemann Pick type C” in sentence.
4. Query grouped terms
- (“nasopharyngeal carcinoma” OR NPC ) AND (“metastasis” OR “invasion”)
- Search for genes that co-occur with “nasopharyngeal carcinoma” or “NPC” and “metastasis” or “invasion” in sentence.
5. Query tagged and grouped terms
(glioma OR gliomas) [A] (cancer OR tumor) AND (“stem cell” OR “stem cells”) [S]
- Search for genes that co-occur with “glioma” or “gliomas” in PubMed abstract, and “stem cell” or “stem cells” in PubMed sentence.
- This case is the default example that is showed in the input box, just click the search button.
target [S] MIR27A [G]
- Search for genes that co-occur with “target” and “MIR27A” and its synonyms in sentence.
- While query target MIR27A, it will not consider the synonyms of EZH2.
- Use “Filter journals”, to extract the result from high impact and the newest papers.
- While searching in sentence, select “Co-occur with interaction words in sentence”, the relationship between gene and terms may be more stronger.
- Use “Input genes”, to find related genes in your gene list, or to exclude related genes that your have known.
- Known related genes that can not be found in the literature, can be added into gene network if they have connections with other selected genes.
1. Main page
2. Search topics related genes
3. Search gene-gene association
4. Input Gene
5. Result gene and literature page – topics-gene associations
6. Result gene and literature page – gene-gene associations
7. Download genes/PMIDs or create gene network
8. Gene network
Interactive visualization: the nodes and edges can be moved, highlighted, and deleted.