Discussion:
[BlueObelisk-discuss] List of chemical names and identifiers
Peter Murray-Rust
2016-03-08 18:43:21 UTC
Permalink
I am now enhancing contentmine.org to search the daily literature and need
dictionaries to match names. I don't mind false negatives so am simply
matching names, but they should have identifiers.

The format would simply be:
name="ethyl acetate" id="a123"

Peter Ertl has supplied a list of > 10000 wikipedia entries + Inchis
(thanks).

I would ideally like classifications such as:
drugs (INNs)
pesticides
herbicides
etc.
as that will help to get non chemists interested.

The size of dictionaries can be between 100 and 100K approx.

Is there a simple way to get such a list out of ChEBI?

P.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Zaharevitz, Daniel (NIH/NCI) [E]
2016-03-08 18:47:24 UTC
Permalink
I should be able to give you our list of approved (U.S.) oncology agents. I think I can also give you our list of oncology investigational agents.

DanZ


/***************************************
* Daniel Zaharevitz, PhD
* Chief, Information Technology Branch
* Developmental Therapeutics Program
* DCTD, NCI
* ***@mail.nih.gov
***************************************/


From: Peter Murray-Rust <***@cam.ac.uk<mailto:***@cam.ac.uk>>
Date: Tuesday, March 8, 2016 at 1:43 PM
To: BlueObelisk-Discuss <blueobelisk-***@lists.sourceforge.net<mailto:blueobelisk-***@lists.sourceforge.net>>
Subject: [BlueObelisk-discuss] List of chemical names and identifiers

I am now enhancing contentmine.org<http://contentmine.org> to search the daily literature and need dictionaries to match names. I don't mind false negatives so am simply matching names, but they should have identifiers.

The format would simply be:
name="ethyl acetate" id="a123"

Peter Ertl has supplied a list of > 10000 wikipedia entries + Inchis (thanks).

I would ideally like classifications such as:
drugs (INNs)
pesticides
herbicides
etc.
as that will help to get non chemists interested.

The size of dictionaries can be between 100 and 100K approx.

Is there a simple way to get such a list out of ChEBI?

P.


--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Peter Murray-Rust
2016-03-08 19:24:26 UTC
Permalink
That's great. And in return we should be able to tell you papers in which
they appear on the day they are published.


On Tue, Mar 8, 2016 at 6:47 PM, Zaharevitz, Daniel (NIH/NCI) [E] <
Post by Zaharevitz, Daniel (NIH/NCI) [E]
I should be able to give you our list of approved (U.S.) oncology agents.
I think I can also give you our list of oncology investigational agents.
DanZ
/***************************************
* Daniel Zaharevitz, PhD
* Chief, Information Technology Branch
* Developmental Therapeutics Program
* DCTD, NCI
***************************************/
Date: Tuesday, March 8, 2016 at 1:43 PM
Subject: [BlueObelisk-discuss] List of chemical names and identifiers
I am now enhancing contentmine.org to search the daily literature and
need dictionaries to match names. I don't mind false negatives so am simply
matching names, but they should have identifiers.
name="ethyl acetate" id="a123"
Peter Ertl has supplied a list of > 10000 wikipedia entries + Inchis (thanks).
drugs (INNs)
pesticides
herbicides
etc.
as that will help to get non chemists interested.
The size of dictionaries can be between 100 and 100K approx.
Is there a simple way to get such a list out of ChEBI?
P.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Geoffrey Hutchison
2016-03-09 13:56:03 UTC
Permalink
Post by Zaharevitz, Daniel (NIH/NCI) [E]
I should be able to give you our list of approved (U.S.) oncology agents. I think I can also give you our list of oncology investigational agents.
Aren't most of these in the MESH database? (https://www.nlm.nih.gov/mesh/)

-Geoff
Peter Murray-Rust
2016-03-09 14:19:05 UTC
Permalink
Post by Zaharevitz, Daniel (NIH/NCI) [E]
I should be able to give you our list of approved (U.S.) oncology agents.
I think I can also give you our list of oncology investigational agents.
Aren't most of these in the MESH database? (https://www.nlm.nih.gov/mesh/)
Sure - I was wanting a simple linear list of chemicals. would there be a
simple way of extracting these?
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Egon Willighagen
2016-03-09 14:39:34 UTC
Permalink
Peter,
Post by Peter Murray-Rust
I am now enhancing contentmine.org to search the daily literature and need
dictionaries to match names. I don't mind false negatives so am simply
matching names, but they should have identifiers.
name="ethyl acetate" id="a123"
For WikiPathways were use BridgeDb, and I take care of the metabolite
identifier mapping database, which includes names and synonyms. The
latest BridgeDb file is at Figshare [0], but I guess you want to above
format, right?

But which identifier do you want? Any? Does the name come from the
database matching the identifier?

I have been using HMDB, ChEBI, and since almost a year Wikidata...

Egon

0.https://figshare.com/articles/Metabolite_BridgeDb_ID_Mapping_Database_20160113_/3083842/1
--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286
ImpactStory: https://impactstory.org/EgonWillighagen
Egon Willighagen
2016-03-09 14:45:17 UTC
Permalink
On Wed, Mar 9, 2016 at 3:39 PM, Egon Willighagen
Post by Egon Willighagen
I have been using HMDB, ChEBI, and since almost a year Wikidata...
So, awaiting some context for the identifier, this is the SPARQL to
get English labels:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?compound ?label WHERE {
?compound wdt:P31 wd:Q11173 ;
rdfs:label ?label .
FILTER (lang(?label) = "en")
} LIMIT 5

At https://query.wikidata.org/ and you can save as in several formats...

P31 is "instance of" and Q11173 is "chemical compound".

Egon
--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286
ImpactStory: https://impactstory.org/EgonWillighagen
Loading...