The Predictive Toxicology Evaluation Challenge
Can an AI program participate in scientific discovery?
Prevention of environmentally-induced cancers is a health issue of
unquestionable importance, and requires an
understanding of the mechanisms of chemical carcinogenesis.
Vital to this are the rodent carcinogenicity tests conducted
within the US National Toxicology Program
by the National Institute of Environmental Health Sciences (NIEHS). This has
resulted in a large database of compounds classified as
carcinogens or otherwise. The Predictive-Toxicology Evaluation
project of the NIEHS provides the opportunity to compare
carcinogenicity predictions on previously untested chemicals.
This has resulted in two blind trials: PTE-1 (now complete)
and PTE-2 (ongoing). Predicting the carcinogenic activity
of compounds in these trials presents a formidable challenge for
programs concerned with knowledge discovery. Desirable features
of this problem are:
- Involvement in genuine scientific discovery;
- Participation in true blind trials;
- Availability of a large database with expert-certified classifications;
- Strong competition from methods used by chemists; and
- Independent adjudication by a world-reknowned scientist.
The
Predictive Toxicology Evaluation Challenge has been devised by us
to provide Machine Learning programs an opportunity to
participate in carcinogensis prediction.
A Prolog representation of the data for the
carcinogenesis problem
is available. This site provides access to the
following.
- A consortium can submit upto 10 entries. Each
entry will be given a unique identifier by us.
Submissions received after August 29, 1997 will be entered
into the challenge.
- Entries submitted on, or before
November 15, 1998 will be evaluated for chemical relevance
by Doug Bristol (US National Institute of Environmental Health Sciences).
- A consortium must be willing to provide the URL of a short
description of their entry. This
description should use the template
provided.
- We intend to submit results obtained to date to
IJCAI-99. Due acknowledgements will be made to entries
that participated.
- Classification is into one of 2 classes: carcinogenic (+)
or non-carcinogenic (-). Two test-sets are available: PTE-1
and PTE-2. Training-sets used for constructing theories must
not include compounds in the test-set chosen for prediction.
The compounds in a test set must not be used to select amongst
theories constructed with a training set.
- Theories will be evaluated along scales of accuracy
and explantory power. Accuracy of a theory is
defined in the usual manner ie. (Tp+Tn)/Total where Tp,Tn
are the True Positives and True Negatives predicted on the test-set
and Total is the total number of compounds in the test-set.
Explanatory power of a theory is initially a boolean property that
is true if some or all of the theory can be drawn as chemical
substructures. This will later be amended by us to incorporate the evaluation
of the expert chemist.
- We reserve the right to amend any errors that may be brought
to light either in these rules and conditions, or in any other
pages comprising this site.
Summary of results so far: PTE-1
Summary of results so far: PTE-2
Machine
Learning at the Computing Laboratory