Cancer Hallmarks Analytics Tool help

The Cancer Hallmarks Analytics Tool (CHAT) allows users to quickly and easily analyze the strength of association between terms of interest and the hallmarks of cancer (Hanahan and Weinberg 2011; PubMed, PDF) in the literature based on a text-mining analysis of 26 million PubMed abstracts.

Quickstart

To analyze the association of any term with the hallmarks of cancer, simply enter the term into the "Query" text box on the CHAT homepage and press "Submit".

CHAT query form

The resulting screen will show a chart summarising the association of the query term (below "p53") with the various hallmarks of cancer.

CHAT query result

Comparison

To compare the strength of association between two terms with the hallmarks of cancer, click on the "Compare" tab on the CHAT homepage, enter the two terms in the "Query" text boxes and press "Submit".

CHAT comparison form

CHAT will then show a chart comparing the strength of association of the two terms (below "Mdm2" and "mTOR") with the hallmarks of cancer.

CHAT comparison result

Metrics

CHAT supports the following metrics for assessing the strength of association between a hallmark (h) and a query (q):

To change the metric to apply, use the Metric selector found in both the Search and Compare tabs as well as on the top of their respective result pages. For interpretation of the metrics, see the following sections.

Count

The value c(h,q) of the count metric for a hallmark h and a query q is simply the number of contexts where both h and q occur. While the count metric is easy to understand, direct comparisons of count values can be misleading because raw co-occurrence counts do not normalize over the number of times that h and q occur overall.

For example, if hallmark h1 is discussed in the literature more often than h2 (e.g. Invasion and metastasis compared to Replicative immortality), then a query q may be more strongly (statistically) associated with h2 than h1 even if c(h1,q) is greater than c(h2,q): the larger co-occurrence count may simply reflect h1 being more frequent.

Similarly, for two queries q1 and q2, if q1 occurs more frequently in the literature than q2 (e.g. "TP53" vs. "BRCA2"), c(h,q1) > c(h,q2) does not necessarily indicate a stronger association of hallmark h with q1 than with q2.

Conditional probability

The conditional probability p(h|q) of a hallmark h occurring in a context where a query term q has occurred is defined as the co-occurrence count c(h,q) divided by the overall count of the query term c(q).

Unlike count (see above), the cprob metric takes into account the overall frequency of the query term, normalizing by this value. This allows more meaningful comparison of cprob values across queries where the terms are not discussed in the literature with similar frequency.

For example, this comparison shows that "autotaxin" has a much higher cprob value with Invasion and metastasis than "Akt" does, despite the latter much more common term having a higher co-occurrence count with this hallmark (see here)

Pointwise mutual information

Pointwise mutual information pmi(h,q) is an information-theoretic metric of association strength that measures how often a hallmark h and a query q occur together compared to how often they would be expected to co-occur if their occurrences were independent (i.e. if they only co-occurred at random).

pmi(h,q) is defined as base-2 logarithm (measuring information in bits) of of the probability of co-occurrence p(h,q) divided by the product of the occurrence probabilities p(h)p(q). These probabilities are derived directly from the counts (see above).

pmi accounts for the overall frequency of both the query and the hallmark and its values are interpretable in that pmi(h,q) = 0 when p and q are independent, > 0 when they co-occur more often than expected from random co-occurrence, and < 0 when less often (negative association).

Normalized pointwise mutual information

While a pmi value of zero indicates no association (see above), values of the metric for positive associations can be difficult to interpret and compare as the maximum pmi(h,q) value increases when the query terms q and hallmarks h become less common (see e.g. Manning and Schütze 1999, Sec 5.4).

The npmi metric normalizes pmi by dividing by the negative log probability of co-occurrence -log2 p(h,q), which constrains its values to the range [-1, 1].

npmi maintains the benefits of pmi but is also more easily interpreted and compared. Regardless of the overall frequency of h and q, a npmi(h,q) value of -1 indicates no co-occurrence, 0 a frequency of co-occurrence matching random (no association, as for pmi) and 1 complete co-occurrence (perfect association).

Chart types

CHAT supports a variety of chart types for visualizing results, including bar, polar and radar charts. To change the chart type, use the Chart selector found in the Search tab and its result page. (For comparisons, only column charts are supported.)

Hallmark hierarchy

CHAT organizes cancer-related concepts using a taxonomy based on the hallmarks of cancer (Hanahan and Weinberg 2000; 2011) and supports two different granularities of analysis:

The full hallmarks provide finer granularity while the top hallmarks feature greater reliability to their higher frequency.

To change the granularity, use the Hallmarks selector found in the Search and Compare tabs and on the result pages.

Documents and annotations

To "drill down" from a chart view to the documents and annotations serving as its source data, click on the relevant part of the chart.

Drilling down to "p53" and "resisting cell death"

For example, after querying for "p53", clicking on the segment of the chart denoting "resisting cell death" shows summaries of PubMed documents relating to the association of p53 with resisting cell death.

Document summaries for "p53" and "resisting cell death"

Clicking further on any of the titles gives a visualization of all hallmark annotations for that PubMed document.

Annotation view for document (PMID 11574421)