Advertisements
48. Multi-Item Gamma Poisson Shrinker (MGPS) algorithm

48. Multi-Item Gamma Poisson Shrinker (MGPS) algorithm

Multi-Item Gamma Poisson Shrinker (MGPS) is based on the metaphor of the “market basket problem”, in which a database of “transactions” (adverse event reports) is mined for the occurrence of interesting (unexpectedly frequent) itemsets (e.g., simple drug-event pairings or more complex combinations of drugs and events representing interactions and/or syndromes). Interestingness is related to the factor by which the observed frequency of an itemset differs from a nominal baseline frequency. The baseline frequency is usually taken to be the frequency that would be expected under the full independence model, in which the likelihood of a given item showing up in a report is independent of what other items appear in the report.

For each itemset in the database, a relative reporting ratio RR is defined as the observed count N for that itemset divided by the expected count E. When using the independence model as the basis for computing the expected count, MGPS allows for the possibility that the database may contain heterogeneous strata with significantly different item frequency distributions occurring in the various strata. To avoid concluding that an itemset is unusually frequent just because the items involved individually all tend to occur more frequently in a particular stratum

MGPS produces Empirical Bayesian Geometric Mean (EGBM) scores. The EBGM calculation is conceptually similar to that of the RR, but incorporates Bayesian “shrinkage” and stratification to produce disproportionality scores toward the null, especially when there are limited data and small numbers of cases. 

One important difference between the RR and EBGM estimates is that in the case of RR the adverse events from the product in question do not contribute to the number of “expected” cases, while all adverse events from the product contribute to the expectation when using EBGM. 

The EBGM values are actually derived from the expectation value of the logarithm of RR under the posterior probability distributions for each true RR. 

EBGM is defined as EBGM = exponential of expectation value of log(RR). EBGM has the property that it is nearly identical to N/E when the counts are moderately large, but is “shrunk” towards the average value of N/E (typically ~1.0) when N/E is unreliable because of stability issues with small counts. The posterior probability distribution also supports the calculation of lower and upper 95% confidence limits (EB05, EB95) for the relative reporting ratio.

The statistical modifications used in the EBGM methodology diminish the effect of spuriously high RR values, thus reducing the number of false-positive safety signals. Thus, EBGM values provide a more stable estimate of the relative reporting rate of an event for a particular product relative to all other events and products in the database being analyzed.

Various commercially available software programs generate PRR and/or EBGM scores (e.g., Empirica Signal™, PVAnalyser™, MASE™ and SAS™). 

The reason why product-event combinations with small numbers of reports must be “shrunk” is made apparent in the following plot of the log10 reporting ratio (RR) vs the number of reported cases for the product-event combination. The RR represents the ratio of the number of observed cases to expected cases under the assumption of independence between products and symptoms.

Figure 1: Variance of log(R)

pastedGraphic.png

The variance of log(RR) decreases rapidly with increasing number of cases per product-event combination.

The open circles in the graph represent product-event combinations with EBGM scores greater than 5, which are clearly reporting signals. Note that with only two cases the first open circle is evident where log(RR) equals 4 or RR equals 10,000. With10 cases per product-event combination open circles appear at log(RR) equals 2 or absolute RR equals 100. With20 cases, open circles appear at log(RR) equals 1 or RR equals 10. Thus, a much lower RR is required to generate a high EBGM score with a greater number of cases.

The extreme shrinkage of EBGM with low numbers of cases is demonstrated in the “squid-like” plot below, where at n=1 a log(RR) of 4 (RR=10,000) is reduced to an EBGM score approaching 1. Slightly less shrinkage occurs at n=2; by the time n= 4, it is much easier to generate clear signals in terms of EBGM scores. As n (number of cases per product-event cases) increases in the plot from 5 to 485 cases per product-event combination, the data can be adequately represented by a single cubic curve. 

Figure 2: Extreme shrinkage of EBGM

pastedGraphic_1.png

Refer below article for statistical application of MGPS: http://ideal.ece.utexas.edu/courses/ee379k/papers/drug-safety.pdf

Advertisements

Leave a Reply

error: Content is protected !!