Fisher’s method vs. min (after multiple comparison’s correction)

Fisher’s method vs. min (after multiple comparison’s correction)#

Model for simulations (alternative)#

We have fit a stochastic block model to the left and right hemispheres. Say the probabilities of group-to-group connections on the left are stored in the matrix \(B\), so that \(B_{kl}\) is the probability of an edge from group \(k\) to \(l\).

Let \(\tilde{B}\) be a perturbed matrix of probabilities. We are interested in testing \(H_0: B = \tilde{B}\) vs. \(H_a: ... \neq ...\). To do so, we compare each \(H_0: B_{kl} = \tilde{B}_{kl}\) using Fisher’s exact test. This results in p-values for each \((k,l)\) comparison, \(\{p_{1,1}, p_{1,2}...p_{K,K}\}\).

Now, we still are after an overall test for the equality \(B = \tilde{B}\). Thus, we need a way to combine p-values \(\{p_{1,1}, p_{1,2}...p_{K,K}\}\) to get an overall p-value for our test comparing the stochastic block model probabilities. One way is Fisher’s method; another is to take the minimum p-value out of a collection of p-values which have been corrected for multiple comparisons (say, via Bonferroni or Holm-Bonferroni).

To compare how these two alternative methods of combining p-values work, we did the following simulation:

Let \(t\) be the number of probabilities to perturb.
Let \(\delta\) represent the strength of the perturbation (see model below).
For each trial:
- Randomly select \(t\) probabilities without replacement from the elements of \(B\)
- For each of these elements, \(\tilde{B}_{kl} = TN(B_{kl}, \delta B_{kl})\) where \(TN\) is a truncated normal distribution, such that probabilities don’t end up outside of [0, 1].
- For each element not perturbed, \(\tilde{B}_{kl} = B_{kl}\)
- Sample the number of edges from each block under each model. In other words, let \(m_{kl}\) be the number of edges in the \((k,l)\)-th block, and let \(n_k, n_l\) be the number of edges in the \(k\)-th and \(l\)-th blocks, respectively. Then, we have
  
  \[m_{kl} \sim Binomial(n_k n_l, B_{kl})\]
  
  and likewise but with \(\tilde{B}_{kl}\) for \(\tilde{m}_{kl}\).
- Run Fisher’s exact test to generate a \(p_{kl}\) for each \((k,l)\).
- Run Fisher’s method for combining p-values, or take the minimum p-value after Bonferroni correction.
These trials were repeated for \(\delta \in \{0.1, 0.2, 0.3, 0.4, 0.5\}\) and \(t \in \{25, 50, 75, 100, 125\}\). For each \((\delta, t)\) we ran 100 replicates of the model/test above.

P-values under the null#

_images/9bfedb2ea5e8ab27143321ee3cf4644876e76b1e41429d944c8801f61bcad82e.png — Distributions of p-values under the null for Fisher’s method (left) and the Min method (right) from a simulation with 100 resamples under the null. Dotted line indicates the CDF of a \(Uniform(0,1)\) random variable. The p-values in the upper left of each panel is for a 1-sample KS test, where the null is that the variable is distributed \(Uniform(0,1)\) against the alternative that its CDF is larger than that of a \(Uniform(0,1)\) random variable (i.e. that it is superuniform). Note that both methods appear empirically valid, but Fisher’s appears highly conservative.#

P-values under the alternative#

Perturb sizes: [0.1 0.2 0.3 0.4 0.5]
Perturb number range: [ 25  50  75 100 125]
Number of runs: 2500

Show code cell source Hide code cell source

RERUN_SIM = False
save_path = Path(
    "/Users/bpedigo/JHU_code/bilateral/bilateral-connectome/results/"
    "outputs/compare_sbm_methods_sim/results.csv"
)

if RERUN_SIM:
    t0 = time.time()
    mean_itertimes = 0
    n_time_first = 5
    progress_steps = 0.05
    progress_counter = 0
    last_progress = -0.05
    simple_rows = []
    example_perturb_probs = {}
    for perturb_size in perturb_size_range:
        for n_perturb in n_perturb_range:
            for sim in range(n_sims):
                itertime = time.time()

                # just a way to track progress
                progress_counter += 1
                progress_prop = progress_counter / n_runs
                if progress_prop - progress_steps > last_progress:
                    print(f"{progress_prop:.2f}")
                    last_progress = progress_prop

                # choose some elements to perturb
                currtime = time.time()
                perturb_probs = base_probs.copy()
                choice_indices = rng.choice(
                    len(perturb_probs), size=n_perturb, replace=False
                )

                # pertub em
                for index in choice_indices:
                    prob = base_probs[index]

                    new_prob = -1
                    while new_prob <= 0 or new_prob >= 1:
                        new_prob = rng.normal(prob, scale=prob * perturb_size)

                    perturb_probs[index] = new_prob

                if sim == 0:
                    example_perturb_probs[(perturb_size, n_perturb)] = perturb_probs

                perturb_elapsed = time.time() - currtime

                # sample some new binomial data
                currtime = time.time()

                base_samples = binom.rvs(ns, base_probs)
                perturb_samples = binom.rvs(ns, perturb_probs)
                sample_elapsed = time.time() - currtime

                currtime = time.time()

                # test on the new data
                def tester(cell):
                    stat, pvalue = binom_2samp(
                        base_samples[cell],
                        ns[cell],
                        perturb_samples[cell],
                        ns[cell],
                        null_odds=1,
                        method="fisher",
                    )
                    return pvalue

                pvalue_collection = np.vectorize(tester)(np.arange(len(base_samples)))
                pvalue_collection = np.array(pvalue_collection)
                n_overall = len(pvalue_collection)
                pvalue_collection = pvalue_collection[~np.isnan(pvalue_collection)]
                n_tests = len(pvalue_collection)
                n_skipped = n_overall - n_tests
                test_elapsed = time.time() - currtime

                # combine pvalues
                currtime = time.time()
                row = {
                    "perturb_size": perturb_size,
                    "n_perturb": n_perturb,
                    "sim": sim,
                    "n_tests": n_tests,
                    "n_skipped": n_skipped,
                }
                for method in ["fisher", "min"]:
                    row = row.copy()
                    if method == "min":
                        overall_pvalue = min(pvalue_collection.min() * n_tests, 1)
                        row["pvalue"] = overall_pvalue
                    elif method == "fisher":
                        stat, overall_pvalue = combine_pvalues(
                            pvalue_collection, method="fisher"
                        )
                        row["pvalue"] = overall_pvalue

                    row["method"] = method
                    simple_rows.append(row)

                combine_elapsed = time.time() - currtime

                if progress_counter < n_time_first:
                    print("-----")
                    print(f"Perturb took {perturb_elapsed:0.3f}s")
                    print(f"Sample took {sample_elapsed:0.3f}s")
                    print(f"Test took {test_elapsed:0.3f}s")
                    print(f"Combine took {combine_elapsed:0.3f}s")
                    print("-----")
                    iter_elapsed = time.time() - itertime
                    mean_itertimes += iter_elapsed / n_time_first
                elif progress_counter == n_time_first:
                    projected_time = mean_itertimes * n_runs
                    projected_time = datetime.timedelta(seconds=projected_time)
                    print("---")
                    print(f"Projected time: {projected_time}")
                    print("---")

    total_elapsed = time.time() - t0

    print("Done!")
    print(f"Total experiment took: {datetime.timedelta(seconds=total_elapsed)}")
    results = pd.DataFrame(simple_rows)

    results.to_csv(save_path)
else:
    results = pd.read_csv(save_path, index_col=0)

_images/50f92e74874f3b8a667a70901250fd15d536d931e21111854d1917a4f20d5dca.png — p-values under the alternative for two different methods for combining p-values: **Fisher’s method** (performed on the *uncorrected* p-values) and simply taking the minimum p-value after Bonferroni correction (here, called **Min**). The alternative is specified by changing the number of probabilities which are perturbed (x-axis in each panel) as well as the size of the perturbations which are done to each probability (panels show increasing perturbation size). Dotted and dashed lines indicate significance thresholds for \(\alpha = \{0.05, 0.005\}\), respectively. Note that in this simulation, even for large numbers of small perturbations (i.e. upper left panel), the Min method has smaller p-values. Fisher’s method displays smaller p-values than Min only when there are many (>50) large perturbations, but by this point both methods yield extremely small p-values.#

Power under the alternative#