Loading Calculator...
Please wait a moment
Please wait a moment
Calculate p-values from z-scores and test statistics for hypothesis testing. Determine statistical significance for one-tailed and two-tailed tests instantly.
0.015179
This reference table shows commonly used z-scores and their corresponding p-values for both one-tailed and two-tailed hypothesis tests. These values are derived from the standard normal distribution.
| Z-Score | One-Tailed P-Value | Two-Tailed P-Value | Significance |
|---|---|---|---|
| 0.00 | 0.5000 | 1.0000 | Not significant |
| 0.50 | 0.3085 | 0.6171 | Not significant |
| 0.84 | 0.2005 | 0.4009 | Not significant |
| 1.00 | 0.1587 | 0.3173 | Not significant |
| 1.28 | 0.1003 | 0.2005 | Not significant |
| 1.50 | 0.0668 | 0.1336 | Not significant |
| 1.645 | 0.0500 | 0.1000 | Significant (one-tailed at 0.05) |
| 1.96 | 0.0250 | 0.0500 | Significant (two-tailed at 0.05) |
| 2.00 | 0.0228 | 0.0455 | Significant at 0.05 |
| 2.326 | 0.0100 | 0.0200 | Significant at 0.01 (one-tailed) |
| 2.50 | 0.0062 | 0.0124 | Significant at 0.01 (one-tailed) |
| 2.576 | 0.0050 | 0.0100 | Significant at 0.01 (two-tailed) |
| 3.00 | 0.0013 | 0.0027 | Highly significant |
| 3.291 | 0.0005 | 0.0010 | Significant at 0.001 (two-tailed) |
| 3.50 | 0.0002 | 0.0005 | Highly significant |
| 3.891 | 0.00005 | 0.0001 | Extremely significant |
| 4.00 | 0.00003 | 0.00006 | Extremely significant |
| 5.00 | < 0.000001 | < 0.000001 | Beyond any doubt |
A p-value (probability value) is one of the most widely used concepts in statistical hypothesis testing. It quantifies the strength of evidence against a null hypothesis by measuring the probability of observing data as extreme as, or more extreme than, the results actually obtained, assuming the null hypothesis is true. The null hypothesis typically represents a default position—for example, that a new drug has no effect, or that two groups have equal means.
When you perform a hypothesis test, you calculate a test statistic (such as a z-score or t-statistic) from your data. This test statistic measures how far your observed results deviate from what the null hypothesis predicts. The p-value then translates this test statistic into a probability using a known distribution, most commonly the standard normal distribution for z-tests. A z-score of 1.96, for instance, corresponds to a two-tailed p-value of approximately 0.05, meaning there is only a 5% chance of seeing results this extreme if the null hypothesis is true.
Researchers compare the p-value to a predetermined significance level, denoted as alpha. The most common alpha value is 0.05, though 0.01 and 0.001 are also widely used depending on the field and the consequences of making errors. If the p-value falls below alpha, the result is declared "statistically significant," and the null hypothesis is rejected. If the p-value exceeds alpha, the null hypothesis is not rejected—though this does not prove the null hypothesis is true.
It is critical to understand what a p-value is not. A p-value is not the probability that the null hypothesis is true. It is not the probability that the results occurred by chance. And a statistically significant p-value does not necessarily imply practical importance. A study with a very large sample size can produce a significant p-value for a trivially small effect. For this reason, modern statistical practice recommends reporting p-values alongside effect sizes and confidence intervals to give a complete picture of the findings.
Two-tailed: p = 2 × (1 − Φ(|z|))
Right-tailed: p = 1 − Φ(z)
Left-tailed: p = Φ(z)
Where Φ(z) is the standard normal cumulative distribution function (CDF), which gives the probability that a standard normal random variable is less than or equal to z.
Problem: A clinical trial comparing a new drug to placebo yields a z-score of 2.50. Is the result significant at the 0.05 level using a two-tailed test?
Problem: A manufacturing process improvement is tested. The hypothesis is that the new process increases yield. The z-score is 1.80. Is this significant at 0.05 using a right-tailed test?
Problem: A company tests whether a new supplier reduces costs. The z-score is −2.10. Is this significant at 0.01 using a left-tailed test?
For quick estimation: a z-score of 2 gives a two-tailed p-value near 0.05, a z-score of 2.6 gives roughly 0.01, and a z-score of 3.3 gives roughly 0.001. Each additional 0.6 in the z-score roughly divides the two-tailed p-value by 10.
| Significance Level (α) | One-Tailed Critical Z | Two-Tailed Critical Z | Confidence Level |
|---|---|---|---|
| 0.10 | 1.282 | 1.645 | 90% |
| 0.05 | 1.645 | 1.960 | 95% |
| 0.025 | 1.960 | 2.240 | 97.5% |
| 0.01 | 2.326 | 2.576 | 99% |
| 0.005 | 2.576 | 2.807 | 99.5% |
| 0.001 | 3.090 | 3.291 | 99.9% |
| 0.0001 | 3.719 | 3.891 | 99.99% |
| Z-Score | Two-Tailed P | Z-Score | Two-Tailed P |
|---|---|---|---|
| 0.0 | 1.0000 | 2.2 | 0.0278 |
| 0.2 | 0.8415 | 2.4 | 0.0164 |
| 0.4 | 0.6892 | 2.6 | 0.0093 |
| 0.6 | 0.5485 | 2.8 | 0.0051 |
| 0.8 | 0.4237 | 3.0 | 0.0027 |
| 1.0 | 0.3173 | 3.2 | 0.0014 |
| 1.2 | 0.2301 | 3.4 | 0.0007 |
| 1.4 | 0.1615 | 3.6 | 0.0003 |
| 1.6 | 0.1096 | 3.8 | 0.0001 |
| 1.8 | 0.0719 | 4.0 | 0.00006 |
| 2.0 | 0.0455 |
| Z-Score | Left-Tailed P | Z-Score | Left-Tailed P |
|---|---|---|---|
| -0.5 | 0.3085 | -2.0 | 0.0228 |
| -1.0 | 0.1587 | -2.5 | 0.0062 |
| -1.28 | 0.1003 | -3.0 | 0.0013 |
| -1.645 | 0.0500 | -3.5 | 0.0002 |
| -1.96 | 0.0250 | -4.0 | 0.00003 |
P-values are the cornerstone of scientific hypothesis testing, helping researchers determine whether experimental results provide genuine evidence for new discoveries or are likely due to random chance.
Clinical trials rely on p-values to decide if treatments are effective. Regulatory agencies like the FDA require statistically significant results before approving new drugs, directly affecting patient care and public health.
Companies use p-values in A/B testing to determine if changes to websites, marketing campaigns, or products lead to real improvements in conversion rates, revenue, or user engagement.
Manufacturing and engineering use p-values for quality assurance, determining whether process changes actually improve output or if observed differences fall within normal variation.
Decide on your alpha level (0.05, 0.01, etc.) before running your analysis. Choosing the threshold after seeing the results is a form of p-hacking and invalidates the statistical test.
Instead of writing "p < 0.05," report the actual p-value (e.g., p = 0.032). This gives readers more information to assess the strength of evidence and allows for different interpretive frameworks.
A statistically significant result (small p-value) does not necessarily mean the effect is large or meaningful. With very large samples, even tiny, practically irrelevant effects can produce highly significant p-values. Always report effect sizes alongside p-values.
A p-value of 0.40 does not mean there is a 40% chance the null hypothesis is true. It means the data are consistent with the null hypothesis, but also consistent with many alternative hypotheses. Absence of evidence is not evidence of absence.
If you run 20 tests at alpha = 0.05, you expect one false positive by chance alone. Use Bonferroni correction (divide alpha by the number of tests) or false discovery rate (FDR) methods when performing multiple comparisons.
Repeatedly checking p-values during data collection and stopping when significance is reached inflates the false positive rate dramatically. Determine your sample size in advance using a power analysis, and analyze the data only after collection is complete.
A p-value is the probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true. It ranges from 0 to 1. A small p-value (typically less than 0.05) indicates strong evidence against the null hypothesis, leading researchers to reject it. A p-value does not measure the probability that the null hypothesis is true or false.
A one-tailed test checks for an effect in only one direction (greater than or less than), while a two-tailed test checks for an effect in either direction. A two-tailed p-value is always double the one-tailed p-value for the same test statistic. Use a one-tailed test only when you have a strong directional hypothesis specified before collecting data.
The 0.05 threshold was popularized by Ronald Fisher in the 1920s as a convenient convention, representing a 1-in-20 chance of a false positive. It is not a magical boundary. Different fields use different thresholds: particle physics requires p < 0.0000003 (5-sigma), while exploratory social science may accept p < 0.10. The appropriate threshold depends on the costs of false positives versus false negatives.
To convert a z-score to a p-value, you look up the z-score in a standard normal distribution table or use a cumulative distribution function (CDF). For a two-tailed test, the p-value equals 2 times (1 minus the CDF of the absolute z-score). For a one-tailed right test, p equals 1 minus the CDF. For a one-tailed left test, p equals the CDF directly.
A p-value of 0.01 means there is a 1% probability of observing results as extreme as or more extreme than the current results if the null hypothesis were true. It does not mean there is a 1% chance the null hypothesis is true. It indicates strong evidence against the null hypothesis and would be considered statistically significant at both the 0.05 and 0.01 significance levels.
In theory, a p-value can never be exactly zero because there is always some nonzero probability of observing any result under the null hypothesis. However, when p-values are extremely small (such as 10 to the power of negative 15), software may round them to zero or display them in scientific notation. In practice, researchers report these as p < 0.001 rather than p = 0.
P-hacking refers to the practice of manipulating data analysis until a statistically significant p-value is found. This includes running multiple tests without correction, selectively reporting results, removing outliers post hoc, or stopping data collection once significance is reached. P-hacking inflates the false positive rate far beyond the nominal 5% and has contributed to the replication crisis in science.
Larger sample sizes produce smaller standard errors, which lead to larger test statistics for the same effect size, resulting in smaller p-values. With a very large sample, even trivially small effects can produce statistically significant p-values. Conversely, small samples may fail to detect meaningful effects. This is why researchers should report effect sizes alongside p-values for a complete picture.