Discrete distributions
Uniform Distribution
A uniform distribution is a probability distribution that assigns equal probability to every outcome. It is a special case of a discrete probability distribution, in which the probability of each outcome is the same. In a uniform distribution, all values have the same probability of occurring. This means that the probability of any given value is 1/n, where n is the number of possible values. For example, if there are five possible values, each value will have a probability of 1/5. Uniform distributions are used to model situations where all outcomes are equally likely. For example, if you are rolling a fair six-sided die, the probability of each outcome (1-6) is 1/6. This is an example of a uniform distribution, because each outcome has the same probability of occurring.
Bernoulli Distribution
A Bernoulli distribution is a probability distribution that comes up when you have a single, dichotomous variable. For example, if you have a coin and you want to know the probability of getting heads, you can use a Bernoulli distribution. The Bernoulli distribution is the probability distribution of the sum of the successes in a sequence of independent trials, each with a single outcome that has only two possible values, such as “success” and “failure”.
Binomial Distribution
Binomial distribution is a statistical probability distribution that describes the number of successes in a fixed number of independent trials. The distribution is used to model the probability of obtaining a certain number of successes (or failures) in a given number of independent experiments, where each experiment has only two possible outcomes (success or failure) and the probability of success is constant for all experiments.
The binomial distribution is characterized by two parameters: n and p, where n is the total number of trials and p is the probability of success in each trial. The probability of obtaining exactly k successes in n trials is given by the following formula:
P(X=k) = (n choose k) * p^k * (1-p)^(n-k)
where “n choose k” is the binomial coefficient, which represents the number of ways to choose k items from a set of n items. The formula can also be written as:
P(X=k) = C(n,k) * p^k * (1-p)^(n-k)
where C(n,k) denotes the binomial coefficient.
The mean and variance of the binomial distribution are given by:
Mean = np Variance = np(1-p)
The binomial distribution is widely used in many fields, including biology, psychology, economics, and engineering, to model phenomena such as the success or failure of drug trials, the probability of defects in manufactured products, and the outcome of marketing campaigns.
Poisson Distribution
The Poisson distribution is a statistical probability distribution that describes the number of rare events that occur in a fixed interval of time or space. It is used to model the probability of a certain number of events occurring in a fixed period of time, given the average rate at which the events occur.
The Poisson distribution is characterized by a single parameter, lambda (λ), which represents the average rate at which the events occur. The probability of observing k events in a fixed interval of time is given by the following formula:
P(X=k) = (e^(-λ) * λ^k) / k!
where e is the mathematical constant approximately equal to 2.71828, and k! is the factorial of k (i.e., the product of all positive integers up to k).
The mean and variance of the Poisson distribution are both equal to λ.
The Poisson distribution is widely used in many fields, including physics, biology, economics, and engineering, to model phenomena such as the number of radioactive decays, the number of errors in a manufacturing process, the number of customer arrivals at a service center, and the number of accidents on a highway. It is particularly useful for modeling rare events, where the average rate of occurrence is low but the variability is high.
Continuous distributions
Normal Distribution
The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution that describes a wide range of natural phenomena, such as height, weight, IQ, and scores on standardized tests. It is characterized by its bell-shaped curve, with the mean, median, and mode all located at the center of the distribution.
The normal distribution is parameterized by two parameters: the mean (μ) and the standard deviation (σ). The probability density function (PDF) of the normal distribution is given by:
f(x) = (1 / (σ * sqrt(2π))) * e^(-((x-μ)^2) / (2 * σ^2))
where x is the random variable, σ is the standard deviation, μ is the mean, π is the mathematical constant approximately equal to 3.14159, and e is the mathematical constant approximately equal to 2.71828.
The normal distribution has several important properties that make it a useful tool in statistical analysis, including the following:
- The total area under the curve is equal to 1, which means that the probability of any possible outcome is always between 0 and 1.
- The distribution is symmetric around the mean, with 50% of the data lying above the mean and 50% lying below.
- The area under the curve between any two values of x represents the probability that the random variable falls within that range.
- The shape of the distribution is determined entirely by the mean and standard deviation, which makes it easy to compare and analyze data from different sources.
The normal distribution is widely used in many fields, including finance, engineering, physics, and social sciences, to model phenomena such as stock prices, test scores, and physical measurements. Many statistical methods, such as hypothesis testing and confidence intervals, assume that the data follows a normal distribution, which makes it an important concept in statistical analysis.
Standard Normal Distribution
The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. It is also called the standard Gaussian distribution, and it is a special case of the normal distribution.
To convert any normal distribution to a standard normal distribution, we use a process called standardization, which involves subtracting the mean from each data point and then dividing by the standard deviation. This results in a new set of data with a mean of 0 and a standard deviation of 1.
The standard normal distribution is often denoted by the letter Z and the probability density function of the standard normal distribution is given by:
f(z) = (1 / sqrt(2π)) * e^(-z^2/2)
where z is the random variable, π is the mathematical constant approximately equal to 3.14159, and e is the mathematical constant approximately equal to 2.71828.
The standard normal distribution has several important properties that make it a useful tool in statistical analysis, including the following:
- The total area under the curve is equal to 1, which means that the probability of any possible outcome is always between 0 and 1.
- The distribution is symmetric around the mean, with 50% of the data lying above the mean and 50% lying below.
- The area under the curve between any two values of z represents the probability that the random variable falls within that range.
- Because the standard normal distribution has a mean of 0 and a standard deviation of 1, it is easy to compare and analyze data from different sources.
The standard normal distribution is widely used in many fields, including finance, engineering, physics, and social sciences, to model phenomena such as stock prices, test scores, and physical measurements. Many statistical methods, such as hypothesis testing and confidence intervals, use the standard normal distribution as a reference distribution.
Student’s T Distribution
Student’s t-distribution is a probability distribution that is used to model the probability distribution of the sample mean when the population standard deviation is unknown and the sample size is small. It was developed by William Sealy Gosset, who published under the pseudonym “Student.”
The t-distribution is similar to the normal distribution, but it has heavier tails, which means that it has more probability in the tails than the normal distribution. The shape of the t-distribution depends on the degrees of freedom (df), which is equal to the sample size minus 1.
The probability density function (PDF) of the t-distribution is given by:
f(t) = (1 / (sqrt(df) * Beta(1/2, df/2))) * (1 + t^2/df)^(-(df+1)/2)
where t is the random variable, df is the degrees of freedom, and Beta is the beta function.
The t-distribution has several important properties that make it a useful tool in statistical analysis, including the following:
- As the sample size increases, the t-distribution approaches the normal distribution.
- The t-distribution has a mean of 0 and a variance of df/(df-2) for df > 2.
- The t-distribution is symmetric around the mean.
- The area under the curve between any two values of t represents the probability that the random variable falls within that range.
The t-distribution is widely used in many fields, including economics, engineering, and the social sciences, to test hypotheses and to construct confidence intervals for population parameters. It is particularly useful when the sample size is small and the population standard deviation is unknown.
Chi-squared Distribution
The chi-squared distribution is a probability distribution that arises in statistics when analyzing the distribution of a sum of the squared standard normal random variables. It is commonly used in hypothesis testing, goodness of fit tests, and constructing confidence intervals.
The chi-squared distribution is characterized by a single parameter, known as the degrees of freedom (df). The degrees of freedom determine the shape of the distribution, and the distribution becomes more symmetrical as the degrees of freedom increase.
The probability density function (PDF) of the chi-squared distribution is given by:
f(x) = (1 / (2^(df/2) * Γ(df/2))) * x^(df/2 – 1) * e^(-x/2)
where x is the random variable, Γ is the gamma function, and df is the degrees of freedom.
The chi-squared distribution has several important properties that make it a useful tool in statistical analysis, including the following:
- The mean of the chi-squared distribution is equal to the degrees of freedom, and the variance is equal to twice the degrees of freedom.
- The chi-squared distribution is skewed to the right, with a longer right tail.
- The area under the curve between any two values of x represents the probability that the random variable falls within that range.
- The chi-squared distribution is used in hypothesis testing to test the goodness of fit of a sample distribution to a theoretical distribution, or to test the independence of two categorical variables.
The chi-squared distribution is widely used in many fields, including economics, engineering, and the social sciences, to test hypotheses and to construct confidence intervals for population parameters. It is particularly useful when dealing with data that follow a normal distribution, and when the sample size is small.
Exponential Distribution
The exponential distribution is a probability distribution that is used to model the time between two consecutive events that occur independently of each other at a constant rate. It is widely used in reliability analysis, queuing theory, and other fields where waiting times are important.
The probability density function (PDF) of the exponential distribution is given by:
f(x) = λ * e^(-λx)
where x is the random variable, and λ is the rate parameter, which represents the average number of events per unit time.
The exponential distribution has several important properties that make it a useful tool in statistical analysis, including the following:
- The mean of the exponential distribution is equal to 1/λ, and the variance is equal to 1/λ^2.
- The exponential distribution is memoryless, which means that the probability of an event occurring within the next time interval is the same, regardless of how much time has passed since the last event.
- The area under the curve between any two values of x represents the probability that the random variable falls within that range.
- The exponential distribution is used in reliability analysis to model the time until a system fails, and in queuing theory to model the time between arrivals of customers.
The exponential distribution is widely used in many fields, including engineering, finance, and the social sciences, to model waiting times and failure times. It is particularly useful when dealing with events that occur independently of each other at a constant rate, and when the time between events is of interest.
More here: Exploring The Different Types Of Probability Distribution Function