miswiki

 

Glossary

Page history last edited by Murali Shanker 3 yrs ago

Acceptance Region

An acceptance region is the set of values for which you would accept the null hypothesis H0.

 

Alternative Hypothesis

The alternative hypothesis, denoted by H1 , is an alternative to the null hypothesis; the change in the population that the researcher hopes is true.

 

Balanced Design

If there is the same number of units assigned to each treatment combination, the experimental design is balanced.

 

Biased Sampling

A sampling method is biased if it produces results that systematically differ from the truth about the population.

 

Boxplot

A graphical way of showing the five-number summary.

 

Census

A census is a sample consisting of the entire population.

 

Cluster Sampling

In cluster sampling, the units of the population are grouped into clusters. One or more clusters are selected at random. If a cluster is selected, all of the units that form that cluster are included in the sample.

 

Confounding Variable

A confounding variable is a variable whose effect on the response variable cannot be separated from the effect of the explanatory variable on the response variable.

 

Continuous Random Variables

A continuous random variable can assume any value in an interval or collection of intervals.

 

Control Group

A control group is a group of subjects or experimental units that are treated identically in every way, except that they do not receive an actual treatment.

 

Convenience Sample

A convenience sample is a sample consisting of units of the population that are easily accessible. Convenience samples are generally biased.

 

Correlation Coefficient r

The sample correlation coefficient r measures the strength of the linear relationship between two quantitative variables. It describes the direction of the linear association and indicates how closely the points in a scatterplot are to the least squares regression line.

 

Critical Value

A cutoff value or critical value is a value that marks the starting point of a set of values that comprise the rejection region.

 

Decision Rule

A decision rule is a formal rule that states, based on the data obtained, when to reject the null hypothesis H0. Generally it specifies a set of values, based on the data to be collected, which are contradictory to the null hypothesis H0 and which favor the alternative hypothesis H1.

 

Density Function

A density function is a (nonnegative) function or curve that describes the overall shape of a distribution. The total area under the entire curve is equal to one, and proportions are measured as areas under the density function.

 

Dependent Variable

See Response Variable

 

Design Layout Table

A design layout table displays the various combinations of the levels of each of the explanatory variables in an experiment.The number of experimental units assigned to each treatment combination may also be presented. If there is just one explanatory variable, the design layout table would list the levels of that one variable along with the number of units assigned to each level. If there are two explanatory variables, the levels of one variable would form the rows of the design layout table, the levels of the second variable would form the columns of the table, resulting in a two-way table. The number of units assigned to each combination can be written inside each cell of the table representing that combination. If there are more than two explanatory variables, we will need to use several two-way tables. For three explanatory variables, the design layout table is a three-way table, presented as several two-way tables side-by-side, one for each level of the third variable.

 

Direction of Extreme

The direction of extreme corresponds to the position of the values that are more likely under the alternative hypothesis H1 than under the null hypothesis H0. If the larger values are more likely under H1 than under H0, then the direction of extreme is said to be to the right.

 

Discrete Random Variables

A discrete random variable can assume at most a finite or infinite but countable number of values.

 

Double-Blind Experiments

A double-blind experiment is one in which neither the subjects nor those working with the subjects knows who is receiving which treatment.

 

Expected Values

The expected value of a random variable X is the average of that random variable. E(X) = &Sigma xi P(X=xi), where the summation is taken over all values of X.

 

Experiment

In an experiment, the researcher actively imposes some treatment on the units or subjects in order to observe the responses.

 

Experimenter Bias

Experimenter bias is the distortion that can arise on the part of the experimenter due to how the subjects are assigned to the groups, which variables are measured and how they are measured, and how the results are interpreted. The bias is generally in the direction of the researcher's theory.

 

Explanatory Variable

An explanatory or independent variable or factor is a variable that is thought to explain or cause the observed outcomes. It is a variable that is thought to explain the changes in the response variable.

 

Extrapolation

Extrapolation is using the regression line to predict the value of a response corresponding to a value of x that is outside the range of the data used to determine the regression line. Extrapolation can lead to unreliable predictions.

 

Factor

See explanatory variable.

 

Independent Variable

See Explanatory Variable

 

Influential Points

An influential point in regression is an observation that has a great deal of influence in determining the regression equation. Removing such a point would markedly change the position of the regression line. Observations that are somewhat extreme for the value of x are often influential.

 

Interquartile Range (IQR)

IQR is the difference between the third quartile (Q3) and the first quartile (Q1)

 

Least Squares Regression Line

The least squares regression line is the line that makes the sum of the squared vertical deviations of the data points from the line as small as possible.

 

Levels

The possible values of the explanatory variable are called the levels of that explanatory variable.

 

Linear Transformation

Linear transformation is a particular transformation of one variable to another.

 

Mean

The mean of a set of n observations is simply the sum of the observations divided by the number of observations, n. The mean is also affected by extreme observations (outliers and values that are far in the tail of a distribution that is skewed). So the mean tends to be a good choice for locating the center of a distribution that is unimodal and roughly symmetric, with no outliers.

 

Median

The median of a set of n observations, ordered from smallest to largest, is a value such that at least half of the observations are less than or equal to that value and at least half the observations are greater than or equal to that value. If you have an odd number of values, the median is the one in the middle. If you have an even number of values, the median is the mean of the two middle values and falls exactly half way between them. For skewed distributions or distributions with outliers, the median tends to be the better choice for locating the center.

 

Mode

The mode of a set of observations is the most frequently occurring value.For a distribution, the mode is the value associated with the highest peak. The most frequent value can be far from the center of the distribution, so the mode is not really a measure of center. However, the mode is the only measure of the three that can be used for qualitative data.

 

Multistage Sampling

Sampling is performed in various stages. The sample at each stage could be any one of the many sampling methods. The items selected at any given stage are selected from within each item that was selected at the previous stage.

 

Mutually Exclusive Subgroups

Mutually exclusive subgroups imply that each unit of the population belongs to only one stratum.

 

Nonresponse Bias

Nonresponse bias is the distortion that can arise because a large number of units selected for the sample do not respond or refuse to respond, and these nonresponders have a tendency to be different from the responders.

 

Normal Distribution

X~N(&mu,&sigma2) means that the variable or characteristic X is normally distributed with mean &mu and variance &sigma2.

 

Null Hypothesis

The null hypothesis, denoted by H0, is a status quo or prevailing viewpoint about a population.

 

Observational Study

In an observational study, the researcher simply observes the subjects or units and records variables of interest. The researcher does not attempt to manipulate or influence the responses.

 

One-Sided Rejection Region

A rejection region is called one-sided if its set of extreme values are all in one direction, either all to the right or all to the left.

 

Outliers

An outlier in regression is an observation with a residual that is unusually large (positive or negative) as compared to the other residuals.

 

Parameter

A parameter is a numerical value that would be calculated from all of the units in the population.

 

Percentile

The pth percentile is the value such that p% of the observations fall at or below that value and (100-p)% of the observations fall at or above that value.

 

Placebo

The placebo effect is a phenomenon in which receiving medical attention, even administration of an inert drug, improves the condition of the subjects.

 

Population

The population is the entire group of objects or individuals under study, about which information is wanted. The size of the population is commonly denoted by N.

 

p-value

Probability of observing sample result or something more extreme under the null hypothesis

 

P-Values

The p-value is the chance, computed under the assumption that H0 is true, of getting the observed value plus the chance of getting all of the more extreme values. We should understand that the smaller the p-value, the stronger is the evidence provided by the data against the null hypothesis H0.

 

Probability

The probability that an outcome will occur is the proportion of time it oc-curs over the long run; that is, the relative frequency with which that outcome occurs.

 

Probability Mass Function

A probability mass function (PMF) is used as a model for a discrete variable. For each possible value the mass function gives the proportion of units in the population having that value. Thus the values of the mass function must be between 0 and 1 and add up to 1.

 

Probability Sampling Method

A sampling method that gives each unit in the population a known, non-zero chance of being selected is called a probability sampling method. This is also called Statistical Sampling

 

Prospective Study

A prospective study is a study of ongoing or future events. Researchers identify subjects who have various explanatory variables or factors and follow them into the future and record the responses. A prospective study works from the potential explanatory variables to the responses.

 

Random Allocation

Random allocation is a planned use of chance for assigning the units to the treatments. An experiment is completely randomized if the experimental units are randomly assigned to the treatment combinations.

 

Random Process

A random process is a repeatable process whose set of possible outcomes is known, but the exact outcome cannot be predicted with certainty. However, there is a predictable long-term pattern of outcomes such that the relative frequency for a given outcome to occur settles down to a constant value.

 

Random Variables

A random variable is an uncertain numerical quantity whose value depends on the random outcome of an experiment.We can think of a random variable as a rule that assigns one (and only one) numerical value to each point in the sample space of an experiment.

 

Range

Range is the difference between the largest and the smallest value.

 

Rejection Region

A rejection region is the set of values for which you would reject the null hypothesis H0. Such values are contradictory to the null hypothesis and favor the alternative hypothesis H1.

 

Replication

If at least two units are assigned to each treatment combination, we have replication in an experiment.

 

Residual

A residual is the difference between the observed response (y)and the predicted response (yhat) using the regression line.

 

Response Bias

Response bias is the distortion that can arise because the wording of a question and the behavior of the interviewer can affect the responses received.

 

Response Variable

A response or dependent variable measures an outcome of the study. It is a variable that is thought to depend in some way on the explanatory variable.

 

Retrospective Study

A retrospective study is a study of past events. Researchers identify subjects who have experienced certain responses and look back to see if the subjects also had various factors or explanatory variables. Subjects may be asked to recall past events. A retrospective study works from the responses to the potential explanatory variables.

 

Sample

A sample is a part of the population that is actually used to get information.The sample size is commonly denoted by n.

 

Sampling Distribution

The sampling distribution of a statistic is the distribution of the values of the statistic in all possible samples of the same size n taken from the same population. The sampling distribution of the sample mean is the distribution of values of the sample mean in all possible samples of the same size n taken from the same population.

 

Scatterplot

A scatterplot graphically shows the relationship between two quantitative variables x and y.

 

Selection Bias

Selection bias is the systematic tendency on the part of the sampling procedure to exclude or include a certain type of unit.

 

Significance Level

The significance level number a is the chance of committing a Type I error, that is, the chance of rejecting the null hypothesis when it is in fact true.

 

Simple Random Sampling

A simple random sample of size n is a sample of n units selected in such a way that every possible sample of the given size n has the same chance of being selected as any other sample of size n.

 

Simulation

A simulation is the imitation of random or chance behavior using random devices such as random number generators or a table of random numbers.

 

Single-Blind Experiments

A single-blind experiment is one in which the subjects are ignorant of which treatment they receive.

 

Standard Deviation

The standard deviation is a measure of the spread of the observations from the mean. It is the square root of an average of the squared deviations of the observations from the mean. The units for standard deviation are same as the original data.

 

Standard Error of the Sample Mean

When data is used to estimate the standard deviation of a statistic, the result is called the standard error of the statistic. For example, the standard error of the sample mean is (s /Ön). When this value is estimated from data as (s /Ön), it is called the estimated standard error of the sample mean.

 

Standard Normal

A standard normal is a normal distribution with a mean of 0 and a variance of 1.

 

Statistic

A statistic is a numerical value that is calculated from all of the units in a sample.

 

Statistical Inference

Statistical inference is the process of drawing conclusions about the population based on information from a sample from that population.

 

Statistical Sampling

See Probability Sampling

 

Statistically Significant

The data collected are said to be statistically significant if the data are very unlikely to be observed under the assumption that H0 is true. If we reject H0 , then we say the data are statistically significant.

 

Stratified Random Sampling

A stratified random sample is selected by dividing or stratifying the population into mutually exclusive subgroups (strata) and taking a simple random sample of units from each stratum. The units sampled from each stratum are combined to form the complete sample.

 

Subjects

A unit is an individual object or person in the population. The units are often called subjects if the population consists of people.

 

Systematic Sampling

For a 1-in-k systematic sample, you order the units of the population in some way and randomly select one of the first k units in the ordered list. This selected unit is the first unit to be included in the sample. You continue through the list selecting every kth unit from then on.

 

Treatment

A treatment is a specific combination of the levels of the explanatory variables.

 

Treatment Group

A treatment group is a group of subjects or experimental units that receive an actual treatment.

 

Two-Sided Rejection Region

A rejection region is called two-sided if its set of extreme values are in two directions, both to the right and to the left.

 

Type I error a

Rejecting the null hypothesis H0 when in fact it is true is called a Type I error.

 

Type II error b

Accepting the null hypothesis H0 when in fact it is not true is called a Type II error.

 

Unbiased

A statistic is unbiased if the center of its sampling distribution is equal to the corresponding population parameter value. In other words, a random variable X is an unbiased estimater of a population parameter t, if E(X) = t.

 

Unit

A unit is an individual object or person in the population. The units are often called subjects if the population consists of people.

 

Variable

A variable is a characteristic of interest to be measured for each unit in the sample.

 

Variance

Variance is an average of the squared deviations of the observations from their mean. It is a measure of spread. Variance is measured in the square of the units of original data.

 

Volunteer Sample

A volunteer sample is a sample consisting of units of the population that chose to respond. Volunteer samples are generally biased.

 

WebCT

A tool for the creation of web-based learning environments.

Comments (0)

You don't have permission to comment on this page.