Glossary of Terms


The act of considering something as a general quality or characteristic, apart from concrete realities, specific objects, or actual instances
achievement test
A measure of knowledge and skills in a content area.
acquiescence set
The tendency to agree with statements on a test or affective measure.
Having to do with attitudes, beliefs, and values.
affective domain
The area of human action which emphasizes the internalized processes such as emotion, feeling, interest, attitude, value, character development, and motivation.
affective taxonomy
A system for classifying different levels of internalization of an attitude or value.
A method of computation. Usually set up so that calculation can be made routinely without mathematical understanding from the computational scheme.
To separate into constituent parts or elements; to examine critically, so as to bring out the essential elements or give the essence of
anecdotal record
A written description of an observed event.
A natural talent or ability.
Collecting data in the context of conducting measurement.
association form
A short-answer item format in which the student is given set of words or phrases and must supply corresponding words or phrases according to a defined basis.
attitude test
A measure of one's feelings.
The selection and provision of test items such that subject matter topics and behaviors are sampled in accordance with established relative weights.
biserial correlation
Shows the degree of relationship between a continuous and a normally distributed variable which has been dichotomized.
blind guessing
The selection of an alternative for a selected-response item without using any knowledge or rational approach to the choice. The probability of choosing the correct response is at chance level. If there are two choices, a blind guess should result in the correct selection about 50 percent of the time, for four choices, 25 percent of the time, and so on.
A strategy for responding to essay questions; providing an answer that may not directly address the question.
Buckley Amendment
Legislation that gives students and their parents access to information about themselves, including test scores.
centile point
Is the point on a scoring scale below which fall a certain percentage of the cases.
central tendency
An average or middle value for a distribution of scores.
A measure of the presence or absence of listed attributes.
chi square
Shows the degree of divergence between observed and expected frequencies.
coefficient of determination
The square of the correlation coefficient; the percentage of the variance in one variable that is predictable from another variable.
Having to do with knowing or understanding.
cognitive domain
The area of human action which pertains to mental processes such as intellectual, learning, and problem solving.
cognitive taxonomy
A system for classifying different levels of understanding.
completion form
A short-answer item format in which the student is to supply the missing word or words in a given item.
Covering all material taught to date in a course.
computerized adaptive testing
Computer-assisted testing in which the items that are presented are determined by the responses to previous items.
concurrent validity
A form of criterion validity based on the correlation of test scores with those on a criterion measure obtained at about the same time.
An idea or concept invented to explain an aspect of human behavior or some other nonphysical characteristic. Example: hostility.
construct validity
The extent to which a test measures certain psychological traits.
content bias
Disproportionate representation of topics and terms within a test.
content sampling
The extent to which the items on a test represent the entire domain of possible items in a content area.
content validity
The extent to which a test or measure is representative of a defined body of knowledge.
correction for guessing
A mathematical adjustment that brings the score to zero for someone who guessed on each item.
A measure of the strength and direction of the association between two sets of scores.
Variance that two or more tests have in common.
criterion referenced
A way of interpreting a test score which compares an individual's performance to an established standard of performance.
criterion validity
Validity based on the correlation between test scores and scores on some measure representing an identified criterion.
Cronbach alpha procedure
A procedure for estimating internal consistency reliability, based on parts of a test.
Related to predictive validity; using results for one sample of individuals to determine if validity coefficients will remain stable for another sample.
Any one of nine centile points (scores) which divide a distribution into ten parts.
Vital and social statistics
descriptive statistics
Summary characteristics of distributions, such as shape, average, and dispersion.
diagnostic test
A test used to measure a student's strengths and weaknesses in a given area.
difficulty index
A measure of the percentage of incorrect responses determined by dividing the number getting the item wrong by the number who tried the item. Used to establish how difficult an item was for the group who took the test.
direct observation
Noticing of phenomena without any intervening factor between the observer and that which is being observed. A record of the situation is made.
The ability of a test item to separate high and low scores on a total test.
discrimination index
A value which indicates the ability of an item to separate high-achieving students from low-achieving students.
The spread among scores in a distribution.
A response for a multiple-choice item that is classed as an incorrect alternative. It is a plausible wrong answer designed to be attractive to students who do not know the correct response.
distractor analysis
Item analysis technique concerned with the options on a multiple-choice item.
A sphere of human activity. The three major categories are cognitive, affective, and psychomotor.
domain specification
A precise delineation of a body of content or a set of behaviors.
Verifiable by experience or experiment; objective collection of data to test a subjective concept
equivalence reliability
The extent to which measurement on two or more forms of a test is consistent.
equivalent (parallel) forms
Two or more forms of a test covering the same content whose item difficulty levels are similar.
Variation produced by the inaccuracies of measurement. The source of the variation may be within the test instrument, within the subjects of measurement, or in the way the test was administered.
essay item
An item format that requires the student to structure a rather long written response, up to several paragraphs.
The process of making a value judgment based on information from one or more sources.
The modification of the conditions of a group or groups that have been chosen for study, and the analysis of the resulting outcomes.
extended response
An answer to an essay item which asks or implies a question which has no definite limits to restrict the student response. The response set is open ended. (See limited response.)
f test
To determine the significance of the difference between the variances (*2) of two groups.
factor analysis
An analytical procedure that can be used for identifying the number and nature of constructs underlying a set of measures.
factor loading
From factor analysis; a correlation between a factor and a test score.
Done to monitor progress over a period of time.
frequency distribution
A listing of scores and the number of persons receiving each score.
general factor
From factor analysis; a factor that has substantial loading with all measures or tests.
global-quality scaling
A method of scoring an essay item; also called holistic scoring, scoring based on the general impression of overall adequacy and quality of the response.
grade equivalent scores
Norm-referenced scores that report performance in terms of grade and month (such as 4.6—fourth grade, sixth month).
The process of evaluating performance and assigning a mark of performance level; commonly associated with assigning letters, A, B, C, D, and F--A being of better or higher performance than B, and so on.
grammatical clue
A flaw in objective items in which the wording or punctuation directs the examinee to the correct answer.
group factor
From factor analysis; a factor that has high loadings with two or more but not all measures or tests.
grouped frequency distribution
A frequency distribution that categorizes scores by intervals.
halo effect
The tendency to give high scores to students known to be good students and vice versa, independent of the quality of the response.
high-stakes test
A test for which the consequences of doing well or poorly are costly.
A bar graph that describes a distribution of scores.
Giving approval for certain procedures after indicating an understanding of those procedures.
The capacity for reasoning and understanding.
intelligence quotient (IQ)
The ratio of mental age to chronological age multiplied by 100 (100 x (MA/CA)); one whose mental age is average for his or her chronological age group has an IQ of 100.
internal consistency reliability
The extent to which parts of a test are consistent in measurement.
A defined distance on a scale of measurement.
interval measurement
Measurements that classify, order, and have equal distances between points on the scales.
Something similar or identical in structure or appearance to something else.
item analysis
An examination of student performance for each item on a test. It consists of reexamination of the responses to items of a test by applying mathematical techniques to assess two characteristics--difficulty and discrimination--of each objective item on the test.
item sampling
A technique used in schoolwide, state, or national testing that administers only a part of a test to each student. This allows a longer test to be administered but does not require a long test session for each student involved. If each student is administered only one-fourth of the test, a four-hour test could be administered with no student giving more than one hour of time.
item specifications
Item writing procedures for criterion-referenced tests that include sample items and descriptions of the stimulus and the response.
item statistics
Summary descriptions of a group's performance on a particular test item.
item-total correlation
the coefficient that describes the association between the scores on a particular item and the scores on the entire list.
Kelly's range
The distance between the 10th and 90th centile ranks.
Kuder-Richardson Formula 21 procedure (KR-21)
a split-half approach to estimating reliability that may be substituted for the KR-20 procedure if item difficulty levels are similar.
Kuder-Richardson Formula 20 procedure (KR-20)
a split-half approach to estimating reliability that provides the mean of all possible split-half reliability coefficients for a test.
Refers to the peakedness or flatness of a frequency distribution as compared with a normal distribution.
A frequency distribution more peaked than normal.
limited response
Essay item which asks a question or gives instructions for restricting the area to be covered in responding to the stated tasks. The coverage expected is well fenced in for the student. (See extended response.)
local norm
The average test performance in some city or region.
mastery-nonmastery discrimination
Item analysis technique concerned with decisions regarding a cut-off score.
matching item
An item consisting of a two-column format--premises and responses--that requires the student to make a correspondence between the two.
The arithmetic average of a set of scores.
mean deviation
A measure of variability or dispersion of a distribution of scores.
A process that assigns by rule a numerical description to observation of some attribute of an object, person, or event.
measurement scales
Classifications of measures based on the amount of information contained in each score.
The middle score of a distribution.
mental age
The average intellectual functioning of normal persons at al given age, usually expressed in months.
minimum competency testing
Testing designed to measure the acquisition of competence or skills to or beyond a defined standard.
The most frequent score of a distribution.
multifactored assessment
Assessment that usually includes the physical, cognitive, psychological, and social factors that are believed to affect learning.
multiple-choice item
A test format in which the examinee selects the correct answer from a list of possible options.
national norm
The average performance of a sample selected to be representative of the entire country.
needs assessment
A process whereby the educational requirements of students collectively or individually are determined. Usually thought of as a formal structured approach, but may be done informally by the teacher.
negative skewness
Asymmetry in which most of the scores in a distribution are at the high end.
nominal measurement
Measurement that classifies elements into mutually exclusive and exhaustive categories.
norm group
The set of subjects used to establish the averages to be used to interpret student scores on a standardized test.
norm referenced measurement
Measurement in which an individual's score is interpreted by comparing it to the scores of a defined group.
normal distribution (curve)
A theoretical distribution of scores which forms a curve that is bell shaped and symmetrical.
The test scores (also possibly statistics generated from scores) of one or more defined groups considered to be representative.
null hypothesis
A statement that there is no difference in measures of the criterion vairable except what would be expected from sampling; requires that a significance level be stated (.05, .01, . . .).
Dealing with things external to the mind rather than with thoughts or feelings; pertaining to that which can be known, or that which is an object or a part of an object.
objective items
Items that can be objectively scored; items on which persons select a response from a list of options.
The degree to which the task to be performed is clear and the correct response is definite.
objectivity (in scoring)
The extent to which equally competent scorers obtain the same result.
Any fact which is used as a basis for evaluation procedures. The output of the process of observing.
oral tests
Examinations in which both the questioning and answering are done aloud.
ordinal measurement
Measurement that classifies and orders along a continuum.
parallel forms
Two or more forms of a test covering the same content whose item difficulty levels are similar.
partial correlation
Shows the relationship between two variables with the effects of one or more other variables held constant.
penalty for guessing
A mathematical procedure for lowering scores as a function of the number of incorrect answers.
Norm-referenced scores that indicate the percentage of a norm group that a particular score exceeded.
performance bias
Bias introduced when individuals are not able to perform on a test because they have not had the opportunity to learn the test content.
performance test
Nonpaper-and-pencil tests that require the student to engage in some type of process, produce a product, or both.
pilot study
A miniature study conducted with a group of students that is not used as part of the major study. It is used to try out procedures or instruments (adapted from Hopkins & Antes, 1990, p. 461)
phi coefficient
Shows the degree of relationship between two dichotomous variables.
A frequency distribution that is flatter than normal.
point biserial correlation
Shows the degree of relationship between a continuous and a truly dichotomous variable.
Any defined aggregate of persons, objects, or events.
positional preference
The regular placement of the correct response in a particular position; for instance, always in choice C.
positive skewness
Asymmetry in which most of the scores in a distribution are low.
power test
A test in which time does not affect quality of performance, that is, students would not perform better if given additional time.
practice effect
The consequences of taking similar tests or testlike exercises.
pre-post discrimination
Item analysis technique concerned with assessing performance before and after instruction.
predictive validity
A form of criterion validity based on the correlation of test scores with scores on a criterion measure obtained at some time.
In a matching item, the column of words consisting of item stems.
prescriptive test
A test designed to identify student deficiencies, weaknesses or problems, and to suggest corrective learning activities.
problem solving
Settlement of a perplexing question or situation.
product moment correlation
Shows the degree of relationship between two continuous variables.
Any thrust area activity which is funded by the National Science Foundation or uses resources designated as matching funds
Having to do with movement or motor skills.
psychomotor domain
The area of human action which emphasizes all types of body movements which are involuntary or voluntary.
psychomotor taxonomy
A system for classifying psychomotor behaviors in terms of the amount of concentration required.
Information in the form of statements or narrative
Information that has been expressed in terms of mathematically manipulable numbers
quartile deviation
A measure of variability or dispersion of a distribution of scores.
quartile one
The point (score) in a distribution that sets off the lower fourth of the group.
quartile three
The point (score) in a distribution that sets off the higher fourth of the group.
quartile two
The point (score) in a distribution which divides the distribution into two equal parts.
random sample
A sample in which every member of the parent population has an equal chance of being chosen.
The difference between the highest and lowest scores in a distribution.
rank correlation
Shows the degree of relationship between two continuous variables by comparing ranks.
rating scale
A measure that contains one's estimate of the value of a person or thing.
ratio measurement
Measurement that classifies, orders, has equal units, and a true zero point.
raw score
The original score, as of a test, before it is statistically adjusted. It may include weighting and a correction for guessing but no other transformation.
reading difficulty
The level of reading ability required to understand test questions.
reliability coefficient
A numerical index of reliability based on a correlation coefficient; theoretically, the index can range from O to + 1.0.
The consistency with which a data collection device measures whatever it is that the device measures.
representative sample
Any subset of persons or items selected to represent a larger group or population which has the same inclinations as the total group or population with reference to some characteristic or characteristics. In testing, the test instrument is composed of tasks which are intended to reflect the characteristics of the larger population of possible test tasks which could be asked.
The act of assuming a pose or role when responding to affective questions.
Any subaggregate of a larger population
A two-dimensional graph of the relationship between two sets of scores.
scorer reliability
The consistency with which two or more individuals would score the same response to a test item.
secure test
A test (often commercially published) that is not circulated so it can be used repeatedly.
separate answer sheets
Forms provided for item response that are not attached to nor contained in the test copy; many can be electronically scored.
short-answer item
A test item for which the student supplies a brief response, usually consisting of a word or phrase.
The tendency of a distribution to depart from symmetry or balance.
socially acceptable response
An answer to a question that may be inaccurate but conforms to desired social norms.
spearman-Brown formula
A formula for estimating reliability if test length is changed.
specific determiners
Terms such as always, never, every, and all that provide clues to correct answers.
specific factor
From factor analysis; a factor that has a high loading with only one measure or test.
speeded test
A test administered so that students are required to complete the exam within a specified amount of time.
split-half method
A procedure for estimating test reliability by which a test is divided into two comparable halves and the scores on the halves are then correlated.
stability reliability
The extent to which measurement on the same test is consistent over time.
standard deviation
A measure of dispersion in a distribution that is the positive square root of the variance.
standard error of estimate
Gives the amount of error involved in predicting a score from the regression equation.
standard error of measurement
The standard deviation of the distribution of error scores.
standard error of the mean
Is the standard deviation of a distribution of sample means.
standard score
A norm-referenced measurement that indicates how many standard deviations a score is above or below the mean.
A process of preparing a test instrument for use in widely separated locations. The test is standardized so that administration and scoring procedures are the same for all test takers. Score interpretation is made to averages of performances of groups of test takers whose scores are then used for making comparison to interpret scores obtained from other students.
Norm-referenced scores that can range from 1 to 9, they have a mean of 5 and a standard deviation of 2.
Descriptive characteristics of a distribution of scores; also, that area of mathematics dealing with the collection, organization, and interpretation of numerical data.
The introductory part of an objective test item.
Existing in the mind; belonging to the thinking subject rather than to the object of thought; relating to the nature of an object as it is known in the mind as distinct from a thing in itself
summative testing
Done at the conclusion of a course or some larger instructional period.
t test for a correlation
A test to discover if a correlation shows a real (significant) relationship, or a relationship due merely to chance.
t test between means or proportions
A test to discover if the difference between two means or two proportions is significant, or merely due to chance.
table of specifications
A two-dimensional grid, content by cognitive process, used in planning a test.
take-home test
A test that a student completes outside of class, usually in an uncontrolled setting.
A system of classification and the concepts of identification, naming, and categorization underlying the coordination.
teacher competency test
A test for (prospective) teachers on knowledge and skills essential for effective teaching.
technical adequacy
The level of test reliability and validity necessary before the test can be recommended for use.
technical problem
A complex situation from a specialized field of study which is presented to a student for solution within the structure of that field. Usually used for assessment of general understandings of a wide set of principles and ideas rather than for special skills and talents.
test anxiety
A psychological state of stress caused by a testing situation.
test bias
A systematic error in the measurement process.
test item file
A collection of individual items on cards which are arranged by content areas for future use in test assembly.
test-retest method
A procedure of estimating test reliability by which the same test is administered twice to the same individuals and the scores from the two administrations are then correlated.
The set of items or questions presented to one or more individuals under specified conditions for purposes of measurement.
testing arrangement
The setting in which a test is administered.
The process of administering or taking a test.
tetrachoric correlation
Shows the degree of relationship between two normally distributed variables which are categorized into dichotomies.
transformed standard scores
Z-scores that have been converted to a distribution with a prespecified mean and standard deviation.
true component
The part of an individual's score that is nonerror; the score if the test were perfectly reliable.
true-false item
A test format in which examinees indicate whether given statements are correct (true) or incorrect (false).
unobtrusive observation
Instances of noticing made in such a way that persons being observed do not know that they are being observed.
The practical factors that must be considered in test selection: cost, testing time, examiner training, and so on.
The extent to which a test measures what it is intended to measure.
validity coefficient
The correlation between a test of known validity and a test of unknown validity.
A measure of dispersion.
weighted scores
The composite scores that are weighted combinations of two or more separate scores.
work sample
A nontest measurement of student learning.

Koenker, R. H. (1971). Simplified statistics. Totowa, NJ: Littlefield, Adams & Co.

Wiersma, W., & Jurs, S. G. (1990). Educational measurement and testing, 2nd ed. Boston: Allyn & Bacon.

Hopkins, C. D., & Antes, R. L. (1990). Classroom testing: Construction. Itasca, IL: F. E. Peacock Publishers.

Hopkins, C. D., & Antes, R. L. (1990). Educational research: A structure for inquiry, (3rd ed.). Itasca, IL: F. E. Peacock Publishers.

Webster's Encyclopedic Unabridged Dictionary of the English Language. (1989). New York: Gramercy Books.