Educational Tools for Teachers.      Tests and quizzes.    
 

 

 


 

 

     
§ General Strategies

Spend adequate amounts of time developing your tests. As you prepare a test, think carefully about the learning outcomes you wish to measure, the type of items best suited to those outcomes, the range of difficulty of items, the length and time limits for the test, the format and layout of the exam, and your scoring procedures.

Match your tests to the content you are teaching. Ideally, the tests you give will measure students' achievement of your educational goals for the course. Test items should be based on the content and skills that are most important for your students to learn. To keep track of how well your tests reflect your objectives, you can construct a grid, listing your course objectives along the side of the page and content areas along the top. For each test item, check off the objective and content it covers. (Sources: Ericksen, 1969; Jacobs and Chase, 1992; Svinicki and Woodward, 1982)

Try to make your tests valid, reliable, and balanced. A test is valid if its results are appropriate and useful for making decisions about an aspect of students' achievement (Gronlund and Linn, 1990). Technically, validity refers to the appropriateness of the interpretation of the results and not to the test itself, though colloquially we speak about a test being valid. Validity is a matter of degree and considered in relation to specific use or interpretation (Gronlund and Linn, 1990). For example, the results of a writing test may have a high degree of validity for indicating the level of a student's composition skills, a moderate degree of validity for predicting success in later composition courses, and essentially no validity for predicting success in mathematics or physics. Validity can be difficult to determine. A practical approach is to focus on content validity, the extent to which the content of the test represents an adequate sampling of the knowledge and skills taught in the course. If you design the test to cover information in lectures and readings in proportion to their importance in the course, then the interpretations of test scores are likely to have greater validity An exam that consists of only a few difficult items, however, will not yield valid interpretations of what students know.

A test is reliable if it accurately and consistently evaluates a student's performance. The purest measure of reliability would entail having a group of students take the same test twice and get the same scores (assuming that we could erase their memories of test items from the first administration). This is impractical, of course, but there are technical procedures for determining reliability. In general, ambiguous questions, unclear directions, and vague scoring criteria threaten reliability. Very short tests are also unlikely to be highly reliable. It is also important for a test to be balanced: to cover most of the main ideas and important concepts in proportion to the emphasis they received in class.

Use a variety of testing methods. Research shows that students vary in their preferences for different formats, so using a variety of methods will help students do their best (Jacobs and Chase, 1992). Multiple-choice or shortanswer questions are appropriate for assessing students' mastery of details and specific knowledge, while essay questions assess comprehension, the ability to integrate and synthesize, and the ability to apply information to new situations. A single test can have several formats. Try to avoid introducing a new format on the final exam: if you have given all multiple-choice quizzes or midterms, don't ask students to write an all-essay final. (Sources: Jacobs and Chase, 1992; Lowman, 1984; McKeachie, 1986; Svinicki, 1987)

Write questions that test skills other than recall. Research shows that most tests administered by faculty rely too heavily on students' recall of information (Milton, Pollio, and Eison, 1986). Bloom (1956) argues that it is important for tests to measure higher-learning as well. Fuhrmann and Grasha (1983, p. 170) have adapted Bloom's taxonomy for test development. Here is a condensation of their list:

To measure knowledge (common terms, facts, principles, procedures), ask these kinds of questions: Define, Describe, Identify, Label, List, Match, Name, Outline, Reproduce, Select, State. Example: "List the steps involved in titration."

To measure comprehension (understanding of facts and principles, interpretation of material), ask these kinds of questions: Convert, Defend, Distinguish, Estimate, Explain, Extend, Generalize, Give examples, Infer, Predict, Summarize. Example: "Summarize the basic tenets of deconstructionism."

To measure application (solving problems, applying concepts and principles to new situations), ask these kinds of questions: Demonstrate, Modify, Operate, Prepare, Produce, Relate, Show, Solve, Use. Example: "Calculate the deflection of a beam under uniform loading."

To measure analysis (recognition of unstated assumptions or logical fallacies, ability to distinguish between facts and inferences), ask these kinds of questions: Diagram, Differentiate, Distinguish, Illustrate, Infer, Point out, Relate, Select, Separate, Subdivide. Example: "In the president's State of the Union Address, which statements are based on facts and which are based on assumptions?"

To measure synthesis (integrate learning from different areas or solve problems by creative thinking), ask these kinds of questions: Categorize, Combine, Compile, Devise, Design, Explain, Generate, Organize, Plan, Rearrange, Reconstruct, Revise, Tell. Example: "How would you restructure the school day to reflect children's developmental needs?"

To measure evaluation (judging and assessing), ask these kinds of questions: Appraise, Compare, Conclude, Contrast, Criticize, Describe, Discriminate, Explain, Justify, Interpret, Support. Example: "Why is Bach's Mass in B Minor acknowledged as a classic?"

Many faculty members have found it difficult to apply this six-level taxonomy, and some educators have simplified and collapsed the taxonomy into three general levels (Crooks, 1988): The first category knowledge (recall or recognition of specific information). The second category combines comprehension and application. The third category is described as "problem solving," transferring existing knowledge and skills to new situations.

If your course has graduate student instructors (GSIs), involve them in designing exams. At the least, ask your GSIs to read your draft of the exam and comment on it. Better still, involve them in creating the exam. Not only will they have useful suggestions, but their participation in designing an exam will help them grade the exam.

Take precautions to avoid cheating.


 
    © 2009 Nesterov A.         
   support@tests-builder.com Рейтинг@Mail.ru