unprofessional but also unethical.

7(E)valuation of Training and Development


Learning Objectives

After reading this chapter, you should be able to:

• Differentiate between formative and summative evaluations.

• Use Kirkpatrick’s four-level evaluation framework.

• Compute return on investment.

• Explain why evaluation is often neglected.

One of the great mistakes is to judge policies and programs by their intentions rather than their results.

—Milton Friedman, Economist

Introduction Chapter 7

Pretest 1. It is possible for organizations to try out trainings before they are launched.

a. true b. false

2. Assessing whether trainees enjoyed training is important only as an evaluation of the trainer’s competence. a. true b. false

3. Return on investment should be calculated after every training session to determine whether it was cost-effective and benefited the company as a whole. a. true b. false

4. Fewer than 25% of organizations perform formal evaluations of training effectiveness. a. true b. false

5. Failure to evaluate trainings may be not only unprofessional but also unethical. a. true b. false

Answers can be found at the end of the chapter.

Introduction We seek to answer one overarching question in the final, evaluation phase of ADDIE: Was the training effective? (See Figure 7.1.) In particular, we assess whether we realized expected training goals—as uncovered by our analysis phase—specifically, whether the trainees’ post- training KSAs improve not only their performance, but also the organization’s performance. As we will see, the process of training evaluation includes all of these issues, as well as decid- ing which data to use when evaluating training effectiveness, determining whether further training is needed, and assessing whether the current training design needs improvement. Ultimately, evaluation creates accountability, which is vital given the significant amount organizations spend on training and developing employees—approximately $160 billion annually (ASTD, 2013). This significant investment makes it imperative that organizations know whether their training efforts yield a positive financial return on training invest- ment (ROI).

Formative Evaluation Chapter 7

Figure 7.1: ADDIE model: Evaluate

In this final phase of ADDIE, we evaluate how effective the training has been. From assessing any improvement in the KSAs of the trainees to the financial return on the training investment, the evaluation phase appraises the effectiveness of not only our prior analysis, design, development, and implementation, but also of the training in totality.


Design Develop ImplementAnalyze Evaluate

7.1 Formative Evaluation Although evaluation is the last phase of ADDIE, it is not the first time aspects of the training program are evaluated. When it comes to training evaluation, we assess the training through- out all phases of ADDIE, using first what is known as a formative evaluation. Formative evalu- ation is done while the training is forming; that is, prior to the real-time implementation and full-scale deployment of the training (Morrison, Ross, & Kalman, 2012). Think of formative evaluation as a “try it and fix it” stage, an assessment of the internal processes of the training to further refine the external training program before it is launched.

Formative evaluations are valuable because they can reveal deficiencies in the design, devel- opment, and implementation phases of the training that may need revision before real-time execution (Neirotti & Paolucci, 2013; U.S. Department of Health and Human Services, 2013; Wan, 2013).

Recall from Chapter 6 that formative evaluations can range from editorial reviews of the train- ing and materials—which may include a routine proofread of the training materials to check for misspelled words, incomplete sentences, or inappropriate images—to content reviews, design reviews, and organizational reviews of the training (Larson & Lockee, 2013; Noe, 2012; Piskurich, 2010; Wan, 2013). So, for example, we may find in a content review that our training is not properly linked to the original learning objectives. Or we may conclude dur- ing a design review that because e-learning is not a good fit with the organizational culture, instructor-led training is a more appropriate choice.

Formative evaluations also encompass pilot testing and beta testing. With pilot tests and beta tests, we are out to confirm the usability of the training, which includes assessing the effec- tiveness of the training materials and the quality of the activities (ASTD, 2013; Stolovitch & Keeps, 2011; Wan, 2013). Both beta tests and pilot tests are considered types of formative evaluation because they are performed as part of the prerelease of the training. For the pilot and beta testing, selected employees and SMEs are chosen to test the training under normal, everyday conditions; this approach is valuable because it allows us to pinpoint any remain- ing flaws and get feedback on particular training modules (Duggan, 2013; Piskurich, 2010; Wan, 2013).


Summative Evaluation Chapter 7

7.2 Summative Evaluation Whereas formative evaluation focuses on the training processes, summative evaluation focuses on the training outcomes—for both the learning and the performance results following the training (ASTD, 2013; Piskurich, 2010; Wan, 2013). Summative evaluation is the focus of the E phase of ADDIE. According to Stake (2004), one way to look at the difference between forma- tive and summative evaluation is “when the cook tastes the soup, that’s formative evaluation; when the guests taste the soup, that’s summative” (p. 17).

In summative evaluation, we assess whether the expected training goals were realized and, specifically, whether the trainees’ posttraining KSAs improved their individual performance (and, ultimately, improved the organization’s overall performance). As Figure 7.2 depicts, in summative evaluation, we assess both the short-term learning-based outcomes—such as the trainees’ reactions to the training and opinions about whether they actually learned anything—and the long-term performance-based outcomes. These long-term performance- based outcomes include assessing whether a transfer of training occurred—that is, applica- tion to the workplace via behavior on the job—as well as whether any positive organizational changes resulted, including return on investment (Noe, 2012; Phillips, 2003; Piskurich, 2010).

Figure 7.2: Summative evaluation’s short-term and long-term outcomes

Training evaluation can be broken down into short-term and long-term assessments. Short- term evaluations are usually trainee focused, whereas long-term assessments are focused on the training itself.


Summative outcomes

Short-term outcomes

Learning by participants

Reactions of learners

Organizational impact and Return

on Investment

Behavior on the job

Long-term outcomes

As Figure 7.3 depicts, however, the most common assessments organizations perform with summative evaluation are ultimately the least valuable to them (ASTD, 2013; Nadler & Nadler, 1990). The next section will discuss each level of evaluation.

Kirkpatrick’s Four-Level Evaluation Framework Chapter 7

Figure 7.3: Use versus value in evaluation

Although levels 1 and 2 are most used and usually easiest to compile, levels 3, 4, and 5 (ROI) are deemed to be the most valuable information in assessing training effectiveness, but they require complex calculations.


Percentage who use the corresponding level to any extent

Percentage who say this level has high or very high value

Reactions of participants

Evaluation of learning

Evaluation of behavior

Evaluation of results

Return on investment







Source: Adapted from American Society for Training & Development. (2013). State of the industry report. Alexandria, VA: ASTD.

7.3 Kirkpatrick’s Four-Level Evaluation Framework Perhaps the best known and most drawn-upon framework for summative evaluation was introduced by Donald Kirkpatrick (Neirotti & Paolucci, 2013; Phillips, 2003; Piskurich, 2010; Vijayasamundeeswari, 2013; Wan, 2013), a Professor Emeritus at the University of Wisconsin and past president of the ASTD. Kirkpatrick’s four-level training evaluation taxonomy— first published in 1959 in the US Training and Development Journal (Kirkpatrick, 1959; Kirk- patrick, 2009)—depicts both the short-term learning outcomes and the long-term perfor- mance outcomes (see Figure 7.4). Let us detail each level now.

Kirkpatrick’s Four-Level Evaluation Framework Chapter 7

Figure 7.4: Kirkpatrick’s four-level evaluation

Donald Kirkpatrick’s four-level evaluation is the widely used standard to illustrate each level of training’s impact on the trainee and the organization as a whole. Kirkpatrick’s typology is a good starting point to frame discussions regarding the trainee’s reaction to the training (level 1), if anything was learned from the training (level 2), if the trainee applied the training through new behavior (level 3), and ultimately, if the training resulted in positive organizational results (level 4).


4 Results

3 Transfer

2 Learning

1 Reactions

Level 1—Reaction: Did They Like It?

A level 1 assessment attempts to measure the trainees’ reactions to the training they have just completed (Kirkpatrick, 2009; Wan, 2013; Werner & DeSimone, 2011). Specifically, level 1 assessments ask participants questions such as:

• Did you enjoy the training? • How was the instructor? • Did you consider the training relevant? • Was it a good use of your time? • Did you feel you could contribute to your learning experience? • Did you like the venue, amenities, and so forth?

A level 1 assessment is important not only to assess whether the trainees were satisfied with the training session per se, but also—and perhaps more significantly—to predict the effec- tiveness of the next level of evaluation: level 2, learning (ASTD, 2013; Kirkpatrick, 2009; Mor- rison et al., 2012; Noe, 2012; Wan, 2013). That is, as level 1 reaction goes, so goes level 2 learning. According to a recent study (Kirkpatrick & Basarab, 2011), there was a meaningful

Kirkpatrick’s Four-Level Evaluation Framework Chapter 7

correlation between levels 1 and 2, in that positive learner engagement led to a higher degree of learning. This outcome specifically follows the idea of attitudinal direction (Harvey, Reich, & Wyer, 1968; Kruglanski & Higgins, 2007), whereby a positive reaction (emotional intensity) can lead to constructive conclusions, as depicted in the following formula:

Attitudinal Direction

Perception + Judgment → Emotion (Level 1)

(Positive) Emotion → Learning (Level 2)

With attitudinal direction in mind, a level 1 evaluation is attentive to the measurement of attitudes, usually using a questionnaire. A level 1 survey includes both rating scales and open- ended narrative opportunities (Clark, 2013; Neirotti & Paolucci, 2013; Wan, 2013).

Typically, participants are not asked to put their names on the survey, based on the assump- tion that anonymity breeds honesty. Level 1 evaluation instruments are part of the training materials that would have been created in the development phase of ADDIE.

Level 2—Learning: Did They Learn It?

In a level 2 assessment, we attempt to measure the trainees’ learning following the training that they just completed (Kirkpatrick, 2009; Wan, 2013; Werner & DeSimone, 2011) and, spe- cifically, in relation to the learning outcomes we established during the analysis and design phases of ADDIE. Remember, learning outcomes can include cognitive outcomes (knowl- edge), psychomotor outcomes (skills), and affective outcomes (attitudes) (Noe, 2012; Piskurich, 2010; Rothwell & Kazanas, 2011).

• With cognitive outcomes, we determine the degree to which trainees acquired new knowledge, such as principles, facts, techniques, procedures, or processes (Noe, 2012; Piskurich, 2010; Rothwell & Kazanas, 2011). For example, in a new employee orienta- tion, cognitive outcomes could include knowing the company safety rules or product line or learning the company mission.

• With skills-based or psychomotor learning outcomes, we assess the level of new skills as a function of the new learning, as seen, for example, in newly learned listening skills, conflict-handling skills, or motor or manual skills such as com- puter repair and replacing a power supply (Morrison et al., 2012; Noe, 2012; Piskurich, 2010).

• Affective learning outcomes focus on changes in attitudes as a function of the new learning (Noe, 2012; Piskurich, 2010). For example, trainees who learned a different attitude regarding other cultures following diversity training or those who gained a new attitude regarding the importance of safety prevention after a back injury– prevention training class have achieved learning outcomes.

As with level 1, evaluations for level 2 are done immediately after the training event to deter- mine if participants gained the knowledge, skills, or attitudes expected (Morrison et al., 2012; Noe, 2012; Piskurich, 2010). Measuring the learned KSA outcomes of level 2 requires testing to demonstrate improvement in any or all level 2 outcomes:

Kirkpatrick’s Four-Level Evaluation Framework Chapter 7

• Cognitive outcomes and new knowledge are typically measured using trainer- constructed achievement tests (such as tests designed to measure the degree of learning that has taken place) (Duggan, 2013; Noe, 2012; Piskurich, 2010; Wan, 2013).

• For newly learned motor or manual skills, we can use performance tests, which require the trainee to create a product or demonstrate a process (Duggan, 2013; Noe, 2012; Piskurich, 2010; Wan, 2013).

• Attitudes are measured with questionnaires similar to the questionnaires described for level 1 evaluation, with the participants giving their ratings for various items (for example, strongly agree, agree, neutral, disagree, or strongly disagree). They also include open-ended items to let trainees describe any changed attitudes in their own words (for example, “How do you feel about diversity in the workplace?”) (Duggan, 2013; Kirkpatrick, 2009; Noe, 2012; Piskurich, 2010; Wan, 2013).

With a level 2 posttraining learning evaluation, Kirkpatrick recommends first giving partici- pants a pretest before the training and then giving them a posttest after the training (Cohen, 2005; Kirkpatrick, 1959; Kirkpatrick, 2009; Phillips, 2003; Piskurich, 2010) to determine if the training had any effect, positive or negative. Creating valid and reliable tests is not a casual exercise; in fact, there is a credential one can attain to become an expert in testing and evalu- ation (http://www.itea.org/professional-certification.html). Does the test measure what it is intended to measure? If the same test is given 2 months apart, will it yield the same result?