I III
' " .I
I

' I

I??? W w '
I”.

  
   
      
   
    
 
   

 

l _ I‘ V"- II. .
“315"? v " l " fu‘:
TI...I’ ‘(~{'$'I|'I$
I) [I I J." I I I

I {WWI} ., 8%: *§{UJ'|J".0? l I‘l(':»
h" "' -".'.-.‘ '1'! ' 1%": it-u.» .1".
I.‘ e .' _ I. I I '. I,

  

 
     
  

  

  

 

 
  
 
 

  
   
   
  
 
  
 
        
     

I
I I I”
.A ml!
I _ arm} :wI .
bl . I
"3:; .\.I‘r5%o "z :53
A" | .. _ < ' .1
[ '.. CIBI‘ ' ' ' "N .. IE:
«1‘ .. r- 3% .9

i .

 
  

sh: 3:" h . ' 3‘
‘2‘ It. . : a)- . . 1 9:1; ’1'
l ‘3‘" “ ‘ I' -~§JK'~ '
I ‘ as

   

 

    

y ' “e; ‘ 7‘: ' ":4. ”9‘1

' ‘ . - ~ v —-~L-.u ‘FL‘ ..'..
y ’1’ "' \- - T333 5'0 '37:; n- ~ ~03”;

- ‘ .[‘L° tﬁ: a l . I1; .I . as , _- ., *F'ﬁyﬂﬁx '3. ,CJ’J'MB; . 3",".1
If»; . II I: \L'XI ,. I.» . 7:55,.-1' v.“ .41.; .

      

. a;
I}
e
‘g‘
133;
l it

4}

    

  

2’!
If
4"?"
I? I
.E
'u

A

r!
A:
32::

   
 
 
  

is»?
31-1 .

I at?!
g;
3;?

1333513

.-
5.
I.-

  
   
  

~

‘4‘ W" .. t ' “'71” ‘
.9 VLK‘F'ah' - Jh ‘v - ‘ ._ .vvv‘v‘ti‘
. , ._

5"“ n . .
t , V» . . , .N‘
,' *3... 232:. (meaty. 7.1;.
..,~ .

 
  
 

o
|

      
 
 
  

w ;ré?”u";¥*’~‘;ti~..°=a It

I (1.. -
3" " E.
n .
- VI

‘ mama

   

  
 

w

   

\_

v
A I ka- I
'2 F ' "'9 RE)" '2”
. ﬁr?“ , 5 ‘I “L.
NV!" 'I '- "M"? 1
'. V' .1... 3‘

'V , .:’\‘.-... F
3,1“ 16?: .q‘ ‘3
“1W“ 3“: i k)
'- .i' “aﬂjkaﬁ .‘ ' r;
_ 'Ji'; I t'I' . .V' d _'_ ~ Iu‘r‘jI. ﬁ-wa ,..‘.
I; W3 ‘ 51%? " Ira—ﬂaw?“ x’

     
     
         
      
 
  

' 1’32.— u
I i O I
I I
133'!“ n
' mm- «4.,
I '
.l . '¥ -
".I’M‘MV' ' ' :sz 13-." .
urﬁl- -‘ n 34.2-- a
' '4. ' v. 1“
- Isa afﬁx ..
l

  

~ I

5
‘J
9
I?

   

              
      
         
  
 
    
  
    
 
 
 

      
  

 
    

   

     

   

h u, - 15; I q. 3,. . .. _
’ '{wi - p“.’,€_[.'-' funk": d}; .‘n’éwkgfkgﬁ mt “ _w.-'v; “w
‘ a f ""§;V~3T(5:;f;.‘ ’31.“. 73*.‘.-.L~'5Cii'—Q¢ﬂ&a 1’35???
"‘HM‘F‘W'?’"Wt“?“9):- ;.:“-~;Lv-r:¢;u«1;»,91- . “35+."
I-‘v'-‘ f “a: . . v , ~ I H1. - ' . .. - I-u.
'13.: $33” .:.y:—§- ',"‘}g. u .. I @212 If}
M r 1: \.'J‘! .v - ”‘1‘:
L" “,.‘,‘_ — - ‘73--
L {East-”‘1‘! "" '5‘: mi:
. ' ' ‘ :nw.
“ 35",,

 
     

‘3-
,r‘—’I..
VI

_ .1 -' ‘ “
— 14.331“ . f? -
' I n ‘1
. . I: £13, )4.- n .31”ng . I
.’ r 1

 
  

     
   
 
 
     
  
   
 
 
  
 
   

I ,_.p4 '. , v ‘..I' I}
‘l- l' .‘ {'JyI-ﬂ}. I “~L nu. I" '
_ I 'I‘ n',- I II III.->II INLWI‘V 1’ I
'. mm. . ~ I » r

. . ‘ ”Y
‘I'u '.‘ I “4.1%" “II

  

 

 

  
  

      

‘ , , , l , OI I ' u‘
. v1, I. IL.» Mr . m» .' .
'. 'I'I'. -:""« I‘VE}; \- WWI," .I . ‘-'_ 1‘ :1“ l ’\ " ""7 n-‘I'qz‘ '-‘
L". 'r-s.‘ 1 (‘I' “ urn I ~ " '{3' W": “Hit; ﬁr "’ "'3." " r
I'I‘II‘M h g I‘. . ugh-3'3 . ,. ~~'... n“
.. ‘+ r‘ .JT-m "...":' m; “ ' ':.\.. M." with; St " '
c . ~-“.-I."I13.'w.1--‘1I . -‘,.' I' I g .'.'- I . “I Null:- ”41%; ,o .._‘.
>~ I 'I l ' . . I . . I ‘ ' ' I '
{NW-a: 1 I'm a :wy w
(. 4 I“ "I‘ ’. v‘rw' ' a I T." ‘-l' u .. I. ‘- M."
I . I . I . , ' .1.-

  

‘ - .vi "3‘3"" 1
Flat-A” n: “‘5‘; “"4" “g
I'.
~ '

. ‘.' t I. .'." l {!IIIII.I.II:I+IIIIII,I IIIIIrIII *

   
 
  

 

       

   

II.. mu.“ m .f'. . .. .
1 film ”I n -:‘."n, I - ‘I‘l'l’I't'IH M" I“
. ”H. II 'II . "' ' I

, III ‘[ [Ir-ll: I““l‘_‘\- II'HIW‘ . I .
3:1‘ﬂ‘N '22. ll "IIII.I.:v"I,I .

    

"WI

-~’|’é‘»‘;

    

J.

  

«M
II»

IIII "
I‘ I
'II
I

I

aka-5.33;; '

7 'I_"I I‘ I ,
ﬂ “LIE-"O“ )1 I. .1‘."l" ‘ II.

'l’ I55“. «-1. I 1‘, -

f;
1
.5
2‘1 0 '- 5'
; £2 5.
; .I‘Kln .~ .-_a nu .
A F: I _; I 7‘
5“.» «is . _ _ ' _ —.-.L"..‘
O
{I} t" ’ »' 3'7
64:6 ‘ ‘..:,of ~- -. . _-‘
M

This is to certify that the

dissertation entitled

INNOVATION IN PUBLIC SECTOR ORGANIZATIONS:
A TEST OF THE MODIFIED RD&D APPROACH

presented by

David B. Roitman

has been accepted towards fulﬁllment
of the requirements for

Ph . D. degree in Psychology

 

 

 

Wjor prtﬁ‘essor

Date 2/ l O/ 84

MS U i: an Afﬁrmative Action/Equal Opportunity Institution 0-12771

 

 

MSU

LIBRARIES
.—:—.

 

 

RETURNING MATERIALS:
Place in book drop to
remove this checkout from
your record. FINES will
be charged if book is
returned after the date
stamped below.

 

 

l

INNOVATION IN PUBLIC SECTOR ORGANIZATIONS:
A TEST OF THE MODIFIED RD&D APPROACH

By

David B. Roitman

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Psychology

l984

ABSTRACT

INNOVATION IN PUBLIC SECTOR ORGANIZATIONS:
A TEST OF THE MODIFIED RD&D APPROACH

By

David B. Roitman

The present research was designed to test the viability of the
modified Research, DevelOpment, and Diffusion (RD&D) approach to the
dissemination of innovative programs. This approach emphasizes
systematic development and evaluation involving practitioner input,
and encourages interpersonal contact during dissemination from
developer-based operations. A current effort to abandon this approach
lacks empirical support.

The following two major research questions were therefore addressed:
(a) To what extent are modified RD&D programs implemented with fidelity
(correspondence to original program models) at adopting sites?, and (b)
To what extent is the fidelity of implementation related to program
effectiveness? Other research questions involved the development of
an empirically-based definition, and typology, for the concept of
reinvention, and examining the relationships among reinvention,
fidelity, and effectiveness.

Seven social programs developed and disseminated nationwide with
federal funding were studied. In general, results supported the

modified RD&D approach. The programs were implemented with acceptable

i3delity,and a significant correlation between fidelity and
effectiveness was obtained. Agreement between telephone and site
visit results attested to the fidelity instrument's construct
validity. However, these results were qualified by evidence of
variance between programs.

Based on this research, reinvention was defined as the use of
materials, activities, procedures, or organizational structures that
cannot be explained using the framework provided by developers to
measure fidelity. Instances of reinvention were categorized as
additions or modifications, proactive or reactive, and if reactive
either externally or internally induced.

The extent of reinvention was found to be positively related to
both fidelity and effectiveness. Partial correlation analyses did not
disconfirm a model which posited high levels of fidelity leading to
high levels of reinvention and effectiveness. The positive
relationship between reinvention and effectiveness was shown to be

due to additions, rather than modifications.

ACKNOWLEDGEMENTS

This research was conducted as a team project. The intellectual
environment created by the members of this team was one of the most
stimulating and enjoyable I have experienced. After being in the
arena with these individuals, I found that defending my ideas in
other settings felt like childs' play. The core of this group
included Craig Blakely, Bill Davidson, Jim Emshoff, Rand Gottschalk,
Jeff Mayer, and Neal Schmitt. Because of their good sense, good
humor, and dedication, the project has been great fun as well as
great learning. I hope that I will have the good fortune to work on
equally balanced and spirited teams in the future--but I doubt it. I
will certainly miss this crew of friends.

Another major influence on this research was Lou Tornatzky. In
fact, he was a team member in absentia--most of us formed our ideas
about innovation research largely or in part through conversations with
Lou, and his vision inspired this project from beginning to end. In
addition to his role as invisible guru for this project, Lou has been
a great social scientist/activist role model for me. No one says more
with fewer words of wisdom and integrity. Thanks, Lou.

In addition to their roles as team members, I'd like to address
each of the core team members as individuals. I'll pick one thing
to thank each for, although of course there are many. Bill's
friendship and words of support have really been something to lean

on during seven long years. Craig is "what it is"--solid, solid,

1':

solid. Rand managed to put up with me for four months on the road--
and has been a great partner. Neal's calm and considerate brilliance
is always a treat--one of the best teachers I've ever known. Jeff,
I'm sorry you signed on so late, and Jim, I'm sorry you left early--
two of my deepest pals up here. Thanks again, team.

Others who served in the trenches on this project included Phil
Nickel, Jeanna Chodakowski, Dave Thompson, Theresa Narzniak, and Devi
Smith (later for you, Joe)--all of whom I thank for their contributions.
Becky Mulholland was the classic secretary doing it all, and Karen
Garlock and Kelly Campbell also helped a lot. Also thanks to Suzy
Pavick for typing this monster.

And the cast of thousands--teachers, aides, jury clerks, youth
workers, cops, counselors, administrators and service providers of
all stripes, kids, jurors, neighbors, pre-release residents-—the
stuff of which these data are made--I hope our research does some
justice.

Now to the individuals not directly involved with this project
who have helped me make it through. I'll single out three from many
more: Charlie Johnson, who has been there with just what I needed,
whenever asked; Bill Fairweather, who invented this strange and
marvelous enterprise of Ecological Psychology, which has been the
dominant shaping force throughout seven years of my life--and I'm
proud of it; and Don Davis--a great buddy, real smart guy, raconteur,
connoisseur, elegant hobo--Don's into a lot of good stuff.

For those reading this who've never struggled through the

Ecological Program, let me tell you, it's tough. I couldn't have

made it through without the constant support of my wife, Susan, who's
made some incredible sacrifices putting up with all of this; and my
parents, who have been parents in the best sense of the word. Thanks
also to Joe Bornstein for his intellectual and cullinary companionship,

and to our good friends Joe and Linda. Thank you, all.

iv

TABLE OF CONTENTS

Page
LIST OF TABLES ......................... ix
LIST OF FIGURES ........................ x
INTRODUCTION ......................... l
Modifying the Classical RD&D Approach ........... l
Fidelity and Adaptation .................. 5
Middle Ground Positions .................. 7
The Need for Measurement DevelOpment ........... lO
Precision and Accuracy in Measuring Fidelity ...... ll

Are Modified RDBD Programs Implemented with Fidelity?. . . l3
Are Different Programs Implemented with Equivalent

Levels of Fidelity? .................. l4
Are Programs Implemented with Fidelity Across Social
Policy Areas? ..................... l5
Is Fidelity Related to Program Effectiveness? ....... l5
Reinvention ........................ 16
Relevance of Reinvention to the RD&D Approach ....... 18
Summary of the Research Questions ............. l8
METHOD ............................. 20
Overview ......................... 20
Innovative Social Programs and Vehicles for
Their Dissemination ................. 20
Selection Criteria ................. 23
Description of Programs .................. 24
Sampling ......................... 24
Sampling Strategy ................... 24
Unit of Analysis .................... 26
Respondents ...................... 27
Measurement of Implementation Fidelity .......... 27
Measurement Development for the Telephone
Interview Instrument ................ 27
Preliminary Identification of Innovation
Components .................... 28
Preliminary Identification and Scaling
of Variations .................. ~ 30
Feedback Interviews with Developers ......... 3l
Pilot Tests ..................... 32
Data Collection for the Telephone Interview
Instrument ..................... 33

Reliability and Validity of the Telephone
Interview Instrument .................
Reliability .....................
Validity .......................
Measurement Development for the Site-Visit
Instrument ......................
Data Collection for the Site-Visit Instrument ......
Reliability and Validity for the Site-Visit
Instrument ......................
Reliability .....................
Validity .......................
Measurement of Reinvention .................
Measurement Development .................
Site-Visit Data Collection ...............
Content Analysis ....................
Development of Criteria and Procedures ........
Content Analysis: Coding Procedures .........
Empirically-Based Definition of Reinvention .......
Typology of Reinvention .................
Addition-Modification ................
Proactive-Reactive ..................
Extent of Reinvention ..................
Summary of Typology and Extent of Reinvention Methods . . .
Measurement of Effectiveness ................
Effectiveness Criteria .................
Data Collection .....................
Data Transformation ...................
Summary of Methods .....................

RESULTS .............................

Fidelity Per Se ......................
Overview ........................
Are Modified RD&D Social Programs Implemented

with Fidelity? ....................

Telephone Interview Results .............

Site-Visit Results ..................

Comparing Telephone and Site-Visit Results:

Visual Comparison .................

Comparisons Between Telephone Interview and

Site-Visit Results: Correlational and

Percentage Agreement Analyses ............

Correlations Between Data Sets ............

Percentage Agreement Comparisons ...........
Are There Differences in Fidelity of Implementation

Between Programs? ..................

Telephone Interview Results .............

Site-Visit Results ..................

Comparing Telephone and Site-Visit Results ......

vi

Page

34
34
34

35
35

62

63
63

64
64
67

69
69

69
7O

72
72

Are Programs Implemented with Fidelity
Across Social Policy Areas? ............
Telephone Interview Results ............
Site-Visit Results ................
Comparing Telephone and Site-Visit Results . . . .
Reinvention Per Se ...................
Overview .......................
Descriptive Analyses .................
Differences Between Programs .............
Sum of Instances .................
Weighted Instances ................
Unweighted Category Sums .............
Weighted Category Sums ..............
Differences Between Policy Areas ...........
Overall Reinvention Scores ............
Reinvention Categories ..............
Effectiveness Per Se ..................
Overview .......................
Types of Data ....................
Comparisons of Sample Sites with Demonstration
Sites .......................
Relationships Among Fidelity, Reinvention,
and Effectiveness ..................
Use of Site Visit Data ................
Tests for Non-Linearity ...............
Data Transformations .................
Standardized Fidelity Scores ...........
Standardized Reinvention Scores ..........
Normalized Effectiveness Scores ..........
Simple Correlations: Relationships Among the
Major Variables ..................
Partial Correlations .................

DISCUSSION ..........................

Question l: Are Modified RD&D Programs

Implemented with Fidelity at Adopting Sites? .
Question la: Are There Differences Between Sample

Programs of Fidelity? Question lb: Are There

Differences Between Two Policy Areas

(Education vs. Criminal Justice) on Fidelity?. . .
Questions 1, la, lb: Implications ..........
Question 2: To What Extent is the Fidelity

of Program Implementation Related to Program

Effectiveness? ..................
Question 2: Implications ..............
Question 3: What is a Useful Definition, and

What is a Useful Typology, for the Concept

of Reinvention? ..................

vii

94

95
95

~ 98

99

100

Page

Question 3a: Are There Differences in

Reinvention Among the Sample Programs? ...... 102
Question 3b: Are There Differences Between the

Educational and Criminal Justice Policy

Areas on Reinvention? .............. l02

Questions 3, 3a, and 3b: Implications ....... l02
Question 4: What are the Relationships Among

Fidelity, Reinvention, and Effectiveness? . . . . lO3

Question 4: Implications .............. 104

Future Research .................... lO7

APPENDICES
A. Examples of Components and Variations ....... 110
B. Descriptive Analyses of Reinvention Data ...... ll2

C. Descriptive Analyses of Effectiveness Data:
Types of Data, Summary Statistics, and
Comparisons to Demonstration Sites ....... 116

REFERENCES ......................... l3O

viii

LIST OF TABLES

Table Page
1. Summary of Methods .................... 6O
2. Descriptive Statistics: Fidelity ............ 65
3. Descriptive Statistics: Reinvention ........... 77
4. Modification and Addition by Program:

Descriptive Statistics .................. 79

5. Rank-Ordering of Partial Correlations with
Effectiveness ...................... 93

ix

LIST OF FIGURES

Figure Page
1. Innovative Social Programs Selected for Study ....... 25
2. Typology and Extent of Reinvention ............ 47
3. Program Effectiveness Criteria .............. 56
4. Mean Average-Item Fidelity Scores ............. 68

INTRODUCTION

Modifying the Classical RD&D Approach

The present research explores the viability of the "modified"
Research, Development, and Diffusion (RDBD) approach as a vehicle for
social innovation. The classical RD&D model (Havelock, 1976; House,
Kerins, & Steele, l972) was used as both a research paradigm and a
model for policy making. Those who used the model assumed that social
innovations should be developed through systematic research performed
at laboratories which specialized in RD&D, and should be evaluated
using both formative and summative methods prior to dissemination to
user sites. This classical model was popular among federal policy
makers in the l960's, and was partly inspired by the success of
federally-sponsored RDBD related to space exploration (House, 1981).

In transfering the classical model to social programming, potential
innovation ad0ption sites such as school districts, municipal
departments and local-level social agencies were assumed by researchers
and policy makers to value evaluation results highly, to make decisions
according to specified goals, and to act as relatively passive
consumers in the dissemination process. It was reasoned that if an
innovation was demonstrated to be effective through research,
disseminating information through printed media would be sufficient

to encourage adoption. Implementation of the innovation was assumed

2
to proceed automatically from adoption (Tornatzky, Fergus, Avellar,
Fairweather, & Fleischer, 1980).

These latter assumptions were called into question by several
studies of educational innovation conducted in the later 1960's and
throughout the 1970's (Berman & McLaughlin, 1978; Farrar, DeSanctis,
& Cohen, 1979; Fullan & Pomfret, 1977; House, 1975; House, 1981).
This body of research presented evidence that sites were not at all
passive receivers of innovations; instead, a myriad of organizational
factors were uncovered as potent influences on the extent of program
implementation. These included the extent to which local decision
makers mobilized broad-based support, used a "problem-solving" rather
than "opportunistic" mode of decision making, and planned ahead for
implementation (Fullan & Pomfret, 1977).

While these educational implementation studies were being conducted,
Rogers and his colleagues (Eveland, Klepper, & Rogers, 1977; Rogers,
1978) were studying the implementation of several other federally
sponsored social innovations, including the GBF-DIME computer-based
information system and the Dial-A-Ride transportation program. These
researchers were struck by the degree to which the innovations were
"reinvented" by sites to fit their specific needs and to provide a
sense of innovation "ownership".

In addition to the studies of educational innovation and the
work of Rogers and his associates, a third source of influence leading
to modification of the classical RD&D model was the theoretical work
of March, Simon, and their colleagues (March & Simon, 1958; Cyert &
March, 1963). These researchers argued that organizational decision

criteria were usually "satisficing", rather than "maximizing".

3
Instead of engaging in thorough searches for information and
elaborate comparisons of pro-and-con factors, decision makers were
recognized as bounded by a more limited, expeditious rationality.
In addition, organizations were conceived by these researchers as
composed of various coalitions of members, each coalition with
different and possibly conflicting goals (Cyert & March, 1963).

A fourth body of research contributing to the modification of the
classical RD&D model was the work of Havelock and his colleagues
(Havelock, 1976). As a result of a massive review of the innovation
literature, Havelock identified three dominant models for innovation
dissemination research and practice; the RDBD, Social Interaction and
Problem Solver perspectives. Both the Social Interaction and Problem-
Solving approaches are more "process" and "user" oriented when
compared to the "product" and "developer" orientation of the RD&D model.
The Social Interaction approach, developed primarily by rural
sociologists (Ryan & Gross, 1943; Rogers & Shoemaker, 1971), emphasized
the social relationships between disseminator and adopter, and the
importance of reference group identification (i.e., that the
disseminator be respected as an "opinion leader"). The Problem-Solving
approach, utilized primarily by organizational development theories and
practitioners (e.g., Lippit, Watson, & Westley, 1958) stressed diagnosis
of the adopter's needs and maximizing the use of the adopter's
resources (both personal and organizational) in the innovation process.
A hallmark of this approach was that change initiated by the adopters
themselves has the best prospects for long-term maintenance. Havelock

and his associates synthesized these three perspectives to produce the

4
"linkage" model of innovation dissemination. This approach attempted
to retain the best features of each perspective, keeping the systematic
development and evaluation aspects of the R080 model, while adding the
interpersonal and interorganizational emphasis of the Social
Interaction approach, and the attention to adopter needs and
organizational processes stressed by the Problem-Solving approach.
Thus the linkage model can be viewed as a modification of the
classical RDBD perspective.

Although these four bodies of research (the empirical educational
innovation and federal-program reinvention studies, and the
empirically-based theoretical works of March and Simon and Havelock)
can be viewed independently, their historical interrelationship is
clear. Havelock's work was widely read among social scientists, and
contributed to the conceptual frameworks utilized in the educational
innovation and reinvention studies, while Havelock's thinking was
shaped somewhat by the ideas of March and Simon.

These various bodies of research were influential at the
federal policy level in the modification of the classical RD&D model
(Datta, 1981). For example, the Office of Education's National
Diffusion Network, established in 1974, was designed to utilize "state
facilitators" as active change agents to assess the need of local
districts, to tailor change strategies to the district's political
climate, and to foster local support for the disseminated innovations
(Emrick, Peterson, & Agarwala-Rogers, 1977). In addition, local-
districts were encouraged to work cooperatively with research groups

in developing and evaluating their own site-generated innovations.

5

These would be eligible for dissemination under NDN auspices if
demonstrated to be effective. The NDN modifications of the classical
RD&D approach were directly inspired by Havelock's work (Raizen, 1979).
Also during the early 1970's, a similar program for encouraging
site-generated social innovations (the Exemplary Projects Program) was
established by the Department of Justice's Law Enforcement Assistance
Administration (LEAA). Although also stressing the importance of
interpersonal contact through site-visits, the LEAA did not establish
as extensive a network of linkage agents when compared to NDN.

‘ In sum, by recognizing that practitioners and local decision
makers were likely to have greater understanding of the political
realities and organizational nuances involved in successful
implementation when compared to researchers or centrally-located
bureaucrats, federal policy makers thus modified the classical RDBD
model. The attempt by the Office of Education to establish a
"Diffusion Network" with strong, relatively long-lived
interorganizational and interpersonal ties, and the emphasis of the
Exemplary Projects Program on site-visits to innovation developers
showed further recognition of the limitations of the classical model.
However, other elements of the RD&D model were retained, such as the
development of programs using scientific research and evaluation,

with funding initially channeled to specific development sites.

Fidelity and Adaptation y
Although these modifications were grounded in research, the
modified RDBD model has not gained the same level of acceptance as

the classical approach of the 1960's. Indeed, a number of writers

6
in the innovation area have argued for abandoning the RD&D model
altogether, in favor of a more decentralized, local problem-solving
approach (Berman, 1981; House, 1974).

The field of social innovation policy research can thus be seen
as divided into two opposing camps: "pro-fidelity" vs. "pro-adaptation"
researchers (Fullan & Pomfret, 1977). The former conceptualize
innovations as consisting of a number of relatively well specified
components. Those championing fidelity argue that rigorously developed
and evaluated programs should be implemented with close correspondence
to the validated models or else suffer the consequences of "dilution"
at adoption sites (Borden & Gomez, 1977; Calsyn, Tornatzky, & Dittmar,
1977). Dilution is expected to lead in most cases to reductions in
outcome effectiveness.

On the other hand, "pro-adaptation" researchers and practitioners
argue that differing organizational contexts and practitioners needs
demand on-site modification, virtually without exception (Berman &
McLaughlin, 1978; House et a1., 1972). For example, according to
Gephart,

A specific product or procedure is developed for a

particular purpose or function...(but)...typically,

purposes or functions differ from setting to setting...

(and)...although the ideal system would be one which had

the needed number and types of components universally

required...we seldom know enough in a design effort to

create all the component parts. (1976, pp. 5-6)

The roots of the "pro-fidelity" and "pro-adaptation" positions
in the previous literature and practice are clear. The fidelity

orientation corresponds to the RDBD approach, while the adaptation

position is rooted primarily in the Problem-Solving tradition. It

7
would seem that the Social Interaction viewpoint is neither inherently
pro-fidelity nor pro-adaptation, but does question two pro-fidelity
assumptions; specifically, that the adopter is a "passive consumer,"
and that the innovation product, rather than the innovation process,
is the most useful focus of attention.

Although several additional frameworks for organizing the innovation
literature have been devised (e.g., House, 1981; Yin, 1978), they
follow a remarkably similar pattern. Each identified an RDBD-type
approach as one perspective, and outlines additional perspectives
which are either directly antagonistic to this model or offer
alternatives which may co-exist with the RD&D approach, while
questioning several of its basic assumptions. Although these approaches
differ in the extent to which the RDBD assumptions are questioned,
all hold similar implications for public policy. For example, one
implication is that the freer users are to adapt programs to their
local needs, the more likely they are to adopt programs which last.

A second implication is that the more the program is modified to suit
the site, the more likely it is to achieve the outcomes desired by
users. An even more radical implication of these perspectives is

that instead of channeling initial program development funds to
specific developer sites, funding should instead be devoted to building

the capacities of the local sites to develop innovations independently.

Middle Ground Positions
In a recent article Berman (1980) has considerably advanced the
fidelity-adaptation debate by proposing a normative contingency model

for implementation strategy. Although the model was developed for

8
policy implementation, it applies equally well to program implementation.
This contingency model implies that different strategies for
implementation are most appropriate for different situations.

According to Berman,

There is no universally best wa to implement policy.

Either programmed (pro-fidelity) or adaptive implementation

can be effective if applied to the appropriate policy

situation...Policy situations are often so complex that a

mix of programmed and adaptive strategies might be more

effective than a simple choice between the two. (p. 206)

Berman suggested five situational parameters to be considered
when designing an implementation strategy: (a) scope of change
(incremental or major); (b) certainty of technology or theory;

(c) amount of conflict over policy goals and means; (d) structure

of the institutional setting (tightly vs. loosely coupled); and

(e) the environment's stability. He argued that relatively structured
conditions support the use of programmed (pro-fidelity) approaches,
while unstructured situations imply the use of adaptive strategies.
Thus, rather than taking a dogmatic pro-adaptation position, Berman
has outlined a sensible middle ground.

Another recent shift from a pro-adaption to a middle ground
position has been taken by House (1981), who argued that "a truly
comprehensive (innovation) strategy would view the (innovation)
situation from all three perspectives" (p. 39). The three perspectives
identified by House are the technological (RD&D), the cultural (similar
to Havelock's Problem-Solving perspective), and the political. (The
latter approach has some similarity to the Social Interaction approach.

However, by focusing on the conflict and negotiation aspects of the

innovation situation in his description of the political perspective,

9
House distinguished his categorization from the previous authors.)
House noted that,

These different frameworks...set limits as to what is

considered useful inquiry...They limit the very language

and concepts employed in the discussions and thereby give

a certain value slant. (pp. l9-20)

However, House goes on to argue that the three positions will
continue to coexist in research and practice, since each has a real
constituency:

...the technological perspective represents the interests

of those who sponsor innovation; the cultural perspective,

the interests of those who are "being innovated"; and the

political perspective, the negotiation of these interests...

It is significant that the three perspectives reflect the

viewpoints of dominant societal institutions. These

viewpoints have already been institutionalized within the

academic disciplines such as economics, engineering

(technological), political science and sociology (political)

and anthropology (cultural)...Can one of these perspectives

be "proved correct"?...It would not seem so. Each

perspective focuses on different aspects of reality, and

in fact values the same aspects differently. (p. 40)

Continuing with House's line of reasoning, both the political and
cultural perspectives imply the inevitability of adaptation. If
organizations are truly composed of various factions and sub-cultures,
each with different political interests and values, some adaptation
is necessary to resolve conflicts. Yet recognition of the complexity
of innovation processes need not obviate the modified RD&D model.

The question remains: Must adaptation prevent the implementation of
a social program with reasonably high fidelity?

The path to an empirical solution to this question was outlined
by Hall and his colleagues (Hall & Loucks, 1978). Taking yet a third
middle-ground position, Hall and Loucks argued that adaptation was

acceptable up to a measurable "zone of drastic mutation", beyond

10

which the innovation lost its integrity. Therefore, the issue became
empirically focused on measuring how much of what kinds of adaptation
have taken place. Following this line of thinking, adapting a program
to better fit its organizational context need not be anathematic to
the modified RD&D approach.

However, despite the clear good sense of the middle-ground positions
taken by Berman, House, and Hall, few decision-makers have heeded their
advice. Instead, as Berman notes, "advocates on both sides seem to be

throwing down the gauntlet" (p. 206).

The Need for Measurement Development

Although the pro-adaptation position has attracted an increasing
number of adherents in recent years (Datta, 1981), a close examination
of the principal studies used in its support shows that its
foundations are somewhat tenuous. For example, the widely-cited RAND
report on federal programs supporting educational change (Berman &
McLaughlin, 1978) found three dominant patterns for implementation:
mutual adaptation (when both project and setting were changed);
cooperation (when "the staff adapted the project...without any
corresponding changes in traditional institutional behavior or
practices"); and non-implementation. The RAND researchers reported
that "mutual adaptation was the only process leading to teachers
change," and "had a better chance of being effectively implemented"
than coopted projects. In addition, they reported a striking absence
of high fidelity adoption. In their words, ‘

A fourth process, which we call technological learning,

represents a situation in which the staff would acquire

skills in using a new educational method without adapting
the method to the reality of the user's setting. At the

11

extreme, "teacher-proof" packaged materials assume such

implementation. However, we did not observe any real

instance of technological learning. Instead, we found

that even highly technological or prescriptive projects

were either modified to suit local needs and interests or

were implemented in a superficial manner that destined

project materials for the schoolroom storage closets.

However, a closer look at the RAND methodology reveals the
absence of any bona fide measure of program fidelity. The RAND
researchers used as their implementation outcome measure "the extent
to which projects met their own goals, different as they might be
for each project" (Berman & McLaughlin, 1977, Vol. VII, p. 50).
Therefore, their implementation measure was quite imprecise, and was
biased to reflect adaptation, rather than fidelity. There is no way
to determine from these results to what actual extent programs were
modified or what components were changed. Additional doubts
concerning the RAND conclusions were raised by Datta (1981), who noted
that the "programs" examined were for the most part loosely defined

policy statements, rather than highly specified social programs.

Precision and Accuracy in Measuring Fidelity

 

In short, reviews of the implementation literature (e.g.,
Scheirer & Rezmovic, 1982) have suggested that considerable attention
to measurement development is presently required to advance the state
of implementation research. Refinements in both the precision and
accuracy of implementation measures are needed. The present research
thus focused a good deal of effort on measurement development. With
regard to obtaining precise specifications of innovation parameters
in order to adequately operationalize fidelity of implementation, the
present study followed the pioneering efforts of Hall and his

associates (Hall & Loucks, 1978; Hall & Loucks, 1981; Heck, Stiegelbauer,

12
Hall, & Loucks, 1981). Their basic methodology involves identifying
program components through extensive interviews with developers and
users, and reviewing written materials concerning the innovation.
Variations (different ways of implementing components) are then
identified and scaled as "ideal," "acceptable," or "unacceptable."
Interviews, observations, and examinations of documents may then be
used to determine which variations are implemented at sites. Specific
patterns of variations (configurations) may also emerge from this
process.

This methodological approach offered much promise and had begun
to be utilized by contemporary researchers in the field of educational
innovation (Crandall, 1979; Owens & Haenn, 1977). The basic approach
was thus employed in the present study, with some modification to
accommodate the scope and purpose of the research.

The development of measurement accuracy, as well as precision
is of critical importance to progress in this research area. Although
both interviews and observations have been utilized by Hall and his
colleagues to measure implementation, they recently reported that
"to date no formal study of the reliability between checklist data
(concerning which variations are implemented) obtained through
interviewing and checklist data obtained through observations has
been conducted" (Heck et al., 1981). In Scheirer & Rezmovic's more
recent review covering 74 studies of program implementation, 55 studies
were identified which used multiple measures of implementation, with

interviews and questionnaires being the predominant types of measures.

13
However, Scheirer and Rezmovic found that,

Of the 55 studies in which multiple implementation measures

were taken, 34 studies (62%) did not present any

comparative information on the extent to which data

obtained by different methods were in agreement...Twenty-one

of 55 studies (38%)...did compare findings from the different

measures. However, the comparisons were as often qualitative

and judgemental as they were quantitative. Further, biases

were not necessarily reduced by the use of multiple

measurement techniques...Based on the available data, we

cannot make conclusions about the relative usefulness]

meaningfulness/validity of data obtained by different

measurement techniques. (pp. 38-40)

The present study therefore devoted attention to this issue by
constructing two forms of each measure: a telephone interview form
which was administered using telephone interviews, and a site-visit
form which involved site-visit interviews, observations, and examinations
of archival data.

Accuracy in the measurement of fidelity was also assessed by
(a) checking inter-rater reliability periodically during both telephone
interview and site-visit data collections; and (b) checking agreement
among multiple respondents (telephone interview) and comparing data
from multiple information sources and the consensus ratings of the

research team at the site.

Are Modified RD&D Programs Implemented with Fidelity?

In sum, although a policy shift towards a pro-adaptation or local
problem-solving model has already begun, the modified RD&D model has
never been examined in a sufficiently rigorous and comprehensive
manner to support or refute such a shift. The present research was
designed to provide evidence concerning the viability of the
modified RDBD model. The major research question to be addressed in

order to test the model was the following:

14
1. Given a relatively precise and accurate operationalization
of program fidelity, to what extent are social program
innovations which are developed and disseminated according
to the modified RD&D approach actually implemented with
fidelity at adopting sites?

In other words, this research question addressed the basic
assumption of the model that programs can be adapted with reasonably
high fidelity at user sites, an assumption which is questioned by
much of the current literature. The next few sections discuss other

issues related to this major research question.

Are Different Programs Implemented with

 

Equivalent Levels of Fidelity?

 

An adjunct question concerns possible differences in fidelity
between the specific programs chosen for study, in addition to
possible differences due to the social policy area. A large number
of variables could be hypothesized to account for such differences
between programs independent of differences between social policy
areas. For example, differences in the degree of technical assistance
provided by developers, or differences between programs concerning
the amount of change in organizational practices they require could
lead to differences in fidelity between programs within a social
policy area. Therefore, an adjunct to Research Question 1 was:

1a. Are there differences between the fidelity of the

specific programs chosen for study in the present

research?

15
Are Programs Implemented with Fidelity
Across Social Policy Areas?

As a second adjunct question to Research Question 1, the present
study tested for differences in fidelity between programs in two
different social policy areas: education and criminal justice. This
was considered an adjunct to Question 1 since it addressed the issue
of external validity (Campbell & Stanley, 1966) of fidelity results
across the dimension of social policy area. This adjunct was stated
as:

lb. Are there differences in fidelity between a sample of

educational innovations vs. a sample of criminal justice
innovations?

The fields of education and criminal justice were chosen for
two major reasons: The federal Departments of Education and Justice
are presently the most visible users of the modified RDBD approach in
the social policy area; and, they each have been using the approach for
over five years, enabling full implementation at a number of adopting

sites.

Is Fidelity Related to Program Effectiveness?

A second important assumption of the modified RD&D model is that
program fidelity is related to program effectiveness. It is assumed
that the more an implemented program resembles the original "validated"
implementation, the greater the likelihood that effectiveness outcomes
achieved at the original site will also be achieved at the user site.
Although Scheirer and Rezmovic's review (1982) reported results

supporting this assumption, the sample of studies was too small to

16

permit generalization (only 11 studies measured both extent of
implementation and program effectiveness). Also, Scheirer and Rezmovic
did not attempt to isolate programs disseminated using the modified
RDBD approach. Therefore, the verification of this assumption has
yet to be demonstrated empirically. The examination of this
assumption was addressed by the second research question:

2. To what extent is the fidelity of program implementation

related to program effectiveness?

Reinvention

 

As noted above, "reinvention" was introduced by Rogers and
colleagues (Eveland et al., 1977; Rice & Rogers, 1980) to capture
the flavor of an active process of change at user sites. The term
"reinvention" brings to mind the phrase "Not Invented Here," a
common phrase used in both public and private sector organizations
to describe the rejection of outsiders' ideas simply because they
originated outside the organization. Such ideas must be "reinvented"
to counter the "Not Invented Here" syndrome.
However, despite the potential usefulness of the term "reinvention,"
the research by Rogers and his associates may not be generalizable to
modified RD&D innovations, since the programs examined by Rogers and
his colleagues were disseminated with low component specificity and
explicitness (that is, the specifications of components were relatively
sketchy and incomplete, and the components were not disseminated in
explicit, "concrete" terms). Such programs may behave quite

differently from programs which are more "well-in-hand" (Gephart,

1976). It is therefore fruitful to consider what the concept of

17
reinvention may add to the conceptualization of RD&D innovations.
Perhaps the concept of fidelity alone more parsimoniously accounts
for the salient phenomena (Taylor, 1980), and reinvention is simply
an unnecessary synonym for low-fidelity implementation. Alternatively,
there may be a need for a concept in addition to "fidelity" to
accurately describe implementation.

Rather than attempting to define the concept a priori, the
strategy used in the present study was to collect case study notes on
every variation that differed in any way from the variations listed
in the fidelity instrument. These qualitative data were later
content-analyzed to determine the most comprehensive and meaningful
definition of reinvention. Content analysis was also used to categorize
instances of reinvention and determine the frequency of occurrence of
different types of reinvention.

This empirically based examination of the "reinvention" concept
can be summarized as an attempt to answer the following research
question:

3. Given the present innovation literature and the data base
from the present study, what is the most useful definition
of reinvention, and what is the most useful and accurate
typology of reinvention as it is practiced by adopters of
modified RD&D programs?

In parallel to the research strategy outlined for fidelity,

secondary questions are:

3a. Are there differences in extent of reinvention among the

sample programs?

3b. Are there differences between the educational and criminal

justice policy areas on extent of reinvention?

18
Relevance of Reinvention to the RD&D Approach

Once reinvention has been defined in useful terms as a distinct
concept, the relationship between reinvention and other concepts
becomes a meaningful issue. As stated above, one assumption of the
RD&D model is that program fidelity is positively related to program
effectiveness. Understanding the relationship among reinvention,
fidelity, and effectiveness also has bearing on the modified RD&D
model. The specific relevance will depend on the definition of
reinvention selected as a result of content analysis. However, there
should be implications for the model no matter what definition is
selected. For example, the empirical relationship between
"non-component-based" changes and program effectiveness would have
implications for dissemination policies, concerning whether or not
such changes should be encouraged. The possibility that reinvention
functions as a mediating variable in a causal model is also worthy of
consideration. In other words, the relationship between fidelity
and effectiveness may depend on the extent of reinvention (however
defined). In that case, administrators and practitioners might well
consider policies towards reinvention based on this relationship.

In light of these considerations, the fourth research question was:

4. What are the relationships among fidelity, reinvention, and

effectiveness?

Summary of the Research Questions

 

In summary, this research is an empirical examination of the
modified RD&D approach to the dissemination of innovative social
programs. The research questions which guided the study are the

following:

19

Given a relatively precise and accurate operationalization

of program fidelity, to what extent are social program

innovations which are developed and disseminated according

to the modified RD&D approach actually implemented with

fidelity at adopting sites?

1a. Are there differences between the fidelity of the
specific programs chosen for study in the present
research?

lb. Are there differences in fidelity between a sample of
educational innovations vs. a sample of criminal
justice innovations?

To what extent is the fidelity of program implementation

related to program effectiveness?

Given the present innovation literature and the data base

from the present study, what is the most useful definition

of reinvention, and what is the most useful and accurate

typology of reinvention as it is practiced by adopters of

modified RD&D programs?

3a. Are there differences in reinvention among the sample
programs?

3b. Are there differences between the educational and
criminal justice policy areas on reinvention?

What are the relationships among fidelity, reinvention,

and effectiveness?

METHOD

W

In order to provide an examination of the modified RDBD approach,
the present research involved the development of fidelity and
reinvention measures, and the utilization of these measures to collect
data on innovation implementation. Data were collected in two
phases: a telephone interview phase, which employed telephone
interviews; and a site-visit phase, which utilized observations,
interviews, and reviews of archival data at the implementing sites.
During the site-visit phase, data was also collected on program
effectiveness. Existing instruments and records were the sources for
these data. These included instruments with established reliability
and validity (such as standardized achievement tests) and various
archival measures (such as recidivism and indices of organizational
efficiency, e.g., juror usage index).

The report of research methods begins with a description of the
social programs which were studied and the sampling strategy which was
utilized. Following these descriptions, the fidelity and reinvention
measures will be described, in terms of both measurement development
and data collection procedures, respectively.

Innovative Social Programs and Vehicles For

Their Dissemination

 

In general, the term "innovative social program" refers to new

ways of doing things which are primarily intended to change people

20

21
and/or change the way they interact. Such programs are also usually
intended to have as their goals various benefits to society. Social
innovations can be considered as covering one end of a continuum. At
the other end would be placed material innovations, which are primarily
designed and perceived to change physical aspects of systems such as
organizations or communities. These may or may not have as their
goal a social benefit.

In order to provide an adequate examination of the modified RD&D
model, it was required that the social innovations selected for study
fit the assumptions of RD&D thinking. That is, (a) innovations were
required to be developed at a single site or small number of sites
using scientific methods of development and evaluation; (b) it was
required that some active process (e.g., site visits, conferences, or
technical assistance) had been used to disseminate the innovations;
and (c) the form in which each innovation was disseminated was required
to be relatively well-specified and explicit. For example, site
visits, training sessions and manuals were required to be relatively
complete, covering the essential elements required to implement the
innovation; and, the elements were required to be described in
relatively concrete terms.

The National Diffusion Network (NDN) of the federal Department of
Education and the Exemplary Projects Program of the federal Department
of Justice provided two rich sources of suitable programs typical of
the recent use of the modified RD&D approach. The use of these two
dissemination vehicles also enabled comparisons between implementation

in two diverse social service fields.

22

The NDN was created in 1974 as a vehicle for the development and
dissemination of innovation educational programs. Programs which are
developed by local school districts(frequently in cooperation with
university departments of education or other research facilities) are
submitted to the Department of Education's Joint Dissemination Review
Panel (JDRP) for review, following formative and summative evaluation
at the developer site. Approved programs are considered "validated"
and enter the dissemination network. Full-time change agents known
as State Facilitators perform various tasks to encourage dissemination,
such as organizing awareness sessions (at which developers explain
their innovations) and site visits to the schools at which innovations
were developed. Facilitators also answer inquiries concerning
validated programs from schools and local districts. Each state has
at least one facilitator, while some of the more populous states have
more than 10 facilitators.

The Exemplary Projects Program also validated programs using a
review panel (The Exemplary Projects Review Board). Programs may be
proposed for consideration by the operating agency, local government or
criminal justice planning unit, State Planning Agency or Law Enforcement
Assistance Administration (LEAA) office. For the period of May, 1976
to June, 1980, the major active dissemination mode for the Exemplary
Projects Program was the site visit to the developer's program. Such
site visits were managed by an auxiliary Justice Department program (the
"Host Program"). This program arranged travel logistics and proVided
per diem and travel expenses to site visitors. In addition to these

active dissemination methods, both NDN and the Exemplary Projects

23
Program use printed material extensively. NDN publishes a catalog
(listing over 150 programs. The catalog is cross-referenced and fully
indexed, and contains a complete list of State Facilitators. The
Exemplary Projects Program publishes a similar catalog listing 35
programs, as well as a detailed manual for each innovation.

Selection Criteria. In order to select a subset of the many
NDN and LEAA programs for study, the following criteria were used:

1. Potential availability of effectiveness data at sites (to
enable data analyses involving program effectiveness).

2. Potential for at least 20 site adoptions per program (to
provide sufficient statistical power to detect significant relationships).
The programs were required to have been disseminated long enough to
allow for implementations of at least two years, with a sufficiently
extensive operation which could result in 20 adoptions.

3. "Organization-wide" quality of the program. This criterion
was required since the research issues concern organizational rather
than individual innovation implementation. Subcriteria included: (a)
the program could not be implemented by only one staff member (teacher,
caseworker, judge, etc.) or one organizational subunit (classroom,
single courtroom in a multicourt system, single neighborhood group in
an organization of neighborhood associations, etc.); and (b) the program
should require some relationships between the implementing organization
and its surrounding community.

These criteria were applied by a team of seven researchers.
Materials for each innovation disseminated by the NDN and the Exemplary

Projects Program were read by two individuals and independently rated

24
on the selection criteria. Ratings were then discussed by the
entire group. This procedure resulted in the selection of seven
innovations, three from the NDN list and four from the Exemplary

Projects Programs.

Description of Programs

 

Figure 1 contains descriptions of the seven innovative programs

selected for study.

M
Sampling Strategy

 

In order to maximize the external validity of the research results,
an attempt was made to randomly sample from the population of
organizations appropriate to each of the innovative programs. Lists
were obtained from national research offices which contained the
populations of the appropriate organizations (schools, courts, and
police departments). Three percent samples were randomly generated from
these population lists, and these samples were in turn randomly sampled
to produce a test sample of 100 organizations. Telephoning these 100
organizations resulted in the identification of only two organizations
which claimed to have adopted any of the programs; only eight of the
organizations in the sample had even heard of the innovations. It
thus became apparent that this sampling strategy would not efficiently
yield a sufficient number of adopters. Random sampling was abandoned,
and a purposive strategy was employed.

Lists were obtained from the program developers and related agencies
(e.g., state planning agencies, NIJ-Hosts Programs, Center for Jury

Studies). These lists were randomly sampled until approximately 15-30

25

Education

1. HOSTS (Help One Student to Succeed)--A diagnostic, prescriptive, tutorial
reading program for children in grades 2-6. Tutors are community volunteers
and cross-age students. The program includes “pulling out" students from
their regular classes at least B h0ur per day.

 

2. EBCE (Experience Based Career Education)--This program provides career experi-
ence outside of school at volunteer field sites for the student. Each career
site is systematically analyzed for its educational potential. Students'
career and academic abilities and interests are systematically assessed.
Individualized learning plans which integrate career experiences and academic
learning are utilized. Programs typically take students from grades 11-12,
although some also accept students from 9-10.

 

3. FOCUS (Focus Dissemination Project)--A “school within a school" for disaffected
junior and senior high school students. All students are required to partici-
pate in a support/problem solving group of 8-10 students and one teacher.
Behavioral contracting and a governing board with student representatives are
important features. Classes in the Focus program involve individualized, self-
paced instruction.

 

Criminal Justice

 

4. ODOT (One Day/One Trial)--A jury management system that calls in a certain number
of potential jurors per day. Potential jurors come in for that day and if not
selected to serve in a trial have completed their obligation. Jurors who are
selected serve the length of the trial.

 

5. CAP (Community Arbitration Prgject)~Juvenile offenders are sent to a formal
arbitration hearing run by the court intake division, rather than to courts.
Juveniles have the specific consequences of their actions explained to them
with parents and victims frequently present at hearings. Youths are then
typically given a number of hours of informal supervision usually involving
work in the community. Restitution is also frequently required.

 

6. SCCPP (Seattle Community Crime Prevention Program)—-This program is a three
phase attack at residential burglary. It involves the setting up of a neighbor-
hood block watch through proactive targeting of neighborhoods, property marking
and inventory, and home security inspections.

 

7. MCPRC (Montgomery County Pre-Release Center)--Involves the setting up of a
residential facility separate from the prison. This facility should be in the
community from which most of the inmates are drawn. Inmates are encouraged to
work so that they will have a job when they are released. Counseling, social
awareness instruction, and behavioral contracting are also part of this pro-
gram.

 

Figure 1. Innovative Social Programs Selected for Study

26

adopters were identified for each program. (The original goal of 20
programs for each innovation could not be achieved since three programs
had fewer than 20 total adopters. Consequently, the number of sites
was increased for other innovations to maintain a total N of 140
adopting sites.) This was the sample used for telephone interview
data collection.

Following the telephone interviews, a subsample of the organizations
interviewed were selected for the site visits. Ten organizations from
each of the seven innovations were chosen to be site-visited, resulting
in a site-visit sample size of 70 organizations. Two criteria
influenced this subset selection process. The most important criterion
required selecting organizations that exhibited a range of fidelity
scores which were calculated from the telephone data. Thus, for each
innovation, three organizations were selected from one standard
deviation above and below the mean within-innovation fidelity score,
and four were selected from the mid-range. This resulted in ten sites
that varied from high to low on fidelity. The second criterion was the
location of the site. A broad geographic distribution was sought for
the sample to maximize generalizability.

Unit of Analysis

 

The unit of analysis was the organization in which the program
was housed. In some cases, this differed from the organization which
made the adoption decision, since implementation in these cases
entailed creating a new organization or subcontracting to another
agency. For example, in one case a crime prevention program was

adopted by a police department and later moved to the town's Bureau

27

of Neighborhood Associations; in several cases, alternative schools
were created to administer and house Experienced-Based Career Education
and FOCUS programs.

Note that schools, rather than districts, were considered to be
the units of analysis in education. Preliminary discussions with
program disseminators revealed that programs were truly "implemented"
at the school level; within-district differences between implementations
were likely to exist.

Respondents

 

Respondents for the telephone interviews were persons who were
identified as "most familiar with the day-to-day operations of the
program" by organizational gatekeepers (e.g., secretaries and clerks),
and who proved to be familiar with operations after preliminary
interview responses. During the site-visit data collection phase,
additional respondents were interviewed and observed. An attempt was
made to include at least one respondent from each relevant class of
actors at each site. For example, site visit data collection for the
Experience-Based Career Education program involved interviews and
observations with students, aides, secretaries, teachers, counselors,

resource people ("employers") and school administrators.

Measurement of Implementation Fidelity
Measurement Development for the Telephone Interview Instrument
The five step approach for developing a fidelity instrument
proposed by Hall and Loucks (1978) was utilized, with several

modifications to suit the scope and purpose of the study.

28

Preliminary Identification of Innovation Components. Hall and
Loucks (1978) found that innovation developers and users had differing
opinions concerning an innovation's components. Further, Leithwood
and Montgomery (1980) noted that the vested interests of different
organizational roles influenced judgments concerning the innovation's
components, and they suggested interviewing developers, administrators,
change agents, and practitioners to get the most complete and accurate
list of program components.

However, the purpose of the present study involved testing the
viability of the modified RD&D model and obtaining a comprehensive
description of the innovation as disseminated, rather than attaining
a comprehensive description of the innovation in practice. It was
therefore decided to limit the sources for component identification
to those individuals who were involved with the program before it had
an opportunity to be modified or reinvented at adopting sites.
Although this did not result in a complete description of the
innovation as it is actually used at implementing sites, it did
obtain the most accurate picture of the innovation as it was originally
researched, developed, and disseminated, prior to modification and/or
reinvention. Thus, the sample of respondents for component
identification was limited to staff members of the developing
organization and users and administrators at the original site or an
"initial adopter" site.

Each developer organization and original or initial adopter site
was visited by two members of the research team. Several staff

members, users, and administrators were interviewed for each innovation.

29

Interviews were tape-recorded and content-analyzed to identify
components. This protocol represents an extension of the strategy
proposed by Hall and Loucks (1978). This protocol had been pilot-tested
prior to visiting the innovation developers, by interviewing a program
developer at a local social service agency.

All written materials and tapes were independently content-analyzed
by two researchers for each innovation to identify components. The
components were selected to conform to the following criteria: (a)
preferably, the component was an observable activity, material, or
facility. If not observable, the implementation of the component was
verifiable through interviews with staff members and clients of the
implementing organization; (b) the component was logically discrete
from other components, and wherever possible, did not depend on the
implementation of other components; (c) the component was "innovation-
specific;" practices which were common to other programs in the
organization were not considered components; and (d) the list of
components exhaustively described the innovation. Following
identification of components, each researcher also attempted to group
the components in the most heuristic categorization scheme possible.

Following the independent content analyses for each program, a
third researcher joined each original pair of researchers to arbitrate
disagreements. Thus for each innovation, three researchers reviewed
components to maximize conformity to the criteria. This procedure
resulted in a list of components for each innovation, with components
grouped in heuristic categories. (Examples of such categories include
"Assessment and Planning," "Training," "Staff-Organization Relationships,"

"Community Involvement," "Staff Functions," "Materials," etc.)

30

Preliminary Identification and Scaling of Variations. The
methodology pioneered by Hall and his associates for measuring
implementation requires the identification of "variations" for each
of the innovation's components. These variations are scaled as
"ideal," "acceptable," or "unacceptable." Thus, fidelity is not
measured simply by the number of components implemented at the user
site, but instead can be represented by a "fidelity score" which
reflects the dimension of component variation at the site.

Hall and Loucks (1978) recommended interviewing approximately
10-20 individuals with different role positions at different user sites
in order to identify variations of components. However, the scope of
the present study (seven innovations and 15-25 sites per program) and
consequent resource constraints prevented the use of this strategy.
Instead, it was decided to have those researchers who had visited the
original innovation sites generate variations, with subsequent additions
and modifications to be made based on pilot interviews and interviews
with the innovation developers. In generating variations, the researchers
attempted to list discrete, observable, and quantifiable alternatives.
Variations which could not be observed were required to be verifiable
through interviews with staff members and clients of the implementing
organizations.

Although generation of at least one midpoint ("acceptable")
variation for each component was attempted, a number of components were
dichotomous in nature, and creating a midpoint value would have been
unrealistic. Thus, some components had only two variations; an

ideal/acceptable variation and an unacceptable variation. For example,

31

the HOSTS reading program disseminated use of a specific
cross-referencing index. Use of any other index, or use of no

index, was clearly unacceptable. As another example, the Seattle
Community Crime Prevention Program required a highly proactive staff
approach. Consequently, block watch meetings were scheduled by staff.
Scheduling of these meetings by any other person (e.g., block residents
or community leaders) was unacceptable.

Following identification of variations by researchers, program
developers were interviewed to verify that these were indeed realistic
ways of doing the programs. The procedure for these interviews are
discussed in the next section.

Feedback Interviews with Developers. In order to check the accuracy

 

of the researchers' preliminary identification of components and
variations, the staff members of developer organizations who were
interviewed previously were recontacted. This second contact involved
sending two lists to each staff member; a list of components and a
list of variations. The respondents were instructed to review the

two lists independently. When reviewing components, respondents were
instructed to consider whether each component was or was not "relevant
for saying that the program has been implemented."

The innovation variations generated by the research team were
reviewed by developers with the f01lowing questions in mind (regarding
each component-specific set of ideal--unacceptable variations): "Are
these variations realistic? Do they describe the possible -
implementations of my program completely, or are there other important

variations which should be included? Are the researchers correct in

32

their labeling of variations (as ideal-~unacceptable)?" These
instructions and lists were sent to the individual primarily responsible
for developing and/or evaluating the innovation at each developer
organization. Four of these organizations (two each for education and
criminal justice innovations) were sent duplicate lists to be reviewed
by additional staff members.

After the lists were reviewed by developers, these individuals were
interviewed by telephone. During these interviews, each component and
each variation was reviewed by the interviewer, and responses were
solicited. The feedback of developers concerning the preliminary
identification of components and variations was thus obtained, and
appropriate modifications and additions were made to the lists.

In sum, the researchers' identification of components and variations
followed by the feedback interviews with developers resulted in a list
of components and scaled variations for each innovation. These lists
comprised the telephone interview fidelity instrument. Examples of
components and variations appear in Appendix A.

Pilot Tests. The instrument was pilot-tested on thirteen adopter

 

sites. One innovation had a total of only eleven adopters and
therefore only one pilot interview was attempted. The remaining six
innovations had two pilot interviews per innovation. Sites selected for
piloting were adopters which had implemented the innovations for less
than two years, and thus did not have an opportunity to achieve full
implementation. 1

The general procedure for pilot testing involved a pair of

researchers. One researcher administered the telephone interview and

33

coded responses, while the other listened, coded responses, and made
notes concerning improvements which could be made in the interview
process. Following the interview, the two researchers compared their
coding results and discussed disagreements. Protocols for coding
difficult items were developed during this period, and the interviewing
team (consisting of five researchers) met periodically to review these
protocols.
Data Collection for the Telephone Interview Instrument

The telephone interview fidelity instrument was administered by
means of a semi-structured telephone interview. The rationale for
using this method was the following: It was anticipated that respondents
would be aware of the program developers' attitudes concerning the way
the programs "ought" to be implemented, and that respondents would
wish to appear to be high-fidelity implementers. Although interviewers
intended to inform respondents that they were "not being evaluated," it
was expected that some skepticism and mistrust would remain. Also,
it was anticipated that a long series of closed-ended questions would
lead to considerable fatigue both on the part of respondent and
interviewer. Consequently, to minimize the effects of evaluation
anxiety, social desirability, and fatigue, respondents were asked
open-ended questions, first about a category (heuristic grouping) of
components, then about the specific components. Responses to these
questions were coded on the closed-ended instrument (Appendix A).
For example, to obtain specific information concerning the selection
and entry procedures used for the FOCUS program, the respondent would

be asked to describe selection and entry procedures in their own terms.

34
If sufficient information was not elicited concerning a particular
component within the category entitled Student Selection and Entry,
an open-ended question would be asked for that component (e.g., "who
refers students to your FOCUS program," rather than "which of the
following refer students to your FOCUS program: Teachers? Administrators?
Counselors? Parents?..."). Interviews were administered such that each
interviewer collected data on approximately the same number of sites
per innovation. Responses were machine-scored for computer analyses.
The length of the interview ranged from 45 minutes to four hours. Final
N for data analyses was 129 sites.
Reliability and Validity of the Telephone Interview Instrument

Reliability. The reliability of interviewers was measured by

 

conducting ten percent of the interviews with a second researcher
listening to the interview, and both researchers coding the data.
Reliability was computed as the percentage of exact agreement between
interviews. The overall reliability figure was .86. Care was taken
to counterbalance coder pairs such that 14 of the 20 possible coder
pairs were utilized in reliability testing.

Validity. A major validity issue was the potential for disagreement
between respondents at the same site concerning the fidelity of
implementation. This was considered to be a validity issue since it
concerned whether or not the respondent was conveying a true picture
of the organizational phenomena rather than his or her own perceptions.
Although agreement between sources of information may be considered to
be a reliability issue, others (e.g., Withey, Daft, Cooper, 1983) have
taken agreement between organizational respondents as a reflection

of the external validity of the measure, i.e., to what extent one may

35

generalize from the results of the particular study (Cook & Campbell,
1979).

The extent of agreement was measured by interviewing both a
"primary" and "secondary" respondent at ten percent of the user sites.
An attempt was made to select secondary respondents who were of the
same job level as the primary respondent, and equally familiar with
the day-to-day operations of the program. These secondary respondents
were usually nominated by the primary respondents. Unfortunately,
the interviews revealed that in some cases secondary respondents were
not as familiar as the primary respondents with the operations of the
programs, and this tended to underestimate the validity figures for
the instrument. Given this problem, the validity figures were
considered to be acceptably high with a mean percent agreement (between
respondents) of .74.

Measurement Development for the Site-Visit Instrument

The site-visit instrument was intended to be a parallel form to
the telephone interview instrument, with guides for obtaining additional
data whenever possible. The instrument listed for each component the
relevant actor(s), key words identifying the component, interviewing
probes, observables (activities, actions, materials, and facilities),
and item anchors.
Data Collection for the Site-Visit Instrument

This procedure involved two pairs of researchers traveling to the
sites selected for the site-visit sample. Each pair visited 35 sites,
and spent two days at each site. Data collection consisted of interviews

with respondents from several role positions at each site, observations

36

of pertinent activities and facilities (e.g., block-watch meetings,
arbitration hearings, juror orientations, interactions among teachers,
aides, and students, etc.), and examinations of archival records.
Reliability and Validity for the Site-Visit Instrument

Reliability. A similar procedure to that used for the telephone

 

interview instrument was employed. At 13 of the 70 visited sites
(19%), both researchers at the site interviewed the same respondents.
observed the same activities, and examined the same documents. Forms
were coded independently, and results were compared. Using the
percentage of exact agreement method, an overall reliability of .81
was achieved.

At the sites which were not included in the reliability sample,
each researcher interviewed, observed, and examined different data
sources. At these sites, just as at the reliability sites, the
researchers coded the data independently. However, at the
non-reliability sites, researchers discussed their reasons for coding
before making final decisions. Thus the best data available to both
researchers were used in coding. At the reliability sites, these
discussions took place after the two codings were compared, so that a
"best" scoring for each component could be determined and recorded.

Validity. Since site-visit data collection involved interviewing
as many respondents as were available at the site (who were familiar
with the program's implementation), the site-visit phase provided many
more opportunities to check agreements between respondents when compared
to the telephone interview phase. Site-visit data collection also

provided numerous opportunities to check for agreement between

37

informants' responses and researchers' observations. Consequently, a
different strategy was employed for checking such agreements, compared
to the telephone interview strategy. Instead of recording responses
from two respondents at 10% of the sites, data-source comparison for
the site-visit phase was achieved by computing the percentage agreement
between the researchers' consensus ratings and all the various data
sources for each component, for all components on which multiple
sources of data were available. (Seven thousand and sixty-six out

of 9214 total data sources, or 77%, were multiple sources, i.e., at
least two different sources [respondents and/or observations and/or
materials] were consulted for coding these components.) The overall
percentage agreement between these data sources and the consensus

ratings was .96.

Measurement of Reinvention

 

The intent of this research with regard to reinvention was twofold:
(a) to develop a useful, meaningful, and empirically based definition
of reinvention which distinguished the term to the greatest extent
possible from such related terms as modification and lack-of—fidelity;
and (b) to develop a typology of reinvention which could meaningfully
and usefully categorize the data set. To this end, an inductive,
exploratory methdology was utilized. This consisted of recording
each instance of supposed reinvention in case study notes, and then
subjecting the notes to content analysis.
Measurement Development

Information gathered during the telephone interviews was used to

refine the conceptualization of reinvention to be employed during the

38

site-visits. The following three questions were asked following the
coding of each content category of program components: "(a) With

regard to the issues we've just been discussing, have we missed anything
or are you doing anything in addition to these activities? (b) Is there
anything you are doing in this area that you consider unique or
different?, and (c) Have you changed anything?"

These questions were intended to orient research team members to
the types of changes which could be made for each program, and to
determine to what extent respondents would be willing to discuss
changes in the programs. It was felt that the level of detail needed
to support a worthwhile content analysis was beyond the scope of the
telephone interviews, especially given the considerable amount of
time already devoted to fidelity in the interview. Also, the
thinking of the research team concerning reinvention was quite
primitive at this point, and changed with each discussion of
the issues. Consequently, recording data on reinvention to be
content-analyzed was reserved for the site-visit phase. During the
telephone interview phase, responses to the three reinvention questions
were discussed by the researchers, but were not recorded in detail
nor content-analyzed.

Site-Visit Data Collection

 

On the basis of the discussions of responses to the telephone
interview reinvention probes, it was decided to tentatively define
reinvention, for the purpose of data collection, as "all instances of
change in programs which cannot be coded using the fidelity A

instrument." Implicit in this definition was the intent to employ a

39

variance, rather than a process, methodology (Mohr, 1978). In other
words, "instances" of reinvention were identified and analyzed, rather
than "events." This was necessary in order to provide consistency
with the variance approach used to measure fidelity, and to enable
correlational analyses relating reinvention to program fidelity and
effectiveness.

The two site visit teams were instructed to probe, in an
unstructured manner, all changes which were mentioned by respondents or
observed at the sites which could not be coded as ideal, acceptable,
or unacceptable variations on the fidelity instrument. Immediately
following each site visit, the research teams tape-recorded descriptions
of these changes. Following the site visits, these tapes were
transcribed into several hundred pages of notes. These notes were
then content analyzed.

Content Analysis

 

Development of Criteria and Procedures. An iterative process was

 

utilized by the two site visit teams to develop content analysis
criteria and procedures. Concepts were tentatively defined; tested
formeaninnglness, usefulness, and discreteness; redefined, i.e., by
combining some concepts, refining others, and abandoning still others;
and then tested again. The process was repeated until the
classification system satisfied the criteria of exclusivity, inclusivity,
and meaningfulness (Warwick & Lininger, 1975).

The first stage of this procedure involved examining the
transcriptions for 12 cases (sites). These twelve cases were
selected using two criteria: (a) the case contained many potential
instances of reinvention; and (b) the case required interesting and/or

difficult content analysis decisions that, through their resolution,

40

would contribute to the development of the analysis scheme (definition
and typology of reinvention). For example, a Community Crime Prevention
Program site located in a city with very different demographic and
geographic conditions than the original program model was selected;

a Community Arbitration Program site with a complex intake system
dependent on interorganizational relationships not found in the

original model was also selected.

The first step in the content analysis involved independent
reviews of the case notes by each member of the four-person research
team. Each individual reviewed three cases (each of which he had
visited, and each from a different innovation) and attempted to
develop potential definitions and typologies. In attempting to define
reinvention, researchers were instructed to first identify instances
which they did and did not want to call "reinvention." They were then
instructed to articulate the reasons for their decisions. At the
same time, they were instructed to generate typologies which could
accommodate various concepts already prevalent in the literature (for
example, see Rice & Rogers, 1980; Larsen & Agarwala-Rogers, 1977).

Following a team discussion of these initial analyses, the cases
were exchanged. During this second stage (and all subsequent stages)
the researchers worked in pairs. Cases were assigned to pairs such
that one and only one member of each pair had site-visited each case
reviewed by that pair. This enabled each pair to have first-hand
knowledge of each case it reviewed. The tasks during this stage were

identical to the first stage.

41

Following a discussion of the potential definitions, typologies,
and criteria for decision-making which emerged from this stage, a
tentative "best" scheme was identified for reliability testing. The
third stage of analysis involved each pair using this scheme to
analyze four cases. (The eight most difficult cases were used in this
stage). Following this analysis, the team again discussed their
decisions and criteria, resulting in a final scheme to be used for
data analysis.

Parallel to this four stage process was a decision process aimed
at identifying criteria for determining the boundaries of reinvention
"instances." The following example illustrates the complexity of
this issue: Several FOCUS (in-school support-group program
for disaffected youth) sites were established as Special Education
programs, and were required to develop an Individualized Educational
Plan (IEP) for each student. This procedure involved a number of
different steps (e.g., a team comprising teachers, district coordinator,
school administrators, and parents decides to accept student into
program; the IEP is developed by this team; the IEP is approved by
parents and student; progress towards achieving the goals set in the
IEP is reviewed by the team). Each of these steps may relate to one
or more components on the fidelity instrument, and given various
rationales, the steps can be separated or combined into various
configurations of "instances of reinvention.“

Thus it can be seen that identifying discrete units of reinvention
was a difficult task. Several factors made agreement between independent
judges concerning the boundaries of reinvention instances difficult to
achieve. These included different degrees of recollection concerning

the specific details of the case, and different biases concerning

42

what was judged to be a meaningful, useful, and discrete unit. Thus,
by the fourth stage of the content analysis, it was decided to achieve
group consensus concerning the boundaries of all instances of
reinvention in a particular case before content analyzing that case.

Content Analysis: Codigg Procedures. The procedures which were
used to code the reinvention transcripts were the following:

1. The four researchers formed two pairs for the first 35 cases;
they then re-paired, to control for possible dyad biases. The two
sets of pairs were both orthogonal to the site visit pairings, so that
each pair had only one individual who had actually visited the site
being reviewed. This enabled content analyzing all 70 cases with each
being analyzed by two pairs each of which had one member with first-hand
knowledge of the site.

2. One individual from each pair initially reviewed the case
and constructed boundaries for reinvention "instances." His decisions
were then checked by the second researcher. Prior to analyzing the case,
the other pair also checked the "instancing" decisions, and if
necessary, boundaries were redrawn according to the consensus decision
of the entire team.

3. Each case, and each instance of reinvention, was coded by
both pairs of researchers independently. Coding was performed according
to the definition and typology of reinvention described in the
following sections.

4. Following independent coding of a case, the two pairs' '
ratings were checked for reliability. After agreement/disagreement

for each instance was determined, a consensus decision was made as to

43

the best coding. (Criteria for these decisions are described in the
following sections.)
Empirically-Based Definition of Reinvention

For the purposes of this study, reinvention was defined as the
use of materials, activities, procedures, or organizational structures
by organizations implementing modified RD&D-mode1 programs, that cannot
be adequately explained using the framework provided by the
developer-defined program components and variations. Reinvention
was treated in this research as a fidelity-based construct. "Instances,"
or units, of reinvention were identified by the use of materials,
activities, procedures, or structures rather than by the components
they related to in the fidelity instrument. This provision was
required since one instance of reinvention could be related to one,
several, or many components, depending on interpretation. Thus
reinvention was defined as I'instance-based" rather than "component-based."

Instances of reinvention were required to remain within the
confines of the program and the implementing site. In other words,
materials,activities, procedures, and structures which were implemented
outside the organization that housed the program and/or which were not
part of the program's implementation at the site were not considered to
be instances of reinvention.

In addition, a single instance of reinvention was differentiated
from a broad organizational practice that could be further divided into
two or more instances of reinvention. For example, an educational
program developed and disseminated as a "mainstream" program might

be implemented at a particular site as a Special Education program,

44

thus involving a number of activities, procedures, and structures

which differ from the original model (e.g., the various steps involved
in the Individualized Educational Plan described above). Decisions
concerning the “boundaries of instances" were therefore somewhat
arbitrary. However, it is important to maintain consistency throughout
a specific content analysis, and throughout content analyses which
might be compared to that analysis. Consequently, decisions concerning
the boundaries of instances were made using a group consensus procedure
to insure consistency.

A given use of materials, procedures, activities, or structures
was not considered to be reinvention if it could be reasonably fit
within the boundaries of the developer-defined components and
associated variations. Such practices were adequately discussed in
terms of variations in developer-defined levels of fidelity, and using
the additional concept of reinvention would only have confused the issue.
Careful consideration was therefore given to the entire set of
components and variations when deciding whether or not a specific practice
should be called reinvention. It was not uncommon to find an instance
which first appeared to be reinvention was actually a "specific
implementation of a vague component or variation."

These "specific implementations" occurred in three major types.
First, the developer may have purposefully defined the component in
vague terms. That is, even though RDBD-model programs must be
relatively well-specified and explicit to fit the R080 approach, a
range of specification detail necessarily exists. For example, a very

explicit program objective may be well-specified, but the means of

45

achieving the objective might be left up to the implementors. A
specific instance from the present data set which illustrates this
general example was the procedure implemented by a Montgomery County
Pre-Release Center site for informing prospective employers of the
site's residents that applicants they were interviewing for jobs were
clients of the site's pre-release program. This site sent letters
written by Center staff to the employers explaining the job applicants'
status. In this case, the original program developer had specified
that employers should be informed; however, the means (e.g., mailed
letter, letter hand-delivered by applicant, phone call from staff,
visit by staff, etc.) was not specified, and was not identified as a
set of variations during the construction of the fidelity instrument.

Second, an organizational practice could be described as a specific
implementation of a vague component if the component was not
well-specified or explicit in the fidelity instrument, but had been
originally defined and disseminated by the developer in clear and
precise terms. In this case, an error was made by the researchers in
the original content analysis of developer responses that was used to
construct fidelity components. Again, this would not be considered
reinvention.

The third type of "specific implementation" resulted from decisions
by the researchers during the development of the fidelity instrument
to delete potential implementation components due to their apparent
insignificance. In retrospect, these potential components could be

seen to be reasonable specifications of program aspects.

46

In short, the first type of “specific implementation" resulted
from the actual program definition and dissemination, while the second
and third types resulted from measurement error. Besides "specific
implementations," other examples of instances from the transcripts
which were not classified as reinvention were for the most part a
result of inaccuracies in the memories of the site visitors concerning
the specific details of component variations.

With regard to inter-coder reliability, each instance was
classified as reinvention or not-reinvention by each pair, independently.
All codings were compared to check reliability. For judgments which
classified an instance as reinvention or not-reinvention, the
percentage agreement between pairs was .80 across all instances.

Typology of Reinvention

 

The category scheme used to code instances or reinvention was
two dimensional. One dimension was used to code the instance as either
an addition to the original program or a modification of the program.
The second dimension was used to code the instance as either
proactive or reactive. The Reactive category was further
divided into Internal or External Reinvention, depending on
the source of the constraint(s) which influenced the reinvention.
Finally, all instances of reinvention were rated on a three point
importance scale.

This typology is summarized in Figure 2. The following sections
describe the typology in greater detail.

Addition-Modification. Additions were defined as materials,

 

procedures, activities. organizational structures which were supplemental

47

 

zo~h<u~m_ooz

 

covpcm>cwmm mo vcmuxm ccw amopoaxp

zoahmoa<

zo~e2w>z~ma Lo Auuz<emoaz~v ezmexu

.N weaned

 

4<~hz<hmmzm
m

ub<mmcoz
N

mosz
p

 

 

 

 

 

 

 

 

 

 

 

m>~hu<omm

Pmcgmuc~

u>~hu<mm

Pmcgmuxu

48

to materials, procedures, activities, or structures specified by the
original model. Materials, procedures, activities, or structures which
involved alterations or rearrangements of components specified in the
original model were called modifications. In other words, modifications
fall within the bounds of the components, but outside the bounds of

the variations; additions cannot be fit within the bounds of components
or variations. In general, additions are novel changes, while
modifications are more mundane alterations of existing program elements.

Although many instances of reinvention were clearly identified as
either additions or modifications, other instances had aspects of both
types of reinvention. Examples of relatively simple coding decisions
include:

1. At one HOSTS (diagnostic, prescriptive, tutorial reading
program) site, the reading teacher gives each pupil a one-item quiz
as the pupil leaves the room. This is intended to provide the student
with positive reinforcement. A component in the HOSTS fidelity
instrument states that "reinforcement is structured into the
process" (#107); another component describes the various roles that
the reading teacher should play (planning, organizing, and decision-
making). Thus, it can be seen that this practice is both an addition
of a procedure and a modification of an existing role activity.

2. At an MCPRC (Montgomery County Pre-Release Center) site,
progress by a resident from the first to the second phase of the
program is contingent on a standard behavior (working five days),
and progress through the remaining phases is contingent on
individualized progress ratings. One component of the fidelity

instrument refers to "monitoring of residents' progress" (#87),

49
while a second component states that "phase movement and increased
privileges are dependent upon behavioral ratings and progress" (#89).
The practice at this site both adds to, and modifies, these components.

More difficult decisions can be illustrated by the following two
examples:

3. At one of the FOCUS (in-school support group and individualized
education program for disaffected youth) sites, pupil self-evaluations
contributed to their grades in the support-group class. This clearly
related to program components which specified grading criteria (i.e.,
"grading is done in terms of effort and growth," "grading takes into
account attendance, participation, and accomplishments") and to
components which specified aspects of the support group (e.g., "student
participation in the support group earns academic credit"). It was
difficult to decide whether this instance represented addition or
modification of grading procedures.

4. A SCPP (Seattle Crime Prevention Program) site implemented
two types of "home security meetings," sequentially. The first was
devoted to socializing among the participants, and staff members
attended to foster linkages among block residents. At the second
meeting, specific "hardware" aspects of crime prevention (e.g., door
locks and window latches) were described by the staff. The original
model had included both linkage and hardware activities in a single
meeting. Thus, this site's implementation can be viewed as an
addition of a second meeting, or a modification of the structure and
procedures of the home security meeting.

The procedure employed for deciding whether to code an instance

of reinvention as modification or addition when it contained aspects

50

of both was as follows: First, the aspects of modification and addition
were stated in explicit terms (i.e., the following questions were
answered: "What is the modification? What program aspects are being
modified? What is the addition? What program aspects are being added
to?"). Second, the following question was answered: "How strong is
the relationship between the instance phrased as a modification and
the components to which it relates?" Third, the relative importance
to the program of the instance treated as modification, versus the
instance treated as an addition, was evaluated. In evaluating
importance, factors such as the likely effects on the program's
effectiveness, and the way the reinvention affected the overall
character of the program were considered. A two-stage decision
criteria was then used. If the instance's links to program components
were seen to be relatively weak, then the instances was coded as an
addition, since it was better conceptualized as a novel use, rather
than an alteration. If the links were strong, the novelty-alteration
distinction was difficult to make. In that case, the coding decision
was based on the evaluation of the relative importance of the
reinvention treated as a modification, compared to its importance
when treated as an addition. Applying these criteria to the examples
given above, the classifications were, respectively: (a) addition;
(b) modification; (c) addition; and (d) modification.

Again, all codings were checked between pairs to determine
inter-coder reliability. This procedure produced a percentage agreement
figure of .86.

Proactive-Reactive. The second dimension used for coding

 

contained three categories: Proactive, Internal Reactive, and External

51

Reactive. These categories refer to the relative extent and type

of constraint(s) which influence the reinvention. Instances of
reinvention which reflected relatively low levels of constraint

were coded as Proactive, while those which reflected relatively

high levels were labeled Reactive. The Internal-External sub-dimension
refers to the source of constraints, whether internal or external to
the implementing organization.

For the purpose of this study, a constraint was defined as any
condition or factor existing within the implementing organization or
its environment prior to the innovation's implementation that
influenced the manner in which the innovation was implemented. In
other words, the existence of a constraint reduces the potential
degree of choice which the implementors have concerning the manner
of implementation.

It should be noted that coding according to this dimension
required a consideration of each implementation's case history.

In other words, "process“ data was used to make decisions concerning
the classification of a variable. Unfortunately, detailed case
history notes were not collected. However, the method for collecting
information on reinvention produced sufficient detail for reliably
distinguishing among the categories. The percentage agreement
between pairs for proactive vs. reactive judgments was .87, while
the agreement figure for judgments differentiating internal from
external reactive reinvention was .86.

Since judgments concerning relative levels of potential

choice are difficult to make, several arbitrary decisions were made

52

prior to coding regarding the Proactive-Reactive distinction.
Specifically, the following three conditions were not treated as
constraints: (a) deliberate organizational strategies; (b) the
existence of some already existing strength or capacity within the
organization (e.g., the abilities and interests of individuals
involved in implementation); and (c) theoretical or philosophical
orientations on the part of implementors (e.g., value positions).
It was reasoned that although these conditions could be referred to
as constraints, they enhanced, rather than diminished, the level of
choice in the organization regarding implementation. Examples of
conditions which were considered to be constraints are: (a) Government
policies, statutes, or regulations; (b) the need to accommodate
existing technology; (c) the size of the organization; and (d)
various aspects of the local community, such as geography,
demographics, availability of transportation, etc.

When reinvention was coded as reactive, evidence was available to
demonstrate the existence of constraints. If no clear evidence
existed, and no logical reasons could be identified for considering
the instance of reinvention to be reactive, it was coded as proactive.
Extent of Reinvention

In order to enable the use of reinvention data in correlational
analyses, it was necessary to quantify the extent of reinvention at
each site. Since instances of reinvention differed qualitatively
(some reinventions were fairly trivial, while others had a major
influence on the character of the site's implementation) it was

necessary to take these qualitative differences into account.

53

A three point rating scale was devised to measure the extent of
reinvention reflected by each instance. This scale was intended to
measure the degree of the reinvention's departure from the original
model, and can be considered a scale of importance (1 = minor, 2 =
moderate, 3 = substantial). There are a number of dimensions upon
which departures from the model may vary, such as: (a) the importance
of the departure to the character of the implementation (e.g., the use
by one FOCUS site of a highly confrontative support-group mode, and
the use of a set of "Project Adventure" trust exercises at another
FOCUS site replacing regular support group meetings were judged to
have considerable effects on the character of the implementations,
while the award of academic credit to FOCUS students for "Student
Leadership Board" membership, but not for support group membership,
was not considered to greatly influence the program's character);
(b) the likelihood that the reinvention would influence the program's
effectiveness at the site (e.g., a Community Arbitration Program
site required arbitrators to complete a rigorous training program at
a local university, involving video-taping, role-playing, and
observations of arbitration sessions); (c) the hypothesized importance
of the change to the program's developer (e.g., the processing by
Community Arbitration sites of cases in which youths had not confessed
guilt was hypothesized to be an important departure from the
developer's program philosophy); (d) the difficulty of the reinvention
to implement (e.g., EBCE [Experience-Based Career Education] teachers
at one site were responsible for driving students to all of the

students' new career experience sites, and introducing the students

54
to the individuals responsible for supervising students at the site);
and (e) the amount of materials, facilities, and activities involved
in implementing the reinvention (e.g., a One-Day One«Tria1 jury
management program site employed 24 field investigators to check
juror questionnaires for missing information and to verify requests
for exemption).

The procedure for rating sites was the following: After each
instances of reinvention was coded (by the independent pairs) using
the two-dimensional category scheme, each pair independently rated the
instance with regard to its extent of reinvention. All ratings were
tested for reliability, and a figure of.80 percent agreement was
obtained. Following each rating comparison, consensus agreements were

reached concerning the "best" rating for the instance.

Summary of Typology and Extent of

 

Reinvention Methods

 

The typology and importance scale for reinvention are visually

summarized in Figure 2, above.

Measurement of Effectiveness

 

Effectiveness Criteria

 

In order to examine the relationships among fidelity, reinvention,
and program effectiveness, it was necessary to provide a means of
measuring the effectiveness of programs at each site. Given the
assumptions of the modified RD&D approach regarding the systematic
deve10pment and evaluation of programs prior to dissemination, in

principle the measurement of effectiveness was straightforward. In

55
order to qualify for the sample, each program had been evaluated at
a developer site according to a set of explicit criteria. The
appropriate measures of the effectiveness of the innovations at
implementing sites were therefore the same criterion measures which
were used to evaluate the original program models. Figure 3 contains
a list of these criteria.

Data Collection

 

Effectiveness data were collected for the site visit subsample
only, for several reasons. It was reasoned that the procedural
difficulties involved in collecting these data (e.g., locating
reports, aggregating data from different files, administering
questionnaires, etc.) would threaten rapport if the research team
were to request telephone interview respondents to compile
effectiveness data and mail them to the research office. It was
expected that collecting effectiveness data in this manner would
result in an exceedingly low return rate, and also would result in
inconsistencies between the form in which data would be aggregated
at different sites. Finally, it was reasoned that complex judgments
would have to be made concerning which data at the site would be
appropriate for aggregation, since without visiting the sites,
researchers had only vague knowledge concerning what types of data
were available. For these reasons, it was decided to measure the
effectiveness of only those programs implemented at visited sites.

The procedure for collecting these data was substantially the
same across the different innovations, with the exception of the

Experience-Based Career Education (EBCE) program. This was the only

56

HOSTS

1. Normal Curve Equivalency (NCE) gain scores: pre-post gains on
nationally norm-referenced reading achievement tests (e.g., California
Achievement Test, Gates-McGintie, Comprehensive Test of Basic Skills)

EBCE

1. New Mexico Career Planning Test1
2. Career Exploration Survey

FOCUS

1. Percentage change in school attendance (pre to post)

2. Percentage change in grades (pre to post)

3. Changes on grade-level equivalent scores on nationally norm-
referenced achievement tests (reading, math, and language)

ODOT

1. Juror Days per Trial (JDPT) (juror days served/number of trials)
2. Juror Usage Index (JUI) (juror days served/number of trial days)
3. People Brought In (PBI) (number of jurors reporting/trials begun)

952.

l. Recidivism

2. Percent completing community assignments

3. Percent referred to States Attorney's office as a result of the
hearing

SCCPP

l. Pre-post changes in crime rate (burglaries only) for target
neighborhoods (comparisons with control groups used when available)

2. Percent change in crime rate for entire city

3. Frequency of self-reported crime prevention steps undertaken

 

 

MCPRC

1. Percent successfully completing program

2. Recidivism

3. Percent employed at time of release

4. Average restitution paid at time of release

5. Average resident savings at time of release

6. Average family support at time of release

7. Average reimbursement to program for room and board

 

1Used with permission from the New Mexico State Department of Education
2Used with permission from the Northwest Regional Educational Laboratory

Figure 3. Program Effectiveness Criteria

57

program for which effectiveness measures were actually administered

by the site-visit teams. A variety of different instruments have been
used to measure the effectiveness of career education programs; a
recent collection lists over 200 such instruments (McCaslin, Gross,

& Walker, 1979). Thus it was expected that there would be little
consistency between sites. Consequently, existing career education
instruments were reviewed, and two questionnaires were selected (the
New Mexico Career Planning Test, McCaslin et al., 1979; and the

Career Exploration Survey, Owens, 1981). These measures focused
directly on the relevant effectiveness criteria, and were short enough
to allow for administration without jeopardizing the willingness of

the implementing organization to cooperate. ‘They were also appropriate
for use at all types of EBCE sites, and their reliability and validity
had been established in previous administrations. Copies of the two
questionnaires were distributed by the two site-visit teams to teachers
at the implementing sites. Students completed these measures during
the last third of the semester during which they were enrolled.
Completed questionnaires were mailed to the research office in raw
form and scored by the research team.

For the remaining six innovations, data were collected in the
form of existing archival records. These included records such as
achievement test scores, attendance records, juror usage indices, and
recidivism studies. Whenever possible, data were collected for the
most recently completed time period contiguous to the site visit

(post-data). Pre-data were also collected whenever possible.

58

Data Transformation

 

Although all attempts were made to collect effectiveness data
in as consistent a form as possible, numerous difficulties prevented
sufficient consistency to permit analysis of data in their raw form.
For example, there was a great deal of variation in the time periods
for which data was available; many sites had data for a time period
contiguous to or immediately preceding the site visit, but some sites
only had year-old data available. Pre-post data was available for
some sites, while others had only post-data. There were also
variations in aggregation formats across sites; some data were
aggregated in yearly reports, while others were only available in
weekly tabulations.

Consequently, it was decided to transform the data to a set
of ranks for each innovation. The procedure was as follows: The
research team divided into two pairs, each pair consisting of one
member from each of the site visit teams. Thus each pair had at
least one member with first-hand knowledge of each site. Each pair
then reviewed all of the effectiveness data which had been collected
for each site, and recorded all available data for the effectiveness
criteria (Figure 3). The two pairs then compared their records to
make sure that the raw data had been compiled accurately. The next
step involved each pair constructing a set of ranks for each program.
All sites within each program were rank-ordered from best to worst,
with no ties permitted. Sufficient data to rank 65 of the 70 visited
sites were available.

The sets of ranks for each innovation were then compared across

pairs to test reliability. An overall rank-order correlation of .90

59

was obtained (average Spearman Rho across programs, with
Spearman-Brown formula used to correct for the two sets of raters

and average weighted by number of ranks per program) was obtained,

indicating that the procedure was highly reliable.

Summary of Methods

The methods described in this chapter are summarized in Table l.

6O

Ammzcwucou mpnmpv

emuouugmch

wocmugoaep mcwumg .u
mmwcomwpmu .n
cowuwcwwmu .m

"mwmapmcm ucmpcou

mmmcmsu Ememogg uzoam

 

pwmw> mpwm mcompmmsc caucmucmno cowucm>cwmm
cowuumprou
sumo to muozpmz
ozh Eogm .mupzmmm museum;
:mmzumm pamEmmLm< one mpmwcmums Emgmogq m>wum>occw
to cowpmcwsmxw :umm Low
.mcowum>emmao mco .mamwp cm>mm
mmugzom memo .mzmw>gmpcw .Ampnmpamuuwca
mcoe< acmEmmcm< Locouncwch "uwmw> mpwm .mpnmpamuow .meuwv
ucmcoasou gumm
mpcmwcoamwm mzmw>gmch com mcowumwgm> use cowumucmem_nsH
cmmzpwm pcmsmmgm< Lmuou-me=~ mcosampm» mpcmcoasou mo umm4 to xuwpwnpm
ucmsmmmmm< “cosmmmmm< coppuwppou cowpmpcm53LumcH mpam_cm>
xpwuwpm> xuwpvnmmrmm mpmo we cosumz

 

mcoggmz to mem553m

F wFQMH

61

 

Lmuooucmch

Emcmoga cwcupz
mmpvm to mcwxcom

muwmw> mupm mcwgau
mmmum gugmmmmc

Xe umpzawggmwc mew:
magmascumcw mcwpmwxm
oz» .Amummv msmgmoga
cm>mm mgp mo use can

mowuw>wuum
compmzpm>m
Peace: mecu

cw mmpvm ma cum:

msmgmoga we

 

uwmw> muwm mucmszcumcm mcwumwxm mmmcm>muomwwm
acmsmmmmm< ucmsmmmmm< :omuumppou cowuwpcmE:LumcH mpnmwcm>
abvcwpe> apwpwnawpam mama to venom:

 

A.u=ouv F m_nee

RESULTS

This chapter presents the results of data analyses designed to
answer the eight research questions listed at the end of the
Introduction (p. 19). Although all eight questions are of interest,
attention should be focused on questions (1), (2), and (4):

1. Given a relatively precise and accurate operationalization
of program fidelity, to what extent are social program
innovations which are developed and disseminated according
to the modified RD&D approach actually implemented with
fidelity at program sites? ’

2. To what extent is the fidelity of program implementation
related to program effectiveness?

4. What are the relationships among fidelity, reinvention,
and effectiveness?

These questions are of primary importance since they are the
most critical to testing the viability of the modified RD&D approach,
which is the primary purpose of this research.

This chapter is structured as follows: Analyses pertaining to
the three major variables (fidelity, reinvention, and effectiveness)
in and of themselves are presented insequence. For each variable,
descriptive analyses are presented, followed by comparative analyses.
The chapter concludes with a presentation of analyses examining the

relationships among fidelity reinvention, and effectiveness.

62

63

Fidelity Per se

 

Overview

The paucity of previous empirical research in the implementation
area argue in favor of treating the data analyses of fidelity pgr_§g_
as exploratory. In this spirit, considerable attention is given to
the descriptive analyses. Also in this vein, comparative analyses
are presented using two different scoring systems: the original three

point scaling of variations (ideal variation = 2, acceptable = l,

unacceptable O) and also a two point scale (ideal or_acceptable = l,

unacceptable 0). The rationale for using the two point scale was
the following: When using the two point scale, it is necessary to
assume that distinctions between "ideal/acceptable" and "unacceptable"
variations are measurable and consistent across components and across
programs. Analyses using the three point scale require the additional
assumption that "ideal" variations are measurably and consistently
different from "acceptable" variations. Since distinctions between
"ideal" and "acceptable" variations are more difficult to make than
distinctions between "ideal/acceptable" and "unacceptable" variations,
it was felt that analyses using the two point scale would provide more
conservative tests of differences between programs on fidelity. The
use of a two point scale also reduced scale variance and made
between-program differences more difficult to detect.

However, even with this conservative approach, the results of
the analyses of variance presented below should be viewed with'
caution, since the sets of components used to measure the fidelity

of different programs were to some extent program-specific. Component

and variation sets differed across programs with respect to (a) the

64

number of components per set, (b) the degree of explicitness with which
individual variations were written, and (c) the number of three-
variation vs. two-variation components per set. Thus analyses of
variance should not be considered to be accurate tests of
quantitative between-program differences. Instead, they should be
viewed as exploratory; they provide limited evidence for generating
hypotheses to be tested in future research.
Are Modified RD&D Social Programs Implemented with Fidelity

Telephone Interview Results. The initial research question
addressed the extent to which modified RDBD social program innovations
are implemented with fidelity at user sites. For this analysis and
most of the following analyses, "site fidelity scores" were computed
by obtaining average-item fidelity scores for each site (the sum of
fidelity item scores divided by the number of completed items per
site). Thus the unit of analysis, unless otherwise indicated, was
the site (the implementing organization). In many of the analyses,
these site scores were aggregated within program (i.e., within each
of the seven innovations). Table 2 shows that the mean fidelity
scores for each program were all greater than 1.0 and therefore fall
within the acceptable range. Both the mean and median across
programs equaled 1.33 when measured with the three point scale

(2 = ideal, l = acceptable, 0 = unacceptable). The standard deviation

equalled .28. The distribution of two point scale scores (1 - ideal or
or acceptable, 0 = unacceptable) also indicated a moderately high
fidelity pattern. The mean of the distribution was .73, and the

median was also .73. The standard deviation was .14. Both three

and two point distributions were clearly skewed in the direction of

65

Ammzcwucou m—nmuv

 

 

 

 

mo._
4?. ma. m~.\-. mm. mm. mm._ \mN.P\m~._ mm.P eowpanweomwo
Pragm>o .u
mF. we. NA. mm. mm. N~.P mN.P mm._ we __atauo
NP. ,8. mace o: co. Pm. mo.F F~._ m_.P ammo:
my. Po. NA. mm. mm. PP. MN.P 8F.P aauom
~_. P“. we. me. am. Fm.P m~._ mN.F a<u
ee.F\~e._
op. ma. Pm. me. ow. em.P mm..\mN.P mm.P pogo
mammmoLa
muwpmzc $5520 .m
PP. KN. mu. ma. em. me.F me._ me.F om Fpata>o
No. on. om. F“. m_. FN.P mN.P m~._ maoou
4F. _m. ma.\mm.\ow. mm. mm. Nm.F so._\~m.F mm.P mumm
mo. an. we. Pm. mp. ae._ mo.P em.P memo:
mamgmoca
Pacowpauzem .<
am new: «avaeoz cavemz am cam: Amwaeoz caweaz .
Ac": ._u<uHV Ac": ._u< .NuHV pom mono
upmum acme; ozh mpmum ucmoa muggp 3mm>gmch mcosgwpmh .H
»»m_mewa ”mawomwuaom a>wpawtumao

N mpnmk

 

66

mm.P\mm.P

 

mp. em. mm. mm. om. mF.F No.~\mm.\Pm. vP.F cowpanwcammo
ppmgm>o .0

mp. Po. mm. mm. mm. mo._ No.F mo.~ au Ppmgm>o
mo. Fm. mm. Pm. mp. mm. mo. em. ummuz
mp. Fm. macs on we. mm. mm. No.P\¢n.\—m. om. mauum
mp. um. um. mm. mm. Fm. mo.P m~.~ m<u
mo. an. Pm. mg. mp. m~.— m~.P mm.~ Homo

mamcmoem

muwpmaﬁ chwEwLu .m

 

 

cp. we. Fm. Om. om. MN.F mm.P\mm. wN.~ ow PFmLm>o
NF. mm. m©.\om. mm. NN. em. mm. em. wagon
cw. mm. ow. mm. mm. mm.P mUOE o: N¢.p mumm
mo. mm. mu. NR. 0F. om.~ mm.~ vm.~ mhmo:
msmgmaea

Pacotumuzum .<

om cam: Amvauoz septa: am new: amvaeoz suave:

AC": nFﬂ<HHV AC": oFﬂ< «NHHV “mm GHMD

apmum Devon 03» apaam Seton match utmw> apwm .HH

 

A.b=ou N apnaev

67

high fidelity. Descriptive statistics for these results are
summarized in Table 2 (I), and are visually portrayed in
Figure 4.

Site-Visit Results. The site-visit results also provided
support for the modified RD&D approach. As shown in Figure 4, four
of the seven programs clearly scored in the acceptable range, with a
mean fidelity average-item score across sites greater than one
(X = 1.13). Of the three programs which did not clearly exceed the
acceptable level, means of .94, .86, and .86 indicated scores close to
the acceptable level.; Of the four means which exceeded the acceptable
value, two were from the Educational policy area and two were from the
Criminal Justice area. The median of scores across programs was 1.15.
The standard deviation equalled .30, almost identical to that of the
telephone interview distribution. The skew towards high fidelity was
less pronounced when compared to the telephone interview scores, with
22 of the 70 sites (31%) scoring below 1.0 (vs. 11% scoring below 1.0
for the telephone sample).

The distribution of two point scale scores also revealed a pattern
of moderately high fidelity. The mean of this distribution was .64,
and the median was .65. Again there was a skew towards high fidelity,
with only 21% of the sites scoring below the "acceptable" point. Also
following the previous pattern, the standard deviation was nearly
identical to that of the telephone interview distribution (telephone
interview SD = .14, site visit SD = .15). Table 2 (II) summariZes the

descriptive statistics for the site visit data.

68

mucoum xuwpmuwu EmpHsmmem>< new: .e mesmwm

Pmcuucmcco “Pave Lacy Loucau mace—az-oce xucaou xcmeomucozuu

secooL; :a_u:o>mea os_cu zuwcaszao upuuoomno a—oozUm-elc_zu_r-_oocumv uuomocm mauouum
AEeLaoLa co_mcm>mu m.,:c>=m. sacooca co_uccu_ng< xu_c===aonm co_ucu=cw cmocou cuwam ou:o_eoqxuum
Emumxm “cosmomcaz mesa .eveh «:0 mac ozone Azacoocn mcvuowcv ammuuam cu acmczum mco upwxuﬂ
mz<muoma
xu_um:w .cc_ewtu cowuouauw
m#.uom mm.uow m~.uom m~.uom -.uom o~.-om o~.uom mupamoa uwmw> ouwv

com. my ﬁom. mx -.~mk o~.~JM coo. Wu mwm. duh. om.~uk

 

 

 

#m.nom mm.nom mm.nom HN.uom m~.uom w~.nom m~.nom
oo.~nk. oo.~uM. ~m.~n&. em.~JM. -.~JM mm.~JM me.~mm wu_:ww¢ occzaw—m»
Kongo: cauum ma<u choao mmDUOm Nuumw awhmo:
_ _ A a _ _ a o m—pauawuuuc:
H”
carom D 1 m
33..“ 2826 11111 M.
Ir \. x
l/ 1 l a 29396”; W
3
S

mN.uom (\

2.5” :826 . /11|11

  

 

mu—ammc u.m'> mgpm mouou_uc_ mc_F umzmcu m poon_
”mu—:mmt xmw>cmac_ «conga—mu mmuou.tc. mcvp uwpom "upoz

69

Comparing Telephone and Site-Visit Results: Visual Comparison.
Figure 4 shows the similarity between the two sets of fidelity results
scored on a three point scale. Note that the standard deviations of
the two data sets are nearly identical as well as the mean fidelity
scores.

Comparisons Between Telephone Interview and Site-Visit Results:
Correlational and Percentage Agreement Analyses

As described above in the literature review, few quantitative
comparisons between different methods for measuring degree of
implementation have been reported. Given the greater costs of site-
visits, a careful comparison of telephone interview and site-visit
methods is of considerable practical importance. Two general types
of telephone versus site-visit comparisons are presented. First,
throughout the results section, both sets of results (telephone and
site-visit) are described, and the patterns of the data sets are
compared for each analysis. Secondly, in the present section,
correlational and percentage agreement comparisons are discussed.

Correlations Between Data Sets. Two correlational analyses were

 

performed: the first used the iEEE.35 the unit of analysis; the
second used the sjtg_as the analysis unit;

1. For each fidelity jtgm_of each program, scores obtained by
telephone interviews were correlated with those obtained by site-visits.
For all items which could be scored on both two and three point scales,
correlations were obtained for both types of scoring. Once the.
correlations were obtained, they were summed across items within
program (using Fischer's Z Technique). This resulted in two sets of

correlations: one set using the three point scale, and a second set

70
using the two point scale. The mean correlation across programs was
.38 for the three point scale, and .44 for the two point scale. The
sets of within-program correlations ranged from .26 (ODOT program) to
.44 (CAP program) for the three point scale and from .33 (SCCPP
program) to .59 (EBCE program) for the two point scale.

2. The correlations between methods using the gitg_as the unit
of analysis ranged from .46 (FOCUS program) to .84 (EBCE program) for
the three point scale, with an average correlation across programs
of .68.

In order to establish reference points for these correlations,
t-tests were used to test the hypotheses that the two types of
correlations (item-level and site-level) differed from zero. The
results showed that 12 of the 14 jtgmylevel correlations
differed significantly from zero at the .01 level (with a 13th
differing from zero at the .05 level). 0f the seven site-level
correlations, three differed significantly from zero at the .01
level and a fourth differed from zero at the .05 level. (The
three program-level correlations which failed to differ signicantly
from zero were reasonably high, ranging from .46 to .54.) In sum,
the correlational analyses indicated a moderate level of agreement
between the two data collection methods.

Percentage Agreement Comparisons. A final analysis used to
compare the results of the telephone and site-visit methods employed

the percent-agreement technique using Cohen's "Kappa" statistic to

71

correct for chance agreements. The formula contributed by Cohen
is the following:

K = f(obs) - f(chance)
N - f(chance)

 

(f [obs] = the frequency of observed agreements; f [chance] = the
frequency of agreements to be expected by chance [computed using the
marginals of a category x category matrix], and N = the number of
opportunities to agree/disagree.)

Percent-agreement figures indicated a moderately high level of
agreement between methods. The raw figures ranged between .59 (FOCUS)
to .73 (HOSTS), and the corrected figures ranged between .290 (SCCP)
and .385 (HOSTS). (The two sets of agreement figures are not perfectly
correlated, due to the different proportions of two versus three point
items between programs.) When averaged across programs, mean percentages
of.658 (raw) and .338 (corrected for chance) were obtained.

In sum, three types of between-method comparisons were employed.
Two correlational analyses were conducted, first at the item level,
second at the site level. Third, percentage agreement figures were
obtained, both in raw form and corrected for chance. All analyses
indicated a moderate extent of agreement between measures.

Are There Differences in Fidelity of Implementation

Between Programs?

 

Telephone Interview Results. A one-way between-program analysis
of variance using the three point scale revealed significant differences,
F = 9.97 (5, 122), p < .00001, w2 = .295. The between-program analysis

of variance for the telephone interview results scored on a two rather

72
than three point scale also revealed highly significant between
program differences, F = 7.71 (6, 122), p < .OOOOl, m2 = .238.

Site-Visit Results. The between-program analysis of variance

 

(with components scored on a three point scale) revealed significant
differences in the site-visit data set as well, F = 11.45 (6, 63),

p < .001, w2 = .472. The analysis of variance which treated
components as dichotomous rather than trichotomous variables again
revealed significant differences between programs, F = 8.58 (6, 63),
p < .00001, w? = .395.

Comparing Telephone and Site-Visit Results. The dominant

 

impression produced by comparing analyses of variance between the two
data sets is the similarity between the two sets of results.
Significant between-program differences were found using both three
point and two point scaling.

Are Programs Implemented with Fidelity

 

Across Social Policy Areas?

 

Telephone Interview Results. A one-way, two-group analysis of

 

variance was performed with the policy areas of Education and
Criminal Justice serving as the two groups. This analysis

resulted in a significant difference between the two social policy
areas in fidelity (F = 20.89 (1, 127), p < .OOOOl, wz = .133,
reflecting a higher mean for the Education group. An examination of
program means suggested that the significant difference between the
Education and Criminal Justice policy areas directly related to the
difference between the high fidelity HOSTS and EBCE educational
programs on the one hand, and the lower fidelity SCCPP and MCPRC

programs on the other hand.

73

Using dichtomous scoring, the results of the analysis of
variance duplicated the findings produced by the trichotomous scoring,
with slightly diminished significance levels (F = 16.91 (1, 127),

p < .0001, m2 = .110).

Site Visit Results. This one-way, two-group analysis of variance
to test the differences between Education and Criminal Justice programs
also revealed significant differences between policy areas, (although
differences between means were relatively smaller than previous
analyses), F = 6.27, (1, 68), p < .01, w2 = .069.

However, when components were scored dichotomously rather than
trichotomously, the mean of the Education group was no longer
significantly greater than the mean of the Criminal Justice group at the
.05 level, F = 3.54 (l, 68), p < .064, m2 = .035.

Comparing Telephone and Site-Visit Results. Although differences

 

between policy areas were observed, these differences were less robust
than the between-program differences reported above. Using trichotomous
scoring, the differences in the site-visit data were smaller than the
telephone data, and when a two point scoring system was used, the
Education group mean was no longer significantly greater than the
Criminal Justice mean at the .05 level, thought still significant at
the .10 level. Despite this variation in significance, results

followed a consistent pattern with regard to their direction.

Reinvention Per Se

 

Overview
Given the definition and typology developed for reinvention

(see Figure 2), an exploratory analysis of the concept requires

74

examining several variables. The following 21 reinvention variables
can theoretically be computed for each site: (a) the sum of
reinvention "instances;" (b) the sum of reinvention instances
weighted by importance ratings (weighted instances); (c) weighted
instance scores divided by the total number of instances (average
weighted instances); (d) the sum of modifications; (e) the sum of
additions; (f) the sum of instances rated as "proactive" (proactive
instances); (9) the sum of instances rated "reactive" (reactive
instances); (h) the sum of instances rated "internal reactive;"

(i) the sum of instances rated "external reactive;" and (j) to

(u) the weighted and average weighted scores which correspond to
variables (d) and (i). However, in the interests of clarity and
brevity, only the interesting and significant results will be discussed.

The same general data analysis strategy which was followed for
examining fidelity will be used to discuss reinvention. (However,
given the multiplicity of variables, detailed descriptions of the
distributions are described in Appendix B, with the main features
highlighted in this chapter.) Differences between programs and policy
areas are reported following descriptive information.

Since the reinvention data collected during the telephone interview
phase were used primarily to develop the conceptualization of
reinvention to be used in the site-visits, only analyses of site-
visit data will be reported. Thus the number of sites for most
analyses will be N = 70. It should also be noted that the answer
to research question #3 (concerning the definition and typology of

reinvention) is discussed in the Method section, above.

75

A final issue concerns the different ways of measuring the
extent of reinvention at each site. The first three reinvention
variables listed above (the sum of instances, the sum of weighted
instances, and average weighted instances) represent different
overall summations of instances across reinvention categories.
Each of these variables has different implications for interpretation.
Recall that the decision rules for establishing "boundaries" between
instances were fairly arbitrary and were set using a consensus
process. Also recall that reinvention is a fidelity-based concept,
and that the number of instances per program is partially a function
of the fidelity instrument. Finally, recall that each instance was
rated on a three point importance scale which was used to weight the
instances. Given these three factors, the three summative variables
can be compared as follows.

The first variable (unweighted sum of instances) is useful for

examining "reinvention: as distinct from "importance." This is a

 

fairly crude index of extent of reinvention, since each instance's
importance can be said to affect the extent of reinvention per site.
The second variable (weighted sum of instances) uses importance
weights, while the third variable (average weighted instances)
controls for the number of instances per site. Although controlling
for the number of instances has intuitive appeal, this third
variable is actually a highly ambiguous index of the extent of
reinvention. For example, consider the fbllowing equation:

5 instances x 3(rating per instances) + 5 instances x 1 = 2
10 total instances

76

In this case, the average weighted score equals 2. However,

also consider this equation:

 

2 instances x 2 = 2
2Ttotal instances

Note that the second equation represents a site which intuitively
has a different "extent of reinvention" compared to the site represented
by the first equation. Thus, average weighted scores obscure
important differences between sites and are therefore omitted from
the following discussion. In brief, the sum of importance-weighted
instances of reinvention appears to be the most meaningful index of
the extent of reinvention.

Descriptive Analyses

 

The descriptive statistics discussed below are summarized in
Table 3. The following results are especially noteworthy:

l. The median number of unweighted instances of reinvention
per site was 6.0, and the distribution was fairly homogeneous across
sites. This indicates a moderate amount of reinvention occurring
throughout the sample.

2. Many of the instances were rated as relatively unimportant,
as indicated by the positive skew of the importance-weighted
reinvention scores, and the low median for these scores, which was
close to the unweighted median (unweighted median = 7.0).

3. There was only a slight positive relationship between the
number of instances of reinvention per site and importance ratings
(r = .188, NS). This indicated that many sites with relatively
many occurrences of reinvention also had mostly unimportant

reinventions.

77

 

 

Table 3
Descriptive Statistics: Reinvention

Median Mode Range Mean SD
Sum of Instances 6 3 0-21 6.53 4.79
(absolute frequencies)
Weighted Instances 7 9/11* 0-36 8.63 7.01
(by importance ratings)
Average Weighted Instances 1.20 1.0 0-3 1.26 .487
(weighted instances 9 #
of instances)
Sum of Modifications 2 2 0-11 3.04 2.07
Weighted Sum of 3 O/2* O-l7 3.80 3.37
Modifications
Sum of Additions 3 1/4* 0-14 3.49 3.24
Weighted Sum of Additions 4 O/l/5* 0-25 4.84 4.98
Sum of Proactive Instances 4 0/4* 0-19 5.03 4.39
Weighted Sum of Proactive 5 0 0-31 6.37 6.07
Instances
Sum of Reactive Instances l 0 0-6 1.50 1.47
Weighted Sum of Reactive 1 0 0-10 2.27 2.44
Instances
Sum of Internal Reactive O 0 0-4 .086 .705
Instances
Sum of External Reactive 0 0 0-6 1.13 1.41

Instances

 

*Multimodal distribution

78

4. Distributions of sums and weighted sums for both

addition and modification were positively skewed, reflecting the

 

greater frequencyyof low-scoring than high-scoring sites. A comparison
of the central tendencies of importance-weighted modifications

(median = 3.0, mean = 3.8) vs. importance-weighted additions (median =
4.0, mean = 4.8) combined with the fact that the unweighted sums were
not very different (N of modifications = 213, N of additions = 244)

suggests that additions were rated as more important than modifications.

 

5. An examination of the distributions of additions and
modifications by program (Table 4A) indicates two interesting patterns.
The distribution of additions shows a relatively uniform progression
from low to high frequency. The distribution of modifications also
shows a uniform progression, with the clear exception of the ODOT
program. This innovation had 29 more instances of modification than
the next highest program.

Also of interest was the comparison between the ODOT mean of

unweighted modification sums (6.6) and the mean of weighted

 

modification sums (7.2) (Table 4B). The difference between these
means (7.2 - 6.6 = .60) was the second lowest difference of its type
among the seven programs. This indicated that ODOT modifications

were in general rated as trivial compared to those of other programs.

 

Thus ODOT can be viewed as an "outlier," with relatively many, highly
trivial modifications.

6. The most noticeable feature of the proactive-reactive ’
dimension was the extremely low frequency of reactive instances. The

mode of the reactive distribution was 0.0; 42 out of 70 sites had no

79

Table 4

Modification and Addition by Program:
Descriptive Statistics

 

A. Ranked distribution of unweighted sums of instances

 

 

 

 

 

Modification Addition
Unweighted Unweighted
Rank Program Sum Rank Program Sum
l ODOT 66 1 FOCUS 50
2 SCCP 37 2 CAP 50
3 CAP 34 3 MCPRC 49
4 MCPRC 24 4 EBCE 33
5 FOCUS 18 5 SCCP 26
6 EBCE 17 6 ODOT 21
7 HOSTS l7 7 HOSTS 15

B. Unweighted and weighted modification sums

 

 

 

Mean of Mean of Difference
Program Unweighted Sums Weighted Sums Scores
HOSTS 1.7 2.8 2.8 - 1.7 = 1.1
EBCE 1.7 1.9 1.9 — 1.7 = .2
FOCUS 1.8 2.6 2.6 - 1.8 = .8
0001 6.6 7.2 7.2 - 6.6 = .6
CAP 3.4 4.3 4.3 - 3.4 = .9
SCCP 3.7 4.8 4.8 - 3.7 = 1.1
MCPRC 2.4 3.0 3.0 - 2.4 = .6

 

8O

reactive reinvention whatsoever. However, it should be realized
that the low frequency of reactive reinvention is partly artifactual.
The proactive-reactive dimension was more sensitive to the post hoc
content analysis procedure than the addition-modification dimension
(since coding decisions regarding proactive vs. reactive coding
required case history information), while distinctions between
additions and modifications could be made largely on the basis of
the relationship of the reinvention instance to the program's
components and variations.

7. The low frequency of coded reactive reinvention resulted in
extremely low frequencies of internal vs. external reactive reinvention.
The modes and medians of both these distributions equaled zero. The
internal-external dimension was therefore excluded from further analyses.

Differences Between Programs

 

Sum of Instances. A one-way analysis of variance revealed no

 

significant differences between programs on the sum of instances per
program. The program means ranged from 3.2 (HOSTS) to 8.7 (ODOT)
with an overall mean of 6.53. The standard deviations ranged from
2.10 (HOSTS) to 6.06 (MCPRC). The lack of differences reflects the
previously described homogeneous distribution.

Weighted Instances. Using a one-way analysis of variance to

 

examine reinvention instances weighted by importance ratings revealed
no significant differences between programs. Means ranged from 5.0
(HOSTS) to 12.40 (CAP), with standard deviations ranging from 3.78
(0001) to 10.01 (sccpp). The overall mean was 8.63. '

Unweighted Category Sums. This section reviews the unweighted

 

sums of instances analyzed within reinvention category (e.g., addition,

81

modification, etc.) and across programs. Significant differences
were obtained between programs on both modifications, F (6, 63) =
6.46, p < .00001, m2 = .319, and additions, F (6, 63) = 2.35, p < .041,
wz = .104. There were no significant differences between programs on
unweighted sums of proactive or reactive reinvention instances.

The relatively extreme differences between programs on number
of modifications per site suggested further analyses. A post hoc
Scheffe test indicated that the ODOT program had significantly more
instances of modification compared to all of the Education programs,
and one of the Criminal Justice programs (SCCPP). There were no
significant differences among the remaining programs. Thus, as noted
above, the ODOT program appeared to be an "outlier." It had more
modifications than other programs, and these were rated as more trivial
than those of other programs. The importance of this finding will
become evident when the addition-modification dimension is related
to other variables, as discussed below.

Weighted Category Sums. Significant differences were again

 

found between programs on modifications, F (6, 63) = 3.46, p < .005,

w2 = .175, and additions, F (6, 63) = 2.78, p < .02, w2 = .132.
Significant differences were also obtained on weighted sums of reactive
reinvention instances, F (6, 63) = 2.75, p < .020, w2 = .131, but

not proactive instances.

Differences Between Policy Areas

 

Overall Reinvention Scores. A one-way analysis of variance

 

revealed differences between policy areas on unweighted sums of

82

instances, F (1, 67) = 5.44, p < .022, w2 = .063. Adding importance
weights produced results which approached significance at the .05
level (p < .068).

Reinvention Categories. For the unweighted sums, significant
between-policy area differences were obtained for number of
modifications, F (l, 68) = 15.20, p < .0002, wz = .169 and number of
reactive reinventions, F (1, 68) = 7.55, p < .008, m2 = .086.
Regarding the weighted sums, differences were obtained for the sum
of modifications, F (1, 68) = 9.70, p < .003, w2= .111, and for the

sum of reactive reinventions, F (l, 68) = 4.60, p < .036, m2 = .049.

 

Effectiveness Per Se
Overview I

This section describes the program-by-program effectiveness
results in three ways: (a) The types of data which were collected
are described; (b) descriptive statistics summarizing results across
sites are presented; and (c) the summary statistics for the sample
sites are compared to the results produced by evaluations of original
demonstration sites. Appendix C details these descriptions, while
this chapter presents the most important issues and findings. Before
presenting these results, the following issues are of interest:

1. The importance of comparing research sample sites with
original sites on effectiveness become evident if one considers the
possibility that adopting sites produced effectiveness scores which
differed by several orders of magnitude from demonstration sites.
Examining the relationship between fidelity and effectiveness would

be of less value as a test of the modified R030 approach if this

83

proved to be the case. Consequently, a comparison between research
sample sites and demonstration sites is of considerable interest.

2. Summary statistics on adopting sites were computed using the
site as the analysis unit since individual-level data were
frequently unavailable. This precluded statistical comparisons with
demonstration site evaluations, which used individual clients as the
unit of analysis. However, the purpose of obtaining a rough comparison
of the adopting-site and original-site samples is served by examining
site-level summary statistics, e.g., means and standard deviations,
for both samples.

3. In all cases, the most recent effectiveness results were
sought in order to maximize the validity of comparisons between
effectiveness and fidelity. (Recall that fidelity data was collected
during fall, 1981 [telephone] and winter/spring, 1982 [site-visits].)

4. For all innovations with the exception of the HOSTS reading
program, multiple measures were obtained. The ranking procedure
utilized these measures as follows: sites were ranked within program
on each separate measure available. (Some sites had data available
for all measures, while others only had partial data.) Ranks were
then averaged across measures to produce an overall "objective" rank
for each site. These averages were unweighted. As described in
Appendic C, there was considerable variation among sites regarding
the number and type of measures available and the appropriateness of
the time period covered. It was therefore decided in cases where
data were of lower quality to check the objective ranks against the
site visitors' subjective impressions. Subjective impressions were

also used to make decisions on tied ranks.

84

5. For HOSTS, FOCUS, and SCCPP, change scores were used as
data for the ranking procedure. The use of change scores is generally
inadvisable due to the regression-towards-the-mean effect (Campbell
& Stanley, 1966). However, their use was considered permissible
for the purpose of this research, since they were to be employed as
the basis for the ranking procedure for which relatively gross
estimates of effectiveness were sufficient.

Types of Data

 

The measures of effectiveness which were used appear above in
Figure 3. Data for these measures was obtained with varying success
across programs. Four programs had all ten sites each providing
adequate data. ("Adequate data" refers to a sufficient amount of
data to make an objective judgment on ranking.) Two programs
(HOSTS and FOCUS) had nine sites providing adequate data, and one
program (SCCPP) has seven sites which provided adequate data. Thus
the total number of sites providing adequate data was 65. Details
concerning the types of data available for each program are
elaborated in Appendix C.

Comparisons of Sample Sites with Demonstration Sites

Given the cost of obtaining the necessary data to perform
statistical tests between research and demonstration site samples
(e.g., obtaining standard deviations for research-sample client
scores) and given the lack of consistency for availability of
measures, statistical tests of the similarity of the two samples
(Research sample and Original Demonstration sample) was not

attempted. However, one can obtain a rough picture of the

85

comparative effectiveness from the data presented in Appendix C.
In general, the two samples were similar. Briefly reviewing each
program, the HOSTS research sample was slightly inferior to the
demonstration site. (They differed by four NCE's; the NCE scale
is an equal-interval percentile scale.) The FOCUS research sample
was slightly superior to the demonstration site on achievement test
scores, while extremely high variance and missing data made
comparisons on GPA's much less meaningful. The ODOT research sample
was somewhat inferior to the demonstration site on two indices of
juror management efficiency, while the CAP research sites were
slightly inferior on one measure (recidivism) and somewhat inferior on
a second measure (percent of youths returned to State's Attorney).
The MCPRC research sites were somewhat superior on recidivism rates
and somewhat inferior on percent of residents employed, when compared
to the demonstration site. Meaningful comparisons could not be made
for the SCCPP sites (since only three sites reported neighborhood-
specific data) or for the EBCE program (data on the measures used
for the research sites were not available for the demonstration
sites).

In sum, given the available data, the sample of implementing
sites can be considered to be roughly comparable to the sample of

demonstration sites on program effectiveness.

Relationships Among Fidelity, Reinvention,
and Effectiveness
Use of Site Visit Data
As explained previously, effectiveness and reinvention scores

were obtained during the site visits, but not during the telephone

86

interviews. Due to the temporal contiguity of the site visit fidelity
data with the reinvention and effectiveness data, these fidelity
scores were used in the analyses reported below, rather than the
fidelity data obtained through the phone interviews. This avoids the
confounding effect of history (Campbell & Stanley, 1966) on the
interpretation of correlations between fidelity and the other two
variables.

Tests for Non-Linearity

 

Before computing correlations among these three variables,
scatterplots of the three relationships (fidelity-reinvention,
fidelity-effectiveness, and reinvention-effectiveness) were
examined for non-linearity. All relationships were observed to
form acceptably linear patterns.

Data Transformations

 

Transformations were performed in order to change raw-site-level
scores to scores which were most amenable to comparisons across
programs. This was accomplished by standardizing fidelity and
reinvention scores and normalizing effectiveness scores. (Raw scores
for fidelity were average-item scores; for reinvention, raw scores
were importance-weighted sums of reinvention instances; raw scores
for effectiveness were the within-program ranks.) Each
transformation followed a somewhat different rationale, and these
are reviewed in the following sections.

Standardized Fidelity Scores. As has been previously noted,
fidelity component-and-variation lists for the seven different

programs differed on (a) the actual number of components

87

(specificity); (b) the degree of concreteness with which components
were operationalized into variations (explicitness); and (c) the
number of three-variation vs. two-variation components. Therefore,
the assumption that differences between program means reflect

"real'I differences is open to question. This assumption can be
avoided for the purpose of across program correlational analyses

by standardizing scores within programs. That is, each site's
average-item score was subtracted from the program mean, and the
result was divided by the program deviation, to produce a standardized
score for the site.

Standardized Reinvention Scores. Although the reinvention
coding system was developed to be uniform across programs, some
dependency on the fidelity instrument existed. This is evident
with regard to component explicitness. In other words, the degree
of explicitness with which a component was operationalized into a
set of variations influenced the extent to which reinvention (i.e.,
change which could not be coded using the component-variation
framework) could be detected and recorded. Therefore, reinvention
scores were also standardized.

As discussed in the section on Reinvention Per Se, simple
reinvention sums were an inadequate representation of the extent of
reinvention, and average reinvention scores were somewhat ambiguous.
It was therefore decided to use only the importance-weighted sums of
reinvention instances for correlational analyses. Again,
standardization was accomplished using within-program means and
standard deviations.

Normalized Effectiveness Scores. Recall that effectiveness

measures differed across programs, and that a ranking procedure was

88

used to measure the relative effectiveness of sites within programs.
It was necessary to transform these ranks, due to differences across
programs regarding the number of sites which did not provide adequate
effectiveness data. Four programs had all 10 sites providing adequate
data; two programs had nine sites; and one program had seven sites.
(The normalization procedure transforms rankings so that the
difference between ranks becomes equivalent across programs, e.g.,
first-ranked sites in program distributions with 10 rankings [10
sites] and seven rankings [seven sites], respectively, are placed
on the same scale, so that the seventh-ranked program in the seven-site
set is at an equivalent scale point to the tenth-ranked program in the
ten-site set.)
Simple Correlations: Relationships Among the Major Variables

The simple and partial correlational analyses presented in these
sections were performed in order to address the following research
questions:

2. To what extent is the fidelity of program implementation

related to program effectiveness?
4. What are the relationships among fidelity, reinvention,
and effectiveness?

The zero-order Pearson correlations among fidelity, reinvention,

and effectiveness were the following:

.52, N = 70, p < .001

fidelity-reinvention: r
fidelity-effectiveness: r = .38, N = 65, p < .001
reinvention-effectiveness: r = .33, N = 65, p < .004.

Note that all correlations are strongly positive. The correlation

between fidelity and reinvention supports a conceptualization of

89

reinvention different from merely "low fidelity;" i.e., these data
show that sites with higher levels of fideltiy had higher, not lower
scores on extent of reinvention. These results also provide strong
support to the viability of the RD&D approach, as shown by the high
correlation between fidelity and effectiveness.

Given the importance of the fidelity-effectiveness correlation
to the test of the R080 model's viability, the "true" correlation
between these variables was estimated with the effects of the measures'
unreliability controlled. This "correction" for attentuation procedure
is usually performed using internal consistency reliabilities
(Nunnally, 1978). In the present case, the true correlation was
estimated using the inter-coder reliabilities. Recall that these
had been obtained using the percent agreement method for the fidelity
data, and an average Spearman rank-order correlation for the
effectiveness data. The estimation procedure employing these two
reliability estimates produced a correlation between fidelity and
effectiveness of r = .44.

Partial Correlations

 

Given the set of correlations among the three variables, the
question of spurious relationships was tested. The possibility
that the reinvention-effectiveness correlation was actually spurious,
resulting from the shared variance of fidelity and reinvention,
was tested using partial correlation analysis. The parallel
hypothesis (that the fidelity-effectiveness correlation was

spurious) was also tested. First, the relationship between

90
fidelity and effectiveness was examined, with the variance contributed
by reinvention controlled. This analysis produced a first-order
partial correlation of rfe . r = .26, N = 65, p = .019. Second,
the relationship between reinvention and effectiveness was examined
with the effects of fidelity controlled. The first-order partial

correlation resulting from this analysis was r f = .17, N = 65,

re -
NS. This provided equivocal support for the statement that the fidelity-
effectiveness relationship was independent from the effects of
reinvention.

A second set of correlations was examined to further analyze
potentially spurious relationships. These analyses involved the
addition-modification dimension of the reinvention typology. It was
hypothesized that the positive relationship among reinvention and
effectiveness was largely due to contributions to programs made by
additions, rather than modifications, due to a greater likelihood of
additions being "in the same direction" as high fidelity variations. Some
support to this hypothesis was provided by the zero order correlations
between addition and effectiveness (r = .40, N = 65, p < .001) and
between modification and effectiveness (r = .17, N = 65, NS). However
since addition and modification were positively correlated (r = .58,

N = 70, p = .001), the effects of these two variables were controlled
using partial correlations. The first-order partial correlation
between addition and effectiveness with modification controlled was
rae . m = .38, N = 65, p < .001. The partial correlation between
modification and effectiveness while controlling the variance
contributed by addition was rme . a = -.09, N = 65, NS. Note the
significant, positive correlation produced by the first partial

91
correlation, and the marginally negative, non-significant correlation
produced by the second partial correlation. This clearly supports
the hypothesis that additions, rather than modifications, contribute
to program effectiveness.

It had been observed in the previous section of this chapter
that the 000T program had a disproportionate number of modifications.
Therefore, the relations of addition and modification to effectiveness
were further examined to determine whether or not the ODOT program
made a disproportionate contribution to the weak overall modification-
effectiveness relationship. Table 5 contains the ordering of partial
correlations by program, and shows that the ODOT program did in fact
have the largest negative correlation between modification and
effectiveness, with the variance due to addition controlled. However,
the table also shows that four of the six other programs also had
negative correlations between modification and effectiveness.

The unusual characteristics of the HOSTS program are also a clear
influence on the direction of the overall partial correlation analyses.
Table 5 shows that this program had the highest correlation between
modification and effectiveness, and the lowest correlation between
addition and effectiveness, by wide margins. However, it should be
noted that the program-by-program analyses employed small sample
sizes, and are thus likely to be influenced by sampling error effects.
Also, the ODOT program had the largest ggmggr_of modifications, while
the HOSTS programs had the fewest ggmggr_of additions and among the
fewest ggmggr_of modifications. Therefore, these within-program
correlational analyses should be viewed as suggesting relationships,

rather than as clear indications of effects.

.eumu mmmco>wuomwwm mcwmmwe op man mew op can» mmmp mmapm> ze

 

92

 

 

 

Fmo. m «5.- memo: n «No. op 50.. homo N
mmm. op om. game: m omp. op mm.- momm m
oFN. OP om. pogo m me. m mm.- auum m
omp. 5 mm. auom v Nwm. m mp.- wagon v
moo. op mm. uumm m owm. op NF.- ammo: m
emo. m om. mauoe N . _w_. o_ mm. e<u N
mmo. op No. a<u _ Foo. m Fm. memo: _
mapm>1a *z e Ememoem xcmm mzpm>la ez e Ememoem xcmm

umFFoepcou cowumuwwwcoz .mmm:w>wuummem umPFoepcou cowuwcu< .mmmcm>vpuwwem new
new cowuwuu< cmwzpmn cowumpweeou emneo umewm :owpmuwevuoz cmmzumn cowpmpmeeou emueo umewe

 

mmmcw>wuommem sue: mcowpmpmeeou —mwpema eo mcwemueo-x:mm

m wpamh

DISCUSSION

This chapter reviews the major findings of the present research
in the context of the research questions introduced in Chapter I, and
discusses the implications of these findings.

Question 1: Are Modified RD&D Programs Implemented

 

with Fidelity at Adopting Sites?

 

The results of this study showed that seven innovative social
programs developed and disseminated using the modified RDBD approach
have been implemented with acceptable fidelity at many adopting sites.
Distributions of fidelity scores were skewed in a positive direction,
and measures of central tendency fell within the acceptable range.
These results were generally obtained by both the telephone interview
and site visit methods of data collection.

This claim can be made with some confidence, since the precision
and accuracy of the fidelity measure were foci of measurement
development and data collection. A precise measure was developed
by content analyzing interviews with developers and extracting lists
of components and variations which operationalized programs in
relatively specific and explicit terms. Accuracy of measurement was
tested by carefully monitoring inter-coder reliability, and checking
for agreement between data sources. Substantial agreement between the
telephone and site visit results also attested to the accuracy of

measurement.

93

94

Question la: Are There Differences Between Sample Programs of Fidelity?
Question lb: Are There Differences Between Two Policy Areas (Education
vs. Criminal Justice) on Fidelity?

Despite the overall pattern of acceptable implementation, there
were significant differences found between programs, both within and
between the two policy areas. For the most part, differences appeared
regardless of whether components were scored on three or two point
scales, although support for differences between policy areas was
weaker than that shown for differences between programs.

Questions 1, la, lb: Implications

Given the fact that only seven programs were examined, and
given the somewhat unorthodox use of analysis of variance to examine
differences between programs and policy areas, the statistical
significance of results attesting to between-program and between-policy
area differences should be viewed with caution. However, these results
clearly suggest that although it is quite possible to achieve high
fidelity social program implementation using the modified RDBD approach,
such implementation is far from automatic. The results also suggest
that differences between policy areas as well as programs within policy
areas exist, although the evidence supporting the policy area
differences is admittedly weaker.

Differences between policy areas may be due to different
dissemination tactics used. For example, the two programs which
were implemented with the highest fidelity (HOSTS and EBCE) were
disseminated using highly sophisticated methods. These included

elaborate networking (national, regional, and local conferences),

95

considerable on-going technical support, explicit preadoption
agreements with administrators, and requirements for monitoring
and evaluating program outcomes. The dissemination methods used
for the Criminal Justice programs were in general less elaborate.
Interestingly, the Criminal Justice program which exhibited the
highest fidelity (One-Day One-Trial) also appears to have benefited
from the most extensive dissemination efforts. However, since the
extent of differences between programs on such variables were not
measured in this study, these ideas should be viewed as grist for a
hypothesis worthy of further study, rather than as a conclusion
supported by this research.

0n the other hand, the findings of the present research clearly

support the position that abandoning the modified RDBD approach at

 

thisgpoint would be premature. In other words, those who criticize
the R030 approach as unrealistic or foolhardy (e.g., Farrar, deSanctis,
& Cohen, 1982) might re-examine the vociferousness of their position in
light of these results.

Yet in contrasting the results of the present study with those
of studies used to attack the R080 model, several points should be
clarified. This may best be accomplished by contrasting the present
research with the most widely-cited of the "anti-RDBD studies," the
RAND research reported by Berman and McLaughlin (1978).

First, the samples of innovations differed considerably between
the two studies. While the RAND study examined the implementation
of loosely defined policies and the translation of these policies into

various projects, the present research investigated the implementation

96

of explicitly defined and highly specified programs which fit the
conceptual parameters of the modified R080 approach.

Second, Berman and McLaughlin (1977) used as their
implementation outcome measure "the extent to which projects met
their own goals," thus building adaptation into both methodology
and findings. The present study has attempted to separate the
concepts of fidelity and adaptation in two ways. Rather than
defining implementation outcomes in terms of project goals, the
present research involved interviews with program developers in
order to identify specific components and variations. In addition,
changes from the original program models which did not fit this
component/variation framework were characterized herein as
"reinvention," a concept independent from "fidelity."

Third, in fairness to Berman and McLaughlin, it should be
recognized that these researchers did not attempt to confuse their
"project-based" implementation measure with the concept of fidelity.
However, as noted by Datta (1981), others have seized upon the RAND
findings concerning the prevalence of "mutual adaptation, cooptation,
and non-implementation" to support the dismantling of RDBD efforts. It

is hoped that the present discussion will help rectify this confusion.

97

Given the difference between the RAND study and the present
research, limits for generalizing from the results of these two
studies should be made clear. The results of the present research
should not be generalized beyond relatively explicit programs fitting
the modified RDBD model to more loosely defined projects. Conversely,
the results of the RAND study should not be generalized beyond
loosely defined projects to more highly specified programs. This
distinction between program types, coupled with the acceptably high
fidelity results obtained by the present research, would argue that
generalized support for dismantling the RD&D model is unwarranted.

Finally, it should be noted that although generally high levels
of agreement were obtained between the telephone interview and site-
visit methods, differences were observed as well. The general
implication for the measurement of fidelity of these findings is that
types of items which can obtain high levels of agreement between
methods should be measured using the more inexpensive telephone
interview method, while types of items which did not produce agreement
should be measured during site-visits. This approach would save
resources, since both telephone interviews and site-visits could be
considerably shortened. Therefore, it would be advisable to perform
a content analysis of high-agreement vs. low-agreement items to
develop a typology of items for further use.
Question 2: To What Extent is the Fidelity of Program Implementation
Related to Program Effectiveness?

The relationship between fidelity and effectiveness was demonstrated
in the present study to be moderately strong. A Pearson correlation of

.38 was obtained between program fidelity (measured during site-visits)

98

and effectiveness rankings (based on archival records). In general,
this finding supports an important assumption of the modified R080
approach: replication lead to effectiveness.

Question 2: Implications

Given the problems in the effectiveness data set which
necessitated the ranking procedure to achieve comparability across
sites, this major finding should be viewed with some caution. The
two major flaws of the effectiveness data set were: (a) differences
between sites and programs in time periods covered by available
archival data; and (b) differences between sites in the types of
measures used. Examples of these flaws are the following:

For most sites, fidelity was measured during the winter and spring
of 1982. However, for three HOSTS sites, 1980-1981 school year data
was provided; for 000T, 1981 calendar year data was used. Thus
program changes between 1981 and 1982 have an unknown effect on
the fidelity-effectiveness correlation. An another example, archival
data for the MCPRC program covered five different lengths of reporting
periods, ranging from eight months to three years. The comparability
of these data is therefore open to question.

With regard to differences between types of measures, the following
examples are illustrative: only four FOCUS sites reported achievement
test scores; criteria for defining recidivism differed among CAP and
MCPRC sites; and only three SCCPP programs had neighborhood level data,
with four sites reporting citywide data and three sites providing no
data.

Despite these flaws, two points should be considered. First, recall

that a high level of inter-coder reliability was achieved (corrected

99

Spearman Rho = .90). Second, it is noteworthy that no other study was
identified in the literature review which utilized archival effectiveness
data in the study of social program innovation. Those studies which
have measured effectiveness in the context of social innovation research
in public sector organizations have generally used global rating
procedures based on the impressions of research staff (e.g., Pelz,
1983).

In summary, although flaws in the effectiveness data set suggest
treating the fidelity-effectiveness correlation with some caution,

this finding should be given serious consideration, especially in light

 

of the current policy trend to "throw the R080 baby out with the bath
water" (Datta, 1981).
,Question 3: What is a Useful Definition, and What is a Useful

Typology, for the Concept of Reinvention?

 

The research efforts devoted to the topic or reinvention were
intended to be exploratory. Previous discussions in the literature
(e.g., Rice & Rogers, 1979) did not define the concept of
reinvention in relation to other constructs, let alone attempt to
operationalize it. Therefore, the present study of reinvention was
primarily concerned with developing a data-based conceptualization
of reinvention.

Reinvention was defined and categorized by content analyzing case
transcripts. This analysis produced a definition and set of categories
for reinvention which enabled highly reliable coding. Reinvention was
defined as "the use of materials, activities, procedures, or

organizational structures by organizations implementing modified

100

RD&D model programs, that cannot be adequately explained using the
framework provided by the developer-defined program components and
variations." The typology of reinvention consisted of two major
dimensions: Addition-Modification, and Proactive-Reactive. Reactive
instances of reinvention were further categorized as either internal
or external to the site, depending on the source of constraint(s)
which led to the categorization of the instance as Reactive. Finally,
each instance of reinvention, after being categorized, was assigned

a weight using a three-point importance scale.

Descriptive data analyses of reinvention revealed that a moderate
amount of reinvention occurred fairly equally across sites (median =
six instances). However, many instances were rated as relatively
unimportant. Additions were rated as more important than modifications,
and the number of instances per site was only marginally correlated
with the importance-weighted scores, indicating a weak positive
relationship between the absolute "amount" of reinvention and the
"extent" of reinvention as reflected by importance ratings. Descriptive
analysis showed very few instances of reactive reinvention. Finally,
the ODOT program was found to be an "outlier," with more modifications,
and more trivial modifications, than other programs. No clear
explanation for this result is apparent. One possibility is that the
extensive technical assistance provided by the Center for Jury Studies

contributed to encouraging modification, due to pro-adaptation

 

messages communicated by consultants. The importance of the finding
is clarified in the context of the relationship between addition-

modification and effectiveness, discussed below (Research Question 4).

101

Question 3a: Are There Differences in Reinvention Among the Sample
Programs?

Question 3b: Are There Differences Between the Educational and
Criminal Justice Poligy Areas on Reinvention?

Significant differences between programs occurred only when the
number of instances per site was controlled (i.e., for average
weighted scores). Similarly, only one of the three reinvention
indices revealed differences across policy areas; in this case,
differences were obtained on the absolute number of reinventions.
However, differences (between programs or between policy areas) were
not obtained on the index which is the most meaningful and unambiguous
representation of extent of reinvention, i.e., the importance-
weighted scores.

Questions 3, 3a, and 3b:- Implications

 

Similarly to the effectiveness data, the reinvention data set is
flawed due to possible inconsistencies across sites. This point
deserves some elaboration.

The final conceptualization of reinvention was based on the
content analysis of researchers' impressions which were tape-recorded
immediately following each site visit. Unfortunately, there was lack
of consistency in the questions used to probe for details on.
reinvention. Secondly, it was not possible to check agreement
among data sources, as had been done with the fidelity data.

The inconsistency in probing and inability to check data-sburce
agreement created questions concerning the validity of two types of
judgments: (a) judgments distinguishing proactive from reactive

reinvention; and (b) judgments concerning the "boundaries" of

102

reinvention "instances." In short, limitations related to the
exploratory nature of this part of the study flawed the data set.
These concerns created less of a problem for the addition-
modification dimension. Addition vs. modification coding decisions
were based largely on the theoretical relationship of reinvention
instances to the fidelity instrument, rather than relationships among
reinvention instances or case history information. Despite its
limitations, this study of reinvention has produced several
contributions. In general, the definition and typology contribute
a framework which organizes the previously ambiguous reinvention
concepts. Secondly, the analyses which examined the relationship of
reinvention to fidelity and effectiveness provides interesting
hypotheses for future research.

Question 4: What are the Relationships Among Fidelity, Reinvention,

 

and Effectiveness?

 

The results of this research showed the relationships among these
three variables to be clearly positive. A strong correlation was
obtained between fidelity and reinvention. Moderate correlations
were obtained between fidelity and effectiveness, and between reinvention
and effectiveness. The use of first-order partial correlations to
control for third-variable variance showed the correlation between
fidelity and effectiveness to diminish slightly with reinvention
controlled, and the correlation between reinvention and effectiveness
to diminish to a greater extent when the variance due to fidelity was
controlled. Also, partial correlations showed that the moderate
correlation between reinvention was likely due to the contribution
of additions to program effectiveness, rather than the contribution

of modifications.

103
Question 4: Implications

The clearly positive relationships among the variables represent
an interesting set of findings. As discussed above, the relationship
between fidelity and effectiveness is critical to the R030 approach.
In addition, the moderate relationship between fidelity and
reinvention indicates the potential viability of a conceptualization
which defines reinvention in terms that differ from simple "reductions
in fidelity."

Given the pattern of zero-order correlations, the partial
correlations involving fidelity and reinvention tested the hypotheses
that (l) the fidelity-effectiveness relationship was actually
independent from the effects of reinvention; (2) the reinvention-
effectiveness relationship was independent from the effects of fidelity;
(3) the addition-effectiveness relationship was independent from the
effects of modification; and (4) the modification-effectiveness
relationship was independent from the effects of addition.

The partial correlation analysis provided some support for
hypotheses (1) and (3). Support can be claimed on two counts.

First, the reinvention-effectiveness correlation (with fidelity
controlled) was diminished by partialling to a greater extent than
the fidelity-effectiveness correlation with reinvention controlled.
An examination of the statistical significance of correlations
before and after partialling showed that the fidelity-
effectiveness correlation remained significant after controlling
the variance due to reinvention (change from p = .001 to p = .019)
while the reinvention-effectiveness correlation became

nonsignificant at the .05 level after controlling the variance

104
due to fidelity (change from p = .004 to p = .092). This
finding supports the hypothesis that fidelity can contribute
to effectiveness independently from reinvention. At the
same time, the strong fidelity-reinvention zero-order correlation
suggests that fidelity may also lead to reinvention. The nature
of the two variables makes this causal direction more likely
than the reverse direction, although bidirectional causality
is entirely possible. (That is, programs are likely to be
implemented with some correspondence to the original model before
needs and opportunities for reinvention become apparent.
Reinvention could then potentially influence the level of
fidelity.)

Second, addition was positively related to effectiveness
with modification controlled, while modification was not related
to effectiveness when the variance contributed by addition was
controlled. This supports the hypothesis that positive
reinvention leads to effectiveness.

Despite these points, the evidence on these issues
remains somewhat ambiguous for the following reasons:

1. The absolute decrease in the reinvention-effectiveness
correlation (with fidelity controlled) equals .16 (.33 - .17), while
the absolute decrease in the fidelity-effectiveness correlation
(with reinvention controlled) equals .12 (.38 - .26). The
difference between the two absolute decreases equals only .04.’
Also, note that after controlling variance due to fidelity, the

first-order reinvention-effectiveness correlation of .17 was still

105
significant at the .10 level. In sum, these relationships are not
sufficiently strong to claim unambiguous confirmation of a model
which has fidelity making a clear contribution to effectiveness

independent from the contribution of reinvention. The best that

 

can be said is that this evidence does not disconfirm such a model.
2. The partial correlation analyses of the addition-modification
dimension and effectiveness did produce evidence that the positive
relationship between reinvention and effectiveness was due to the
effects of addition rather than modification. Further, it is
reasonable to assume that there is a positive relationship between
addition and positive reinvention, and the data of Table 5 for the
most part bear this out. However, the data also indicate that the
relationship between addition-modification and effectiveness is not
simple, and bears further study. First, the strongly negative
relationship between addition and effectiveness for the HOSTS program
suggests that some types of additions are negative reinventions, and
that these types are program specific. Similarly, the data in
Table 5 suggest a contrast between the positive modification
introduced by some programs (e.g., the HOSTS and CAP programs) and
negative modification introduced by others (e.g., 000T and EBCE).
Third, the outlier nature of ODOT weighted these results to some
extent, contributing heavily to the negative modification-
effectiveness relationship. Finally, the program-by-program partial
correlations are based on small sample sizes. These data are qUite
subject to sampling error effects, and can only suggest possible
relationships. However, the addition-modification partial

correlation with effectiveness remains interesting, and the

106
relationships between modification-addition and positive-negative
reinvention deserves clarification. This could be accomplished
by coding instances of reinvention as positive or negative in
future research, and introducing this dimension into analyses;

and, by obtaining larger samples for each program.

Future Research

 

The provisional support which the present study provides for
the viability of the modified RD&D approach argues in favor of
continuing this line of research. Given the methodological flaws
identified in this chapter, a first step in this direction would be
replicating the present study with improvements made to correct
these flaws. Such improvements would include (a) checking multiple
data sources for agreement on reinvention data, (to provide a test
of validity) and collecting information on reinvention case histories
more systematically, to better determine the existence and source
of constraints; and (b) spending a greater proportion of time during
the sampling phase in determining whether potential sites could
provide effectiveness data which were equivalent within programs.

Other steps which could be taken to improve future research
based on lessons learned in this study include content analyzing
fidelity items which resulted in agreement between telephone
interviews and site-visits, in order to develop a categorization
of items for which information may be accurately collected over
the telephone, thus saving resources; and, spending a greater
proportion of time during the measurement development stage on
component/variation development for fidelity measurement, so

that a category scheme spanning across programs could be employed.

107

For example, a scheme might include such categories as Client
Entry, Staff Selection and Training, Client Processing Procedures,
Critical Staff Behavior towards Clients, Critical Administrative
Behaviors, Materials and Facilities, etc. Rather than developing
these categories a priori (cf. Leithwood & Montgomery, 1980), they
should be based on the specific set of programs in the sample. The
use of these categories would facilitate rational-empirical
scaling of items (components) so that cross-program comparisons
would be more meaningful.

Finally, two sets of research questions could be added to those
addressed in the present study in replications of this research.
These are: (a) Can sets of "core" fidelity components be identified
which are more essential to achieving program effectiveness than other
components? If so, are these "core" components similarly categorized
across programs? Can developers identify these core components a
priori? (b) Can instances of reinvention be reliably coded as
"positive" (i.e., in the same direction as ideal variations) or
"negative" (in the direction of unacceptable variations)? If so, do
positive reinventions contribute to program effectiveness, and do
negative reinventions detract from effectiveness?

These two sets of questions have important implications for
program dissemination and implementation. Answering the first set
of questions might enable disseminators and implementors to focus
attention on core components, leading to more effective site ad0ptions.
Similarly, attaining a greater understanding of positive vs. negative
reinvention could potentially contribute to greater effectiveness at

sites.

APPENDICES

APPENDIX A
EXAMPLES OF COMPONENTS AND VARIATIONS

APPENDIX A

Examples of Components and Variations
Note. I = ideal, A = acceptable, U = unacceptable

Example #1: HOSTS Readigg Program

N of components1 = 54, N of 3-point components2 = 22, N of 2-point
components3 = 32

Tutors attend faithfully.

I. Tutors are faithful in attendance, achieving on the average at
least 95% attendance rates, (e.g., students are left without a
tutor no more than 5% of the time).

Tutors on the average achieve 80 to 95% attendance rate.
Tutors are not faithful in attendance, achieving an attendance
rate of less than 80%.

C)

Example #2: EBCE (Experience-Based Career Education)

N of components = 60, N of 3-point components = 30, N of 2-point
components = 30

Career Site: Resource Person Comitment

1. Resource people are asked to make a specific commitment regarding
the specific learning experiences offered at the career site.

A. Resource people are asked to make a more general commitment
regarding the general kinds of learning experiences offered at
the career site.

U. Resource people are not asked to make any commitment regarding
the learning experiences offered at the career site.

Example #3: FOCUS (Alternative School-Within-a-School Program)

N of components = 103, N of 3-point components = 84, N of 2-point
components = 19

Hourly attendance is taken.

I. Hourly attendance is taken for all students.

A. Hourly attendance is taken only for those students who the
teacher feels are an attendance problem

U. Hourly attendance is not taken for any students.

 

1Total number of components

2Number of components scaled with 3-points: Ideal, Acceptable, Unacceptable
3Number of components scaled with 2-points: Ideal/Acceptable, Unacceptable

108

109

Example #4: 000T (One-Day One-Trial)

 

N of components = 36, N of 3-point components = 23, N of 2-point
components = 13

Panel size is kept small.

1. Panel size is between 14 and 18 jurors.
A. Panel size is between 18 and 30 jurors.
U. Panel size is over 30 jurors.

Example #5: CAP (Community Arbitration Project)

N of components = 59, N of 3-point components = 36, N of 2-point
components = 23

Youth given choice of continuing arbitration.

I. Youth is given a choice of whether or not to continue arbitration.
It is explained that this is a real choice.

A. Youth is given a choice of whether or not to continue arbitration.
However, it is implied that if the youth does not continue
arbitration the case will go to court.

U. Youth is not given a choice of whether or not to continue arbitration.

Example #6: SCCPP (Seattle Community Crime Prevention Program)

N of components = 59, N of 3-point components = 36, N of 2-point
components = 21

Focus is on residential crime.

I. Focus of crime prevention activity is entirely on residential
burglary.

A. Focus is primarily on residential burglary and neighborhood
crime/community issues.

U. Residential burglary is but one of many components in a
comprehensive crime prevention package.

Example #7: MCPRC (Montgomery County Pre-Release Center)

N of components = 85, N of 3-point components = 56, N of 2-point
components = 29

Pre-referral briefing of potential residents.

1. All potential residents are briefed about the PRC while still in
prison, jail or prior to entry (e.g., as condition of probation).

A. All potential residents are provided with material about the PRC
while still in prison.

U. Potential residents are not briefed or don't receive materials
while still in prison.

APPENDIX B
DESCRIPTIVE ANALYSES OF REINVENTION DATA

APPENDIX 8
Descriptive Analyses of Reinvention Data

Sum of Instances. The median number of instances across sites

 

was 6.00 and the distribution had a mean of 6.53, with a range of
0.0 to 21. Four sites (6%) had no instances of reinvention. The
distribution can be characterized as fairly homogenous, with only
a slight positive skew. This is indicated by the closeness of the
mean to the median.

Weighted Instances. Recall that "importance of reinvention" for

 

each instance was rated on a three-point scale. Since the greatest
number of instances recorded was 21, the potential maximum weighted
instances score would be 3 x 21, or 63. This serves as a reference
point for the observed distribution. This distribution ranged from
zero to 36, far short of the potential range. The median of the
distribution was 7.00, and the mean was 8.63. The disparity of these
two statistics indicated a positive skew (i.e., a larger proportion
of cases fell below the mean than above the mean). Forty-five sites
(64%, or almost two-thirds of the sites) scored nine (approximation to
the mean of 8.63) or less.

Average Weighted Instances. These scores had a potential range
from 0.00 to 3.00. The observed range of the distribution was 0.0
(four occurrences) to 3.00 (only one occurrence). The median of the
distirbution was 1.20, and the mean was 1.26. Thus, the distribution
was slightly positively skewed, with 39 sites (56%) scoring less than
the mean. The mode of the distribution was 1.0 (the lowest scale point

with the exception of zero) with 20 occurrences. One might hypothesize

110

111
that absolute frequency of instances and importance are related; i.e.,
that sites having many reinventions tend to also have important
reinventions. However, the distribution of the modal scores was not
a function of the number of instances per site alone; sites with as
many as 9, 10, and 12 instances had average weighted scores as low as
1.0. More precisely, the correlation between the sum of instances
and the average weighted instances indicated a small, positive relationship
(r = .188, NS). In sum, there was a slight relationship between the
occurrence of reinvention and its importance; a site with many
reinventions was only slightly likely to have important reinventions.

Sum of Modifications. The distribution of unweighted reinvention

 

instances which were coded as modifications had a median and mode of
2.00, a mean of 3.04, and a range of 0.0 to 11.0. The distribution
was positively skewed with 46 of 70 sites (66%) having three or fewer
instances coded as modifications.

The distribution of importance-weighted sums of modification had
a median of 3.00, a mean of 3.80, and a range of 0.0 to 17.0. The
distribution was similar in its positive skew to the unweighted scores,
with 46 of 70 sites (66%) having a weighted score of 4.0 (approximation
to the mean) or less. This similarity indicated that most of the
modifications were rated as relatively unimportant.

Sum of Additions. The distribution of unweighted instances coded

 

as "additions" had a median of 3.00 and a mean of 3.49. The range of
the distribution was 0.0 to 14. Fifty-four of the 70 sites (77%) had
four or fewer additions. The distribution of additions thus somewhat

resembled the distribution of modifications.

112

For the importance-weighted additions, the median was four and
the mean was 4.84. The distribution ranged from 0.0 to 25, and 49
of the 70 sites (70%) scored lower than the mean. Note that additions
were rated as somewhat more important than modifications.

Sum of Proactive Instances. The median of this distribution was
4.0, the mean was 5.03, and the range was 0.0 to 19. The scores were
positively skewed, with 43 of the sites (61%) having five or fewer
instances which were coded as proactive. The distribution of
importance-weighted proactive scores had 37 of the 70 sites (53%)
scoring at the mean of 6.37 or lower.

Sum of Reactive Instances. This distribution exhibited an extreme
positive skew, with a mode of 0.0 and a median of 1.0. The mean was 1.5.
There were 23 sites (33%) which had no instances coded as reactive, and
19 (27%) sites had only a single reactive instances. The range of the
distribution was 0.0 to 6.0.

The high frequency of sites with no reactive instances meant
that some similarity between the distrubutions of weighted and
unweighted scores must occur, and a discrepancy between distributions
is in fact not evident when examining scores below the mean. Yet when
importance-weights were introduced, 20 sites (30%) scored 40 or higher,
while only five sites (7%) had at least 4.0 (unweighted) reactive
instances. Comparing these results to the previously described
proactive scores, note that importance ratings had somewhat of a
greater effect on reactive when compared to proactive reinventiOn.

It should be noted that the distributions of proactive and reactive
instances partly reflect the quality of the data and the decision rules

for coding. This dimension was more sensitive to the post hoc coding

113

method than the addition-modification dimension, since coding

decision regarding proactive vs. reactive coding required case history
information, while addition vs. modification decisions could be made
largely on the basis of the relationship of the instance to the
program's components and variations. This issue is discussed in
greater detail in the Discussion section.

Sum of Internal Reactive Instances. Given the high frequency of
sites which had no occurrences of reactive reinvention, it was not
surprising to find 50 of the 70 sites (71%) with no instances coded
as internally reactive. The median and mode of the distribution
were thus zero. Only four sites (6%) had more than one instance of
internal reactive reinvention.

Sum of External Reactive Instances. This distribution was also

 

characterized by a large number of sites which did not have any coded
instances. There were 35 sites with no instances coded as externally
reactive (50%), and thus both the mode and median were again zero.

Twenty-six sites (37%) had more external than internal instances.

APPENDIX C

DESCRIPTIVE ANALYSES OF EFFECTIVENESS DATA:
TYPES OF DATA, SUMMARY STATISTICS, AND
COMPARISONS TO DEMONSTRATION SITES

APPENDIX C
Descriptive Analyses of Effectiveness Data:

Types of Data, Summary Statistics, and
Comparisons to Demonstration Sites

Help One Student to Succeed (HOSTS)

Types of Data. Obtaining effectiveness data for this program was

 

relatively straightforward. All sites received Title I funding which
required submission of annual evaluation reports. These reports all
contained Normal Curve Equivalent (NCE) gain scores, although a variety
of reading achievement tests were used. These tests included the
California Achievement Test, the Gates-McGuinte Reading Test, the
Metr0politan Achievement Test, the Iowa Test of Basic Skills, the
Stanford Achievement Test, the Woodcock Reading Mastery Tests, and
the California Test of Basic Skills. Recent research has demonstrated
"fairly similar estimates of gains and pretest scores" across various
achievement tests used to implement the Title I Evaluation and
Reporting System (Thompson & Novak, 1981, p. 126), thus justifying
the treatment of different tests as comparable data.

The use of change scores is generally inadvisable due to the
regression-towards-the-mean effect (Campbell & Stanley, 1966).
However, their use was considered permissible for the purpose of
this research, since they were to be employed as the basis for the ranking
procedure for which relatively gross estimates of effectiveness were
sufficient. In addition, NCE gain scores were readily available, while
the analysis of regressed post-scores would have entailed considerable

additional cost.

114

115

All HOSTS evaluations were annual reports. Seven of the ten
evaluations were contiguous to the site-visit period (pre-measured
in Spring, 1981; post-measured in Spring, 1982). The remaining three
evaluations covered the previous year (pre: 1980; post: 1981).

Summary Statistics. Gain scores are reported in NCE (Normal
Curve Equivalent) units. The NCE scale is like a percentile scale in
that it has values of "1'l and "99" at the extremes, and "50" is the
midpoint of the distribution. However, the NCE scale is different
in that it is an equal interval scale. The overall mean NCE gain
score across sites was 10.24, with a standard deviation of 5.60, and
a range of 2.31 to 19.74. This sample mean gain was similar to the
demonstration evaluation gain of 14 NCE's. Three of the sample sites
exceeded the 14 NCE gain score.

Experience Based Career Education (EBCE)

 

Types of Data. Comparing sites on effectiveness presented a

 

special problem for this program. The original federally-funded EBCE
dissemination effort had supported four program models: the

Northwest Regional Lab (NWRL) model; the Far West Lab (FWL) model;

the Appalachian Educational Lab (AEL) model; and the Research for
Better Schools (RBS) model. The first three models were fairly similar,
and were used to develop the fidelity instrument. (The RBS model was
excluded from the study, since it involved considerably less out-of-
school activity than the other three models.) This strategy was
employed rather than basing the instrument on a single model and
sampling only from that model's adopters, since preliminary

investigation had revealed many sites claiming to use aspects from two

116

or three of these models. However, personal communication with
evaluators from the Regional Labs revealed that a wide variety of
instruments had been utilized to evaluate adopting sites. Consequently,
it was decided to select appropriate existing instruments and administer
them during the site-visits, rather than use archival data.

Personal communications with EBCE evaluators and a review of the
current literature (McCaslin, Gross, & Walker, 1979) revealed two
instruments to be best-suited for the present research. These were
the Career Exploration Scale (CES; used with permission from Dr. Thomas
R. Owens, NWRL) and the New Mexico Career Planning Test (NM; used with
permission from the New Mexico State Department of Education). These
instruments were chosen as most suitable to evaluating achievement of
the two primary EBCE goals: performance of actual career exploration
behavior; and training students in the skills required for career
exploration.

The CES is a 25-item instrument. Sample items from the CES
scale follow.

Sample Items

 

In relation to one or more jobs or career fields you might
like to enter, how frequently during the past semester have you:

1. Talked about the job or career field with relatives or
friends?

2. Talked about the job or career field with persons employed
in that career field?

3. Talked about the job or career field with teachers,
counselors or program staff?

4. Read materials about the job or career field?

117

(Note. These items were scored on a five-point scale with Never = 1,
Once = 2, A few Times = 3, Several Times = 4, and Frequently = 5.)
The NM Test contains 20 items. Sample items appear as follows.
1. Rachel, a 9th grader, is considering entering the high school
program in printing because she heard that it is a good trade,
but she would like to know more about it. Which is the least
appropriate way to find out about it?
A. Arrange to watch a printer all day
8. Read about occupations in printing
C. Watch a film about printing
0. Learn about printing after she is in the program
2. Ramon wants to find out how good he is at carpentry relative
to other fellows his age. What would be his most accurate
source of information?
A. A job sample on carpentry
8. His shop teacher
C. His scores on a national carpentry test
0. The Opinion of his friends
The CES scale was reported by NWRL to have been used to evaluate
85 sites by the time of this research, and an alpha coefficient of
.93 was reported (personal communication, Dr. Thomas R, Owen, 1981).
The NM Career Planning Test was reported to be highly reliable and an
excellent instrument for evaluation by evaluators from all the
Regional Labs. No measurement documentation was available from the
evaluators. A review of the testing literature (McCaslin et al.,
1979) and an attempt to contact the test developers also failed to
produce documentation. However, the high degree of consensus among
the lab evaluators prompted use of the NM instrument in this
research.
The two instruments were completed by students enrolled in
EBCE programs at the time of the site-visits. They were administered

during the latter third of the school semester.

118

Summary Statistics. The Career Exploration Scale was scored on
a five-point scale ranging from "Never" to "Frequently" (see sample
items above). For the present sample, the mean was 3.46, with a
standard deviation of .340.

The New Mexico Career Planning Test was scored according to a
scoring key provided by Educational Testing Service, Princeton, New
Jersey. Scores per site represented mean number correct. The mean
across sites was 3.68, with a standard deviation of 1.01.

FOCUS Altenative Education Program

 

Types of Data. The evaluation of the original FOCUS model had

 

examined the following outcome variables: achievement test scores;
pupil attitudes toward school; school suspension; disciplinary
referrals; grade point average; self-concept; court referrals; and
police and sheriff contacts. Inquiries conducted during the telephone
interview phase revealed that the following indices could be provided
by implementing sites: attendance; GPA; grades in reading, math, and
language arts (these were especially relevant since the program
stresses basic skills); and achievement test scores. Although attendance
data per se were not reported in the original program evaluation, it
was felt that attendance would serve as an adequate proxy for measures
which were not readily available at most sites (i.e., pupil attitudes
towards school and court/police contacts) or which were relatively
rare events (i.e., suspension and referrals).

Change scores were again utilized. Since length of stay in the
program differed across pupils within sites, pre-periods (semesters)
were determined for each pupil, depending on his/her date of entry

into the program. In all cases, it was attempted to obtain for

119

post-data end-of—semester records which corresponded to the semester
during which the site-visit occurred. In addition to semester data,
it was also attempted to get pre- and post-annual data, since the
program developer has stressed the importance of allowing sufficient
time for program's effects to influence pupil behavior. The pre-post
,periods covered were the following: six sites provided 9/80-1/81
semester pre-data and 9/81-1/82 semester post-data. Of these sites,
two sites also provided annual data for appropriate pre-periods and
6/82 year-end scores. Three additional sites provided annual data
only, also for appropriate pre—periods and 6/82 year-end scores. One
site failed to provide any data. In assigning ranks, semester data
was used for the six sites which provided pre-post scores, and annual
data was used for the remaining three sites.

Summarnytatistics.- Attendance information was provided by seven

 

sites. Since data were computed for different time periods across
sites, they were converted to percentage change scores. The mean
change was -95% (absences). However, there was a great deal of
variance in these scores, with scores ranging from -85% to +64%.
Three sites reported positive changes, four sites reported negative
changes, and three sites did not make attendance data available.
Information on pupil grades was provided by five sites. Since
different indices were used, these scores were also converted to
percentage change scores. The mean change was +86%. The variance was
again quite high, with scores ranging from -10% to +334%. When
compared with the demonstration site evaluation, the mean change of

the sample compared quite favorably. However, this was largely due

120

to two extreme scores (+334% and +102%) in the sample. The
demonstration evaluation was performed at the two original FOCUS
sites. The mean gain at one site was +11% and +17% at the other
site. Only the two extreme sample scores previously cited exceeded
+17%.

Although grades in reading, mathematics, and language arts were
not reported in the original evaluation, they were utilized in the
present study where possible to compensate for missing data (i.e.,
missing overall GPA's) when assigning ranks. Four sites provided
both reading and math grades, while three provided language arts
grades. Means and ranges for percentage change scores were the

following: Reading: Mean = +4.3%, range = ~12% to +10%; Math:

Mean = +14.2%, range = -1.0% to +22.9%; Language Arts: Mean +14.2%,

range = -1.0% to +22.9%; Language Arts: Mean = 32.1%, range +6.5%
to +80%.

Achievement test scores were provided by four sites. Three
different tests were used (Iowa Test of Basic Skills, California Test of
Basic Skills, and Metropolitan Achievement Test). Percentage change
scores were again employed to assign ranks. The mean gain score
across the four sites were +ll.4%, with a range of 4.5% to 16%. The
mean gain for the present research sample was similar to the
demonstration evaluation, which reported a +9% gain for one site and
a 1% gain for the second site. (The Iowa Test was used for both
sites.) Three of the four sample sites reported gains exceeding 9%.

In sum, there was a great deal of variance across sites on all

measures. Comparisons to two demonstration sites were made on two

121
measures: GPA and achievement test percentage gain scores. The first
measure showed greater change for the present research sample, although
this was due to extreme scores by two of the five sites. The second
measure revealed rough similarity between the two samples. When
considering the quality of these comparisons, it should be recalled
that only half the research sample (five sites) provided GPA's and
only four sites provided achievement test scores.
One-Day One-Trial (000T)

Types of Data. This program has three major goals: (a) to

 

increase the efficiency of juror processing; (b) to make juries more
broadly representative across socio-economic and demographic variables;
and (c) to improve juror attitudes towards the courts.

Efficiency of juror processing is a multivariate concept. An
understanding of the basic processing system is necessary to appreciate
the various sub~concepts. The basic sub-processes around which the
system's efficiency revolves are the postponement/excusal process and
the voir dire hearing flow. The first process refers to the rules and
mechanisms for postponing or excusing juror service for an
individual, while the second refers to the rules and mechanisms
governing the flow of individuals to and from the voir dire hearing.
(During this hearing, jurors are selected for trial service. They may be
rejected by prosecuting or defense attorneys due to their possible
predispositions for a guilty or innocent verdict, or excused for other
reasons.)

A wide variety of input-output ratios have been used by courts to
quantify juror flows. Preliminary infbrmation obtained during the

telephone interviews indicated that three of these indices were most

122

commonly used among the present sample. These were the following:
(a) Juror Days Per Trial (JDPT; number of juror days served divided
by the number of trials); (b) Juror Usage Index (JUI; juror days
served divided by number of trial days); (c) People Brought in (PBI;
number of jurors reporting divided by the number of trials begun).
These ratios measure different aspects of input-output efficiency.
The ideal system accurately estimates the number of jurors needed to
be called on any given day, based on the number of trials to be begun
that day. This involves not only efficient juror handling (i.e.,
notification, entry, orientation, and voir dire assignment), but also
efficient communication with judges regarding trial needs, and an
appreciation on the part of judges of the jury system's needs.

Summary Statistics. For all sites, data for the 1981 calendar

 

year was utilized. Data used to compute the three efficiency ratios
(JDPT, JUI, and FBI) were provided by all ten sites. The means across

the ten sites were the following: Mean (JDPT) = 62.13; Mean (JUI) =

23.4; Mean (PBI) 36.34. Standard deviations for the three indices

were: SD (JDPT) 22.8; SD (JUI) = 7.3; SD (PBI) = 13.74. Ranges for
the three indices were as follows: JDPT, 12.17 to 92.05; JUI, 9.13 to
36.10; PBI, 14.35 to 62.58. JDPT and JUI figures covering six month
pre- and six month post-periods were reported in the evaluation of the
original demonstration site. These figures were: pre (JDPT) = 49.3;
post (JDPT) = 36.9; pre (JUI) = 16.3; post (JUI) = 11.1. Given these
figures, it appears that the research sample in general was not as

effective as the demonstration site. For both measures, only one

research sample site scored better than the demonstration site.

123
Community Arbitration Project (CAP)

 

Types of Data. The telephone interviews revealed that three

 

measures were most likely to be available at the sample sites:

(a) percent of youths recidivating; (b) percent of community
assignments completed; and (c) the percent of youths referred to the
States Attorney's Office as a result of the arbitrary hearing.

The relevance of the first two measures to the effectiveness of
the program should be fairly self-evident. As a proposed alternative
to the existing juvenile justice system, the arbitration program should
result in low percentages of youths committing further offenses, and
high percentages of community service assignments completed. The
importance of the third measure relates to the program's goal of
successfully targeting those youths whose offenses are sufficiently
minor to warrant diversion to the arbitration program. Thus, low rates
of State's Attorney referrals are sought.

There was considerable variation among sites with regard to the
types of data and quality of data available. Six sites had collected
recidivism data. However, the time periods for these data varied from
six months to three years, and criteria for defining recidivism also
varied. Time periods for the other two measures also varied, in this
case from three months (the most recent quarter) to one year. There
was also considerable variation on the availability of these two
measures, with eight sites providing information on community
assignments, and seven sites providing State's Attorney referral

data.

124

Given these variations, it was decided to use "the best data
available for each site" as the basis for ranking. Thus, annual data
was used when available.

Summarnytatistics. Means and standard deviations for effectiveness
data across sites were the following: percent recidivating; Mean =
11.6%, SD = 4.2% (data provided by six sites); percent completing
community assignments: Mean = 84.6%, SD = 8.0% (data provided by
eight sites); percent returned to State's Attorney's Office as a
result of the arbitration hearing: Mean = 12.5%, SD = 14.0% (eight
sites provided data).

The mean recidivism rate for the research sample (11.6%) was
quite similar to the rate reported in the demonstration site
evaluation (9.8%). However, the research sample was somewhat inferior
to the demonstration site with regard to the percent of youths returned
to the State's Attorney's Office (12.5% vs. 7.2%). Data on successful
completion of community assignments was not reported in the
demonstration site evaluation.

Seattle Community Crime Prevention Program (SCCPP)

 

Types of Data. Organizations implementing crime prevention

 

programs face an evaluation problem above and beyond the normally
difficult obstacles associated with criminal justice evaluation. This
problem involves the difficulty in collecting on-going data on program
"clients." Implementers of all of the other innovative programs
included in the present study in general collected effectiveness

data as part of normal program operations. However, police departments

normally collect data on a citywide basis, or on levels of analysis

125

not necessarily contiguous with geographic units in which crime
prevention programs are implemented. These units are usually referred
to as "neighborhoods," and their geographic definition is for the
most part ambiguous. Also, participation in crime prevention programs
is constantly shifting, thus exacerbating the evaluation problem.

The original site dealt with this problem by (a) targeting
census tracts; (b) seeking to "saturate" census tracts with services;
(c) administering victimization surveys to block club participants;
and (d) keeping accurate and up-to-date records on block club
membership.

Unfortunately, few adopters used such a data-based and focused
approach. Three of the 10 site-visited programs failed to provide
any data whatsoever. 0f the seven programs providing data, only
three had information specific to the served areas as well as citywide
data; the remaining four were able to provide citywide statistics
only. (One of these sites provided unaggregated data on three large
neighborhoods claimed to have been "covered" by the program. However,
investigation revealed that many residents in the neighborhoods had
not participated in the program, and thus the data were treated as
an estimate of "citywide" statistics.) The three sites which had
collected data specific to the served areas had also collected data
on control neighborhoods, although in contrast to the original site
evaluation, no random assignment was utilized in these three sites.
Pre-post data was available for all sites, including those whiCh had
citywide data only. All data reflected burglary rates, but there was

variation across sites regarding the type of burglary statistics

126

available, with four sites providing "residential burglary rates,"
one site providing "breaking and entry" (B&E) only, one site providing
"burglaries and entries," and one site providing "burglaries, B&E."
Summary Statistics. The three burglary rates for the sites
reporting data on the neighborhoods covered by the program were .4%,
.2%, and 5.9%. Two of these sites were superior to the demonstration
site, which reported a rate in one survey of 2.43% for neighborhoods
covered by the program, and a rate of 5% for a second survey. (These
results are difficult to compare since the two demonstration site
surveys covered different time periods [yearly vs. six months] for
different years [1975 vs. 1976] and used different criteria for
inclusion.) For the citywide burglary rate change-score data, the
mean across the seven sites providing data was -69.l%. The range was
-92% to +26.5%, with five sites reporting decreases and two sites
reporting increases. The site reporting -92% was the site which had
provided unaggregated data for the three neighborhoods claimed to
be covered by the program. These data were not treated at face
value during the ranking procedure, but were weighted by the
credibility loss of the site.
Montgomery County Pre-Release Center (MCPRC)

Types of Data. Information gathered during the telephone

 

interviews indicated that the types of effectiveness-related data
most likely to be provided by sites were the following: percent of
residents successfully completing the program; recidivism rates for
those successfully completing the program; percent of residents

presently employed; amount of restitution paid per resident;

127

savings per resident; amount of family support provided per resident;
and reimbursement to the program per resident. As was the case with
the Crime Prevention Program, there was variation across sites
regarding time periods covered. There were five different reporting
periods provided, ranging from eight months to three years.

Summary Statistics. The mean percentage of successful program

 

completion (i.e., not revoked from the program), across sites (seven
sites reporting) was 66.6%, with a range from 41% to 94%. This
compared to the demonstration site percentage of 74.3%. Three of the
sample sites had percentages of 74% or better. The mean percentage
recidivism rate for the sample (six sites reporting) was 16%, with
percentages ranging from 3% to 22%. This compared favorably with

the demonstration site evaluation, which reported a rate of 22.2%.

Thus all sample sites reported lower recidivism rates than the
demonstration site. With regard to employment, a mean percentage

of 76% employed was achieved by the sample sites (six sites reporting).
Percentages ranged from 41% to 100% employed. This compared to the
demonstration site's figure of 93%. Two sample sites had percentages
of 93% or better. The average amount of restitution across five sites
reporting (and with site used as the analysis unit) was $40.61. The
average amount of savings on release was $369.95, as reported by five
sites. It is difficult to make comparisons with the sample, since the
demonstration evaluation reported ranges such as "$50 or less" and
"$250 or more." Average family support across the four sites reporting
was $118.79, and the average amount reimbursed to the program by

residents across four reporting sites was $160.76.

REFERENCES

REFERENCES

Berman, P. (1980). Thinking about programmed and adaptive
implementation: Matchin strategies to situations. In H.M.
Ingram & P.E. Mann (Eds.), Why policies succeed or fail.
Beverly Hills, CA.: Sage Publications.

 

Berman, P. (1981). Educational change: An implementation paradigm.
In R. Lehming & M. Kane (Eds.), Improvingyschoolz Using what
we know. Beverly Hills, CA.: Sage Publications.

 

Berman, P., & McLaughlin, M.W. (1977). Federalyprograms supporting
educational chan e: Factors affecting implementation and’
continuation R-1589/7-HEWT} Santa Monica, CA.: Rand
Corporation.

 

 

Berman, P., & McLaughlin, M.W. (1978). Federal programs supporting
educational change: Implementing and sustaining innovations
(Final Report, R-1589/8-HEW). Santa Monica, CA.: Rand
Corporation.

Boruch, R.F., & Gomez, H. (1977). Sensitivity, bias, and theory in
impact evaluation. -Professional Psychology, 8(4), 411-433.

Campbell, D.T., & Stanley, J.C. (1966). Experimental and quasi-
experimental designs for research. Chicago, IL.: Rand-McNally.

 

Calsyn, R., Tornatzky, L.G., & Dittmar, S. (1977). Incomplete
adoption of an innovation: The case of goal attainment scaling.
Evaluation, 128-130.

 

Crandall, D. (1979). A study of dissemination efforts supporting
school improvement: Final study deSign. ‘Andover, MA.: The
Network.

Cyert, R.M., & March, J.G. (1963). A behavioral theory of the firm.
Englewood Cliffs, NJ.: Prentice-Hall.

Datta, L.E. (1981). Damn the experts and full speed ahead: An
examination of the study of federal programs supporting educational
change, as evidence against direct development and for local
problem-solving. Evaluation Review, 5(1), 5-32.

Emrick, J.A., Peterson, S.M., & Agarwala-Rogers, R. (1977, May).
Evaluation of the National Diffusion Network, Vol. 1: Findings
and recommendations (SR1 Project 4385). Washington, 0.0.:
U.S. Office of Education, Department of Health, Education, and
Welfare.

 

128

129

Eveland, J.D., Rogers, E., & Klepper, C. (1977, March). Igg_
innovation process in public or anizations: Some elements of
a preliminary model. spFingfie d, VA.: NTIS.

Farrar, E., deSanctis, J.E., & Cohen, D.K. (1979). Views from below:
Implementation research in education. Cambridge, MA.: Huron
Institute.

 

Fullan, M., & Pomfret, A. (1977). Research on curriculum and
instruction implementation. Review of Educational Research,
41(2), 335-397.

Gephart, W.J. (1976, April). Problems in measuring the degree of
implementation of an innovation. Paper presented’at the annual
convention of the American Educational Research Association.

 

Hall, G.E., & Loucks, S.F. (1981, April). The concept of innovation
configurations: An approach to addressingprogram adaptation.
Paper presented at the annual meeting of the American Psychological
Research Association, Los Angeles.

Hall, G.E., & Loucks, S.F. (1978, March). Innovation configurations:
Analyzing the adaptation of innovations. Paper presented at the
annualeeeting of the American Educational Research Association,
Toronto.

 

Havelock, R.G. (1976). Planning for innovation through dissemination
and utilization of know1edge. Ann Arbor: University of Michigan.

 

Heck, S., Steigelbauer, S., Hall, G.E., & Loucks, S.F. (1981).
Measuringinnovation configurations: Procedures and applications.
Austin, TX.: Research andFDevelopment Center f0r‘Teacher
Education, The University of Texas.

House, E.R. (1975). The politics of educational innovation.
Berkeley, CA.: McCutchan.

House, E.R., Kerins, T., & Steele, J.M. (1972). A test of the research
and development model of change. Educational Administration

Quarterly, 8(1), 1-14.

House, E.R. (1981). Three perspectives on innovation: Technological,
political, and cultural. In R. Leming and M. Kane (Eds.),
Improving schools: Using_what we know. Beverly Hills, CA.:
SageTPublications.

Larsen, J.K., & Agarwala-Rogers, R. (1977). Reinvention of innovative
ideas: Modified? Adapted? or None of the above? Evaluation,
136-140.

130

Leithwood, K.A., & Montgomery, D.J. (1980). Evaluating program
implementation. Evaluation Review, 4(2), 193-214.

 

Lippit, R., Watson, J., & Westley, B. (1958). The dynamics of
planned change. New York: Harcourt Brace & Jovanovich.

March, J.G., & Simon, H.A. (1958). Organizations. New York: John
Wiley and Sons.

 

McCaslin, N.L., Gross, C.J., & Walker, J.F. (1979). Career education
measures: A compendium of evaluation instruments. *Columbus,
OH.: The National Center forTResearch in Vocational Education.

 

Mohr, L.B. (1978). Process theory and variance theory in innovation
research. In R. Radnor, I. Feller, & E. Rogers (Eds.), Igg_
diffusion of innovation: An assessment. Evanston, IL.:
Northwestern University Press.

 

Nunnally, J.C. (1978).- Psychometric theory. New York: McGraw-Hill.

 

Owens, T. (1981). Northwest Regional Educational Laboratory.
Personal communication.

Owens, T.R., & Haenn, J.F. (1977, April). Assessin the level of
implementation of new programs. Paper presente at the American
Educational Research Association.

 

Pelz, D.C. (1983). Quantitative case histories of urban innovations:
Are there innovatin stages? IEEE Transactions on Engjneering
Management, EM-30(2 , 60-67.

 

Raizen, A. (1979). R&D management practices: Dissemination programs
at the National Institute of Education: 1974 to 1979.
Knowledge: Creation, Diffusion, Utilization, 1(2), 259-292.

Rice, R.E., & Rogers, E.M. (1980). Reinvention in the innovation
process. Knowledge: Creation, Diffusion, Utilization, 1(4),
499-514.

Rogers, E.M., & Shoemaker, F.F. (1971). Communication of innovations:
A cross-cultural approach. New York: Free Press.

Rohrbaugh, J., 8 Quinn, R. (1980). Innovation and organizational
performance: A study of the implementation andIroutinization of
a new information technology_. Grant proposal, National Science
Foundation, Policy Research and Analysis Division. Available
from authors, State University of New York at Albany, Graduate
School of Public Affairs.