All posts by Caldwell Report

Validating Data for Clinical and Forensic Use of CBTIs

by Alex B. Caldwell, Ph.D.

COMMENTARY ON: Relative user ratings of MMPI-2 computer-based test interpretations, John E. Williams and Nathan C. Weed, (2004). Assessment, 11, #4, 316-329. Caldwell Report will provide you with a copy if requested.

This study set out to do a meaningful competitive comparison of the then eight publicly available Computer Based Test Interpretation services (CBTIs) for the MMPI-2. In a thorough study they answered many prior criticisms of preceding studies, none of which had provided a comprehensive appraisal of all of the available services.

The reports that were rated included protocols from inpatient, outpatient, college counseling, and prison samples. The participants submitted an answer sheet from one of their own clients to the authors, and they received either a CBTI analysis of that profile or else an analysis of a profile that was modal and gender matched for whichever of those four groups to which it corresponded (257 valid protocols were used). The participants then rated the report which they received on 10 variables (they knew it could be their case or a modal profile, but they only found out which after having sent their ratings back to the authors). These ratings were as follows:

1. Conciseness
2. Confirmation of therapist’s impressions of the client
3. Usefulness for diagnosis and/or treatment
4. Accuracy
5. Provision of new and important information
6. Presence of contradictory information
7. Organization and clarity
8. Presence of useless information
9. Omission of important information
10. Appropriateness of diagnostic considerations

The ability to compare the ratings of the actual reports versus the modal reports enabled them to demonstrate that the CBTI reports were adding a large amount of information above and beyond what could be attributed to stereotype accuracy or Meehl’s “Barnum” effect. This latter is the potential for descriptive statements to be considered highly accurate despite their lack of discrimination among individuals (e.g., “Some days are better than other days.”). For clinical purposes this increment is happily reassuring as well as work-facilitating, and for forensic purposes it stands as a strong support of the expectation that the CBTI reports we use really are saying specific things that can potentially make substantial differences in the determinations to be made by the trier of fact.

The averaged ratings were then extensively analyzed by rank ordering the levels of favorability across the eight CBTI programs. The reports by Automated Assessment Associates (Strassberg & Cooper) received the highest ratings on accuracy, clinical usefulness, confirmation of opinion, and diagnostic suggestions. Strassberg and Cooper described their systems as conservative and to be used only in conjunction with other information; conservatism in this predictive context may enhance accuracy. The reports that were offered by Western Psychological Services obtained the highest ratings on being concise and free of useless information. The NCS-Pearson reports rated highest on organization and absence of contradictory statements. Williams and Weed distinguished between their report accuracy versus report style and organization variables; the WPS and NCS-Pearson reports thus topped out on the style variables although not topping out on content (except the authors grouped conciseness with content).

The reports from Caldwell Report had top ratings on the inclusion of new and important information and on not omitting important information (the latter by a wide margin over all of the other reports). I find this gratifyingly consistent with my obvious long-term intent to provide thorough reports that take as full an advantage as possible of the wealth of information that is embedded in the profiles and the other scores. Williams and Weed mentioned a prior study (Adams and Shore, 1976) in which there was a modest relationship between length of a report and its accuracy rating, with longer reports rated as less accurate overall. This makes obvious sense to me in that the more different things one says and the more specific one’s statements, the more opportunities one has to “go wrong.” I would confess some gratification at being third ranked – and close to second – in overall accuracy despite going out on so many “limbs” where my specificity could easily be rated as not “on target.”

Considering the use of a CBTI as a consultation, I believe that providing new and important information and not leaving out important issues is a crucial contribution of the “consultant.” That one’s clinical impression is confirmed is reassuring as well as strengthening of one’s clinical interventions and forensic presentation. But calling attention to what the client may have consciously avoided confronting or unconsciously led attention away from can be a significant gain for the clinician. A psychiatrist with whom I enjoyed working many years ago (Ulrich Jacobsen, M.D.) spoke of using the MMPI either “to confirm or to alert.” Not omitting important information corresponds, of course, to the alert function. I believe that having confidence that the MMPI has been thoroughly searched for overlooked or avoided issues should be strongly reassuring to the therapist or examiner.

MASTER LECTURE
What Do the MMPI Scales Fundamentally Measure? Some Hypotheses

Read Alex Caldwell's Master Lectureby Alex B. Caldwell, Ph.D.

I consider the question of whether all psychpathological behaviors can, on an evolutionary foundation, be consideredas positive adaptations. I proposed that higherfunctions can be differentiated from their associated emotional modulations at simultaneous subjective, behavioral, and neural levels and that organizing analyses inthis way will enable us to fill in our understanding of both the effects and relief oftraumatic experiences. I then present each of the 8 clinical scales of the MMPI(Hathaway & McKinley, 1943) as a dimension of positive adaptation with simultaneous cognitive-emotional, operant-classical, and neocortical-limbic elements.

Continue Reading…

Police Officer Involved Shootings Spotlight Need For Stricter Background Checks

by Alex B. Caldwell, Ph.D.

The revelation of numerous police officer involved shootings have focused a spotlight on the need for stricter background investigations of police candidates, as well as a renewed scrutiny of their pre-hire psychological screening. Psychological evaluation not only reveals diagnosable mental disorders, but also can flag those recruits whose personality types and behaviors are unsuited to police work where judgment and emotional stability are crucial, especially in situations involving high stress and/or high risk.

Because there is currently no national standard for how police recruits are psychologically evaluated, police departments are often dependent on the expertise and experience of a police psychologist. However, state statutes do not explicitly require a licensed psychologist administer an evaluation. Even when mandated, evaluations may be inconsistently administered. Tests and methods used by examiners, as well as the qualifications of the examiners themselves, vary widely. Some departments forgo formal testing altogether. Yet we know there is a direct correlation, as cited by Flint Taylor, of the People’s Law Office in Chicago, between a lack of screening and those cases litigated involving police brutality.

When you use the Caldwell Report Personnel Evaluation Form, which was normed on police applicants and is imbued with the refined and complex interpretation of Alex B Caldwell, Ph.D., the world’s leading expert on the MMPI-2, your police evaluation process will benefit from Dr.Caldwell’s vast experience and expertise. This Personnel Evaluation Form is used for the screening of high risk employment and related selections of special sensitivity and/or consequences for the public. It is also used for return-to-work evaluations.

Eliminate the risk of a “bad hire”, which, as we have recently been made so painfully aware, can result in trauma to individuals, families and the public, as well as devastating legal and criminal repercussions for your department.
<div class=”readMoreText mb”>
<h2>Adaptational Etiologies</h2>
The reports from Caldwell Report now contain new, unique, and we believe revolutionary hypotheses as to the developmental origins of the patterns of psychopathology that are identified by the MMPI-2. These are emotionally shaping traumas and developmental experiences including rearing attitudes, histories and types of abuse, personal tragedies, more recent adult traumatic experiences and onsets, and for some patterns potential biologic dispositions that can make a person more vulnerable to the particular MMPI-2 codetype outcomes. The qualities of the attachments they form are considered in detail. As is explained in the website material, this is based in part on Dr. Caldwell’s belief that <em>all behavior is survival adaptive</em> when we sufficiently understand the individual in his or her experiential and constitutional contexts.

These developmental supplements are now included in about three-fourths of the narrative reports that we prepare; some code types occur so infrequently that sufficient etiologic data is not presently available. Some profiles, of course, have a single pair of scales that are distinctly the most elevated. Other profiles may have four, five, or six scales nearly equally elevated. These latter typically are clinically mixed cases with behavioral suggestions of several varied diagnoses. In the simple profiles, the etiologic and developmental information is expected to be a relatively good fit, sometimes even a bit uncanny. But the profiles with several scales nearly at the same elevations often have complex and mixed histories with diverse traumas if not many different painful and aversive experiences. In these cases the etiologic data is expected to fit only a part of the emotional history, a salient part but incomplete in the areas reflected in the secondary scales.

In <em><u>forensic</u></em> applications the etiologic Supplement has at times proven problematic. The developmental material – that can be so productive and save so much time in psychotherapy – is tangential to the determinations to be made by the trial court. The Supplement has in-depth information that in most cases could only be confirmed or disconfirmed in extended and intensive treatment sessions; we would almost never expect it to be covered within the time frame and focus of a forensic examination. Clients have alerted us to instances in forensic cases where cross-examining attorneys have taken to asking questions about the Adaptation Supplement. Their effort is to obscure the prior direct testimony with unanswerable questions, as if to suggest that the examiner’s work was incomplete or deficient. One colleague said his response is, “If you want to know about that material, you should call Dr. Caldwell as a Witness.”

Therefore, this Adaptation Supplement is <em><u>explicitly optional</u></em>. It will normally be included in each report we send out where such information has been identified for the obtained pattern type. However, you un-select this option and we will not in clude it in the report.

If there are any additional ways we can help you, please let us know.

PROLOGUE: From Etiology to Empathy to Compassion

by Alex B. Caldwell and Micheline Becker-Caldwell

One of the touchstones of psychotherapy is the quality of empathy, the capacity to feel the emotions that envelope and motivate the client. The more we understand our clients, the more effective our interventions will be. Comprehending what experiences in our clients’ lives have strongly influenced their present suffering can strengthen our connection with them.

To conceptualize all behavior as adaptive is a pivotal step in this direction: the Adaptation and Attachment Supplements are my present effort to facilitate this understanding. To have guidelines toward the recent as well as more remote past origins of the person’s current frustrations and suffering is to have a new set of hypotheses to consider in exploring what has led the person into his or her present state and circumstances. To be able to help the person understand what are his/her points of greatest sensitivity, how they have come about, how he/she is protecting him/herself, and then what the prices are that the person is paying to do this is to clarify the person’s choices and thus to enable change. The clinician is a guide or facilitator in assisting the changes that can gain the person a more gratifying life, i.e., a “facilitator of more effective adaptation.”

It is easy to become habituated to how judgmental our clinical terms are. “You are passive-aggressive,” “dependency manipulative,” or “borderline” are hardly less pejorative than (respectively) obnoxious, conniving, and crazy. The moment we start forming negative judgments of our clients we start to lose them – they will start having to defend themselves against us, we who are supposed to be their allies and protectors. The more fully we can understand what conditioning experiences and biologic vulnerabilities have operated to shape the client as he or she presently is, the easier it is to keep our own good-bad perceptions out of the way.

Note that the words “maladjustment” and “maladaptation” do not appear anywhere else in this website other than this one paragraph. Maladaptation is an observer’s judgment as to how people’s ways of reacting are not gaining them the gratifications and goals that they reasonably might pursue or perhaps explicitly desire. Given the strength of private self-justifications, “I am maladjusted” or “I am a dysfunctional individual” is rarely or at the most limitedly a part of most people’s private self-perceptions, even if defensively proclaimed in order to blunt someone else’s criticisms and judgments (“You may be right, I guess I am.”). But the private self-statement is far more likely to be, “But I had to do that,” than “I just did that because I am so badly maladjusted.”

As people with elevated scores on the Pa-3 subscale so consistently demonstrate, the judging person knows that he/she is right. Thinking in terms of all behavior as being adaptive can dilute and help relax what seems a natural if not nearly universal vulnerability to become judgmental when frustrated or threatened. We see the challenge is not to never be judgmental but rather to become aware that we are protecting ourselves against our own discomforts when we become judgmental.

Empathy is a major bridge to compassion; the path we are proposing is from etiology to empathy to compassion. Note how strongly the originators of the major religions have advocated compassion. For example, Christ spoke of caring for the most vulnerable: “Inasmuch as ye have done it unto one of the least of these my brethren, ye have done it unto me.” Matthew 25:40. Buddha made compassion a core focus of his thought, not to cause suffering to another sentient being. The Dalai Lama emphasizes this. If you truly appreciate another’s pain and hurt, you become aware of whatever pains your own actions are causing them.

Our clients so often have fears and other reactions they do not want; seeking to change themselves is typically why they seek help. For the clients to understand that their distress is the natural outcome of what they have endured and survived is to see more clearly what it is that they want to change. A natural benefit of this conceptualization is less judgmental negativity and increasing compassion for themselves as well as others. Our world is one of ever-increasing population demands on finite resources, with all the frustration, aggression, and violence that can ensue from that. The quality of life – if not the chances for the survival of the species itself – may be improved by whatever increases in compassion that we can contribute to society.

Whatever fresh insights may arise from the collaboration of Buddhists and neuroscientists, it is my hope that these may lead us to become more and more “warm-hearted persons.” I would like to conclude this essay with the Dalai Lama’s own concluding words:

Whether compassion has an independent existence within the self or not, compassion certainly is, in daily life, I think, the foundation of human health, the source and assurance of our human future.

Inserted text from: Houshmande, Z., Livingston, R. B., & Wallace, B. A. (1999). Consciousness at the crossroads: Conversations with the Dalai Lama on brain science and Buddhism. Ithaca, NY: Snow Lion Publications.

What does scale 5-Mf measure?

by Alex B. Caldwell, Ph.D.

    • When I use scale 5-Mf in a report, I am uncomfortable with the descriptors that come to mind.
    • What does scale 5-Mf measure?
    • Hasn’t it changed over time?
    • What do more extreme scores tell us?

Hathaway (see Dahlstrom & Dahlstrom, 1980, pp. 73-75) primarily used gay vs. straight males to develop the scale. Secondarily he used the old Terman-Miles (1936) test, and thirdly (least weight) he looked at adult male vs. adult female item response differences. He sought to measure what he saw as important psychological differences between gay and straight males (in contrast to object choice prediction), and recent neuroscience research looks potentially supportive of his perception.

He did a separate Fm scale for lesbians vs. straight women, but the scale did not work very well, in my hindsight possibly because of a greater heterogeneity of gender identities in the sample of women. For example, Kinsey et al. (1953) showed that gays strongly tended to remain gay, and Hooker studied how difficult it is for gays to shift to (at least outwardly) straight gender roles. Women can shift from sexual relationships with men to women and back – or to being sexually inactive – much more easily than gay males. Hathaway et al. felt it would be less confusing to have a single scale, although four items that report being concerned about sex are scored in the opposite direction. Note that the norms go in the opposite direction (high is resemblance to the opposite gender, so a high score is feminine for men but a low score is feminine for women).

In understanding scale 5-Mf, I find it helpful to distinguish between gender identity, gender role, and object choice. Object choice is highly idiosyncratic and I think often susceptible to chance circumstances in terms of the person’s earliest and often acutely intense adolescent encounters. Self-consciousness and socially problematic behaviors often lead to object choices being concealed. This can also drive discrepancies between one’s role and identity; gender role can be a mask, and it often is. Regarding Mf and possible changes over the years, a followup study showed a high correspondence of the same descriptors showing up for high and low scoring male and female college students after a 30-year period (Todd & Gynther, 1988): there was a remarkable absence of change. Lists of descriptors reflect the expression of both role and identity, but I believe that if there were changes in the underlying gender identities, the behavioral expressions would show at least some changes. I like to think that there has been some increment in the general acceptance of cross-role behavior over time (e.g., unisex clothing, gay marriages, etc.) But to me, the Mf scale is primarily about gender identity, and I see little if any change in this basic underlying dimension.

Regarding interpretation, I propose that the underlying dimension is best summed up as being defined by an orientation toward actions on the masculine end and an orientation toward feelings on the feminine end, construing both “actions” and “feelings” broadly. This helps get us off the hook of such pejorative terms as fag, queer, bull dyke, etc. They hardly follow from the scale anyhow, especially since the use of such words is largely in response to gender role and object choice more than to identity. The lowest scoring males I have seen or known often have strong exploratory urges and individual needs for mastery over nature; they may not understand how women can apparently sit and talk all day. This does not, however, presume the abusive attitudes often attributed to machismo; other scales assess aggressive potentials. Some are well aware of the vulnerability of women, and they can be protective and gentle in a nevertheless very masculine way. High T-scoring males are much more interested in what is happening in the personal (emotional feeling) lives of the people around them as well as their own feelings; they may find hunting, boxing, etc., distasteful if not repugnant.

Adjectives more often descriptive of lower scoring women have included approachable, charitable, emotional, lazy, and tolerant; they have also been seen as worldly, sensitive, and self-dissatisfied. Many seem protective of their rights to the expression of their feminine identities (especially if they have felt exploited or oppressed). Higher scoring women often value physical strength and endurance and may be seen as adventurous, calculating, self-assured, exploitative, and self-confident. They often have tomboy elements in their histories and subtly masculine traits (largely independent of object choice). In one study, by far the strongest behavioral association with scale 5 in women was their answer to the question, “If you had to choose between your mate versus your work which would you choose?” The “action” end chose work and the “feeling” end chose mate.

I see the feminine end of the scale (both genders) as being more strongly associated with verbal expressions of anger and aggression (almost “verbally only” toward the extremes), and that physical expressions of aggression are somehow more a part of the nature of the world for those scoring at the masculine end (lower 5 men and higher 5 women). That would not be a specific prediction of behavior in an immediate sense but rather a shift of potential thresholds. I would add that I think that when low-5 women do strike out physically, it is apt to be more pain- and distress-inflicting (e.g., an old French movie with low SES women working in a laundry-a fight broke out and they were tearing out each others earrings and scratching each others’ faces), and that when high-5 women fight, it is more like male fighting over territoriality and prerogatives. Masculine aggression may often be seen as just enough for an efficient control of a child’s or another person’s behavior, and not necessarily as deliberately mean or cruel or tormenting; the latter, like machismo above, depends on other scale elevations such as 4-Pd, 6-Pa, 8-Sc, and 9-Ma.

Let me close with observations on the highest scale 5 score (female) I have ever seen. A woman came to the psychiatry unit at the University of Minnesota (1950’s) wanting help in order to get married to another woman who also wanted the marriage. At a T just over 90, she reportedly had never worn a dress in her life, she was expert in using tractors, she went hunting alone, and she loved to go fishing alone. Psychologically she was “a man with a practical action problem.” Given a satisfying complementarity of roles, their only problem was they could not have children, but they had accepted that. Unfortunately, the issue was a legal one and not psychiatric, so we were not really able to help her.

References

Dahlstrom, W. G., & Dahlstrom, L. E. (1980). Basic readings on the MMPI: A new selection on personality measurement. Minneapolis: University of Minnesota Press.

Kinsey, A. C., Pomeroy, W. B., Martin, C. E., & Gebhard, P. H. (1953). Sexual behavior in the human female. W. B. Saunders: Philadelphia & London.

Terman, L. M., & Miles, C. C. (1936). Attitude-interest analysis test. McGraw-Hill Book Company, Inc.

Todd, A. L., & Gynther, M. D. (1988). Have MMPI Mf correlates changed in the past 30 years? Journal of Clinical Psychology, 44, 505-510.

What is the K scale really about?

by Alex B. Caldwell, Ph.D.

Rather immediately in the development of the basic MMPI scales – in the early 1940’s – it became apparent that answering such an array of personal items is inevitably subject to biases from the varied attitudes and approaches that subjects take. One alternative was for the clinician to make approximate judgmental efforts to avoid unfortunate over-interpretations of the scores of self-critical respondents as well as serious under-interpretations of the scores of guarded and defensive respondents, but this introduced an unacceptable amount of error in the unreliability of such judgments. They felt, therefore, that the scales they had developed had to have measured adjustments to “correct” for these biases. In a highly regarded and remarkably thought-provoking talk, Meehl and Hathaway (1946; republished in Dahlstrom & Dahlstrom, 1980) detailed their efforts to quantify the potentially distorting effects of such biases and attitudes. In their article, “The K factor as a suppressor variable,” they published the development of all three basic “validity” scales, L, F, and K. Here I will discuss the origins of L and F briefly and then K in more detail.

Where did the L scale come from?

The L scale was an explicitly a priori scale influenced in part by prior studies of honesty in grade school students. For this scale the challenge was to create a set of statements that were: (1) usually too good to be true and (2) rarely responded to by normal subjects. In the aggregate, then, it would be extremely rare for anyone to sincerely answer a large proportion of them in the scored direction. (As a marker of their success, there are only 15 items. I can recall having seen a raw score of all 15 items three or four times in my life from tens of thousands of profiles.) This scale immediately became useful in detecting efforts to “look good” in less sophisticated subjects, but it was quite uneven to ineffective with college-educated subjects. In my experience, it is also confounded by having two contrasting sources: (1) deliberate faking good and (2) a naive properness in less educated persons who may have high, rigid, and literal-minded religious beliefs or other strict personal values. (Because the second of those two contingencies is sincere responding, I never use “Lie scale” as the automatic designation of the L scale.)

What is the origin of the F scale?

The F scale was also a non-criterion group scale. It was the selection of 64 items (60 in the MMPI-2) answered in the scored direction (true or false) by less than 10% of their normal sample, and usually by less than 5%. It is an “infrequent response” scale; the letter F could be understood as standing for “frequency/infrequency,” but that seems clumsy. Better, I believe just to think “the F scale” as well as just “the L scale.” The premise was basically that a large number of rarely made responses alerts us to the fact that something may be going wrong with either the person’s approach to the items or with our tabulation of the person’s responses. This scale can also be elevated by a variety of biases and sources of distortion. For present purposes the following list is what I believe to be the three main operating elements: (1) deliberate attempts to “look bad,” (2) marginal or overtly psychotic ideation, and (3) socioeconomic status/education (discussed below). Secondarily there are such factors as limited literacy, intoxication when taking the test, non-psychotic idiosyncracies of thinking, perhaps mistaken understandings of the instructions, etc.

Besides Potassium, what does K stand for?

The development of the K scale was far more complex. The idea of a non-obvious scale to measure the overall tendency to “look good” or “look bad” seemed reasonably straightforward, but the requirements they put on it made it anything but. These included: (1) They wanted a scale that “worked at both ends,” i.e., effectively discriminated both “looking good” and “looking bad” (many of their preliminary scales worked reasonably well in one direction but less well in the other). (2) They wanted a rigorous empirical selection of the scale items from a large item pool, partly because, “Those items whose significance would not have been guessed by the test-maker will then be equally mysterious to the testee” (see the “K factor article” in Dahlstrom & Dahlstrom, 1980, p. 86). Thirdly, since their goal was a scale to be used to correct their clinical scales for ever-present “look good” or “look bad” tilts or dispositions, they wanted the correction scale not to be weighted with or biased by psychopathology-ideally not at all. This combination of requirements – especially the last – set up a major set of hurdles.

In over two years of intensive work, they developed an untold number of experimental scales (too many, they wrote, to report in detail). There were conscious, i.e., by instructions, fake good and fake bad scales, and they also generated presumptively self-negative scales (functioning normals with disturbed profiles) and presumptively self-favorable scales (psychiatric inpatients with normal range profiles), the latter two sets making assumptions as to the direction of distortion but not as to the extent of conscious intentionality. They finally settled on a 22 item scale derived from a group of 50 inpatients, mostly with diagnoses of “psychopathic personality, alcoholism, and allied descriptive terms indicating behavior disorders rather than neuroses” (Dahlstrom & Dahlstrom, 1980, p. 99) as the best of the lot, although they adamantly stressed that it was the performance of the set of items that mattered and not the group of origin. This 22 item scale was designated L6 as a variation that happened to have a requirement of L at or over T-60. (This scale barely beat out scale N, which was Meehl’s own doctoral dissertation, but which scale contained a bit too much loading of psychopathology.)

This preliminary scale L6 still had a serious defect: a subset of (more or less) psychotically disturbed and severely depressed patients consistently got low raw scores reflecting their very low self-esteem. Therefore, L6 still remained undesirably influenced by psychopathology. From their scales for conscious distortion, they set out to identify a set of items that were not influenced by instructions to fake in either direction. From this set they found eight items that nevertheless discriminated the severely disturbed patients from the normals. These eight were scored in the patient response direction. The effect of adding these eight items to the 22 L6 items was to bring the average patient raw score on the 30 items back up to that of the average score of normal subjects. Thus, the final K scale and the K-correction appear to be no more than minimally affected by the presence of psychopathology.

I would see the goal of the K-correction in effect as being to identify an optimal estimation of what the T-score would have been had the person been straightforward. K then is operating as a threshold for the reporting of self-negative feelings and socially problematic attitudes. The basic clinical scale items to which the high K person responded in the scored direction despite a strongly self-favorable bias (whether conscious or not) would then, in effect, carry much more weight per item, that is, reporting distresses and shortcomings despite a strong reluctance to do so. This is compensated for by adding an above average amount of K. Similarly, the basic scale items responded to by someone with a low K score have a much lower threshold for their admission. These should be given less weight, and this occurs as the consequence of adding a smaller than average amount of K. This has a balancing and I believe beneficially homogenizing effect on who is included in which codetype: it becomes the optimal estimation as to which is the person’s appropriate codetype. Note that the non-K-corrected codetypes would quite often be different from the K-corrected (Wooten, 1984). There is little or no research on those non-K codetypes, and their test results would be much more confounded by test-taking-attitudes than are the K-corrected codetypes.

It was not factor derived; why was it called the K factor?

In the “K factor” article Meehl and Hathaway (1946; see Dahlstrom & Dahlstrom, 1980) went on to a factor analysis of a curious group of clinical and arbitrary “variance analyzing” scales. In this analysis the K scale emerged as central to a single factor with negligible residuals. They then went on to argue that there is too much imprecision in our measurement of personality to sacrifice any accuracy for the sake of internal consistency, i.e., factorial purity. Indeed, they argued that, “From both the logical and statistical points of view, the best set of behavior data from which to predict a criterion is the set of data which are among themselves not correlated.” (op cit., p.117; see also McGrath, 2005). They fundamentally rejected the construction of personality scales on a factor analytic basis, and they concluded, “Since scales are so very ‘impure’ at best, there does not seem to be any very cogent reason for sacrificing anything in pursuit of the rather illusory purity involved.” (op cit., p. 116). To my awareness, this carefully developed argument has never been refuted; instead it has been ignored for decades with endless factor-analytic (high alpha) test construction efforts, up to and including the recent Restructured Clinical or “RC” scales. Note also how few tests based on factorial scales have gained extensive clinical usage in personality assessment. I would urge everyone seriously involved with the MMPI-2 – above all if teaching or supervising its use – to study the K factor article and make their own decisions regarding these arguments. I believe this is the most important article ever written to understand what has made the MMPI so unique (see Dahlstrom & Dahlstrom, 1980, or contact us for a copy).

What correlational properties potentially affect interpretation of the K scale?

The Caldwell clinical data set (1997) is a mixture of clinical cases plus a good scattering of mildly disturbed and relatively normal subjects, a total of 52,543 individual protocols. The sample is significantly overeducated by the census but significantly less so than the MMPI-2 normative sample, of which latter 45% had graduated from college and 18% of the total normative sample had done postgraduate work.

In this data set, the K scale correlated 65 with the socioeconomic status scale (Ss, Nelson, 1952). This K to Ss correlation suggests that approximately 42% of the variance of K is due to SES and similarly, of course, education (note the analyses of this data set in Greene, 2000). The correlation of K with Mp (Malingering Positive, Cofer, Chance & Judson, 1949) was .50 suggesting that about 25% of the K variance can be explained by conscious defensiveness as measured by the Mp scale. Wiggins’ Sd (social desirability, 1959) correlates .28 with K, which might be another 8% except that Mp and Sd correlate .75; their combined contribution to the K variance would be slightly over 25%. The correlations of Mp and Sd with Ss are quite low and thus SES and conscious defensiveness are essentially independent of each other. Curiously, scale R (Welsh, 1965) correlates .30 with K and is negligibly or even negatively correlated with these other three scales, so it is close to a 10% contribution to the variance of K, and it is almost entirely independent of both SES and conscious defensiveness.

An almost totally unappreciated point is that – without our realization or appreciation – K has been correcting for the impact of SES from the time of its invention. The widespread use of reasonably educated and bias-free samples together with their usually middle class or higher SES in most of the negative studies on the K-correction has operated to conceal this function of K In addition, this is consistent with the correlation of Ss with the F scale: an almost startling -.77 (approaching 60% of the variance of F!). This shows how lower SES subjects do not learn what not to say whereas well educated subjects are in effect trained in what not to say as well as what you do say and just how you say it. I believe that understanding these relationships can considerably expand our understanding as to in what ways and how broadly the K factor is crucial to our interpretation of MMPI profiles.

In normal subjects (with no motives to bias their responses) the scores on K are quite stable over time. The longest term of followup of which I am aware is by Leon, Gillum, Gillum, and Gouze (1979) as part of a longitudinal cardiovascular research study. The 30 year reliability of K was .434; of the thirteen basic scales this was exceeded only by 0-Si, 5-Mf, 9-Ma, and 2-D in decreasing order. For the five partial interval retestings, ranging from 6 to 24 year intervals, the reliabilities of K ranged from .502 to .673 with three of the five over .60. Given that these subjects had no incentive to distort, this would be attributable to the effects of socioeconomic status and the emotional reserve of the Welsh R scale; both of these are attributes that one would expect to be reasonably stable over longer periods of time. Circumstantial demands to look good or bad would, of course fluctuate by the occasion, so their impact – had it been present – would have led to much lower long term reliabilities, but the absence of bias here led to correlations that are high considering the lengthy time intervals.

In his discussion of the K-correction, Greene (2000) comments on studies in which K was interpreted as a measure of personality integration and healthy adjustment (p. 95). This is directly consistent with the r of .65 for Ss with K, especially given samples of subjects with no incentive to distort or bias their responses. By many lines of evidence higher levels of SES and education would be expected to be associated with better personality integration. The component of consciously biased responding on K in effect was essentially inactive in these studies. Thus the assertions of an association of K with personality integration and healthy adjustment are validated mainly as a function of SES.

Could they validate the K-correction?

In order to test whether this new 30 item K scale was working as a correction scale, they did several experimental sequences with profiles falling between T-65 and T-80. They regarded this as the problematic or critical range since scores over T-80 would almost always be pathological, and scores below T-65 would be too low to be elevated to an assured level of belonging in the psychopathologic range. They then created four mixed sets of patient plus normal profiles and cut each batch at K over versus K at or below the T-50 average. The hypothesis was that those scoring above T-50 K would disproportionately be defensive patients and those below would more often be self-critical normals. Each of the four sets of data supported this hypothesis; recognizing the major difficulties in fully cross validating such a complex scale, they felt this at least showed that the K scale was working in the direction in which it should.

Is the K-correction still working?

There have been objections to the continued use of the K correction, with statements such as, “The bad news is that the K-correction doesn’t do anything; the good news is that it doesn’t do anything.” The argument then is that it should be abandoned. The main anchoring data of such assertions have mostly been in studies of subjects who responded straightforwardly with no identifiable or consistent-across-subjects incentives to bias or distort their responses. These studies define circumstances in which the K-correction indeed has little to do – especially among reasonably educated subjects responding straightforwardly. These are precisely not the groups for whom the K-correction was designed.

A problem in at least a few of these studies was that the “criterion” has been a rating sheet filled out based on a single session with the subject. The ratings then are based very largely on what the person just said. But the non-K-corrected scale is also what the person just said. This operates as an experimental bias in favor of the non-K-corrected scale. For example, Barthlow et al. extended this to three hours (three sessions) which is a significant improvement in discovering the person’s self-deceptions, role playing, and other sources of a potential misfit of self-reporting. Three sessions is still much less “tuning in” to the subject’s potential biases and distortions than would be expected from a month’s admission to a University Hospital inpatient service, which was modal in the origin of the MMPI. Considering this experimental bias, it is perhaps a bit surprising that in Barthlow et al.’s data some of the correlations with the K-corrected scores exceeded at all the correlations with the non-K-corrected.

Putzke, Williams, Daniel, and Boll (1999) tested 61 patients with end-stage lung disease waiting for lung transplantations with considerable uncertainty whether a lung might become available in time to save their lives. The context defined a strong “pull” to appear psychologically healthy and deserving of priority. The 30 patients with higher scores on K (using a median split) obtained significantly lower raw scores on all of the K-corrected scales as well as Si (K-to-scale overlapping items having been deleted). After K-correction there were no significant differences except that Hs appeared possibly to have been a bit over-corrected. In this setting where there was a clear and consistent incentive to “look good,” their data strongly supported the use of the K-correction. This was precisely the sort of group for whom the K-correction was designed. As the Putzke et al. study illustrates, the K-correction is working very well when and where it should: when the person has a strong incentive to bias the test responses in order to look too healthy or too disturbed.

In civil forensic actions such as child custody and denied employment, there is an almost universal incentive to appear healthy. In personal injury and workers compensation as well as criminal trials, there can be strong incentives to look damaged or impaired. Thus in contexts where such biasing is so consistently present, the utility of the MMPI would almost invariably be reduced without the K-correction: defensive profiles would be underinterpreted and exaggerated profiles would be overinterpreted. If K went uncorrected, then in the resulting confusion as to who was distorting in what direction and how much, the court’s trust of the MMPI would soon be severely damaged.

References

Barthlow, D. L., Graham, J. R., Ben-Porath, Y. S., Tellegen, A., & McNulty, J. L. (2002). The appropriateness of the MMPI-2 K Correction. Assessment, Vol 9, No. 3. 219-229.

Caldwell, A. B. (1997). [MMPI-2 data research file for clinical patients.] Unpublished raw data.

Cofer, C. N., Chance, J. E., & Judson, A. J. (1949). A study of malingering on the MMPI.
Journal of Psychology, 27, 491-499. See also Greene, R. L. 2000, The MMPI-2: An interpretive manual (2nd ed.) Boston: Allyn & Bacon, pp. 97-98.

Dahlstrom, W. G., & Dahlstrom, L. E. (1980). Basic readings on the MMPI: A new selection on personality measurement. Minneapolis: University of Minnesota Press.

Greene, R. L. (2000). The MMPI-2: An interpretive manual (2nd ed.). Boston: Allyn & Bacon.

Leon, G. R., Gillum, B., Gillum, R, & Gouze, M. (1979). Personality stability and change over a 30-year period – Middle age to old age. Journal of Consulting and Clinical Psychology, 47, 517-524.

McGrath, R. E. (2005). Conceptual complexity and construct validity. Journal of Personality Assessment, 85, 112-124.

Meehl, P. E., & Hathaway, S. R. (1946). The K factor as a suppressor variable in the MMPI Journal of Applied Psychology, 30, 525-564. Reprinted in Dahlstrom, W. G., & Dahlstrom, L. E. (1980). Basic readings on the MMPI: A new selection on personality measurement. Minneapolis: University of Minnesota Press.

Nelson, S. E. (1952). The development of an indirect, objective measure of social status and its
relationship to certain psychiatric syndromes (Doctoral dissertation, University of Minnesota). Dissertation Abstracts International, 12, 782. See discussion in Caldwell, 1997b.

Putzke, J. D., Williams, M. A., Daniel, F. J., & Boll, T. J. (1999). The utility of K-correction to adjust for a defensive response set on the MMPI. Assessment, 6, 61-70.

Welsh, G. S. (1965). MMPI profiles and factor scales A and R. Journal of Clinical Psychology,
21, 43-47. See also Greene, R. L. 2000, The MMPI-2: An interpretive manual (2nd ed.). Boston: Allyn & Bacon, pp. 243 & 219-225.

Wiggins, J. S. (1959). Interrelationships among MMPI measures of dissimulation under standard
and social desirability instructions. Journal of Consulting Psychology, 23, 419-427. See also Greene, R. L. 2000, The MMPI-2: An interpretive manual (2nd ed.). Boston: Allyn & Bacon, pp. 98-100.

Wooten, A. J. (1984). Effectiveness of the K correction in the detection of psychopathology and its impact on profile height and configuration among young adult men. Journal of Consulting and Clinical Psychology, 52, 468-473.

What are the arguments regarding the use of the MMPI-2 vs. MMPI-A with adolescents?

by Alex B. Caldwell, Ph.D.

What was the extent of the development of adolescent assessment with the original MMPI?

For the first 24 years of developing my MMPI interpretation system, there was only the original MMPI. There was extensive research showing good, validly interpretable profiles down to ninth grade/age 15. This included an extensive research program (books by Hathaway & Monachesi, 1953, 1961, 1963) based on the testing of over 15,000 ninth grade students in Minnesota around 1950. This included a major exploration of the true longitudinal prediction (rather than concurrent or post-diction) of subsequent juvenile delinquency. There was the work of Marks, Seeman, and Haller (1974), with special adolescent norms, and there was a large number of published individual studies. The score curves by age levels on the F scale show a small but gradual increase from 18 down to 15 but an increasingly steep slope below 15. My system contains a large number of internal adjustments for adolescence (especially regarding psychoticism lest it be too readily presumed to be chronic given the still developing age level). I actually ended up having to start the age adjustment process from less than 22 on down (except 49/94 code where age 21 is already adult).

With the revisions, what are the consequences of using one form or the other? When the MMPI-2 revision committee (Butcher, Dahlstrom Graham, Tellegen, & Kaemmer, 1989) developed the MMPI-2 revision, they decided to separate the adolescents, and they ended up setting cutoffs at up through age 17 or both age 18 and living at home as adolescent (use the MMPI-A) and age 19 on up or both 18 and not living at home as adult (use the MMPI-2). The main issue that the proponents of the MMPI-A (e.g., Archer, 1997, and others) have argued is that the adult form seriously over-pathologizes the teenager – so many get substantially elevated profiles. I have argued that the MMPI-A seriously under-pathologizes the adolescents – a quite sizeable proportion of adolescent psychiatric inpatients get normal range profiles. In truth I think each of us is partly right. My feeling is that the psychological and hormonal turbulence of adolescence really shows in the MMPI-2 scores, but comparing turbulent inpatients to an also relatively turbulent normative teen sample on the MMPI-A makes everyone look comparatively normal. In the end I feel that, with appropriate age allowances, the adult MMPI-2 codetype gives us a better basic understanding of the adolescents’ issues and behaviors as well as considerations as to their origins.

With adults we see long term qualities of behavior in the MMPI profiles, including childhood etiologic factors. What about adolescents? I do think that an adolescent profile should be thought of as much closer to a high speed photograph of a rapidly moving target, whereas an age 22+ adult profile can be much more of a slow speed, in-depth portrait. Such an upper age level is consistent with the early to mid-20’s completion of the myelination of affective cerebral neurons; the inhibitory affective control systems are the last part of the brain to fully mature. This means we must be careful not to project undue stability or fixity over an extended future time to the behavioral implications of a teenage profile. Such parental assertions as, “Sometimes she’s so remarkably responsible, she’s like 11 going on 21,” or “Today I swear he’s 18 going on 12,” reflect how different an adolescent can be from one occasion to another. I think that as events in the adolescent’s life correspond emotionally to prior events, both ugly and beautiful, that those re-stimulations then determine the emotional state of that day (or whatever relatively brief time interval). In any case, my emphasis is that the goal is the most accurate and useful prediction of behavior we can generate, possibly with cues as to what prior emotional episodes are being re-aroused.

Which source of information, then, is more clinically workable? The original MMPI items feel very dated to today’s adolescents, but the MMPI-2 edits make it much more comfortable for them. The MMPI-A benefits from being a bit shorter (which they usually like) as well as having yet a little more comfortable wording for adolescents. Nevertheless, I believe that the very large amount of pattern or code type research on both adults and adolescents using the original version versus so comparatively little configural data on the MMPI-A favors the predictive power of the MMPI-2. This gives us the most useful information from the broadest research base and most extensive accumulation of clinical experience. This is not to say that this is an ideal solution because one still has to remind oneself to allow for the turbulence of adolescence and the greater potential for future change in the results obtained. Considering all this, in my interpretation service I continue to interpret MMPI-2 protocols from adolescents with all the internal age adjustments included. I do not encourage testing below age 15, although bright and mature 14 year-olds can often get profiles that are valid by all available criteria as well as being good “clinical fits.”

References

Archer, R. P. (1997). The MMPI-A: Assessing adolescent psychopathology (2nd ed.). Mahwah, NJ: Erlbaum.

Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989). MMPI-2: Manual for administration and scoring. Minneapolis: University of Minnesota Press.

Hathaway, S. R., & Monachesi, E. D. (1953). Analyzing and predicting juvenile delinquency with the MMPI. Minneapolis” University of Minnesota Press.

Hathaway, S. R., & Monachesi, E. D. (1961). An atlas of juvenile MMPI profiles. Minneapolis” University of Minnesota Press.

Hathaway, S. R., & Monachesi, E. D. (1963). Adolescent personality and behavior: MMPI patterns of normal, delinquent, dropout, and other outcomes.

Minneapolis” University of Minnesota Press.

Marks, P. A., Seeman, W., & Haller, D. L. (1974). The actuarial use of the MMPI with adolescents and adults. Baltimore: Williams and Wilkins.

Validating data for clinical and forensic use of CBTIs

by Alex B. Caldwell, Ph.D.

COMMENTARY ON: Relative user ratings of MMPI-2 computer-based test interpretations
John E. Williams and Nathan C. Weed, (2004). Assessment, 11, #4, 316-329. Caldwell Report will provide you with a copy if requested.

This study set out to do a meaningful competitive comparison of the then eight publicly available Computer Based Test Interpretation services (CBTIs) for the MMPI-2. In a thorough study they answered many prior criticisms of preceding studies, none of which had provided a comprehensive appraisal of all of the available services.

The reports that were rated included protocols from inpatient, outpatient, college counseling, and prison samples. The participants submitted an answer sheet from one of their own clients to the authors, and they received either a CBTI analysis of that profile or else an analysis of a profile that was modal and gender matched for whichever of those four groups to which it corresponded (257 valid protocols were used). The participants then rated the report which they received on 10 variables (they knew it could be their case or a modal profile, but they only found out which after having sent their ratings back to the authors). These ratings were as follows:

1. Conciseness
2. Confirmation of therapist’s impressions of the client
3. Usefulness for diagnosis and/or treatment
4. Accuracy
5. Provision of new and important information
6. Presence of contradictory information
7. Organization and clarity
8. Presence of useless information
9. Omission of important information
10. Appropriateness of diagnostic considerations

The ability to compare the ratings of the actual reports versus the modal reports enabled them to demonstrate that the CBTI reports were adding a large amount of information above and beyond what could be attributed to stereotype accuracy or Meehl’s “Barnum” effect. This latter is the potential for descriptive statements to be considered highly accurate despite their lack of discrimination among individuals (e.g., “Some days are better than other days.”). For clinical purposes this increment is happily reassuring as well as work-facilitating, and for forensic purposes it stands as a strong support of the expectation that the CBTI reports we use really are saying specific things that can potentially make substantial differences in the determinations to be made by the trier of fact.

The averaged ratings were then extensively analyzed by rank ordering the levels of favorability across the eight CBTI programs. The reports by Automated Assessment Associates (Strassberg & Cooper) received the highest ratings on accuracy, clinical usefulness, confirmation of opinion, and diagnostic suggestions. Strassberg and Cooper described their systems as conservative and to be used only in conjunction with other information; conservatism in this predictive context may enhance accuracy. The reports that were offered by Western Psychological Services obtained the highest ratings on being concise and free of useless information. The NCS-Pearson reports rated highest on organization and absence of contradictory statements. Williams and Weed distinguished between their report accuracy versus report style and organization variables; the WPS and NCS-Pearson reports thus topped out on the style variables although not topping out on content (except the authors grouped conciseness with content).

The reports from Caldwell Report had top ratings on the inclusion of new and important information and on not omitting important information (the latter by a wide margin over all of the other reports). I find this gratifyingly consistent with my obvious long-term intent to provide thorough reports that take as full an advantage as possible of the wealth of information that is embedded in the profiles and the other scores. Williams and Weed mentioned a prior study (Adams and Shore, 1976) in which there was a modest relationship between length of a report and its accuracy rating, with longer reports rated as less accurate overall. This makes obvious sense to me in that the more different things one says and the more specific one’s statements, the more opportunities one has to “go wrong.” I would confess some gratification at being third ranked – and close to second – in overall accuracy despite going out on so many “limbs” where my specificity could easily be rated as not “on target.”

Considering the use of a CBTI as a consultation, I believe that providing new and important information and not leaving out important issues is a crucial contribution of the “consultant.” That one’s clinical impression is confirmed is reassuring as well as strengthening of one’s clinical interventions and forensic presentation. But calling attention to what the client may have consciously avoided confronting or unconsciously led attention away from can be a significant gain for the clinician. A psychiatrist with whom I enjoyed working many years ago (Ulrich Jacobsen, M.D.) spoke of using the MMPI either “to confirm or to alert.” Not omitting important information corresponds, of course, to the alert function. I believe that having confidence that the MMPI has been thoroughly searched for overlooked or avoided issues should be strongly reassuring to the therapist or examiner.

How do computerized reports fit into custody examinations, reports, and testimony?

by Alex B. Caldwell, Ph.D.

The computer-generated MMPI-2 report basically describes the patterns of behavior that are characteristic of those who obtain similar profiles. What reactions, what sensitivities, what internal issues, what external interpersonal conflicts, etc., are likely? This is, of course, actuarial hypothesis generation: it alerts the clinician to what to look for, perhaps what to give weight to even if the examinee minimizes the problem. For example, it may alert the clinician as to what may be presented as a relatively superficial problem that could cover over other more uncomfortable issues.

All such statements are probabilistic even though it is not possible to set universal numerical probability values on each statement. Such values would fluctuate far too much when the MMPI-2 is used with quite different populations. My own solution to this is simply to graduate the overall level with such phrases as, “in most cases,” “in many cases,” “in some cases,” “in a few cases,” etc. An example of “a few cases” would be covered over paranoid trends in a profile that is not usually marked by paranoid thinking, but there are signs that this may be an exception. Thus the clinician is alerted to take note if clinically there are such signs.

Distinguishing the Actuarial Function from the Clinical Function The actuarial task is to offer relative baselines for various behaviors. The item responses are entered into a complex computer program that is unaffected and unbiased by any information about the issues being considered or by any information gained by the examiner. The output then becomes an array of hypotheses to which the examiner may want to attend. If a particular report statement happens not to fit and one needs to explain the probabilistic nature of “actuarial” to the court, then one might use the following as an example. The actuarial function is like a professional actuary tabulating the driving records of adolescents with versus without driver training. Everyone can recognize that there are exceptions (i.e., wrong predictions) – some with driver training are still poor drivers, and some without are nevertheless good drivers. But the point is that, on the average, one group has a different record from the other, and the size of this predictable difference becomes an element in setting their insurance rates. That an individual prediction in an actuarial MMPI-2 report does not fit does not take anything away from the fit of the other predictions, given that a good preponderance do fit. So far, Meehl’s prediction has proven right: the whole body of research on statistical vs. clinical prediction remains an amazing 100% in favor of statistical/actuarial predictions as equal to or exceeding clinical judgment. The clinical function is to accumulate all the available information that one can obtain that is relevant to the determination to be made by the trier of fact. An important part of this can be the testing of the hypotheses based on what has been observed with similar MMPI-2 results. The probabilities are hardly 1.0, so to become practically meaningful they must be verified via interviews, observations, other records, etc. This process is, of course, vulnerable to accusations of bias and selectivity. But as noted above, the actuarial predictions are generated solely from the individual’s item responses and such demographics as age, gender, marital status, or years of education: the computer-generated actuarial characteristics cannot be biased by any clinical information about the person. Thus, whenever the objective predictions are clinically documented to be accurate, they clearly were not originated by observer bias; this strongly supports the objectivity of the examiner. One of my refrains is that to focus on the convergence of the clinical and the actuarial data can enable the most clearly objective and least challengeably biased presentation of one’s opinions and recommendations. A friend recently had an opposing psychologist witness assert that he had no need for nor use of computer-generated reports because “they do not take the person’s circumstances into account.” This is as sensible as saying that a car was poorly designed because it cannot fly, i.e., that it cannot do something for which it was in no way designed. This deceiving dodge was straightforwardly explained to the court.

How are computer-generated reports to be used in the preparation of a custody examination report? How the information from the computer-generated or actuarial report (CBTI or Computer Based Test Interpretation) is to be integrated into the final report is ambiguous. Specific rules or even recommendations have never been formally specified, and consequently individual practices vary considerably. I am not presently aware of any courts having taken any precedent-setting actions on this issue. I only occasionally see the final clinical reports that have made significant use of my (or other) CBTI reports, so my comments are in part based on feedback from practicing examiners. In my awareness, this use has ranged from paraphrasing, to copying a few words, to entering whole intact paragraphs, or to appending the entire CBTI. I do have a concern that copying extended passages, especially whole paragraphs or more with no recognition of the source, might be found misleading by members of the court. I am aware that some professionals may not want to indicate the CBTI source out of concern that they will be forced to produce the entire CBTI report. There may be statements in it which they do not want to be forced to explain or defend, e.g., a serious diagnosis listed in my “Diagnostic Impressions” section or a diagnostically serious discussion elsewhere in my report (see discussion below). Ideally, this should largely be a false fear as I will discuss, but aggressive cross examining attorneys can find ways to make what should be straightforward become tortuous if not torturous. I believe my responsibility is to provide the most accurate, complete, and useful test analyses that I can. The refinement of the material in my interpretive system has proven a lifelong task. For many obvious reasons it is not realistic or even possible for me to “police” how my reports get used beyond doing my utmost to be sure that the clients to whom we send reports are appropriately licensed professionals. The policing of abusive uses must be done by state professional ethics agencies, the A.P.A., or the courts in which they appear.

How can one deal with strong clinical and diagnostic statements? The issue of strong diagnostic statements and possibly formal diagnostic entries with serious implications in CBTIs merits specific comment. In part, their presence in our narrative reports reflects the predominance of clinical cases (often psychiatric inpatients) in the evolution of MMPI and MMPI-2 interpretation as well as being the MMPI’s strongest historical area of application. Much of the original interpretive data came from such settings, although the test has been used with hundreds of thousands of individuals in non-clinical settings. In the custody examination context, for example, such diagnostic CBTI content can best be understood as a possible vulnerability. That is, it is reasonably interpreted as reflecting trends or outside potentials in the person’s makeup. With relatively unelevated profiles, this is essentially the assertion that if more adversities were to befall the person and the person’s life were to go seriously downhill, the categories mentioned would be the most likely “summary labels” as to the direction(s) in which the person’s deteriorating emotional state would evolve and be seen. Assigning a formal diagnosis is not an actuarial function; a formal diagnosis is a clinical opinion based on a hopefully wide range of information. In treatment settings, the actuarial function is to contribute to differential diagnosis by alerting the clinician as to what labels are most commonly associated with psychotherapeutic clients and psychiatric patients who obtain similar patterns on the test (see Caldwell, 1996). It can sincerely be the custody examiner’s straightforward opinion that the multiple requirements for making such a diagnosis are not met in this immediate instance–it could be possible if everything got a lot worse and fell apart for him/her, but such a more extreme state is not now the case. On this basis it would be quite legitimate to dismiss the possible identified diagnoses as largely or even entirely irrelevant for this person at this point in time. Note that whenever the pattern is within the normal range, my reports explicitly state that fact. The diagnostic statement then almost always starts with such phrasing as, “Among psychotherapy patients . . .” and is followed by a statement that the normal range profile may reflect no more than an essentially normal personality or else a situational adjustment reaction (overall a large majority of subjects who obtain normal range profiles are indeed functioning individuals). In some cases with atypical or highly defensive profiles, an additional normality-qualifying statement may comment that the profile is within the normal range but more ambiguous than most because of the degree of defensiveness. This latter in part recognizes the fact that a significant minority of psychotherapeutic client profiles (including psychiatric inpatient profiles) are nevertheless within the normal range (denial and defensiveness, milder problems that benefit from working through, lack of self-awareness, etc.).

How does the CBTI connect to the concluding opinion? Damaging one’s credibility through attributions of bias is, of course, not an infrequent effort in adversarial custody examination proceedings. My belief here is that the use of CBTIs as non-case-biased sources of information can be very helpful in anchoring one’s objectivity and credibility. By emphasizing the hypothesis-generating or “alerting” function of the CBTI as to what are likely to be problematic issues for each of the litigants as parents and in relation to each other, the examiner can start from an uninfluenced and objective basis from which to develop recommendations. The available MMPI-2 interpretive data are not readily organized for searching in depth on a codetype-by-codetype basis (beyond textbook summaries). It would take many hours for the clinician to make a thorough search of the data sources for each profile considered, and the clinician’s own search itself might be made to look selective or biased. By pointing out that MMPI-2 interpretation is a very complex undertaking, it then becomes quite reasonable to the trier of fact for the examiner to consult an expert who has spent his career working on the task. Using the Caldwell Report Custody Report (the interpersonal implications of the MMPI-2 test results) and the Caldwell Report Narrative Report (the intrapsychic processes of each individual) clearly conforms to the nature of an expert consultation. In summary, I believe the direct discussion of the convergence of the clinical data with the actuarially-generated hypotheses can add a strong element of objectivity and logical flow to the process of exploring the particular person’s characteristics as a parent as well as maritally if not more generally. My impression is that the courts typically find this objective anchor to lead to substantial increments in the credibility of the opinions and recommendations provided.