ISSN 1466-4615


Lazslo Tarnay

The Rear Window of Essentialism



Noel Carroll

_Theorizing the Moving Image_

Cambridge: Cambridge University Press, 1996

ISBN 0-521-46049-2 Hardback

ISBN 0-521-46049-5 Paperback

426 pages


This thick book by one of the leading film theorists today brings together 28 essays on various topics, the common thread being that they all belie, propose, or vehemently argue, a kind of cognitivist approach to producing, experiencing and theorizing films, or as Carroll likes to stress: the moving image. The essays are grouped into seven chapters, avowedly on a rather subjective, almost aleatoric, basis, each gravitating around an intriguing, sometimes haunting, problematic such as defining cinema, editing, the use of visual metaphors, non-fiction, ideology and rhetoric, or the history of film theory. There are added polemical writings scattered through the chapters and a final chapter reproducing Carroll's earliest attempts at answering some or all of the problems mentioned. The loose structure may make it difficult for the reader to follow the cross-references, but the method applied, the 'piecemeal theorizing' is meant to be a just compensation for the difficulties. If the chapter-editing knife might have produced embarrassing flashback and flashforward cuts, the hand which weaves each single essay, even the last ones, succeeds in creating a transparent, yet enduring and tightly-knit tissue.


Piecemeal theorizing does not result in missing links or in an incomplete mosaic, but (i) in a clear stance and (ii) in a fully-fledged argument against all medium-based, essentialist and ideology-centred functionalist approaches. Carroll uses so many disclaimers that he seems to have displaced any potential charge of being one-sided, discriminating or exclusive. On the other hand, he touches on so many aspects of the moving image that there remains hardly any empty slot for some new idea, unless it were polemical, and hence inevitably more exclusive and essentialist than Carroll's. But as it is common in any theorizing, what is merit in a certain perspective may appear as drawback from a different angle. The very effort at inclusiveness may result in a loss of explanatory value. This is rarely the case with Carroll, however, he provides us with brilliant case analyses, for example, of sight gags, editing or film metaphors. The case is rather the opposite: inclusiveness does not merely imply extension, it goes with the aim of generality, hence the need for intension.


But intensional inclusiveness is bought at the expense of simplifying the issue in the sense that Carroll uses or takes over concepts from other disciplines, which may explain his data, but which are in turn far from clear, or are too simply postulated. The use of the concepts of relevance and salience is a neat example. Although they may appear very commonsensical, they were once introduced as unexplained primitives in cognitive rationality or cognitive linguistics in order to cover up the failures of those theories. (cf. the adage in the relevance theory of Sperber/Wilson that every utterance comes with the guarantee of relevance, or the idea proposed by T. C. Schelling or M. Gilbert that the choice of a given strategy, for example, of crossing a bridge among a series, is salient for no theory-specific reason but because, say, there is a bird sitting on it.) And they function the same way in Carroll's theorizing. So much so that together with other terms such as 'appropriate' (cf. his asking what makes certain devices appropriate -- instead of essential or defining -- to the movies) or 'larger context', they may easily reiterate the problem of essentialism: no proper treatment of movies can dispense with them, let alone the fact that they call for *essential* clarification. Methodological essentialism, indeed, but essentialism anyway. I am not saying this must be necessarily wrong, provided that the basic ideas are clarified. While Carroll is against unified theory, he is out on the field with a unified method. Yet 'unified' here amounts to the use of loose -- cognitivist -- language, rather than building on firm ground.


When arguing against defining medium by means of specificity, Carroll says: 'it is the use we find for the medium that determines what aspect of the medium deserves our attention. The medium is open to our purposes; the medium does not use us to its own agenda.' But in order for this argument to go through Carroll will have to identify medium across different uses. And how else can it be done if not by means of specificity? It is the much-debated equivalence or equivocation between meaning and use in post-Wittgensteinian philosophy of language. Are spoken and written poetry different uses of the same medium? If yes, then Carroll reduces the musicality of poetry to an exploitable trait, and he has to state what can be so common in a minstrel's song and a Walt Whitman poem to be ranked together. If not, then he will have to answer why a film predominated by long shots should not be considered as a different medium from a film based on editing? Moreover, how could he treat video image and film as the same medium then? Either he can only speak of uses of indefinables, call them media, or he is in need of definable criteria. Since Carroll is against hunting for distinctive features, he must vote for the first alternative: but how are we to understand his use of 'medium', something awaiting to be exploited? How can it be identified? A possible escape route could be prototype theory, which he does not mention, and rightly so, for it still makes a demand for basic features.


He says instead: 'Traits are only significant vis-a-vis uses.' But what is the difference between trait (meaning) and use? Is the *trompe-l'oeil* technique a trait or a use? Carroll would have it that pictorial depth is a trait (aspect) of the medium. But to see depth on a flat surface is rather an aspect of our sight: a painter using that technique does not exploit conflicting aspects of the medium, rather he draws upon the imperfectness of our sight; pictorial depth does not reside in the medium; if it does reside in anything, then it is the representation that the medium is a vehicle of. There is no problem saying that new uses are discovered or re-invented, but Carroll wants us to believe that there may be a gap, or even a contradiction, between what a medium does best and the effects it excels in. Yet he does tell us how to define excellence. Does it mean achieving a given effect or subserving a given purpose? If the first, his tool analogy does not work, for even a tool designed for a special purpose may be re-adopted evolutionarily to a new use; I may use a hammer for bending the branch of a pear tree to reach the fruit. Will it mean that the hammer as a medium excels in collecting fruits? If it is the purpose that determines excellence, then we are on the right track toward piecemeal theorizing. In a footnote Carroll admits that he takes excellence of the arts 'in the terms of one thing that they do compared to other things that they do'. But how should comparison be made here? So, following again the tool analogy: does hammer excel in hitting nails more than in reaching for fruits? If yes, will our answer not rely on some inherent traits of the hammer? Else, how can we compare two purposes, two lifestyles, which are like artforms, as Carroll wish, without the recurrence to some system of values? But what are these values, if we renounce essentialism?


Carroll argues against re-presentationalists (another kind of essentialism) by distinguishing three things: physical portrayal, depicting (kind), and nominal portrayal (proxy) cuts deep, but he goes too far. For to say that 'the relevant types of representation we observe in photography and cinema are not a *function* of the ontology of the photographic image but the purposes we have found respectively for still and moving photography' (emphasis mine) is to suggest two things: (i) that Citizen Kane could represent nominally *without* the physical portrayal of Orson Welles (or any other actor) and (ii) that the fictional world we construct on the bases of the proxy offered by the film is *not* dependent on physical representation. Both may be turned against Carroll's own belief in exploiting possible traits of the medium. The new uses of the media adopted to cultural purposes in this case are a function of the old -- representational -- ones. It is not like the hammer case where the use for reaching fruits is indeed not dependent on the use of hammering nails, but it is dependent on the trait of, say, having a handle necessary for hammering. Are not medium and genre confounded here?


Thus Carroll comes to the definition of moving images. If an artform is not equivalent with medium specific traits and it may involve more than one medium, we need an independent criterion to define artform in advance. That sometimes style or use determines medium does not imply that the structure of the medium never determines style. Before giving his own definition Carroll tries to refute the photographic realist by revealing a slippery slope in her argument of transparent presentation. But his refutation suffers from a couple of mistakes. Even if films may involve other media (digitally synthesised image) than photographic image, it is no argument against *photographic* realism as such that it can directly present its object. Similarly, the fact that Braque was unaware of a squirrel in his painting, or the existence of aspect change (the duck/rabbit case) does not tell against direct representation in *another* medium, unless it is presupposed that both media share the property of indirect representation. But that would amount to a *petitio principi*. That two things share a property does not mean that they are of the same type.


Here we find Carroll playing out one medium's characteristics against an other's. The reason being that an artform may involve several different media. This may well be true, but it cannot be used for arguing that there is a common basis, namely modelling, for comparing and transferring their properties. 'Seeing-As' is a property of pictures or shapes which can only be transferred onto photographic images if we take the latter as models. To use Carroll's example: the fact that we can mistake an image of a garage roof for the roof of the house does not prove that photos are not direct representations any more than the fact that we can mistake perceptually one object for the other. I do not claim that the photographic realist is necessarily right, but only that in order to refute her we need to draw upon the idea of depicting or nominal representation. And the fact that photographic images are *detached displays* does not tell against any of the two conditions Carroll imposes on transparent presentation. It is also highly dubious that looking at a star through a telescope one can really orientate himself in *the* space of the observed object, while on the other hand looking at a monitor screen (which digitally synthesises images) set up in a building or shop may well contribute to spatial orientation. Thus it is not the distinction between prosthetic devices and synthesised images on which spatial discontinuity or homospatiality can be based. It is the function or the use of the device that determines if it is the transparent presentation or the indirect re-presentation of an object. Consider, for example, the case of stereoscopes: how can we know by simply looking into it that the picture we see is a detached display or an aid to spatial orientation?


I also doubt that the use of still images in a film is a stylistic effect *because* it is technically possible for it to move; for it is to confuse the movement of images with the representation of motion. In fact, seeing a film consisting of still images or freeze frames is *eo ipso* seeing moving images: it is made possible by means of the projection of images. But it does not *represent* motion. The stylistic effect lies in its *relation* to other projected images in a sequence. It is how we cognize that relation, not that we expect it to move in some future instant. Carroll is confusing a property of the medium (i.e., moving in the sense of being projected) with an element of the representation (i.e., motion). The reason may be his anti-semiotic stance in which he does not distinguish between sign vehicle and its possible *representata*.


Carroll cites five necessary conditions on what he terms the moving image, which I do not reproduce here, for he admits that 'we don't derive any deep insights into the effects of movies or into film style by contemplating these five conditions'. It is because of the inter-animation of the arts with respect to their media. And it is what compels one to piecemeal theorizing. It also shows where the attempt at inclusiveness leads: the emptying out of meaning. By focusing on media, we have known very little about what an artform consists in. But what is the rationale for establishing them? I guess that Carroll takes all that burden because he wants to fight classical theorists. I do not deny that his five conditions are sensible and I do not claim that classical theorists are right in sticking to their sort of essentialism. But I think that the argumentation by which Carroll arrives at his conditions suffers from major faults. Hence I conclude he did not succeed in defeating realism in the sense of doing away with it completely. He did not prove that cinematic image *cannot* be realistic.


But leave it as that, since Carroll certainly takes the identity question as a structuralist fallacy when he blames the medium specificity theorist for equating medium purity with aesthetic quality, and in contrast he votes for the inter-animation of the arts. And piecemeal theorizing is well adapted to such a situation: when all theories are displaced, it is still open to conduct critical analysis of cases which show only family resemblance, hence necessitating a constantly changing viewpoint: what one film excels in, for example, deep focus or long takes, may be irrelevant in an other with an intricate plot. With the question of genre put aside for the moment (it reverberates throughout the essays as the question of norm) the common core of Carroll's approach is spotlighted. And it is a notoriously cognitivist core.


The cognitive bent is particularly noticeable in 'The Power of the Movies' where Carroll convincingly argues that films 'are immediately accessible to *untutored* audiences in every corner of the world', the reason being that object recognition and picture recognition develop in tandem. Pictorial representations refer by means of displaying certain resemblances to their referents and to recognize such resemblances is a biologically wired ability. Thus movies, although cultural products (inventions), need not be conventional and do not require decoding as Metzian semiotically inclined scholars would have it. Although recent psychological evidence seems to support essentially Carroll's claim, it is more proper to say that children learn to recognize similarities first, rather than objects, and it is the similarities between external relations of objects and their internal representations that help to recognize pictures (cf. Edelman) not the fact that people have actually encountered those objects. Else it would be either circular, or contradictory, for them to recognize pictures of things they have never met in their lives, as Carroll in fact want them to. Merely assuming a similarity between the picture and the pictorially represented object is not enough. It is the modelling of such external similarities, rather than of objects, that help us to recognize the pictorial representation of a Martian today or of devils in the Middle Ages. Whether such recognition does not require training I do not want to seriously dispute here, but to say, as Carroll does, that it does not involve inferencing is highly questionable. Here, I think, Carroll lumps together the semiotical and computational theories. Is he so sure that a pygmy who has never seen planes would recognize one if shown a picture of it? Yes, he might recognize it *as* a bird or bird-like thing but not as the object that he would recognize it, were he familiar with the function and the construction of planes. And would he not have to *learn* that as he learnt that flying things are birds? The capacity of recognising pictures in general may well be inbuilt, but to recognize particular pictures may be a developmental result.


But the basic problem lies in Carroll's so-called erotetic approach invoking question-answer chains. To conceive of filmic narrative as a process of answering micro- and macro-questions recalls the Rumelhartian model from the seventies concerning the narrative structure of folk tales. It is so easy to say that the implicitly or explicitly posed questions in cognising filmic narratives determines what is *relevant* to the spectator, while his eyes are controlled and confined by framing and camera movements to what should be visually *salient* to him. The two conditions, relevance and salience, however, must be co-ordinated! But how does such a co-ordination happen? Carroll owes us an explanation of how processing narratives and processing moving images are internally or externally related. Of course, if we focus on how our perceptual apparatus works in normal conditions, we could simply state an equivalence: for example, what is foregrounded is always more salient than the background, since we are evolutionarily wired to react to an attacking enemy than to the far away hills. Analogously we are more eager to know the identity of the murderer than, say, how he spent a day five years before the supposed murder. But it may rightly be the opposite in many situations, and not only in arts, like when we are enjoying or scrutinising the view from the hilltop, or if that day five years ago is *taken to be relevant* to the identification of the murderer. Salience and relevance do not explain anything but are presupposed! When contemplating Brughel's painting, The Fall of Icaros we have to know in advance the myth *before* focusing our attention: it is not simply visually salient.


Of course Carroll would say that these are deviations from, or subversions of, the norm, rather than *sui generis* cases. But this would only show that he takes the classical Hollywoodian movie as a prototype because it comes closest to what he takes to be the norm of real perception or cognition. That may rightly account for the power of the movies over the untutored audience, or the suspense in certain films. Yet why should the question: 'Will the parents really pay for the kidnapped child?' be an unexpected alternative to Carroll's 'Will the child be saved or not?' The classical conflict of action films of this kind is generally between desperate parents willing to pay almost any sum and the 'sober' police willing to withhold the money from the kidnappers. I am not arguing particularly against the application of question-answer chains to the processing of narratives, but raising the question whether underground movies differ only in our background knowledge and narrative expectations rather than in the *mechanism* of understanding. If it is so, then Carroll's emphasis on his delineating a *genre* but not the essential characteristics of films is misplaced -- since by delineating a genre he in fact outlines the basic general mechanism of understanding and producing films, which we may dub 'cognitive essentialism'. Or else: underground movies are not simple deviations from the norm, but they rely on a different mechanism in which our normal perceptional apparatus meets its Waterloo: salience is no longer determined by, or co-ordinated with, pre-existing narrative -- linear -- structure. To understand such a lack of co-ordination requires a cognitive process that need not coincide with the evolutionary stable rational cognitive strategies. It could be a problem of survival for underground art, but it also points at a crack in the hidden -- normative-generic -- essentialism in Carroll's approach.


Carroll's cognitivism is particularly strong in explaining POV shots. He continues to rely on the analogy with ordinary perception: POV shots are ways to track a glance to its target. He rightly points out, with reference to Davidson's principle of charity, that to understand a shot as one from somebody's point of view does not imply identification with her. Just as understanding an argument is not accepting it. One may simply entertain an other, let alone contradictory, one. But then seeing through somebody's eyes must entail seeing from one's own, even if it is through one's mind's eye rather than through perception. I may see how a murder sees, and still see differently. But what if there is no 'other' way to see the situation? I mean, the image is so strongly one-sided that it obliterates somehow all other viewing possibilities. Since Carroll equivocates between the informative and the communicative intentions (cf. Sperber/Wilson's approach) he cannot distinguish between ordinary information-gathering -- the fact that edited shots are supposed to communicate by relying on the spectator's information-gathering -- and the blending of the two. The sequence of a glance shot and an object shot may well represent tracking information, but it may also communicate that there is no other way in principle to see the situation. In visual arts sometimes sight may be exploited as the only source for knowledge, although the two -- as Metz pointed out -- never coincide. Once again Carroll may retort that it is a deviation rather than the norm. When I am made to see though a murderer's eye I am informed of what he knows; but when I am made to see through, for example, the child's eye in Rossellini's _Germania anno zero_, I am communicated that, in war, one sooner or later cannot but steal. I am made to identify with his point of view. POV shots utilise the ambivalence between informative intention and communicative intention (cf. Sperber/Wilson): if I read somebody's hidden diary I am informed of her feelings, but if I find it on the kitchen table exposed to my sight the same feelings may be taken to be communicated to me by her. Certainly I may disagree with her, but it is not open to disagree with (the point of view of) the child in Rossellini's film because his (eye)sight is the only way we can get a glimpse into the fictional world. Any other opinion would be a gross violation, hence a misunderstanding of the filmmaker's vision.


Carroll's approach to sight gags and to metaphoric images plays out the idea of incongruity in interpretation. The humorous or metaphoric effect is caused by switching from one interpretation of an object, event or gesture to an other. I think he again oversimplifies the issue when he cites two reference points -- visual and verbal incongruities: the duck/rabbit drawing, and the punchline of jokes (although many jokes are based on *double entendre* when both interpretations develop in tandem without the incongruity). On the one hand aspect changes are different from either humour or metaphor in the sense that you cannot see both figure at the same time, while one can see that Chaplin is eating shoelaces as if they were spaghetti. On the other hand metaphors -- at least under some interpretations -- need not require that the other meaning be present. The cognitivist bent leads Carroll to say that verbal images are a pervasive means of cinematic communication when the images are such that they evoke certain strings of words -- that is they 'translate' verbal metaphors. But why should the 'extended meanings', as Carroll puts it, exist already in language? They may well be describable linguistically, but I doubt that the cine-similes of vagina and window by Brakhage, or the juxtaposition of lowering canons and the rack of soldiers by Eisenstein derive their force from some pre-existing 'string of words'. The 'selective affinities' of the compared terms, their common conceptual basis (that 'arms industry figuratively crushes the life out of the common man') have no verbal metaphoric equivalent. Why call them *verbal* images? Rather they derive their force from the blending of conceptual spaces (the idea is of Turner/Fauconnier): if there is any similarity with language here, then it is that they convey a metaphorical shift to solve the incongruity. That the incongruity is describable linguistically, as, for example, 'gossips are like hens', does not make them 'linguistic' or verbal. (Note that in contrast you cannot say 'ducks are like rabbits': there is no conceptual incongruity here, if there is one it is perceptual.) Thus, by highlighting the inferential process in interpreting images, Carroll is unjustly trafficking between the visual and the narrative area, as he does in the cases of salience and relevance.


As for visual images where non-compossible parts are presented as constituting the same entity, Carroll fares much better in delineating the category of *core filmic metaphor*. He points out that such metaphors need have a heuristic value which consists in exploiting relations between the source and the target domains. Yet with a sudden twist he adds that filmic metaphors are more like linguistic metaphors in that they both carry the suggestion of an identity relation. But the case is rather the opposite: if there is any analogy in filmic and linguistic metaphors it must be the inapplicability of the identity relation. For Carroll homospatiality is like the syntactic use of the copula in language. But when I say 'John is a pig' I do not draw upon any identity other than the purely syntactic unity. It is as if Carroll were playing into the hands of Metzian theorists by implicitly pointing at the possibility of visual syntax. And even paradoxically so, since it is one of his criteria that the incongruity in filmic metaphors cannot be interpreted narratively.


Carroll's methodological formalism is the strongest in his approach to cinematic 'ampliation' of impossible causation and polyvalent montage ('Ampliation is the creating or establishing of a movement onto the second object of the already existing movement of the first object'). It reveals more than anything else the force of editing. That we can see causation is another effect of filmic editing. That we see impossible causation 'results as a particular quality of *causal* agency in the cutting'. I would add the first is *representational*, while the second *ideological-rhetorical*. Since causation implies fused movement, to speak of impossible causation as a means of ampliation may be confusing. For it may represent a causation internal to the fictional world, or it may reiterate an external 'impossible viewpoint' kind of ideological commentary. I call Carroll's approach formalist because he does not distinguish between the two uses of editing, but he rather points to their -- technical -- common core: fusing. He even lists Eisenstein's formal categories of conflict under this rubric. Once again essentialism looms large. Although he adds that ampliation finally depends on the larger context of the film, we do not learn much about what that context is.


To sum up: Carroll may be seen as a sceptical cognitivist in film theory, since he is not very enthusiastic about the analogy between mental processes (memory, focusing attention, etc.) and filmic means (flashbacks, close-ups, etc.) because we know very little about how our mind works. Yet I don't see why certain filmic means cannot *exploit* mental processes, despite their structural differences, so I find his argument against functionalism wanting. He points out that the world of fiction is not discontinuous with reality, while mental processes connect us to outer space, time, causality -- that is, practical ends or action. Yes, but I again don't see how these two arguments refute the film/mind analogy, even if it does displace Munsterberg's theory of film as an artform. In fact his one-sided cognitive idea of inferencing by means of question-answer chains itself exemplifies mental processing as the backward-forward projecting of film. This then reverses the analogy: just because we know more of films and inferences, the working of the mind may be understood in terms of close-ups, forms of editing, etc., rather than the opposite. Carroll's piecemeal theorizing ends up in methodological essentialism or formalism not so much because he applies a cognitivist core to the moving image but because he does it by obscuring the differences between the following: medium and artform, code and inference, informative and communicative intention, the visual and the narrative, relevance and salience, signifier and signified, representation and ideology or rhetoric. (For example, he unduly deprives the code model of its force, although there is no inferencing without a code, yet the function of the code is played by automatic visual perception, which is by no means so automatic, for itself is based on learning, hence both decoding and inferring.) To keep up some or all of these differences, however, would inevitably lead to messing with theory, something that Carroll clearly wants to avoid in his theorizing.


Janus Pannonius University

Pecs, Hungary





Edelman, G. [1997] 'Representation is Representation of Similarities', a draft version circulated as a BBS target article.


Gilbert, M. [1989] 'Rationality and Salience', _Philosophical Studies_, 57: 61-77.


Gilbert, M. [1988] 'Rationality, Coordination, and Convention', _Synthese_, 84: 1-21.


Sperber, D. and D. Wilson [1986] _Relevance_. Oxford: Basil Blackwell.


Schelling, T. C. [1960] _The Strategy of Conflict_. Oxford: Oxford UP.


Turner M. and G. Fauconnier [1995] 'Conceptual Integration and Formal Expression', _Metaphor and Symbolic Activity_, 10 (3): 183-204.







The Rear Window of Essentialism

_Film-Philosophy_, vol. 1 no. 6, September 1997



Copyright © _Film-Philosophy_ 1997




Save as Plain Text Document...Print...Read...Recycle


Join the Film-Philosophy salon,

and receive the journal articles via email as they are published. here


Film-Philosophy (ISSN 1466-4615)

PO Box 26161, London SW8 4WD, England



Back to the Film-Philosophy homepage