a. Scott McCloud and Neil Cohn - Webcomics, Comics Theory and Multimodality …

1.4. Webcomics, Comics Theory and Multimodality …

1.4.3. a. Scott McCloud and Neil Cohn

Multimodality is a crucial component in comics theory, extending towards comics functioning in the digital sphere. Depending on the modes and media used, certain collaborative meanings are made, meanings which are further dependent on how the reader interprets them. As McCloud (1993) wisely points out in Understanding Comics, “an equal partner in crime is the reader” (p. 68).

As such, naturally, it would be important to delve deeper into multimodal comics theory;

often, comics are seen as a kind of language, a vocabulary, a grammar, and one that is wholly dependent on the reader’s background knowledge and presuppositions. It is here that Gerard Genette’s, Neil Cohn’s, Thierry Groensteen’s and Scott McCloud’s detailed multimodal theories pertaining to comics become important in not only understanding the comic medium, but also being able to further apply it to the webcomic realm.²⁸The following approaches were chosen as they were seen as most relevant to the query of comics and multimodality in this subchapter.

1.4.3. Multimodality and the Comic Theories of Scott McCloud, Neil Cohn, and Thierry

(2) picture specific, where the image dominates, and the text does not add immensely to the imagery, i.e. the only text in a panel is an onomatopoeia.

(3) duo specific, where the image and text transfer the same information, thus supporting one another in an obvious way i.e. the image shows a boy jumping into a puddle, and the text reads: “I jumped into a puddle.”

(4) additive, where the text strengthens the image, or/and the image strengthens the text, i.e.

there is an image of a woman smiling, and the text in a speech bubble reads: “I am over the moon!”

(5) parallel, where the image and text go in different directions but do not intersect, i.e. there is an image of a field, but the text is a dialogue between businessmen.

(6) montage, where words are incorporated into the picture and not separated by speech bubbles or invisible outlines, i.e. in a panel, a streamline of thoughts coming out of a characters head is presented as a jumbled spiral.

(7) interdependent, where image and text are codependent on each other, showing an idea together that they could not show on their own, i.e. a character says “Let’s go over there!” and the image shows a path into the forest. McCloud calls this combination the most common one, and also notes that interdependent combinations are not always equally balanced.

As can be seen, McCloud singles out the basic semantic relationships between text and image; the taxonomy is enlightening in how both modes can interact. Furthermore, it provides a solid basis to further study the interaction of image and text, aiding in e.g. attempts to build upon the comic multimodality theory, and allowing for a simple understanding of how image and text intertwine in order to produce meaning. This taxonomy can also be an important basis for analyzing how panels work in webcomics theory; as is known, most webcomics belong to the non-enhanced category, allowing for McCloud’s word-image combinations to be applied to many forms.

Such a detailed observation can aid in analyzing webcomics both as a literary form and within the context of translation. For instance, one can observe that the panels in Ursula Vernon’s famous webcomic, Digger (2003-2011), are largely interdependent, pointing to a mature use of the comic form. At the same time, plenty of examples can be found of either word specific combinations, where the narration of the main character takes on a dominating form, or picture specific combinations, where the image dominates. Such descriptions of interplay can allow for the choosing of a proper translation strategy, should the need arise. This is because McCould (1993) also considers the comic medium on the level of the page-for instance he discusses the concept of closure, that is “the phenomenon of observing the parts but perceiving the whole” (p. 63), along with the concept of the gutter in comics. However, McCloud does not expand upon the text-image relationship in a greater way, and neither does he consider the concept of the sequence on a wider

scale. It is at this point that Neil Cohn fills in the metaphorical gaps, analyzing the textual and visual relationships in modern comics across longer panel sequences on the basis of multimodal and language theory.

Cohn (2016) has pointed out that a number of approaches and theories that discuss the relations between text and images are lacking; he specifically notes that many do not approach the visual/textual relationships from a cognitive point of view, or from the point of view of language (p.

304). Cohn (2016) notes that McCloud’s taxonomy is a useful foundation for a discussion of text/image relationships, and interestingly notes that McCloud indirectly implies the existence of the panel as a so called “text-image unit” that interacts meaningfully with other panels, which in turn are also text-image units (p. 305). However, Cohn (2016) does criticize the taxonomy for the fact that it “cannot account for certain contrasts between multimodal interactions” (p. 305). Cohn says that panel sequences vary in multiple ways which McCloud’s taxonomy simply cannot address or characterize, mostly due to the fact that the taxonomy does not denote differences between narrative visual sequences (p. 305). Using detailed examples, Cohn (2016) proposes the consideration of Visual Narrative Grammar (VNG) when analyzing the relationships on a lower level, especially across panel sequences.

Citing Cohn (2016), VNG “argues that sequential image understanding is guided by a narrative structure that assigns categorical roles to panels and orders them using hierarchic constituents, beyond the linear semantic relations between images” (p. 306). As such, VNG allows for the characterization of different multimodal interactions within any work that uses the comic medium, making room for discussion regarding structure and used modes, something Cohn widely acknowledges (2016, pp. 306-307): he breaks down the grammar of both the visual and verbal, clearly seeing them both as a kind of language. After a detailed analysis, Cohn (2016) comes to the conclusion that there are three main interactions between modalities: autonomous, dominant, and assertive (p. 310). While Cohn makes his theory heterogeneous and relatable to different kinds of works and productions, the theory has clearly been created keeping the comic medium primarily in mind.

Cohn (2016) defines Autonomous²⁹ as “the most basic expression” when it comes to modality, namely “monomodality” (p. 311). For Cohn, a work, production or piece of literature is autonomous when it features only one modality; so for instance, a novel containing only text is

“Verb(al)-Autonomous”, while a children’s book that features only images and no text is “Vis(ual)-Autonomous” (Cohn, 2016, p. 311). In both Cohn (2016) and in a paper on multimodality and narrative structure in comics by Cohn, Taylor and Pederson (2017), it is noted that autonomy can be very much found in comics as well: “Writing in the absence of images would (. . .) be Verb-Autonomous, while a sequence of images without any text would be Vis-Autonomous” (pp. 21-22).

29 Cohn (2016) uses capitalization for his categories; this thesis follows this spelling.

While a natural deduction, one could wonder how such an observation could be helpful for the study of multimodal interactions within the comics medium, which is usually made up of both text and image, if not other additional modalities. For instance, webcomics rarely appear without a plethora of other modalities. Cohn (2016) rightfully points out that the category is needed because:

“testing multimodal examples against Autonomous expressions should allow us to assess the semantic contribution of each modality and/or the presence of structure in that modality alone” (p.

311). Here Cohn (2016) presents the advantage of removing or replacing certain modalities when studying multimodal works, so as to better define their roles and impact on the work, be it within the space of a panel or whole sequences e.g. a whole comic book (p. 311). Next to corpus study, this is a technique often used by Cohn in his studies pertaining to multimodality and the comic medium.

Figures 7 and 8

Vis-Dominance and Verb-Dominance

Note. Figure 7 on the left is an example of a Vis-Dominant page, and figure 8 on the right is an example of a Verb-Dominant page. Adapted from Unsounded (2010-current), by A. Cope, retrieved June 3, 2021, from http://www.casualvillain.com/Unsounded/comic/ch15/ch15_23.html. Adapted from xkcd (2006-current) by R. Munroe, retrieved June 3, 2021, from https://xkcd.com/2286/.

Cohn’s second category, and the first which actually depicts multimodal interaction, is Dominance, which he describes in the following way: “where a single Modality <visual-graphic> uses a Grammar <narrative> and controls the meaning <Semantic Dominance>, while the other modality

<verbal-graphic> plays a supportive role semantically, with no grammatical <syntactic> structures”

(p. 311). For this category, Cohn (2016) gives the example of an action sequence taking place over a series of panels, where the only text that appears is supplementary onomatopoeia (p. 311). In Cohn’s specific example, the visual dominates as it carries the narrative, while the verbal takes a back seat. This does not mean that e.g. the onomatopoeia is not important, but rather that it does not play a defining role, and that if one were to delete the onomatopoeia, the reader would still be able to make out what is happening in the panel sequence.

As Cohn (2016) emphasizes, the presented categorizations of multimodal interactions are established in order to help define the basic structure which features “varied surface relations” (p.

311). In Cohn et al. (2017), the difference between Verb-Dominant and Vis-Dominant is explained in the context of comics: “For example, a Verb-Dominant relation would have syntactic text with an individual image, or with a sequence of images with no coherent connecting structure (. . .) In contrast, a Vis-Dominant relation would have a sequence of images with a well-formed narrative structure, but with text lacking syntax-such as onomatopoeia-which do not exist within a sentence context” (p. 22). The authors use their own examples based on almost a century of American comics, but for the purposes of this paper, I’d like to refer to an example taken from webcomics (figures 7 and 8).

Figure 9 Co-Dominance

Note. A webcomic strip depicting a Co-Dominant relationship. Adapted from Sinfest, by T. Ishida, 2008, qtd. in N. Cohn, 2016, p. 314.

Cohn (2016) also proposes an interaction based on Co-Dominance, that is a relationship that

“distributes semantic dominance across modalities, yet one (or both) of those modalities still lacks grammatical structure, as in other Dominant interactions (p. 313). Cohn notes that such a relationship often occurs in single-panel comics, or advertisements, where e.g. one modality has a narrative structure, and one does not, but both are needed to properly decode meaning. Interestingly, he uses a webcomic example to demonstrate this relationship (figure 9).

Cohn’s (2016) final relationship is dubbed as Assertive, which “is characterized by grammatical structure appearing in both modalities (syntax, narrative) while the meaning remains controlled by only one” (p. 315). Cohn et al. (2017) adeptly summarize the subcategories of Assertive interaction: “Verb-Assertive relations weight meaning more towards the text, though the visuals would still contribute meaning in a narrative sequence (. . .) The same structural relations where the visuals carry more semantic weight would thus be Vis-Assertive (. . .) If meaning is balanced across both grammatical structures, the sequence would be Co-Assertive” (pp. 22-23).

This is one of the more complex categories, and one that could be compared to McCloud’s interdependent imagetext relationship. It is also in this category that Cohn recommends using subtraction or replacement in order to establish what kind of relationship one is truly dealing with.

Figures 10 and 11 Co-Assertion

Note. Figures 10 and 11. A Co-Assertive example of panel sequences. Given that these are the first pages of the webcomic, the narration (in square brackets), dialogue (in speech bubbles) and images work together to make a coherent whole. Adapted from Digger (2003-2011), by U. Vernon,

retrieved August 14, 2021, from http://diggercomic.com/comics/ 2007-02-01-wombat1-gnorf.gif and http://diggercomic.com/comics/2007-02-02-wombat2-purrrple.gif

The Assertive category differs from the Dominant category in that from a quantitative point of view,

there could be an equal amount of text and images in a panel sequence which both feature a narrative structure, but one of these modalities could carry more semantic weight; crudely speaking, the analyzed modality is more important. Arguably, the Co-Assertive category presents the biggest challenge, as one would have to establish by way of deletion if a given sequence is truly Co-Assertive; if the sum of featured modalities are needed to create a whole meaning. An example of Co-Assertion would be the first two pages of Digger (2003-2011), by Ursula Vernon (figures 10 and 11).

Furthermore, Cohn (2013a) also provides categories for content that can appear in the panels themselves; this is so-called active information, which is information that actively contributes to the narrative. Here, Cohn (2013a) distinguishes between:

• Macro panels, which depict a number of characters that interact with each other,

• Mono panels, which depict only singular characters,

• Micro panels, which depict “parts” of a character e.g. an eye, a leg etc.,

• Amorphic panels, which depict environments, places with no active characters. (p. 10) The reason as to why Cohn draws attention to this aspect is due to the attentional framing (Cohn et al., 2017, p. 23) that often accompanies these panel types. Attentional framing often has a strong impact on the narrative structure in a comic, and require an analysis on a broader level if the comic is to be e.g. translated. For instance, if a comic depicts a series of micro panels, which later reveal an entity, or a character, it is up to the translator to look at the comic from a broader perspective, and implement translation that adheres to the narrative structure and meaning of the panel sequence.

In recent years, Cohn (2020; 2021) continues to discuss comics and their place in current academic discourse, as well as how one can interpret the various multimodal relationships that occur within the space of the medium. Cohn (2020) continues to place importance on VNG, as well as specific processing theories that aid in multimodal reading of sequential imagery. One of these processing theories is the Parallel-Interfacing Narrative-Semantics Model, abbreviated to the PINS Model (p. 353). As Cohn (2020) explains, The PINS model involves the processing of two levels of representation-semantic and narrative structure-and features three key mechanisms: access, prediction, and updating (p. 353). The model illustrates how reader comprehension takes place when it comes to “sequential narrative images”, which while are not explicitly stated as comics, are certainly used as core examples (Cohn, 2020, p. 353). Simply put, the reader can construct a sequential vision of what is taking place, and this is done through connecting any possible narrative and semantics that are featured e.g. a punch should be followed by a hit, or a trip should be followed by a fall.

Cohn (2020) describes these as situation models- of course, semantic and narrative expectations can be violated, which in turn contribute to e.g. creating humor, or forcing the reader

to revise their interpretation (p. 372).³⁰ It is important to note that Cohn (2021) takes into account that every reader is different, and that not every reader will interpret visual narration in the same way (pp. 1-4). In fact, he calls out the widespread assumption that image sequences are universally understood and gives the presiding belief an acronym- Sequential Image Transparency Assumption (SITA) (Cohn, 2021, p. 3).³¹ While Cohn does indeed scrutinize and sub-categorize the intricate structures of sequential images, he does not isolate each level of meaning-making; rather, he ultimately highlights how it all comes together to create coherent visual narratives, thus speaking for the complex multimodality of comics.

As such, it is undeniable that Cohn has made valuable contributions to comics theory. His detailed focus on multimodal relationships has shed light on how modalities can work with each other to create meaning; he uses corpus studies to conduct extensive analyzes (Cohn et al., 2017) and study the shifts that multimodal works, such as comics, can undergo. For instance, Cohn et al.

(2017) confirm through their corpus studies that American comics have undergone decompression, which is a shift in narrative style; namely, comics have become less compressed and spread out across a higher number of panels, and have also begun to rely more on images than on text to carry the narrative (p. 31). Yet perhaps most importantly, Cohn (2013a; 2013b; 2013c; 2016; 2018; 2020;

2021) sees comics as a kind of language, one which is subject to a modified linguistic theory. As Cohn (2016) points out, “verbal language uses three primary components: A modality (phonology), meaning (conceptual structure), and grammar (syntactic structure)” (p. 307). He then cites Jackendoff’s theory of parallel architecture, which “argues for an equal contribution of each of those structures” (Cohn, 2016, p. 308).

Cohn applies the notion of parallel architecture to comics (2013a; 2013b; 2016; 2020), noting that visual narratives have a triad of similar elements, namely: a modality, which is the graphic structure, meaning, which is the conceptual structure, and a grammar, which is the narrative structure. (Cohn, 2016, p. 308). Such a point of view allows for the consideration of comics as a visual language, made up of components that work together to create a coherent whole; indirectly, Cohn’s approach shuts down any potential misunderstanding that comics are simply a number of modalities stitched together- modalities that can be easily changed or taken away. The application of Jackendoff’s theory is also beneficial in its equality; naturally, a structure can be predominant in part of a multimodal work (e.g. Picture-specific / Vis-Dominant panels), but under the notion of parallel architecture each structure (and any sub-structures) can be considered equally. Thus, according to parallel architecture, “the structure of visual languages used in comics does not cascade hierarchically down from a page to a panel to component parts in a singularly divisible structure. Rather, independent components mutually interact to form the perception of a holistic

30 Further details can be found at the cited article; Cohn (2020) goes into miniscule details about the PINS model.

31 Cohn goes into great details about readers of visual narratives/sequential images and Visual Language Theory (VLT) in the scope of his book, Who Understands Comics? (2021). Among other things, he takes into account social-cultural factors, as well as neurodivergent readers.

experience” (Cohn, 2018, par. 1.1). Thus, Cohn recognizes that as multimodal works, comics have a number of components working in unison. This kind of perspective changes the way comics are looked at, and impacts how they can be seen within certain domain studies, such e.g. pedagogy or translation.

Many of Cohn’s academic contributions (2013a; 2013b; 2013c; 2016; 2018; 2020; 2021) are dedicated to the meticulous dissection of the structure of comics, multimodality and visual language; Cohn often branches out into the realm of phonology, phonetics, syntax, semantics, culture, and compositional theory. However, one could write an entire dissertation pertaining to Cohn’s academic work alone; therefore, for the purposes of this paper, the prime focus will be on Cohn’s aforementioned contributions, with occasional reference to his other papers on comics and multimodality. Cohn’s most valuable contribution within the context of this thesis is the fact that he sees comics as a language, and brings to light the intricate patterns that govern the joined cooperation of modes, choosing to even focus on the content contained within sequences if it leads to better understanding. Such a framework, much like McCloud’s, is an important aid when studying comics within the context of a certain purpose- in this case, translation, which will be further described and applied in the following chapters. It is also important to mention that while McCloud’s basic framework seems to fade in comparison to Cohn’s, McCloud’s taxonomy was one of the first, and it allowed for the development of the question of mul timodality in comics;

furthermore, as can be seen, it served as a foundation not only for Cohn’s framework, but also for Groensteen’s comics theory.

1.4.3.b. Thierry Groensteen. Thierry Groensteen is a prolific French comics scholar who

W dokumencie The Webcomic Dimension for Our Millennial Space: Translation Queries in theContext of Contemporary Theoretical Investigation (Stron 48-56)