Anticipatory Phonetic Strategies for Simultaneous and Consecutive Interpreting (APVV-15-0307)

Model of anticipatory prosodic strategies

Štefan Franko, Roman Gajdoš, Slávka Janigová, Katarína Kukučková,
Mária Paľová, Rudolph Sock, Mariana Zeleňáková

Faculty of Arts, Pavol Jozef Šafárik University in Košice

Understanding is the basic competence through which progress is achieved at every level of foreign language proficiency. Due to the influence of cognitive psychology, language teaching has begun to focus more on the knowledge of information processing theory, theory of schemes, and the strategic dimension. Thus, several models have emerged that consider reception competencies a prerequisite for developing language production competencies. The individual's ability to apply their own effective strategies in resolving any difficulties associated with fulfilling a specific task comes at the forefront.

Further research into receptive competences focused on the role of context, the influence of automatic speech mechanisms, and the relationship between the cognitive and the affective aspects of language use.


Speech understanding models are process-focused models that seek to explain how global speech understanding is formed. The features of the models are in that they give the possibility to generalize, so they must be able to be applied to a new set of data with the same properties, thus testing the capacity and validity of the model. At the turn of the centuries, cognitive research into understanding focused on attention in auditory decoding, on memory processes as well as on the characteristics of the listener themselves and the role that their previous knowledge plays in the process of understanding (Witkin, 1990). Thanks to these researches, we now know that to decode speech, the listener does not have to focus on each of its sounds, that through training they can influence transfers between short-term and long-term types of memory and can guide retention in the memory and reconstruction of what has been understood.

Out of the many existing models of understanding the speech, we will now list only those which inspired us when developing the model of anticipatory prosodic strategies. The simple Master Model (Mills, 1974) draws attention to the mental decision to listen, to actively respond to information, to maintain attention, to overcome obstacles and to retain the information in memory. This model was subsequently developed by Lundsten (1979), who tried to describe how speech assumes meaning. The listener must first recognize the sounds, group them together, and determine the boundaries of the units, which they will retain in memory in the form of mental images, which they will further evaluate, compare, and verify using previous experience. If the mental image of the information is consistent with previous experience, the group of sounds is assigned meaning. Otherwise, the listener returns to the perceptual memory trace and re-examines the original auditory data that they interpret and attach importance to them. We realize that this one-way model no longer corresponds to the current model of describing understanding processes, but it describes well the alternation of transitions between "sliding on the surface" and "true understanding" as described by professional interpreters.

From among the 1980s models, we would like to mention two models (Clark and Clark, 1977; Goss, 1982), which limited the role of contextual influences in the study (environment, participants, theme, etc.). Research into anticipatory prosodic strategies focuses in fact on understanding an isolated sentence, although all of the sentences examined share a common communication-situational element (presented by French Members of the European Parliament at a certain time), and thus it cannot be completely ruled out that professional interpreters could not have employed elements of the wider context in their perception tests. The first model describes the processing of information in four stages:

1. Short-term memory captures information in the form of a phonological trace.

2. Phonological elements are divided into components that carry meaning and/or function.

3. The components create a list of proposals/offers corresponding to the phonological trace.

4. The offer replaces the phonological trace in the short-term memory and the listener remembers the global meaning of the sentence.

Goss adds the parameter of critical evaluation of the assigned meaning to this model and points out that this process is not merely recurrent but is also interactive, with the authentic response of the listener to oral production being part of their perception competence.

Nagle and Sanders (1986) describe the understanding of speech as capturing of a sound image, its immediate storage in short-term memory, where it is divided into units of meaning depending on the knowledge stored in the long-term memory. Some audio images are being instantly erased from short-term memory (these are unknown units in most cases), others trigger multiple assignments of meaning and its rapid saturation and fall-out. Automatic processes are introduced, which stabilize information processing with the help of knowledge in long-term memory. Affective aspects are reflected in increased focus and efforts to understand. Control and monitoring processes are now taking place, through which understanding is objectively being evaluated. According to this model, a synthesis is prepared from the extracted information in short-term memory, which is verified in it before the information is transferred to long-term memory. Syntheses evaluated as unsatisfactory in the short-term memory will be reprocessed. Understanding is thus a sequence of successful and accepted syntheses. This is a two-way model in which the listener tries to understand in both bottom-up and top-down ways. For us, this model was inspiring in that the subject matter of our research being in how the extracted prosodic information contributes to the complex of information forming a satisfactory synthesis, which will become a prediction of any possible subsequent information (and their synthesis). From this point of view, it is clear that verification and evaluation of prosodic information is also a two-way process. The model described emphasizes the controlled processes, which are currently referred to as strategies and thus represent a key concept that we work with.

Lhote (1995), phonetician, developed the so-called landscape model (modèle paysagiste) of understanding oral production. Lhote claims that any oral production is situated in a certain sound environment (voice, rhythm, intonation, tone, silence, etc.) and each language has its own landscape of sounds, which the listener reconstructs on the basis of their mental images (and their general and specific knowledge). Landscape-like listening is thus similar to observing a landscape, in which the eye of an observer would at once include a snowy relief up the hill coloured by the sun setting, a village in the valley, and a lone skier. When listening to speech, the capturing feature instigates, the listener to select and focus attention on some of the components of the soundscape. If the search in phonological memory fails, the recognition function would work with hypotheses, which it verifies with the help of phonological sequences (sounds, intonation contours and patterns) and verifies a possible meaning of the discourse with the help of the existing knowledge. The balance between the two functions is the decisive factor here. The function of enabling the assignment of meaning, i.e. its rapid availability, is the result of the simultaneous employment of both functions, which leads to a correct or incorrect understanding. This relatively simple model is interesting to us in that it works well in describing the difficulty of understanding in a foreign language, partly because of the differences between the two soundscapes, but also because live speech is accompanied by various peculiarities of the landscape (inadequate speech rate, hyper prosody, reckless articulation, disregard for standard intonation) that delay or limit the recognition function. In interpreting, rapid availability of meaning is crucial for performance. Simultaneous application of the capturing and the recognizing functions is thus one of the means of optimizing the interpreter's output.

From among the recognized models of understanding, we would like to mention the model of controlled activation and deactivation in the recognition of forms and meanings described for understanding the written production (Fayol, Mouchon, 1984). The authors state here that the speed of understanding (activation and deactivation) is influenced by markers (anaphora, punctuation, connectors), which the reader uses to only start the correct direction of information processing, avoiding short-term memory overload. Since we know that punctuation is just a graphical form of some intonation patterns, we were trying to find an answer to the question whether some of the prosodic phenomena do perform the function of anaphora or connectors.

And for maintaining consistency, may we just add that the terms of working memory and existing resources (Stevick, 1993) are used in the descriptions of current models, instead of the terms of short-term memory and long-term memory. We now know that accuracy and rate of understanding depends on several variables, including the psychological and physiological condition of the listener. We also know that the existing knowledge is of a varying nature (explicit, implicit, encyclopaedic, pragmatic, procedural) and also that the processing of information is not linear, it is rather the result of a constant interaction between working memory and existing resources.


A brief overview of the presented models suggests that many factors enter the process of assigning and creating meaning. It is also clear that theoretical structures cannot be applied on the whole complex of aspects of the listening comprehension competence. Obtaining a relevant model thus means focusing our research on the most accurately defined area and exploring strategies that can be used to successfully fulfill a specific task. By default, it also means figuring out how we can improve a particular language skill that contributes to a given competence. The term strategy here refers to a conscious and planned procedure that facilitates capturing, retention, retrieval, and reconstruction of information. It is therefore necessary to work with direct experience and observe the listeners' strategies "live". In his model, Goss (1982) emphasizes in his model that in trying to retain in memory a certain flow of information, automated processes are predominantly used, while consciously controlled processes, i.e. strategies, are more frequently used in an effort to remember certain parts of the information flow. Goss tried to identify the strategies of the "good listener" (bon auditeur) and came to the following conclusions: they can use the pauses of the speaker, the redundancy of oral production and the prediction of the future possible development of that production. Such prediction (anticipation) is a sequence of creating and verifying hypotheses about its future possible meaning and the closer the area of knowledge to the listener, the easier it is going on.

The cross section of the results of previous research shows that a good listener can:

1. use previous knowledge,

2. deduce meaning,

3. consider the context,

4. use anticipation,

5. use their ability to analyze and assess critically,

6. control one's own activity leading to understanding.

While the listener's first four language skills relate to cognitive strategies, the latter two represent metacognitive strategies that allow them planning and guiding listening. The above six skills are complemented by mnemonic strategies (supporting techniques for storing and retrieving information, especially keywords and grouping information elements), compensatory strategies (logical presuming that overcomes knowledge gaps) and affective strategies (overcoming unrest, lack of self-confidence). Lund (1991) emphasizes the ability of a good listener to create an acceptably probable context using grasp and the ability to hold attention and reorient it to possible difficulties of understanding without getting lost in understanding. An experienced listener can assess the extent to which they understand, admittig a problem/error and woud correct the trajectory of meaning.

Studies pay less attention to how the phonetic component contributes to the understanding of oral production. In addition to the above-mentioned elements, quite a few authors focus on noticing the link between speech rate and quality of understanding (number of words per minute, number of syllables per second), short-term memory capacity and quality of understanding (the number of information elements maintained between 20 an 30 seconds), the listener's ability to use speaker pauses and hesitation pauses and the critical length of the pause, and their speech flow segmentation. In particular, research into the field of foreign language didactics underlined that segmentation is a natural automatic process created by a perceptual habit tied to the listener's first language and that children are extremely sensitive to changes in intonation, gradually developing their own listening strategies depending on the language, culture or environment. At the same time, studies are emerging that emphasize the inability of adults to adopt articulatory and prosodic habits of a foreign language.

In connection with the study of the demarcation function of phonetic phenomena, the interest is shifting from segments to suprasegments. Vanderplank's stress and rhythmic patterning (1993, cited in Cornaire, 1998) sheds more light on the research into articulation rate, and Lhote (1995) attempts to determine the stages - from sensory to cognitive ones - of processing the information being listened to. The ear captures the flow of sound, extracts the relevant acoustic features, transmits them in the memory, where the mental image of its meaning is created by further processing of the original mental information. The transition from sound to an element of meaning is aided by the cooperation of the cerebral hemispheres: the right one handles melody, intonation and emotion, while the left one handles language and cognitive components. Cooperation between hemispheres has a compensatory effect.

These findings stood at the beginning of a series of research papers that looked at the mechanisms of repeated recognizing the language forms and their meanings, highlighting initial or final syllables, and indicating the boundaries of sentences, syntagmas, and interrelationships among syntagmas. Suprasegmental phenomena come to the fore as elements that streamline the performance of working memory by grouping the elements that are already partially hierarchically moved to long-term memory (they become existing resources). Research into intonation and melodic and rhythmic variation as an indication for understanding the meaning begins at this point. Claims appear that prosodic analysis and phonetic exercises develop the competence of listening comprehension.

By virtue of speech understanding models, we have come to realize what a complex and multifactorial phenomenon we are researching, specifically that the validity of theoretical constructs is relatively difficult to verify. It is therefore necessary to continue the basic research into big data relating to reception, perception, and understanding.


Anticipation is a concept whose significance, especially for simultaneous interpretation, is recognized by most interpreters and characterized by them as a series of anticipatory moments that allows them to interpret correctly and completely while maintaining control over their own production. The form, which is captured by the ear of the listener or the interpreter, not only has morphology, but it also contains the dynamics of its origin: it materializes the first layer of the possibility of attaching the meaning, and is thus a manifestation of praxological anticipation. Gradual emergence of forms is a continuous process of microgenesis, an ever ongoing modulation according to the thematic field that does not determine where the process of stabilization is to be terminated, which is relative and temporary. Such a continuous and repeatedly unfinished process of the emergence of forms makes it possible to understand that any form that arises represents a meaning (Paľová, Zeleňáková, 2019). The existing researches into the relationship between anticipation and understanding have focused mainly on higher levels of language (morphology, syntagmatic and textual syntax).

The study of anticipatory phonetic strategies focuses on anticipatory phonetic phenomena as basic elements of microgenesis. The range of anticipatory speech strategies ranges from the micro level, i.e. from purely phonetic segments up to the macro level of words, sentences, and the entire speech production. At the micro level, anticipatory articulation gestures are understood as the expansion or extension of these gestures to adjacent phonetic segments. At the macro level, the extension of anticipatory phenomena exceeds segments and syllables. Segments and syllables are characterized by prosodic features, especially by intonation schemes, which are applied from words to whole sections of sentences. Anticipatory phenomena are inherent in speech production, and the listeners have learned to use them in speech perception for adequate and optimal reconstruction of current linguistic meaning.

The objective of the presented research is the folowing:

1. confirming the existence of anticipatory phenomena at the phonetic level and locating these,

2. describing how anticipatory phenomena affect the strategic decisions of the interpreter (the ability to perceive and use them largely affects the quality of their performance - accuracy, completeness, adequacy, and fluency),

3. extracting model sentences from the corpus of expressions, the former containing significant anticipatory phonetic phenomena predicting a certain type of anticipation,

4. training the ability of interpreters to perceive and use them, and thus verify the validity of the research conclusions.


The notion of probability anticipation was introduced by information theory. This applies to a whole range of human activities, which in mathematical form is described by the Markov chain, for which it holds that the probability of transition to the next event only depends on the current event. It is relatively difficult to apply the notion of probability anticipation on language production because the Markov chain is a linear sequence of random elements. This difficulty can be overcome by assuming that the mechanism of production/reception of speech production is taking place at several hierarchically arranged levels. In simultaneous interpreting, the interpreter's brain creates hypotheses and anticipates certain verbal and semantic sequences, which are verified at several levels and these are subsequently either confirmed or refuted. The probability that anticipation of the future development of speech production will be correct increases depending on the degree of redundancy of the speech production (Chernov, 2004).

This research is further based on a sensory-motoric approach to speech and the theory of continuous Anticipatory Perception based on Events (APE) hypothesis, according to which the listener is able to use observable anticipatory indications of the speech signal (Sock et al., 2011). We also use the inventory of prosody functions in oral communication (Di Cristo, 2013), especially the description of the influence of prosody on the segmentation of oral production and the subsequent creation of its hierarchically arranged units. Perception tests (see below) performed by interpreters in this research confirm that the information regarding the structure of speech captured in the anticipatory cue and the time of their retaining in short-term memory is the most important information to help them optimize their interpreting output.

Research into anticipatory phonetic strategies also draws on the knowledge contained in psycholinguistic models, according to which oral production is formed sequentially in such a way that the next sequence is prepared during uttering the previous one. Segmentation/connection involves the manifestation of cognitive planning of the speaker, and the listener or the interpreter must demonstrate the competence to understand their cognitive intent. Cognitive competence thus controls both production and perception (Levelt, 1993). The competitive model may be applicable in a situation where several heterogeneous competitive indications lead the listener to opt for one of several probable interpretations. The constraints resulting from individual planes of language are reflected in the cognitive plane at the moment of segmentation/connection.

In a summary of translation theory studies on the individual ability of interpreter to process information, M. Guidère states that "the sense of saturation with the interpreter is triggered by three important factors: 1) a sudden change in the rate of processing (change in the speaker's speech rate, enumeration); 2) change in the quality of the sound signal (bad acoustics, flat or overexposed speech); 3) incorrect segmentation of speech production or extremely demanding sequences (grammatical irregularity, logical mistake or ambiguity)" (Guidère, 2011: 108 - 109). Perception tests confirm all three above-mentioned saturation factors and such cases of sentence implementation were excluded from the corpus of model sentences.


A corpus consisting of 50 hours of recordings of speeches by French MEPs, containing 10 male and 10 female voices, was prepared for the research into anticipatory phonetic strategies. 7,366 sentences were extracted from the corpus, which were used for the analysis of digitized acoustic data using the Multi-Speech and Speech Analyzer programmes, which was taking place in the following sequence:

1. standardization of sentence amplitude and perceptual determination of prosodic groups, anticipatory nuclei and types of anticipated information,

2. automatic extraction (+ manual correction) of prosodic parameters - fundamental frequency (F0 in Hz), intensity (in dB), duration of prosodic groups and silent and filled bands (in s/ms) and total and articulation rate,

3. defining generalizable phonetic conditions for creating a model of "mental software" anticipation for the training of interpreters.

The sequence flow diagram illustrates the stages of extraction of digitized acoustic data. 

All the extracted parameters were saved in the documentation system in such a way that they could be returned to and statistically evaluated if relevant relationships between the primary unmonitored parameters and the main monitored parameters were demonstrated. The aim of this stage of the experiment was to examine the trajectories of intonation diagrams and to generalize the phonetic conditions necessary to create a conceptual model combining the phonetic, syntactic, and semantic levels.

A set of 1,000 sentences was formed from the extracted set of sentences, containing at least two rhythmic groups, which were listened to by four Slovak conference interpreters with professional history from 4 to 30 years. Thus, four thousand perceptions were recorded and the data from them were statistically processed and analyzed. It was the task of the interpreters to listen to the sentence and locate a place (expressed as a time stamp) on the oscillogram in which they obtained enough information necessary to anticipate the probability of further development of the sentence. At the same time, at each such localized moment, they were to name what information they anticipated. In the task of naming the anticipated information, they initially were working without any input instruction. After four hundred listenings, the data obtained were evaluated for the first time. Interindividual differences in the localization of moments were compared, which were marked as anticipatory cores, and the following typology of anticipated information was created using the analysis of descriptions of anticipated information:

  • key part of the discourse = core of the discourse = new information → RHEME
  • change of the topic, postponement of the topic, deviation from the topic → THEME
  • determinative syntagma = all kinds of determination within the syntagma
  • predicative syntagma = verb, its semantics or its time and mode
  • explication = making it more precise, expansion, narrowing of the information
  • escalation = amplification = intensification
  • opposition and contrast
  • enumeration
  • emphasis, stress, or a combination of both
  • logical and/or sentence and/or verbal parallelism and combinations thereof
  • conclusion
  • coordination.

The following table illustrates the interindividual differences in the perception of interpreters in locating the anticipatory core and determining the type of anticipated information.

The experiment showed that despite the interindividual differences in listening and understanding, there are anticipatory moments that the hearing of the four interpreters always unambiguously identified and intonation patterns that they always unambiguously linked to a certain type of anticipated information (the sentences marked in yellow in the table on the right). In the experiment, only the localization of the anticipatory moment by four interpreters within the range of + - 100 ms is considered to be concordant. The values, which are marked in red in the table, indicate the critical factors for the localization of the anticipatory moment: the time that elapses from realizing the possibility to anticipate to the marking of this moment on the oscillogram expresses the response time of individual interpreters. It is highly probable that in the present case they aimed the same possibility of anticipating. However, the specified time interval had to be respected to ensure that localization applied to the same sonic nucleus. The following table shows the number of identical perceptions in listening to a set of one thousand sentences, where in the first captured anticipatory core their perception matched in 700 cases and the type of anticipated information in 702 cases. They reached a complete agreement of perception in time and type in 558 sentences. In the second anticipatory core captured, the agreement was achieved 382 times over time and the type agreement 357 times. Complete agreement of perception in time and type was achieved in 302 cases.

In the perceptions of the third and subsequent rhythmic groups, the deviation in localization and type was too high. To illustrate this, we present a chatz of the variance of the localization of anticipatory moments by four interpreters in the first and the second hundred sentences:

Anticipatory nuclei were located at the end of the rhythm group with increasing F0 using perceptual tests, followed by a pause. To determine the F0 variation of the rhythm group peak, the F0 values of the sonantic nuclei of the penultimate and the ultimate syllable were compared, with the average difference of the F0 values = 35.98 Hz. 

The interpreters explained the high interindividual differences in the perceptions of the endings of the third and the subsequent rhythmic groups of sentences by saying that, although they could hear a rising F0 with a subsequent pause at the end, they were unable to obtain any information relevant to anticipation.

It follows from this observation that the prosodic anticipation coming from the first moments of the melodic whole is the determining factor for the optimization of the interpreting performance, especially from the intonation pattern of its first rhythmic group (bottom-up). As the sentence progressed over time, the interpreters' predictions about the the future organization and meaning of the forms began to rely more on higher levels of language (top-down), which permanently verified the predictions obtained from the first to second anticipations. Short-term memory retains the memory imprint of the intonation pattern that was the source of anticipation until the anticipated form and its meaning are confirmed/refuted. For some types of anticipated information, such as rhyme, this can be a relatively long time, because often the forms and meanings to verify the prediction come only in the last rhythmic group of the sentence. Cooperation between short-term and long-term memory is essential for short-term memory to remain functional. Because the working memory maintains one to two intonation patterns as decisive for the prediction of meaning, it cannot/does not need to register and use all the anticipatory moments that the prosodic plane offers. This confirms the presence of redundancy at the prosodic level and the co-effect of redundancy at all the levels of language. By synthesizing this knowledge, it is possible to identify four groups of anticipated information that interpreters can use to optimize interpreting output:

1. syntactic information - prosodic (intonation and melodic) formulas allow to predict the organization of a sentence,

2. semantic information - prosodic formulas make it possible to evaluate the interactions of meanings that carry the words with the help of the prosodic indication segmentation/connection,

3. reference information - combinations of syntactic information with the results of semantic interaction make it possible to check the logical and thematic coherence of the discourse,

4. pragmatic information - prosodic formulas make it possible to manage the conformity of understanding with the author's discourse intention.

Interpreters use them as a reference interface to start assigning and attributing the meaning along the selected trajectory.


Out of the set of 558 sentences obtained, which represented a complete concordance at the time of the anticipatory core captured and in the type of anticipated information in perception tests performed by four conference interpreters, sentences that had graver irregularities (syntactic and grammatical errors, reformulations, hyperprosody and hypoprosody) were excluded. The current model includes:

500 recordings of sample sentences with the possibility of displaying an oscillogram, supplemented by

500 recordings of sample sentences spoken by a representative speaker - a male voice

(with the possibility of displaying an oscillogram),

500 recordings of sample sentences spoken by a representative speaker - a female voice

(with the possibility of displaying an oscillogram),

transcript of 500 sample sentences and

a set of selected sample sentences with an inserted warning of an impending anticipatory cue at the oscillogram.

(36 spare sentences matching the same selection criteria were added to the base model in case of file corruption in individual folders.)

The 500 sentences of the model contain typical stylized prosodic contours, with which it was possible to train an acoustic model (machine learning), which searches for anticipatory nuclei on its own in corpus sentences (7,366 sentences), making the current model relatively easy to expand/modify. Frequency of the fundamental tone F0 or the contour F0 are the key parameter in this detection. Based on the detection of the phenomenon, the trained acoustic model marks as a moment of anticipation a section of the sound signal with the property of the rising F0+ pause and parameters of typical stylized prosodic contours. The model trained takes into account just the prosodic level of the language (without semantic, syntactic or other information), so changing the model will also require a certain amount of manual correction.

All the existing models of understanding the oral production confirm that hypotheses or predictions concerning possible future language forms, their organization and meaning must be verified by the flow of speech. Auditory decoding of speech requires training of hearing for certain phonetic phenomena, as well as training of auditory working memory. Mastering the rhythm (rhythmic sequences) and intonation of a foreign language has been neglected in the teaching, and unmanaged prosody is reflected in a defective understanding or in the form of too much compensatory effort. Interpreting is a strenuous performance and saving energy is helped by excellent, ideally automatic, mastery of the elements of the lower language levels of speech, which will allow the transfer of cognitive efforts to resolve the problems at higher levels (stylistic, pragmatic). As we have shown in the analysis of models of understanding, the existing models mainly lack a precisely defined area of research and verification of the theoretical construct. The advantage of the model of anticipatory prosodic strategies is in the strictly defined area of research (intonation patterns and intonation contours of French used in France), the possibility to repeat research in another language or verify its validity by measuring and analyzing the error rate of interpreting performance. The model is both a cognitive model that increases sensitivity of interpreters to the possibility of using anticipation from the first seconds of the speaker's speech, as well as a training model for those who need to improve the prosodic level of their proficiency in French.

Understanding is a complex competence that requires a lot of partial skills "on the way from sound to meaning", but it starts with recognizing the sound. As is well known, we only hear and identify those sounds of language that the phonological memory knows. Therein lies the importance of training it to distinguish the sounds of a foreign language, because the interpreter must capture the meaning continuously, not as far as at the end of the sentence. Interpreting is a struggle with time, the interpreter has a minimum of time for reformulating, so they must anticipate and work with a latent meaning that will verify the end of the sentence. This is one of the reasons why the model consists of isolated sentences. The narrow definition of the field of research suppressed the influence of the context, although the global context, and thus the possibility to assign the meaning from top to bottom, is partly given by the source of the corpus. Interpreting, especially the simultaneous one, is used by the European institutions, which are usually the goal of conference interpreters. The model offers realistic authentic prosodic speeches of the Members of the European Parliament, but also their exemplary, cultivated and controlled implementation by both male and female voices of native speakers. Transcription of model sentences will serve as a support in decoding sequences which present a subjective difficulty to the listener. We know that even at the C levels of foreign language proficiency, not all the language skills and competences are fully balanced. Therefore, we added selected sample sentences to the model with an inserted warning of the approaching anticipatory moment at the oscillogram. Their task is to direct the auditory attention of the listener and strengthen the cognitive processing of auditory perception.

The model is intended mainly to students of interpreting and novice interpreters who want to improve their phonetic strategies through teaching, but especially through independent learning and training. This respects the five basic principles of adult language education - individual experience, the moment when they want to learn, orientation to acquire skills and motivation to improve the key language competence by practicing a specific language skill.


The model is intended for foreign language students from level B up, but also for language professionals. While at lower levels, learners can work with the model to acquire their correct intonation patterns in parallel with the acquisition of the remaining language skills, learners with the acquired C level can work to improve understanding and production of the prosodic level of language. The prosodic level of language has an irreplaceable place in the complex of language perception. It is prosody that is the first level which is the determining element for understanding in language learning in early childhood. Before understanding the meaning of the words, the child responds to the melody of the speech. Over time, the dominant use of prosody (what they place the greatest emphasis on and focus attention in decoding the meaning) gives way to other language levels, especially the semantic level. In teaching/learning each new language, the prosodic level of language should be given the same space as the other language levels, because the learner creates a special phonological memory for the given language. Mutual, mostly negative, interference between phonological memories significantly affects correct understanding, so it is a non-transferable language competence.

Current didactic procedures use sound recordings, their transcription and a graphically marked melody of a sentence when learning the correct prosody. Unlike our model, they do not offer a simultaneous interconnection between visual and sound perception. Another advantage of the presented model is the possibility to practice prosody on units smaller than one sentence. Determining the number of rhythmic groups, identifying the beginning and the end of a rhythmic group is task a more difficult in natural expression than it seems at first glance. But i tis the well-defined rhythmic group that helps to correctly determine the relationship between individual syntactic and semantic groups of language forms. While at the level of production, incorrect/insufficient use of the above-mentioned language competence is relatively easily identifiable (in the form of incorrect intonation, melody, and accent), at the level of understanding it is more difficult to perceive the possibility of compensating for the loss of meaning transmitted from the prosodic level of language. However, much higher cognitive effort is required to compensate for this, limiting the use of cognitive capacity to capture other meanings. Using the compensation strategy in simultaneous interpretating poses a risk of accumulation of misunderstanding at several levels, or significantly increases fatigue of the interpreter's attention, which is a significant factor affecting the quality of the interpreting output.

It is possible to work with the model in several ways, depending on the language level of its user - learner. At lower levels of language proficiency, exercises would focus on mastering a pre-identified intonation pattern, while at C levels, we will use didactic techniques that require a more independent and cognitive-intensive learner's approach based on the already acquired competencies and conceptualization skills, and on systematization.

The model now consists of 536 sentences taken from authentic speeches by French-speaking Members of the European Parliament and the same 536 sentences spoken by native speakers and their transcription. The method of selecting sentences is described in detail in the previous part of the article. The sentences are arranged in twelve categories, each of which corresponds to a single anticipated phenomenon:

  • Type 1 - rheme (193 items)
  • Type 2 - theme (1 item)
  • Type 3 - determination (108 items)
  • Type 4 - predictive syntagma (90 items)
  • Type 5 - explication (66 items)
  • Type 6 - contrast (22 items)
  • Type 7 - scaling (3 items)
  • Type 8 - enumeration (17 items)
  • Type 9 - emphasis (18 items)
  • Type 10 - logical parallelism (4 items)
  • Type 11 - conclusion (11 items)
  • Type 12 - coordination (3 items)

Within the examined set, there were more sentences containing the above-mentioned types, but only those sentences that completely and simultaneously met the criteria were included in the model:

- capturing the anticipatory moment by 4 interpreters within the range of + - 100 ms;

- the standard difference between the rising F0 of the penultimate and the ultimate syllable at the end of the rhythmic group 35, 98 Hz;

- standard silent pause duration of 250 ms;

- identical determination of the relevant type by all 4 interpreters.

So far, we have solely relied on hearing to practice distinguishing the type of anticipated information, as the calculations have not confirmed any relevant correlation between ascending F0 and type (except for type 8) and any relevant correlation between the percentage change in intensity and base frequency F0 was only shown for types 6, 8, 9, 11, and 12. It is therefore necessary to look for other combinations of suprasegmental phenomena which cause the ear to be able to capture the difference of the intonation contour predicting a certain type of anticipated information.

The general context of model sentences is known from a non-linguistic situation. However, the actual content of the speeches can be highly different. The most important thing for the success of communication is correct understanding of the core of the discourse - the rheme, which in the context is the bearer of new, as yet unknown information. For the speaker, as well as for the listener, this part of the discourse is crucial, therefore the speaker consciously or unconsciously emphasizes it and its pronunciation is preceded by the same scheme. This first type is also the most frequently anticipated phenomenon, as evidenced by the highest number of items - sentences in the first category. Due to the fact that the division of a sentence into the core of the discourse and the starting point may not correspond to the standard division according to sentence parts (subject, predicate ...), the exercises intended to anticipate the core of the discourse will only be intended for experienced language users. It is precisely because of this difference in comparison with the usual formal division of a sentence, i.e. the fact that the range of the syntactic and semantic parts of a sentence is not always the same, that the ability to anticipate the incoming rheme is a significant skill of the listener and especially the interpreter. Directing the attention to the moment where it is no longer possible to anticipate the necessary information in communication on the basis of the sentence syntax, but it is possible to anticipate it on the basis of prosody, of the intonation pattern, is the key to successful interpretation.

The second significantly represented type of the anticipation captured is the ability to anticipate predictive syntagm - a verb whose role in a sentence is well known. Exercises aimed at practicing its anticipation are suitable for learners with the acquired B.1. and a higher level of language proficiency.

Exercises designed for levels B.1.+ are based on repeated listening to the intonation pattern with the indication of the place of identified anticipation in model recordings and purposefully lead the learner to focus attention on the place and to anticipate a predefined phenomenon. Didactically, the exercise is based on the method of teaching by actively involving several senses simultaneously (hearing + sight), which allows a stronger stabilization of the intonation pattern heard and its "storage in internal software", from which the next incitement of neurons by the same sound the same decoding scheme will reflexively be applied. The topic for this type of exercise is given in the following methodological sheet.

Methodological Sheet 1

Title: Introduction to Anticipation

Entry language level: B.1.+

Target language level: B.1.+

Teaching aids: Technical equipment (computer/laptop/tablet with Audacity/Speech Analyzer installed, headphones), recordings of sets of sentences Type 4 - predictive syntagma (90 sentences).

Task: Listening to the intonation of a prosodic group followed by a verb.

Task description: Each learner shall work independently. They have at their disposal and use their own technical equipment and a series of 10 + 10 model sentences spoken by a native speaker, who listens and at the same time watches the oscillogram of the recording. Each learner can listen to the series of sentences repeatedly, according to their own needs.

Objectives: Pragmatic:

Improving perception and sensitivity to the prosodic level of language

Training anticipatory ability in a selected phenomenon - a predicative syntagma

Fixing the correct intonation patterns

Training attention

Sociolinguistic and sociocultural:

  • Improving communication


  • Practicing the intonation

Task progress - stages:

Initialisation: Definition of prosody, intonation. Presentation of work with technical aids, "reading" the oscillogram (individually or in a group).

Arousal of interest: Discussion (individual: study of the sources).

Attention focusing: In the recordings, the moment at which the attention is to be intensified is highlighted.

Understanding: Achieved by repeated active listening to recordings.

Conceptualization: Localization of the anticipatory moment of the predicative syntagma at the end of the prosodic group with increasing F0 followed by a pause.

Systemization: Verification on a new set of sentences.

Exercises with a higher level of difficulty will use a comprehensive didactic process divided into the following stages: initialization, arousal of interest, directing attention, understanding, conceptualization, systemization. In the initialization stage, the learner listens to a series of model recordings with the same type of anticipated phenomenon. Subsequently, they should create hypothesis (s) and determine the common element of the moment of anticipation in the sentences heard. Active involvement of the listener/interpreter in the formulation of the hypothesis, which is the basis of the anticipatory strategy, arouses their interest. In the next step, we will focus our attention on the anticipated phenomenon by listening again to a series of model sentences, now with making a mark of the anticipatory moment at the oscillogram. This is followed by a stage of understanding by means of verifying the original hypothesis and recognizing the type of anticipated phenomenon. In the conceptualization stage, the listener/interpreter has established the connection of the prosodic level with the semantic level (transcription of the recording is available at this stage). In the last stage, the acquired skill is systemized through several activities offered, such as hearing a new series of sentences, in which one will not belong to the same category and will anticipate another phenomenon. The task will be to find a sentence that does not belong to a series of sentences. Another activity will be represented by hearing a series of sentences from the same category and a call to mark the place of anticipatory moment and subsequent verification - self-check, displaying the sentences with the already correctly marked place of anticipatory moment. Finally, it is possible, for comparison, to work with an authentic recording of the same sentences and gradually increasing the sensitivity of the listener/interpreter to recognizing the moments of anticipation. The following methodological sheet can be a stimulus for working with the model and creating the exercises:

Methodological Sheet 2

Title: Anticipation of the Rheme

Entry language level: B.2.+

Target language level: B.2.+

Teaching aids: technical equipment (computer/laptop/tablet with Audacity/Speech Analyzer programme installed, headphones), recordings of sets of sentences Type 1 - rheme (193 sentences).

Task: Determine the moment and type of anticipation

Task description: Each learner shall work independently. They have at their disposal and use their own technical equipment and a series of 10 + 10 model sentences spoken by a native speaker and a series of 10 + 10 authentic recordings of the same sentences, which they are listening to and at the same time follow the oscillogram of the recording. Each learner can listen to the series of sentences repeatedly, according to their own purpose.

Objectives: Pragmatic:

- improving perception and sensitivity to the prosodic level of language

- training anticipatory ability in a selected phenomenon

- fixing the correct intonation patterns

- training attention

Sociolinguistic and socioculturall:

  • improving communication


  • intonation training

Task progress - stages:

Initialization: Listening to a series of 10 recordings of sentences with confirmed anticipation of rhyme without marking the moment and type of anticipation and the phenomenon.

Arousal of interest: Hypothesis formulation - determination of common anticipatory moment and anticipated phenomenon.

Focusing the attention: Listening to the recordings and watching their recording at the oscillogram with a marked moment of anticipation at which the attention is to be intensified.

Understanding: Identification of the type of the phenomenon anticipated.

Conceptualization: Listening to the recordings while watching their transcription and audio record at the oscillogram; connecting the prosodic and semantic levels of language; localization of the anticipatory moment of the rheme at the end of the rhythmic group with increasing F0 with the following pause.


Activity 1: Verification in a new series of 10 sentences (Exercise 1/1), of which 1 will not contain the same anticipated phenomenon. The listener/interpreter should correctly identify and exclude a sentence that does not anticipate the same phenomenon - the rheme.

Activity 2: Listening to a new series of 10 sentences (Exercise 1/2) and marking the moment of anticipation. Verification by listening the same series of sentences with the already marked moment of anticipation - self-control.

Activity 3: Listening to a new series of 10 sentences from Type 1 - rheme. After acquiring sufficient auditory sensitivity, the learner tries to imitate the exemplary intonation. They record their own production using technical equipment and compare the resulting record with the original oscillogram record.

Exercises of model sheets are created as individual ones, which of course does not prevent the teacher from creatively exchanging them and involving a competitive or comparative element in the activities. At the beginning of the exercise, it is advisable to give priority to exercises with the activities that have a clearly defined goal, which reduces the likelihood that a less experienced teacher or learner will achieve an incorrect result. In proactive didactics, the emphasis is on the learner's ability to succeed, and they need clear and relatively simple instructions. The model sentences serve as support and training for storing and recalling intonation patterns of French from memory automatically, without exerting any special cognitive effort. Training requires some time, which is individual, so group exercises should only focus on acquiring a basic level of skill. Both students of interpreting and professional interpreters are well aware that automating the processes at lower levels of language frees up cognitive capacities to resolve the issues that may occur at higher levels of understanding (imagery, targeted ambiguity, diverted illocutionary value of the discourse, speaker idiolect, etc.) and their own production. and its self-control. Surely they will be motivated to independently improving their phonological competence, which will open up the possibility for them to more readily anticipate possible semantic trajectories and gain time to confront them with their own knowledge in long-term memory, with the propositionary intention of the speaker, with general and situational logic to take the one that will allow them to meaningfully and in intonation proper conclude a sentence that they already have started uttering.

This work was supported by the Slovak Research and Development Agency under the contract No APVV-15-0307.


Assoc. Prof. PhDr. Taida Nováková, CSc.

Assoc. Prof. PhDr. Anna Butašová, CSc.


BARTHELEMY F., GROUX D., PORCHER L. (2011). Le français langue étrangère, Paris, L'Harmattan, Coll. Cent mots pour.

BEACCO J.-C. (2007). L'approche par compétences dans l'enseignement des langues, Paris, Didier.

BERTOCCHINI P., COSTANZO E. (2008). Manuel de formation pratique pour le professeur de FLE, Paris, CLE International.

CHERNOV, G. V. (2004). Inference and Anticipation in Simultaneous Interpreting. Amsterdam/Philadelphia: Benjamins Translation Library.

CLARK, H. E., CLARK, E. J. (1977). Psychology and Language. New York, HBJ.

CORNAIRE, C. (1998). La compréhension orale. Paris, CLE International.

COURTILLON J. (2003). Élaborer un cours de FLE, Paris, Hachette, Coll. F.

CUQ J.-P., GRUCA I. (2002). Cours de didactique du français langue étrangère et seconde, Grenoble, PUG.

DI CRISTO, A. (2013). La prosodie de la parole. De Boeck, Bruxelles.

FERRAND, L. (2004). Nature des codes phonologiques activés au cours de la lecture silencieuse. In: Psycholinguistique cognitive. Bruxelles, De Boeck Université.

GUIDERE, M. (2011). Introduction à la traductologie. Penser la traduction : hier, aujourd'hui, demain. De Boeck, Bruxelles.

GOSS, B. (1982). Listening as onformation processing. Communication Quarterly, 30.4.

LEVELT, W. J. M. (1993). Speaking, from Intention to Articulation. MIT Press, Cambridge.

LHOTE, E. (1995). Enseigner l´oral en interaction. Paris, Hachette.

LUND, R. J. (1991). A comparison of second language listening and reading comprehension. The Modern Language Journal, 75.

LUNDSTEN, S. W. (1979). Listening : its impact on reading and the other language arts. Urbana, IL, Eric Clearing House on Reading and the Oder Language Arts.

MILLS, E. (1974). Listening : Key to Communication. New York, Petrocelli Books.

NAGLE, S. J., SANDERS, S. L. (1986). Comprehension theory and second language pedagogy. TESOL Quarterly, 20.1.

PAĽOVÁ, M. (2018). Transit mnésique en traduction à vue. In Meyer, J-P., Paľová, M., Marsac, F. (dir.) : Consécutivité et simultanéité en Linguistique, Langues et Parole. L´Harmattan, Paris, s. 115 - 125.

PAĽOVÁ, M., ZELEŇÁKOVÁ, M. (2019). O prozodickom anticipačnom náznaku. XLinguae. European Scientific Language Journal, 1/2019.

PAĽOVÁ, M., KIKTOVÁ, E. (2019). Les indices anticipatoires prosodiques et l´activation de la référence en interprétation simultanée. European Scientific Language Journal, 1XL/2019.

ROBERT J-P., ROSEN E. (2011). Faire classe en FLE : Une approche actionnelle et pragmatique, Paris, Hachette.

SOCK, R. et al. (2011). Anticipatory Perception based of Events (APE) hypothesis. International Seminar on Speech Production 2011, Jun 2011, Montreal, Canada.

SOCK, R. - VAXELAIRE, B. (2012). Produire et percevoir la parole. In: Les Sciences de l´Homme et de la Société et la Maison Interuniversitaire des Sciences de l'Homme - Alsace. Strasbourg, MISHA Editions (sous la direction de Ch. Maillard), s. 152-161.

STEVICK, E. (1993). Memory : Old news, bad news, new news, good news. JALT Journal, 15.1.

VISETTI, Y. M. (2004). Anticipations linguistiques et phases du sens. In: Sock, R., Vaxelaire B. (éds): L´anticipation à l´horisont du présent. Hayen: Pierre Mardaga éditeur, s. 33-52.

WITKIN, B. R. (1990). Listening theory and research : The state of the Art. Journal of the International Listening Association, 4.

ZIMMERMANN, J., KIKTOVÁ, E., PAĽOVÁ, M. (2019). Towards tho the Anticipation in Simultaneous Interpreting. CEUR Workshop Proceedings, Vol - 2473, urn:nbn:de:0074-2473-1, ITAT 2019 Information Technologies - Applications and Theory.