UX Literature Review / 2024 / DePaul University HCI 450
Linguistics (especially sociolinguistics) and UX are two fields I'm incredibly passionate about, so when I had to write a literature review for HCI 450 at DePaul, I knew I wanted to work at the intersection of the two. After three short months, I ended up with this! Enjoy reading, and please reach out if you'd like to discuss any thoughts :) !!
This literature review examines existing issues with Conversational User Experiences (CUX), as well as how incorporating knowledge of sociolinguistics can work towards rectifying them. CUXs aim to create human-like interactions, but currently fail to adequately process many types of user inputs, such as those by non-native speakers, speakers who code-switch, or in VUI conversations with multiple participants. Interfaces also cannot vary output speech styles to suit different contexts, nor do they code-switch, or consider implications of voice outputs’ accents. These can all result in decreased user satisfaction, confidence, efficiency, trust, and adoption–all of which are necessary for CUX to continue expanding. This review explores how sociolinguistics can be factored into the development of CUXs to best navigate each of these challenges; sociolinguistics explores societal impact on language, through social interactions, dialects, accents, code-switching, and listener preferences of speech patterns. This paper also briefly discusses the implications of technolingualism and how CUXs can play a role in either perpetuating or minimising cultural stereotypes.
Conversational Interfaces, including chatbots and voice interfaces (VUIs), have been gaining popularity, but still lack many features to make users fully trust them and feel understood. Specifically, one field of study that remains to be sufficiently integrated into Conversational User Experience (CUX) design is sociolinguistics. The way we speak, as well as our perception of others’ speech, is shaped by the communities we are raised or living in, the social contexts of our conversations, among many other influences. It is necessary to emphasise that these differences also manifest within speakers of the same language, and not only across language groups. Variations in speech styles are most observable through social interactions, individual speaker styles and registers, accents, dialects, and code-switching. CUXs currently lack adequate support for the processing of both native and non-native varieties of input, and offer a limited range of output styles. As such, it is important to consider sociolinguistic concepts, theories, and methodologies, especially the aforementioned forms of sociolinguistic variation (SLV), when designing CUXs. This would also help account for globalisation and international markets, and increase user trust and confidence, enabling more widespread adoption.
Even with Standard English speakers, CUX is imperfect in how it handles conversations, especially those with multiple participants. Alexa does not specify a speaker in its responses, and as such, any participant is able to chime in. While this may point to a more natural flow, a few things hinder it from reaching its full potential. Koh (2021) notes that speakers tend to share a knowledge schema in terms of what their goal is with Alexa, but Alexa’s repair sequence does not specify the issue in its response (p. 3). This both fails to move the conversation forward, and makes it harder for users to continue sharing a unified schema as they may have conflicting ideas of how to resolve the situation. This is to say, voice interfaces currently lack the ability to understand or account for context that would be necessary for efficiency and human-likeness. If VUIs do improve on this, it is also worth considering, then, that each participant in a given context may have a different way of speaking, and so VUIs need to be able to process all types of inputs, and provide outputs that are effective in various social contexts.
Porcheron et al. (2018) voice similar concerns, but strongly believe voice interfaces are not at all interactional, and are merely embedded into users’ conversations (p. 9). Accordingly, they simply suggest improving prompt phrasing and feedback. Koh, however, is less dismissive of the potential of VUIs as conversationalists and highlights that many CUX researchers do aim to create naturalistic conversations (Koh, 2021, p. 4). While both share the suggestion of improving prompt phrasing, Koh also notes that utilising interactional sociolinguistic (IS) methodologies in the development of CUXs would be beneficial to CUX researchers’ goal; doing so could help enable Alexa to detect the presence of multiple participants and interact accordingly, and guide users to share knowledge schemas.
As the CUX market continues to grow, designers should incorporate IS methodologies in order to improve systems’ understandings of how different user groups might speak in social contexts. This, in turn, would 1) help single-user conversations flow more naturally (when considering the CUX as the second participant in the interaction), and 2) allow for regional and community preferences of discourse norms to be taken into account, improving global user experiences. However, for IS to make the strongest impact on CUX, these interfaces must first ensure adequate input processing for users with varying levels of proficiency in the system’s language, which they currently lack.
Cihan et al. (2022) discusses users’ need to use their second languages when engaging with conversational interfaces due to the lack of comprehensive range of language support on most CUXs. This also makes it harder for users to retrieve the appropriate vocabulary, as it is less frequently used and makes the interactions require more mental effort. As such, they suggest that CUXs not only understand code-switching, but are able to code-switch themselves as well. While they do not conduct a study to validate both the need for and effectiveness of this feature, the following studies examine each respectively.
Pyae and Scifleet (2019) researched Google Home experiences of users with varying English proficiency levels. They found that native English speakers had more positive experiences overall, and found the tool easy to learn and use (Pyae and Scifleet, 2019, p. 4). However, non-native English speakers found it challenging to make Google Home understand their phrases as intended due to the lexicon they were able to access, their sentence structures, and pronunciations. Even so, “both native English and non-native English speakers are interested in using it and they both suggested the potential of it to be used in different contexts” (Pyae & Scifleet, 2019, p. 5).
Similarly, Parekh et al. (2020) explored code-switching bilingual users’ experiences with chatbots, specifically with Hinglish. They noted that users code-switch between English and Hindi without prompting, even if the chatbots only spoke in a single language. However, their task success and utterance lengths were higher and longer when chatbot responses did incorporate code-switching and were more informal.
It is clear that non-native English speakers have an interest in using CUXs, but may face challenges in doing so efficiently, and accounting for bilingualism and code-switching would minimise these. Currently, however, many CUXs do not support users’ native languages, nor bilingualism and code-switching, and simultaneously struggle to accurately understand non-native speakers’ utterances, leaving many unable to use CUXs as intended by designers.
Beyond CUXs’ processing of users’ speech, the way the interfaces speak themselves, and what they say, also impacts users’ satisfaction. Chaves et al. (2022) found that how information is presented to users–that is, the CUX’s linguistic style, register, and features– influences their acceptance of it, and their perception of its credibility and competence. Users’ preference for the way information is presented is also dependent on whether they are prioritising efficiency, human-likeness, or personalisation in that interaction. Additionally, Ferland and Koutsaal’s (2020) study revealed that users’ confidence in and authenticity towards CUXs is higher when the interface does not disclose what it knows about users.
These studies both demonstrate that for users to have a positive perspective of and accept a CUX, it is important to consider the content and style of the CUX’s speech in different contexts. The significance of this would likely hold true for international audiences as well, even if the specific preferences vary, but for a global context, it is also necessary to introduce the linguistic concept of prestige. Linguistic prestige is related to the variety of speech that communities hold in high regard, or even view as the most (or only) ‘correct’ way to speak. The next section explores the application of this concept in CUXs.
There is no single influence that defines the language, accent, or dialect with prestige in a community; it is often influenced by the region’s colonial history, the present and historical social class structure, and an individual’s experiences. Sutton et al. (2019) phrases it as such: “how voices are perceived by people can be highly dependent on their own personal histories, and the social and cultural contexts where a voice is heard. Voices can be grounded in stereotypes, prejudices, and speech ideologies, and can give emphasis to certain social groups and cultural identities over others” (p. 2).
Naas and Brave’s (2005, as cited in Niculescu et al., 2008) study found that Caucasian and Korean Americans prefer VUIs to have accents and speech styles similar to those of their own backgrounds. They note that for Korean Americans, Korean accents were “a strong mark of similar socio-cultural background evoking familiarity in geographical context where their parents were foreigners” (Naas and Brave, 2005, as cited in Niculescu et al., 2008, p. 524). However, Niculescu et al. (2008) replicated this study with Singaporeans and discovered the opposite; Singaporeans prefer VUIs to have British accents, not Singaporean, and rank British English VUIs higher in terms of politeness, voice quality, dialogue easiness, and trustworthiness. They propose that this may be due to Singlish being perceived as a casual, indigenised variety discouraged from use by the government, in addition to the British colonial history in Singapore.
These two studies demonstrate that regions, communities, and individuals can have different preferences for speech styles based on their backgrounds and environments. Accordingly, Sutton et al. (2019) suggests that CUX, specifically VUI, outputs have the potential to benefit from individualisation and context-awareness; this could mean giving users the ability to manipulate specific phonetic components, completing surveys to suggest voice output styles, and location-based or activity-aware accents. They also emphasise diversification as a solution, which is elaborated on in the following section.
This review discusses both the processing ability and outputs of CUXs, what they currently lack, and how they can be more effective and globally inclusive. Sutton et al. (2019) believe that “VUIs have achieved such a level of technical capability that attention can move towards considering aesthetics [i.e. outputs rather than processing] in VUI design. And so, now would be an appropriate time to explore relevant knowledge from other disciplines” (p. 11). While CUXs might still be imperfect in processing user inputs, even with native speakers, I think it is reasonable to improve processing and outputs concurrently; ensuring both that users trust and are understood by CUXs early on is necessary for its continued adoption, and incorporating sociolinguistic concepts into design decisions would aid in doing so.
However, with regards to outputs, it is worth noting Pfrehm’s concept of technolingualism, which is that “technology both shapes and is shaped by language” (Pfrehm, 2018, as cited in Sutton et al., 2019, p. 8). Accordingly, Sutton et al. (2019) cautions inadvertently perpetuating stereotypical or exaggerated voices, or limiting users’ exposure to their local environments. They suggest diversification in one of four ways: suggested voices that do not activate a conscious response in users; providing voice options users have no associations with; creating new accents; or reducing VUI human-likeness.
Ultimately, sociolinguistic concepts and methodologies offer the most well defined route to sufficiently accommodating international audiences when developing for CUX input processing. Incorporating it for CUX outputs will be especially beneficial early on to ensure user trust and adoption, but in the long term, it will be equally necessary to develop with the reciprocal nature of CUXs and cultural prejudices in mind.
As CUXs have been gaining more widespread adoption and will only continue to grow, it is essential to consider sociolinguistic needs, especially of international audiences, to increase the amount of users that are able to efficiently utilise these tools. Additionally, this would increase the trust users have in CUX outputs, and allow users to feel more confident and comfortable in using them. Individuals are a product of their background and their environment, and sociolinguistics and SLV would help account for the wide array of user speech styles and preferences. Designers should improve CUX processing with multiple users involved, understanding various accents, and supporting additional languages or code-switching; as well as accounting for regional prestige, linguistic style, and register in outputs.
Chaves, A. P., Egbert, J., Hocking, T., Doerry, E., & Gerosa, M. A. (2022). Chatbots language design: The influence of language variation on user experience with tourist assistant Chatbots. ACM Transactions on Computer-Human Interaction, 29(2), 1–38. https://doi.org/10.1145/3487193
Cihan, H., Wu, Y., Peña, P., Edwards, J., & Cowan, B. (2022). Bilingual by default: voice assistants and the role of code-switching in creating a bilingual user experience. Proceedings of the 4th Conference on Conversational User Interfaces. https://doi.org/10.1145/3543829.3544511
Ferland, L., & Koutstaal, W. (2020). How’s your day look? The (un)expected sociolinguistic effects of user modeling in a conversational agent. Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3334480.3375227
Koh, J. (2021). Discourse analysis in voice user interface research. CUI ’21: Proceedings of the 3rd Conference on Conversational User Interfaces. https://doi.org/10.1145/3469595.3469622
Niculescu, A., White, G. M., Lan, S. S., Waloejo, R. U., & Kawaguchi, Y. (2008). Impact of English regional accents on user acceptance of voice user interfaces. Proceedings of the 5th Nordic Conference on Human-Computer Interaction: Building Bridges. https://doi.org/10.1145/1463160.1463235
Parekh, T., Ahn, E., Tsvetkov, Y., & Black, A. W. (2020). Understanding linguistic accommodation in code-switched human-machine dialogues. Proceedings of the 24th Conference on Computational Natural Language Learning, 567–577. https://doi.org/10.18653/v1/2020.conll-1.46
Porcheron, M., Fischer, J. E., Reeves, S., & Sharples, S. (2018). Voice interfaces in everyday life. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3173574.3174214
Pyae, A., & Scifleet, P. (2019). Investigating the role of user’s English language proficiency in using a voice user interface. Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3290607.3313038
Sutton, S. J., Foulkes, P., Kirk, D., & Lawson, S. (2019). Voice as a design material: sociophonetic inspired design strategies in human-computer interaction. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3290605.3300833