The Advent of Chat User Interfaces

A picture of a womans face made out of blue-violet cubes. Artificial mind.

Picture a morning where your first conversation is with your computer, discussing your schedule, organising your tasks, and even composing your emails — all without you typing a single word. Can you imagine cutting your daily screen time in half, thanks to a digital assistant that doesn't just understand your words, but comprehends your intentions, too? This isn't a scene from a sci-fi movie; it's a reality that's right around the corner.

The Past

In the early days of computing, talking to machines was tricky. The first ever user interface was binary code. Humans were using only 0s and 1s to meticulously craft instructions passed to the computer. Operating systems made things a bit easier by enabling interactions through a terminal interface on a monochromatic screen. In those days, computers were tools that only a few skilled individuals could effectively operate.

Graphical user interfaces (GUIs) brought the computers to the masses. Interacting with software meant navigating through complex menus, deciphering technical jargon, and endless clicking towards a goal. GUIs, although increasingly sophisticated, often created barriers between technology and the users, sometimes requiring a learning curve to efficiently operate the software. To bridge this gap, software designers had to constantly invest significant resources into user experience research and employ methodologies like A/B testing to optimise the usability of the software. Such practice continues to this day.

In early 2023, a completely new way of interacting with software invaded the collective consciousness of humankind. ChatGPT changed the game by introducing a way for humans to have smooth, sensible conversations with a computer. Instead of using buttons, sliders, and drop-down menus, users were greeted by a friendly, textual interface, inviting them to simply type their thoughts and queries using natural language.

The interface of a chat was not new. We've been using it for years in our messaging apps. We've also seen chatbots before. But ChatGPT was the first AI able to very closely resemble a conversation with a human being. It could understand natural language, learn facts from user messages, and generate coherent, human-like responses. It opened up a world of new possibilities for both casual users and software developers that could integrate large language models (LLMs) into their programs.

The Present

Traditional user interfaces, especially those developed in the last few years, offer a rich, dynamic, and visually engaging user experience. These interfaces are able to convey complex information through visuals and incorporate high levels of interactivity with the user. Animation and other audio-visual feedback techniques create a sense of immediate responsiveness as the user interacts with the software.

Development of modern UIs is hard, even with whole teams of designers and software developers on the job. Since the interfaces rely on visual representation of both the underlying state and modifications of this state, the usability of the UI is always at odds with the complexity of the system. The more complex systems we create, the more of a challenge it is to visualise the information in a clear and intuitive way.

"The real problem with the interface is that it is an interface."

~ Don Norman

Chat interfaces, at their most basic level, use text messages. This simplicity makes them incredibly efficient and accessible. They offer a more intuitive mode of communication, requiring little to no learnability, which makes them much more efficient. It's almost like chatting with a human being, except the counterpart is actually an AI.

Compared to traditional UIs, conversational interfaces bring a fundament change in how humans interact with software. And we need to implement this change almost everywhere.

The Future

AI systems that can operate on digital resources, interact with other software systems, physical environment, or human beings, and perform tasks with some degree of autonomy are often referred to as AI Agents. A chat interface is a perfect medium through which AI agents can receive commands from human users. Interacting with AI agents through chat will be like having a clever robot that understands our language and handles complex software tasks for us, without us having to lift a finger. The chat will become the interface to everything.

A natural evolution from chat UIs is a voice UI. By integrating voice recognition and voice synthesis, AI agents will interpret vocal commands and queries that come from users. This will result in a hands-free user experience, increasing the accessibility of conversational UIs and making the conversations even more similar to human interactions.

AI agents of the future will be fully autonomous. Instead of waiting for our instructions, they will make choices of their own and act on them. They will watch over our digital assets, our communication channels and even our vital signs. They will let us know what's going on and if we won't like what they're planning, we'll just tell them to stop or to adjust their course of action. The autonomy of AI agents will provide human beings with a tremendous leverage and freedom.

To express complex ideas, AI agents will be able to draw images, create 3D visuals, and show them to humans as part of the chat experience. Such capability, with high-enough quality, will be superior to any traditional UI. The AI will be able to tailor the visual representation to specific user needs and create many different, on-demand visualisations of the same idea.

A picture is worth a thousand… text messages?

Communication with rich visuals will work both ways. Augmented reality (AR) and holographic imaging will enable manipulation of images and 3D models by using hand gestures. Those hand-modified visuals will be immediately "seen" and reacted to by the AI. It will be like having a magical 3D canvas where both you and the computer can draw together and talk about what you see at the same time.

Traditional UIs will not die completely. They will always be needed in some software domains. What will radically change is how they are created. Interfaces will be designed mostly by AI, very often on-demand. AI agents will analyse patterns in the commands they receive and design traditional UIs that increase efficiency of the processes we commonly engage with. In many areas, traditional UIs will become optional.

A future where conversational interfaces are both powerful and ubiquitous is very exciting, but for the change to actually take place, we have many challenges yet to be overcome.

The Challenge

If words are the only communication medium, the user must be fluent in using words. They must be able to precisely state their desires and, before that, they must actually know what they want. If any of those conditions are not met, AI agents will either not be able to execute commands or will do things different from the actual wishes of the user.

Traditional UIs offer the possibility of learning by trying things out or following tutorials and in-context instructions. For a system with a chat interface to reach a similar quality in regards to explorativeness, the underlying AI needs to be aware of the capabilities of the system. The conversational agents need to be able to guide the users through available commands and help them decide or formulate precise instructions.

For an AI system to be effective in audio-visual communication, the underlying technology must be multi-modal. A single AI model must be able to understand, process, and generate content in multiple forms, such as text, audio, images, and video. Multi-modal AI will interpret vocal tone, facial expressions, and even the surrounding context from the user's environment. This depth of understanding, along with use of augmented reality or holographic technology, will enable rich communication between humans and AI agents. With this, conversational interfaces will get much more efficient than natural human interactions.

Streaming of messages and assets generated by the AI poses a technical challenge. To maintain a natural conversation flow, messages have to be processed, understood, and responded to in a manner that seems instantaneous. The problem multiplies when the an AI system needs to provide rich multimedia content such as images, videos, 3d models or interactive elements. AI systems must ensure seamless delivery of these assets without hampering the user experience.

Chat interfaces must be engineered in a way that allows users to peek into the 'mind' of the AI and understand how and why particular decisions were made. This kind of observability is needed to build trust and enable users to comprehend AI's strengths and weaknesses. Providing meaningful insights into the AI's operations while not overwhelming the users at the same time demands for innovative UI design.

The Ethics

In a future where powerful AI Agents are used through conversational interfaces, the rights and the wrongs of technology become a major software design factor and a serious topic for debate. We need to make sure that AI is ultimately fair and benevolent for human beings, carefully considering all aspects of our complex existence, both on the level of the individual and the humanity as a whole.

Responsible use of AI is a topic which deserves an article on its own. Many of the ethical concerns, like information accuracy, bias reinforcement, technology misuse or AI monopoly, are essentially connected with the underlying LLMs and AI agents. Only a few of the considerations directly affect conversational interfaces.

  • Privacy: It's essential to keep conversations private and secure. The misuse of personal information might result in identity theft, financial loss, and potentially impact mental health of the users.
  • Inclusivity: Ethical chat interfaces should cater to all demographics. Those with accessibility needs must be able to interact with AI efficiently and enjoyably. Neglecting to create universally accessible UIs will perpetuate digital divides and amplify existing inequalities.
  • Transparency: Users must never be led to believe that they are interacting with a human being while in reality they are having a conversation with AI. Any kind of manipulation or deception risks eroding user trust and can enable misuse.

Privacy, inclusivity, and transparency might not be the only ethical aspects of conversational interfaces that AI developers will need to consider, but they provide a good basis. With rapid advancement of AI technology, we should expect many mistakes to be made. By including ethical considerations in the design of AI systems we expect to decrease the probability of outcomes that will be individually fatal or socially disastrous.

The End

We've journeyed from using formal computer languages, through clicking buttons on screens, to having friendly chats with computers. Now, we're entering a world where talking to technology might be as normal as talking to our friends. The future will likely see a blend of traditional and conversational interfaces, tailored to the task at hand, the complexity of the system, and the needs of the user.

The challenge is to make communication in this new paradigm not just helpful but also easy and enjoyable. A future where technology isn't just smart but also kind and inclusive is a good one. We must focus on making sure that as our digital helpers get smarter, they also become our allies.


This article was originally published on Maciej's Medium [medium.com].