Operator article suggestion

A bot to help answer simple user questions

OVERVIEW

In August 2017, Intercom launched Operator - a bot to help simple tasks in business conversations. It helps businesses to ask for contact details when they’re not around; Recommend articles to answer simple questions; And asks for feedback after support teams resolved the conversation.

I designed the interaction for the article suggestions feature, and put the few different Operator features together into one holistic system for launch. After a few weeks since launch, over 1600 customers adopted the features and it helped resolving over 7000 questions automatically.

TEAM

Mark Ryan - Product manager
Elizabeth McGuane - Content Strategist lead
Emma Meehan - Product Researcher
Kostya Gorskiy - Design manager

TIMELINE

Feb 2017 - Aug 2017

LINK

Intercom Operator

Challenge

When we launched our Help Center product (Educate), we’ve built a feature for suggesting articles for support teams to answer the users questions. Sometimes these articles suggestions could be helpful for the end-users, but we didn’t have a way to provide these articles to them when they ask a question through the messenger.

As a result, the end-users need to wait for replies from support team to get their simple questions answered if they are not aware of the Help Center.

The idea of suggesting articles directly to end-users was descoped from the initial product launch because we were not certain about the quality of the suggestions, and how end-users would perceive the idea of a bot in business communication.

After the product launch, we’ve decided to refocus on this product feature and explore the design solutions more exhaustively to ensure that this feature wouldn't harm the overall experience.

Process

‍

Concept design and research

Interaction design

Test & iterations

Design and test an ideal user flow to learn about end-users perception with conversational bots

Uncover technical challenges and explored interaction design solutions

Two rounds of qualitative testing to gather more insights and inform further design decisions

Concept design and research

In the ideal scenario, when someone asks a question, the system would be smart enough to suggest the right answer directly to the users.

Initially we assumed the technology would magically find the right answer and designed for the best case scenario. It’ll only suggest one article directly to end-users if it has a high confidence score.

‍

Hypothesis

We didn’t know how end-users will react to the idea of a bot automated response when they initiate a conversation with a business.
We assumed that any “wrong” answer would make the system look dumb and frustrate the end-users. So we've decided to not show any articles if the confidence is below a high threshold.

LEARNINGS

We ran concept tests on the design prototype to find out how people perceived this interaction. We found that when the experience was very positive when the article was the answer they needed:

About 5 of the 6 participants understood that it was a auto suggestion that came from a system, not from a human.
They understood that the article suggestion was an interim solution before a human could respond.
They thought receiving an article suggestion was helpful to their experience.

However, shortly after implementing this design direction we quickly learnt that this strategy suffers from two main problems:

Cold start problem - The suggestion system learns from past conversations. For some beta customers who didn’t have too much conversations yet, it takes a long time for the system to learn before it could provide highly relevant articles.
Low chance of providing the right answer with a single suggestion - With the way we set the suggestion confidence threshold high and only displaying one result, the occasion where the system could actually provide suggestions is very low. Almost close to zero.

Second round solution

Informed by the learnings, we’ve decided to adjust our approach:

The system would suggest multiple articles instead of a single-shot answer, so that it casts a broader net for success.

We’d lower the suggestion confidence threshold for it to show suggestions more often. We wanted to challenge our initial assumption and see if people are actually OK with receiving articles that are less relevant.

‍

Design challegnes

What would be the best interaction for the system to provide multiple articles suggestion?

Content wise - how may the bot introduce articles knowing that it has a lower rate of accuracy?

Prototype A

Taking inspirations of how Facebook Messenger handles multiple content cards, we could try to introduce a carousel design pattern for the job.

Prototype B

Alternatively, we could introduce a stacked cards pattern that expands to reveal multiple suggestions.

Learnings

As much as we thought people would be familiar with the carousel design pattern, results from the user test suggested that people actually preferred the second prototype, because it allows them to scan all the results quickly.

With regards to the experience of receiving less relevant articles - surprisingly, none of the participant actually really put off by the fact that they received automatic suggestions that don’t answer their questions. In fact, they were pretty positive that they received some attempts of answers while the team wasn’t available for an immediate respond. As long as they understood that their questions will reach the support teams, they actually don’t mind trying self-serve while they wait.

Third round solution

The learnings really gave us a lot of confidence to validate our approach, and we’ve even double down on this route:

The system would adopt a “Hybrid” engine: When it hasn’t learn from much past conversations yet, it’d attempt to extract keywords from the user’s question and suggest results based on a search algorithm. Of course, when it has enough data to use the machine learning algorithm, it would favour the results of that engine.
On top of that, we’re introducing an “Answerability” filter to filter out conversations that can’t be answered by machines. This includes scenarios like when they just send a message to see if the team is around, reporting bugs, asking about very specific questions about their account. With this component, the system could try to focus on only suggesting articles to scenarios that are appropriate.

In terms of design, knowing that being able to scan all the article suggestions quickly provides better user experience, I’ve iterate the design to bundle multiple content card into a single card (Similar to how WeChat delivers multiple content card).

This solution feels much closer to what we wanted to achieve both in terms of design and technical solution.

Suggestion feedback loop

Providing suggestions isn’t the end state of the interaction. We needed a way for users to inform the system if their question has been answered by the suggestions, so that:

The system could automatically close the conversation for the CS team and let them focus on the ones that need more attention.
It could provide opportunity of learning for the system to improve its suggestions over time.

‍

Design challenges

How might we design the most appropriate interaction for asking if people got their answer?
How might we let the system understand which article out of the multiple articles was the correct answer?

‍

I partnered with our PM to come up with multiple design ideas:

Allowing users to rate individual suggestion article card
Dismissing irreverent articles (Similar to how you can dismiss a sponsored tweet)
Posting a “questionnaire” after users read articles

Having evaluated the pros and cons of each approach, we clearly felt that we had a consent on a design principle - We shouldn't let users do the work for us.

Avoid asking users to select which article helped. It’s not their job to try to improve our system.
Ask for permission to close the conversation without directly expose the conversation “Open/Close” state. The users don’t need to know the mechanic behind the curtain.

‍

This is the interaction we went with in the end:

After users opened, read, and closed an article, the bot would wait for few seconds (to make sure they’re not about to read another suggestion) and ask a simple question to see if their question is answered. The UI would then present a Yes/No structured reply options above the reply composer.

If they reply with Yes, we’d count that as a successful suggestion and close the conversation. In the follow up response, the bot will inform users that the conversation is now ended unless they send another message.

If they reply with No, or with any free-form reply, we’d assume they are still seeking support from human, and would therefore show a follow-up response to let them know again when they would expect a reply from the team.

This interaction has been tested well in qualitative research and we were seeing ~56% of response rate consistently.

Behavioural design

Throughout the design process, we (me, Kostya and Elizabeth) put a lot of design thoughts into how the bot should behave. These details aren’t immediately visible, but made the product more considered and behaving with “manners” at a result.

The bot won’t introduces itself with any persona at all. We learnt from pervious studies that people hate “chatting” with a bot. It remains as a passive actor during the conversation.
It won’t interrupt if users are still typing to continually construct their questions.
As soon as a support team member enters the conversation, it would halt the bot flows.
There’s no dead end with the bot. Every interaction with the bot is optional. It would default a fallback to let users know when they should expect a reply from the team.
If users tried to self-serve before they ask a question in the messenger (e.g. read an Help Center article), it should suppress article suggestions.

Launch🤖

After bundling the article suggestion feature with other messaging automation features, we were ready for launch. As the industry was hyped about bots that try to do too much too helpful, our bot focuses on a few things well. It was really exciting for us to get it out there and see how the market reacts to our approach.

Post-launch, we closely monitored the suggestions engagement rate and resolution rate. Overall, we saw a pretty good engagement rate to indicate the success with the user interactions (e.g. ~26% click through rate of suggested articles). The resolution rate isn't as high as we expected, but it gives us a base line on improving our suggestion engine.

7/ Killer example of Intercom keeping humans at the center of support paying off in a big way. Great job!
— Reza Khadjavi (@rezakhadjavi) August 9, 2017

Intercom rocks!! @intercom Thanks so much guys. Operator is sick. pic.twitter.com/EhQZnIvk5q
— Margo G Cleveland (@Margo_Cleveland) August 18, 2017

We use @intercom at @ProductHunt.@jakecrump’s pretty excited about this. 🙌🏼 pic.twitter.com/kM2wd1bSSp
— Ryan Hoover (@rrhoover) August 9, 2017

A great day! @intercom released their Operator bot today 😎 This is exactly what I've been waiting for.
— Scott Underwood (@ThunderwoodSays) August 9, 2017

started to see the power of @intercom's Operator feature last week. helping us help @figmadesign users even quicker! 🙌🏻
— J A S O N (@jscottpearson) August 22, 2017