**Addressing Markdown Rendering Jank and Response Delay in Large Language Model (LLM) Chatbots**
**Selective Markdown Buffering**
Markdown rendering jank and delays in LLM chatbot responses are two common user experience disruptions. The rendering of syntax fragments as raw text until they form a complete Markdown element results in a jarring visual experience. Additionally, the long response time caused by multiple LLM roundtrips and consultation of external data sources leads to user frustration. In an effort to address both problems, Sidekick has developed a solution that involves buffering Markdown parsing and implementing an event emitter. This allows for the prevention of Markdown rendering jank while streaming the LLM response in real-time.
**Buffering Markdown for Rendering**
Markdown poses a challenge for rendering due to the ambiguity of certain character sequences until a closing character is encountered. For example, a “*” character at the beginning of a line could indicate either emphasis or an unordered list item. To overcome this challenge, Sidekick buffers characters whenever a sequence that could be a Markdown expression is encountered. The buffer is then flushed when an unexpected character is encountered or when the full Markdown element is complete. This buffering process requires the use of a stateful stream processor that consumes characters one-by-one. A Node.js Transform stream is utilized for this purpose.
**Async Content Resolution and Multiplexing**
LLMs provide general human language understanding but lack up-to-date and accurate information. To address this, Sidekick instructs LLMs to consult external tools for additional information. The typical integration involves receiving user input, requesting tool consultation from the LLM, receiving tool responses, and assembling the responses into a final answer. However, this process often leads to delays in response time. Sidekick breaks the tool invocation and output generation out of the main LLM response to allow the initial response to be sent directly to the user, with placeholders that are later populated asynchronously.
**Multiplexing Tool Content**
Rather than making additional requests to populate tool content, Sidekick multiplexes asynchronously-resolved tool content into the main response stream. This allows for a more seamless user experience, as the main LLM response is rendered directly to the user while tool content is rendered separately in placeholder areas. The use of Server-Sent Events enables the multiplexing of multiple response streams into one, treating each stream as a series of named events. The UI then splits the multiplexed response into its components for rendering.
**Tying It All Together**
Asynchronous multiplexing ties back to the Markdown buffering process. Sidekick uses special Markdown links to indicate content that will be resolved asynchronously. These links are referred to as “cards,” and their URLs use the “card:” protocol. The link text provides a paraphrased version of the user’s intent. When encountering these card links, Sidekick’s Markdown parser buffers them and initiates an asynchronous resolution task. The main LLM response and any card content are then multiplexed into a single streamed response, which is split and rendered by the UI.
**Conclusion**
In conclusion, Sidekick’s solution addresses the issues of Markdown rendering jank and response delay in LLM chatbots. By implementing selective Markdown buffering and streaming asynchronously-resolved tool content, Sidekick provides a smoother user experience and reduces waiting time. The use of Server-Sent Events and a stateful stream processor further enhances the efficiency of the chatbot interactions. This approach can serve as inspiration for other developers looking to enhance their own AI chatbot experiences.
This post was written by Ateş Göral, a Staff Developer at Shopify working on Sidekick. You can connect with him on Twitter, GitHub, or visit his website at magnetiq.ca.
GIPHY App Key not set. Please check settings