What you need to know
- TensorRT-LLM is adding OpenAI’s Chat API support for desktops and laptops with RTX GPUs starting at 8GB of VRAM.
- Users can process LLM queries faster and locally without uploading datasets to the cloud.
- NVIDIA pairs this with “Retrieval-Augmented Generation” (RAG), allowing more bespoke LLM use cases.
During Microsoft’s Ignite conference today, NVIDIA announced an update to their TensorRT-LLM, which launched in October. The main announcements today are that the TensorRT-LLM feature is now gaining support for LLM APIs, specifically OpenAI Chat API, which is the most well-known at this point, and also that they have worked to improve performance with TensorRT-LLM to get better performance per token on their GPUs.
There is a tertiary announcement that is quite interesting also. NVIDIA is going to include Retrieval-Augmented Generation with the TensorRT-LLM. This allows an LLM to use an external data source for its knowledge base rather than relying on anything online—a highly demanded feature for AI.
What is TensorRT-LLM?
NVIDIA recently rolled out NVIDIA TensorRT-LLM, an open-source library that allows for local computing of LLMs on NVIDIA hardware. NVIDIA touts this to gain privacy and efficiency when dealing with large datasets or private information. Whether that information is sent through an API like OpenAI’s Chat API is secure. You can learn more about NVIDIA TensorRT-LLM at NVIDIA’s developer site.
The changes announced today to NVIDIA TensorRT-LLM are the addition of OpenAI’s Chat API and performance improvements for previously supported LLMs and AI models like Llama 2 and Stable Diffusion through DirectML enhancements.
This technology and computing can be done locally through NVIDIA’s AI Workbench. This “unified, easy-to-use toolkit allows developers to quickly create, test, and customize pre-trained…
“The adventure of life is to learn. The purpose of life is to grow. The nature of life is to change. The challenge of life is to overcome. The essence of life is to care. The opportunity of like is to serve. The secret of life is to dare. The spice of life is to befriend. The beauty of life is to give.” —William Arthur Ward