AI Platforms / Deployment – NVIDIA Technical Blog

AI Platforms / Deployment – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-02-28T18:11:54Z https://developer.nvidia.com/blog/feed/ Douglas Moore <![CDATA[Accelerate Medical Imaging AI Operations with Databricks Pixels 2.0 and MONAI]]> https://developer.nvidia.com/blog/?p=96530 2025-02-28T18:11:54Z 2025-02-28T18:11:50Z

According to the World Health Organization (WHO), 3.6 billion medical imaging tests are performed every year globally to diagnose, monitor, and treat various...]]>

According to the World Health Organization (WHO), 3.6 billion medical imaging tests are performed every year globally to diagnose, monitor, and treat various conditions. Most of these images are stored in a globally recognized standard called DICOM (Digital Imaging and Communications in Medicine). Imaging studies in DICOM format are a combination of unstructured images and structured metadata.

Source

]]> Anu Srivastava <![CDATA[Latest Multimodal Addition to Microsoft Phi SLMs Trained on NVIDIA GPUs]]> https://developer.nvidia.com/blog/?p=96519 2025-02-28T17:13:38Z 2025-02-26T22:05:00Z

Large language models (LLMs) have permeated every industry and changed the potential of technology. However, due to their massive size they are not practical...]]>

Large language models (LLMs) have permeated every industry and changed the potential of technology. However, due to their massive size they are not practical for the current resource constraints that many companies have. The rise of small language models (SLMs) bridge quality and cost by creating models with a smaller resource footprint. SLMs are a subset of language models that tend to…

]]> Charu Chaubal <![CDATA[NVIDIA AI Enterprise Adds Support for NVIDIA H200 NVL]]> https://developer.nvidia.com/blog/?p=96424 2025-02-24T22:37:49Z 2025-02-24T22:37:47Z

NVIDIA AI Enterprise is the cloud-native software platform for the development and deployment of production-grade AI solutions. The latest release of the NVIDIA...]]>

NVIDIA AI Enterprise is the cloud-native software platform for the development and deployment of production-grade AI solutions. The latest release of the NVIDIA AI Enterprise infrastructure software collection adds support for the latest NVIDIA data center GPU, NVIDIA H200 NVL, giving your enterprise new options for powering cutting-edge use cases such as agentic and generative AI with some of the…

Source

]]> Sama Bali <![CDATA[Transforming Product Design Workflows in Manufacturing with Generative AI]]> https://developer.nvidia.com/blog/?p=96242 2025-02-21T17:42:04Z 2025-02-20T19:32:11Z

Traditional design and engineering workflows in the manufacturing industry have long been characterized by a sequential, iterative approach that is often...]]>

Traditional design and engineering workflows in the manufacturing industry have long been characterized by a sequential, iterative approach that is often time-consuming and resource intensive. These conventional methods typically involve stages such as requirement gathering, conceptual design, detailed design, analysis, prototyping, and testing, with each phase dependent on the results of previous…

]]> Ram Cherukuri <![CDATA[Spotlight: BRLi and Toulouse INP Develop AI-Based Flood Models Using NVIDIA Modulus]]> https://developer.nvidia.com/blog/?p=95990 2025-02-20T15:53:17Z 2025-02-13T21:00:00Z

Flooding poses a significant threat to 1.5 billion people, making it the most common cause of major natural disasters. Floods cause up to $25 billion in global...]]>

Flooding poses a significant threat to 1.5 billion people, making it the most common cause of major natural disasters. Floods cause up to $25 billion in global economic damage every year. Flood forecasting is a critical tool in disaster preparedness and risk mitigation. Numerical methods have long been developed that provide accurate simulations of river basins. With these, engineers such as those…

]]> Emily Potyraj <![CDATA[NVIDIA DGX Cloud Introduces Ready-To-Use Templates to Benchmark AI Platform Performance]]> https://developer.nvidia.com/blog/?p=95558 2025-02-20T15:54:23Z 2025-02-11T17:00:00Z

In the rapidly evolving landscape of AI systems and workloads, achieving optimal model training performance extends far beyond chip speed. It requires a...]]>

In the rapidly evolving landscape of AI systems and workloads, achieving optimal model training performance extends far beyond chip speed. It requires a comprehensive evaluation of the entire stack, from compute to networking to model framework. Navigating the complexities of AI system performance can be difficult. There are many application changes that you can make…

]]> Pranav Marathe <![CDATA[Just Released: Tripy, a Python Programming Model For TensorRT]]> https://developer.nvidia.com/blog/?p=95947 2025-02-10T17:08:43Z 2025-02-10T17:08:40Z

Experience high-performance inference, usability, intuitive APIs, easy debugging with eager mode, clear error messages, and more.]]>

Experience high-performance inference, usability, intuitive APIs, easy debugging with eager mode, clear error messages, and more.

]]> Isabel Hulseman <![CDATA[New NVIDIA AI Blueprint: Build a Customizable RAG Pipeline]]> https://developer.nvidia.com/blog/?p=95614 2025-02-13T20:44:16Z 2025-01-30T22:26:12Z

Connect AI applications to enterprise data using embedding and reranking models for information retrieval.]]>

Connect AI applications to enterprise data using embedding and reranking models for information retrieval.

]]> Martin Cimmino <![CDATA[Continued Pretraining of State-of-the-Art LLMs for Sovereign AI and Regulated Industries with iGenius and NVIDIA DGX Cloud]]> https://developer.nvidia.com/blog/?p=95012 2025-01-23T19:54:22Z 2025-01-16T12:00:00Z

In recent years, large language models (LLMs) have achieved extraordinary progress in areas such as reasoning, code generation, machine translation, and...]]>

In recent years, large language models (LLMs) have achieved extraordinary progress in areas such as reasoning, code generation, machine translation, and summarization. However, despite their advanced capabilities, foundation models have limitations when it comes to domain-specific expertise such as finance or healthcare or capturing cultural and language nuances beyond English.

]]> Sama Bali <![CDATA[GPU Memory Essentials for AI Performance]]> https://developer.nvidia.com/blog/?p=94979 2025-01-23T19:54:24Z 2025-01-15T16:00:00Z

Generative AI has revolutionized how people bring ideas to life, and agentic AI represents the next leap forward in this technological evolution. By leveraging...]]>

Generative AI has revolutionized how people bring ideas to life, and agentic AI represents the next leap forward in this technological evolution. By leveraging sophisticated, autonomous reasoning and iterative planning, AI agents can tackle complex, multistep problems with remarkable efficiency. As AI continues to revolutionize industries, the demand for running AI models locally has surged.

]]> Dror Goldenberg <![CDATA[Powering the Next Wave of DPU-Accelerated Cloud Infrastructures with NVIDIA DOCA Platform Framework]]> https://developer.nvidia.com/blog/?p=94889 2025-01-23T19:54:26Z 2025-01-13T17:30:25Z

Organizations are increasingly turning to accelerated computing to meet the demands of generative AI, 5G telecommunications, and sovereign clouds. NVIDIA has...]]>

Organizations are increasingly turning to accelerated computing to meet the demands of generative AI, 5G telecommunications, and sovereign clouds. NVIDIA has unveiled the DOCA Platform Framework (DPF), providing foundational building blocks to unlock the power of NVIDIA BlueField DPUs and optimize GPU-accelerated computing platforms. Serving as both an orchestration framework and an implementation…

]]> Zeeshan Patel <![CDATA[Accelerate Custom Video Foundation Model Pipelines with New NVIDIA NeMo Framework Capabilities]]> https://developer.nvidia.com/blog/?p=94541 2025-02-04T19:34:45Z 2025-01-07T16:00:00Z

Generative AI has evolved from text-based models to multimodal models, with a recent expansion into video, opening up new potential uses across various...]]>

Generative AI has evolved from text-based models to multimodal models, with a recent expansion into video, opening up new potential uses across various industries. Video models can create new experiences for users or simulate scenarios for training autonomous agents at scale. They are helping revolutionize various industries including robotics, autonomous vehicles, and entertainment.

]]> Charu Chaubal <![CDATA[New Whitepaper: NVIDIA AI Enterprise Security]]> https://developer.nvidia.com/blog/?p=94475 2024-12-20T20:56:54Z 2024-12-20T00:41:33Z

This white paper details our commitment to securing the NVIDIA AI Enterprise software stack. It outlines the processes and measures NVIDIA takes to ensure...]]>

This white paper details our commitment to securing the NVIDIA AI Enterprise software stack. It outlines the processes and measures NVIDIA takes to ensure container security.

]]> Michelle Horton <![CDATA[Top Posts of 2024 Highlight NVIDIA NIM, LLM Breakthroughs, and Data Science Optimization]]> https://developer.nvidia.com/blog/?p=93566 2024-12-16T18:34:16Z 2024-12-16T18:34:14Z

2024 was another landmark year for developers, researchers, and innovators working with NVIDIA technologies. From groundbreaking developments in AI inference to...]]>

2024 was another landmark year for developers, researchers, and innovators working with NVIDIA technologies. From groundbreaking developments in AI inference to empowering open-source contributions, these blog posts highlight the breakthroughs that resonated most with our readers. NVIDIA NIM Offers Optimized Inference Microservices for Deploying AI Models at Scale Introduced in…

]]> 0 Michelle Horton <![CDATA[Time-Lapse AI Model Enhances IVF Embryo Selection]]> https://developer.nvidia.com/blog/?p=93767 2024-12-18T16:38:55Z 2024-12-12T17:29:22Z

Researchers from Weill Cornell Medicine have developed an AI-powered model that could help couples undergoing in vitro fertilization (IVF) and guide...]]>

Researchers from Weill Cornell Medicine have developed an AI-powered model that could help couples undergoing in vitro fertilization (IVF) and guide embryologists in selecting healthy embryos for implantation. Recently published in Nature Communications, the study presents the Blastocyst Evaluation Learning Algorithm (BELA). This state-of-the-art deep learning model evaluates embryo quality and…

]]> Amr Elmeleegy <![CDATA[Spotlight: Perplexity AI Serves 400 Million Search Queries a Month Using NVIDIA Inference Stack]]> https://developer.nvidia.com/blog/?p=93396 2025-02-10T17:25:21Z 2024-12-05T17:58:43Z

The demand for AI-enabled services continues to grow rapidly, placing increasing pressure on IT and infrastructure teams. These teams are tasked with...]]>

The demand for AI-enabled services continues to grow rapidly, placing increasing pressure on IT and infrastructure teams. These teams are tasked with provisioning the necessary hardware and software to meet that demand while simultaneously balancing cost efficiency with optimal user experience. This challenge was faced by the inference team at Perplexity AI, an AI-powered search engine that…

]]> Michelle Horton <![CDATA[How AI is Making Climate Modeling Faster, Greener, and More Accurate]]> https://developer.nvidia.com/blog/?p=93000 2024-12-12T19:35:22Z 2024-12-04T18:00:00Z

Christopher Bretherton, Senior Director of Climate Modeling at the Allen Institute for AI (AI2), highlights how AI is revolutionizing climate science. In this...]]>

Christopher Bretherton, Senior Director of Climate Modeling at the Allen Institute for AI (AI2), highlights how AI is revolutionizing climate science. In this NVIDIA GTC 2024 session, Bretherton presents advancements in machine learning-based emulators for predicting regional climate changes and precipitation extremes. These tools accelerate climate modeling, making it faster, more efficient…

]]> Vega Shah <![CDATA[In-Silico Antibody Development with AlphaBind Using NVIDIA BioNeMo and AWS HealthOmics]]> https://developer.nvidia.com/blog/?p=92757 2024-12-12T19:38:30Z 2024-12-03T18:00:00Z

Antibodies have become the most prevalent class of therapeutics, primarily due to their ability to target specific antigens, enabling them to treat a wide range...]]>

Antibodies have become the most prevalent class of therapeutics, primarily due to their ability to target specific antigens, enabling them to treat a wide range of diseases, from cancer to autoimmune disorders. Their specificity reduces the likelihood of off-target effects, making them safer and often more effective than small-molecule drugs for complex conditions. As a result…

]]> Carl (Izzy) Putterman <![CDATA[TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x]]> https://developer.nvidia.com/blog/?p=92847 2025-01-11T17:32:51Z 2024-12-02T23:09:43Z

NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that...]]>

NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that provides blazing-fast inference support for numerous popular large language models (LLMs) on NVIDIA GPUs. By adding support for speculative decoding on single GPU and single-node multi-GPU, the library further expands its supported…

]]> 3 Amr Elmeleegy <![CDATA[NVIDIA TensorRT-LLM Multiblock Attention Boosts Throughput by More Than 3x for Long Sequence Lengths on NVIDIA HGX H200]]> https://developer.nvidia.com/blog/?p=92591 2024-12-12T19:47:20Z 2024-11-22T00:53:18Z

Generative AI models are advancing rapidly. Every generation of models comes with a larger number of parameters and longer context windows. The Llama 2 series...]]>

Generative AI models are advancing rapidly. Every generation of models comes with a larger number of parameters and longer context windows. The Llama 2 series of models introduced in July 2023 had a context length of 4K tokens, and the Llama 3.1 models, introduced only a year later, dramatically expanded that to 128K tokens. While long context lengths allow models to perform cognitive tasks…

]]> 1 Bethann Noble <![CDATA[Deploying Fine-Tuned AI Models with NVIDIA NIM]]> https://developer.nvidia.com/blog/?p=91696 2024-12-17T00:07:21Z 2024-11-21T22:04:57Z

For organizations adapting AI foundation models with domain-specific data, the ability to rapidly create and deploy fine-tuned models is key to efficiently...]]>

For organizations adapting AI foundation models with domain-specific data, the ability to rapidly create and deploy fine-tuned models is key to efficiently delivering value with enterprise generative AI applications. NVIDIA NIM offers prebuilt, performance-optimized inference microservices for the latest AI foundation models, including seamless deployment of models customized using parameter…

]]> Amr Elmeleegy <![CDATA[5x Faster Time to First Token with NVIDIA TensorRT-LLM KV Cache Early Reuse]]> https://developer.nvidia.com/blog/?p=91625 2024-11-14T17:10:41Z 2024-11-08T23:55:43Z

In our previous blog post, we demonstrated how reusing the key-value (KV) cache by offloading it to CPU memory can accelerate time to first token (TTFT) by up...]]>

In our previous blog post, we demonstrated how reusing the key-value (KV) cache by offloading it to CPU memory can accelerate time to first token (TTFT) by up to 14x on x86-based NVIDIA H100 Tensor Core GPUs and 28x on the NVIDIA GH200 Superchip. In this post, we shed light on KV cache reuse techniques and best practices that can drive even further TTFT speedups. LLM models are rapidly…

]]> Anton Korzh <![CDATA[3x Faster AllReduce with NVSwitch and TensorRT-LLM MultiShot]]> https://developer.nvidia.com/blog/?p=91412 2024-11-14T17:10:52Z 2024-11-01T22:00:36Z

Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands – and where input...]]>

Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands – and where input sequence lengths differ with each request – poses unique challenges. To achieve low latency inference in these environments, multi-GPU setups are a must – irrespective of the GPU generation or its memory capacity. To enhance inference performance in…

]]> Charu Chaubal <![CDATA[Enhanced Security and Streamlined Deployment of AI Agents with NVIDIA AI Enterprise]]> https://developer.nvidia.com/blog/?p=90647 2024-11-27T18:39:53Z 2024-10-29T16:00:00Z

AI agents are emerging as the newest way for organizations to increase efficiency, improve productivity, and accelerate innovation. These agents are more...]]>

AI agents are emerging as the newest way for organizations to increase efficiency, improve productivity, and accelerate innovation. These agents are more advanced than prior AI applications, with the ability to autonomously reason through tasks, call out to other tools, and incorporate both enterprise data and employee knowledge to produce valuable business outcomes. They’re being embedded into…

]]>