Mehran Maghoumi – NVIDIA Technical Blog

Build an AI Agent with Expert Reasoning Capabilities Using the DeepSeek-R1 NIM

2025-02-28T20:23:54Z

AI agents are transforming business operations by automating processes, optimizing decision-making, and streamlining actions. Their effectiveness hinges on expert reasoning, enabling smarter planning and efficient execution. Agentic AI applications could benefit from the capabilities of models such as DeepSeek-R1. Built for solving problems that require advanced AI reasoning…

Source

Streamlining Data Processing for Domain Adaptive Pretraining with NVIDIA NeMo Curator

2024-10-18T20:11:21Z

Domain-adaptive pretraining (DAPT) of large language models (LLMs) is an important step towards building domain-specific models. These models demonstrate greater capabilities in domain-specific tasks compared to their off-the-shelf open or commercial counterparts. Recently, NVIDIA published a paper about ChipNeMo, a family of foundation models that are geared toward industrial chip design…

Source

Curating Custom Datasets for LLM Parameter-Efficient Fine-Tuning with NVIDIA NeMo Curator

2024-10-18T20:13:02Z

Source

Curating Custom Datasets for LLM Training with NVIDIA NeMo Curator

2024-10-18T20:14:38Z

Data curation is the first, and arguably the most important, step in the pretraining and continuous training of large language models (LLMs) and small language models (SLMs). NVIDIA recently announced the open-source release of NVIDIA NeMo Curator, a data curation framework that prepares large-scale, high-quality datasets for pretraining generative AI models. NeMo Curator, which is part of…

Source

Scale and Curate High-Quality Datasets for LLM Training with NVIDIA NeMo Curator

2025-02-17T05:28:15Z

Enterprises are using large language models (LLMs) as powerful tools to improve operational efficiency and drive innovation. NVIDIA NeMo microservices aim to make building and deploying models more accessible to enterprises. An important step for building any LLM system is to curate the dataset of tokens to be used for training or customizing the model. However, curating a suitable dataset…

Source