AI agents are transforming business operations by automating processes, optimizing decision-making, and streamlining actions. Their effectiveness hinges on expert reasoning, enabling smarter planning and efficient execution. Agentic AI applications could benefit from the capabilities of models such as DeepSeek-R1. Built for solving problems that require advanced AI reasoning…
]]>Domain-adaptive pretraining (DAPT) of large language models (LLMs) is an important step towards building domain-specific models. These models demonstrate greater capabilities in domain-specific tasks compared to their off-the-shelf open or commercial counterparts. Recently, NVIDIA published a paper about ChipNeMo, a family of foundation models that are geared toward industrial chip design…
]]>Data curation is the first, and arguably the most important, step in the pretraining and continuous training of large language models (LLMs) and small language models (SLMs). NVIDIA recently announced the open-source release of NVIDIA NeMo Curator, a data curation framework that prepares large-scale, high-quality datasets for pretraining generative AI models. NeMo Curator, which is part of…
]]>Enterprises are using large language models (LLMs) as powerful tools to improve operational efficiency and drive innovation. NVIDIA NeMo microservices aim to make building and deploying models more accessible to enterprises. An important step for building any LLM system is to curate the dataset of tokens to be used for training or customizing the model. However, curating a suitable dataset…
]]>