Documents
What are the hardware requirements and best practices for running local LLMs (AI) for advanced coding and statistical analysis tasks (e.g., R monographs, complex Python pipelines) on Bluefin, and how do factors like model size, VRAM, RAM, and context window affect performance?
What are the hardware requirements and best practices for running local LLMs (AI) for advanced coding and statistical analysis tasks (e.g., R monographs, complex Python pipelines) on Bluefin, and how do factors like model size, VRAM, RAM, and context window affect performance?
Type
Answer
Status
Published
Created
Feb 17, 2026
Updated
Feb 17, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

For advanced coding and statistical analysis tasks such as writing R monographs or building complex Python pipelines (e.g., with Splink), you should target large language models (LLMs) of at least 30-32B parameters. These larger models, such as Qwen2.5-Coder:32b, Nemotron 30B, and Qwen3-Coder-30B, offer significantly better performance for complex, agentic coding tasks compared to smaller models (7B-14B), which struggle with inference capacity and require more explicit instructions.

Hardware Requirements:

ComponentMinimumRecommended
RAM64GB96-128GB
VRAM16GB (split CPU/GPU)24GB+ (full model in VRAM)
StorageFast NVMeFast NVMe
  • GPU: NVIDIA RTX 4090 (24GB VRAM) is ideal, as it can fit a 30B Q4 model entirely in VRAM, maximizing performance. RTX 4080 Super (16GB) is workable with some CPU offloading. AMD RX 7900 XTX (24GB) is a good value if you are comfortable with Vulkan setup.
  • Context Window: For large codebases and long documents, set the context window to at least 16K tokens, preferably 32K or more. Some users configure --ctx-size 131072 for serious coding work.
  • Performance: Models that fit entirely in GPU VRAM perform much better than those that require CPU/GPU memory swapping. Expect 50-60 tokens/second for 30B models on high-end hardware.

Laptop vs Desktop:

  • Laptops like the Framework 13 (AMD Ryzen AI 9 370, 64GB RAM, iGPU) can handle 7-14B models for lightweight tasks, but for 30B models with large context windows, a desktop with a discrete GPU is strongly recommended.
  • NPU acceleration is not yet supported for LLM inference on Linux, even if the hardware and drivers are present.

References:

Summary:
For best results in advanced coding and statistical analysis, use a desktop workstation with 64-128GB RAM and a discrete GPU with at least 24GB VRAM (e.g., RTX 4090), and configure your LLM with a large context window. This setup will allow you to run 30-32B parameter models efficiently, enabling robust assistance for complex tasks.