What are the hardware requirements and best practices for running local LLMs (AI) for advanced coding and statistical analysis tasks (e.g., R monographs, complex Python pipelines) on Bluefin, and how do factors like model size, VRAM, RAM, and context window affect performance?

For advanced coding and statistical analysis tasks such as writing R monographs or building complex Python pipelines (e.g., with Splink), you should target large language models (LLMs) of at least 30-32B parameters. These larger models, such as Qwen2.5-Coder:32b, Nemotron 30B, and Qwen3-Coder-30B, offer significantly better performance for complex, agentic coding tasks compared to smaller models (7B-14B), which struggle with inference capacity and require more explicit instructions.

Hardware Requirements:

Component	Minimum	Recommended
RAM	64GB	96-128GB
VRAM	16GB (split CPU/GPU)	24GB+ (full model in VRAM)
Storage	Fast NVMe	Fast NVMe

GPU: NVIDIA RTX 4090 (24GB VRAM) is ideal, as it can fit a 30B Q4 model entirely in VRAM, maximizing performance. RTX 4080 Super (16GB) is workable with some CPU offloading. AMD RX 7900 XTX (24GB) is a good value if you are comfortable with Vulkan setup.
Context Window: For large codebases and long documents, set the context window to at least 16K tokens, preferably 32K or more. Some users configure --ctx-size 131072 for serious coding work.
Performance: Models that fit entirely in GPU VRAM perform much better than those that require CPU/GPU memory swapping. Expect 50-60 tokens/second for 30B models on high-end hardware.

Laptop vs Desktop:

Laptops like the Framework 13 (AMD Ryzen AI 9 370, 64GB RAM, iGPU) can handle 7-14B models for lightweight tasks, but for 30B models with large context windows, a desktop with a discrete GPU is strongly recommended.
NPU acceleration is not yet supported for LLM inference on Linux, even if the hardware and drivers are present.

References:

Summary:
For best results in advanced coding and statistical analysis, use a desktop workstation with 64-128GB RAM and a discrete GPU with at least 24GB VRAM (e.g., RTX 4090), and configure your LLM with a large context window. This setup will allow you to run 30-32B parameter models efficiently, enabling robust assistance for complex tasks.