A guide to using Ollama or vLLM to run models like Llama 3 and Mistral directly on your VPS, providing internal APIs for applications without the need for expensive cloud services.