< AI Infrastructure / >

End-to-End AI Infrastructure: Deploy Local LLMs Remotely

Back
//
npm install praevisio_labs

Complete solution for running large language models locally using Ollama

I took a detour down the rabbit-hole of local LLM's (worth it), and learned how to run models locally, serve responses to a GUI interface instead of the default terminal, how to tunnel via ngrok, and expose the backend to a public client.

It definitely felt like drinking water from a firehose, and I leveraged AI heavily during the process as there were so many moving pieces I was not familiar with.

Some of the more interesting things I learned along the way include:

  • Running LLM's locally with Ollama
  • Setting model hyperparameters
  • Chunking model responses
  • Model quantization
  • ngrok tunelling

If you're interested in doing this too, I've linked a 3-part walkthrough below. You can also check out the repo for some helpful documentation on how to choose the right model, install dependencies, configure your server, deploy your app, and manage model responses.


Part 1: Run LLMs Locally with Ollama CLI


Part 2: Handling Raw Bytes Stream from Ollama API Endpoint


Part 3: Exposing Your Local API for Remote Access w/ ngrok