Modal’s cloud GPU infrastructure makes running your own LLMs easy!

  1. Install any dependencies your LLM inference needs
  2. Attach a GPU to your Function with gpu="h100"
  3. Run LLM inference code inside the Function

This demo shows an LLM explaining the code of the demo.

Click Run to see it in action!

If you want to see what a more sophisticated LLM inference server on Modal looks like, check out this example.

You can also explore our gallery of other examples for inference, training, batch jobs, sandboxed code execution, and more!


Terminal modal run inference.py

Press the Run button to start.