Try DeepSeek-R1 on Modal! View example
January 21, 20255 minute read
How to run Nomic Embed V1.5 on Modal
author
Yiren Lu@YirenLu
Solutions Engineer

What is Nomic Embed V1.5?

Nomic Embed V1.5 is a powerful text embedding model that consistently ranks near the top of the MTEB embedding model leaderboard. The model excels at converting text into dense vector representations, making it particularly effective for semantic search, document clustering, and retrieval-augmented generation (RAG) applications.

What is Modal?

Modal is a cloud platform that provides the fastest and easiest way to access GPUs for running inference on embedding models like Nomic Embed V1.5. Running inference on a GPU is essential for embedding models because it significantly accelerates the processing of large volumes of text, enabling real-time applications. For more information on how to get started, visit the Modal documentation.

Performance considerations

The model delivers fast inference times on Modal’s H100 GPUs, typically processing text in milliseconds. For production deployments, consider implementing a caching layer for frequently embedded text to optimize costs and reduce latency. The model’s output vectors are suitable for direct use in vector databases like Pinecone or Weaviate.

Example code for running the Nomic Embed V1.5 embedding model on Modal

To run the following code, you will need to:

  1. Create an account at modal.com
  2. Run pip install modal to install the modal Python package
  3. Run modal setup to authenticate (if this doesn’t work, try python -m modal setup)
  4. Copy the code below into a file called app.py
  5. Run modal run app.py
import modal

MODEL_ID = "nomic-ai/nomic-embed-text-v1.5"
MODEL_REVISION = "d802ae16c9caed4d197895d27c6d529434cd8c6d"

image = modal.Image.debian_slim().pip_install(
    "torch", "sentence-transformers", "einops"
)
app = modal.App("example-base-nomic-embed", image=image)

GPU_CONFIG = "H100"

CACHE_DIR = "/cache"
cache_vol = modal.Volume.from_name("hf-hub-cache", create_if_missing=True)

@app.cls(
    gpu=GPU_CONFIG,
    volumes={CACHE_DIR: cache_vol},
    allow_concurrent_inputs=15,
    container_idle_timeout=60 * 10,
    timeout=60 * 60,
)
class Model:
    @modal.enter()
    def setup(self):
        from sentence_transformers import SentenceTransformer

        self.model = SentenceTransformer(
            MODEL_ID, revision=MODEL_REVISION, cache_dir=CACHE_DIR, trust_remote_code=True
        )

    @modal.method()
    def embed(self, sentences: list):
        return self.model.encode(sentences)


# ## Run the model
@app.local_entrypoint()
def main():
    sentences = [
        "search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten"
    ]

    print(Model().embed.remote(sentences))

Additional resources

Ship your first app in minutes.

Get Started

$30 / month free compute