How to run Nomic Embed V1.5 on Modal

Solutions Engineer

What is Nomic Embed V1.5?

Nomic Embed V1.5 is a powerful text embedding model that consistently ranks near the top of the MTEB embedding model leaderboard. The model excels at converting text into dense vector representations, making it particularly effective for semantic search, document clustering, and retrieval-augmented generation (RAG) applications.

Modal is a cloud platform that provides the fastest and easiest way to access GPUs for running inference on embedding models like Nomic Embed V1.5. Running inference on a GPU is essential for embedding models because it significantly accelerates the processing of large volumes of text, enabling real-time applications. For more information on how to get started, visit the Modal documentation.

Performance considerations

The model delivers fast inference times on Modal’s H100 GPUs, typically processing text in milliseconds. For production deployments, consider implementing a caching layer for frequently embedded text to optimize costs and reduce latency. The model’s output vectors are suitable for direct use in vector databases like Pinecone or Weaviate.

To run the following code, you will need to:

Create an account at modal.com
Run pip install modal to install the modal Python package
Run modal setup to authenticate (if this doesn’t work, try python -m modal setup)
Copy the code below into a file called app.py
Run modal run app.py

import modal

MODEL_ID = "nomic-ai/nomic-embed-text-v1.5"
MODEL_REVISION = "d802ae16c9caed4d197895d27c6d529434cd8c6d"

image = modal.Image.debian_slim().pip_install(
    "torch==2.6.0", "sentence-transformers==3.4.1", "einops==0.8.1"
)
app = modal.App("example-base-nomic-embed", image=image)

GPU_CONFIG = "H100"

CACHE_DIR = "/cache"
cache_vol = modal.Volume.from_name("hf-hub-cache", create_if_missing=True)


@app.cls(
    gpu=GPU_CONFIG,
    volumes={CACHE_DIR: cache_vol},
    scaledown_window=60 * 10,
    timeout=60 * 60,
)
@modal.concurrent(max_inputs=15)
class Model:
    @modal.enter()
    def setup(self):
        from sentence_transformers import SentenceTransformer

        self.model = SentenceTransformer(
            MODEL_ID,
            revision=MODEL_REVISION,
            cache_folder=CACHE_DIR,
            trust_remote_code=True,
        )

    @modal.method()
    def embed(self, sentences: list):
        return self.model.encode(sentences)


# ## Run the model
@app.local_entrypoint()
def main():
    sentences = [
        "search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten"
    ]

    print(Model().embed.remote(sentences))

Additional resources

Nomic AI Documentation - Official documentation and best practices
MTEB Leaderboard - Benchmark comparisons
Vector Database Guide - Understanding vector storage
RAG Architecture Patterns - Implementation strategies

How to run Nomic Embed V1.5 on Modal

What is Nomic Embed V1.5?

What is Modal?

Performance considerations

Example code for running the Nomic Embed V1.5 embedding model on Modal

Additional resources

Ship your first app in minutes.