
What is Nomic Embed V1.5?
Nomic Embed V1.5 is a powerful text embedding model that consistently ranks near the top of the MTEB embedding model leaderboard. The model excels at converting text into dense vector representations, making it particularly effective for semantic search, document clustering, and retrieval-augmented generation (RAG) applications.
What is Modal?
Modal is a cloud platform that provides the fastest and easiest way to access GPUs for running inference on embedding models like Nomic Embed V1.5. Running inference on a GPU is essential for embedding models because it significantly accelerates the processing of large volumes of text, enabling real-time applications. For more information on how to get started, visit the Modal documentation.
Performance considerations
The model delivers fast inference times on Modal’s H100 GPUs, typically processing text in milliseconds. For production deployments, consider implementing a caching layer for frequently embedded text to optimize costs and reduce latency. The model’s output vectors are suitable for direct use in vector databases like Pinecone or Weaviate.
Example code for running the Nomic Embed V1.5 embedding model on Modal
To run the following code, you will need to:
- Create an account at modal.com
- Run
pip install modal
to install the modal Python package - Run
modal setup
to authenticate (if this doesn’t work, trypython -m modal setup
) - Copy the code below into a file called
app.py
- Run
modal run app.py
import modal
MODEL_ID = "nomic-ai/nomic-embed-text-v1.5"
MODEL_REVISION = "d802ae16c9caed4d197895d27c6d529434cd8c6d"
image = modal.Image.debian_slim().pip_install(
"torch", "sentence-transformers", "einops"
)
app = modal.App("example-base-nomic-embed", image=image)
GPU_CONFIG = "H100"
CACHE_DIR = "/cache"
cache_vol = modal.Volume.from_name("hf-hub-cache", create_if_missing=True)
@app.cls(
gpu=GPU_CONFIG,
volumes={CACHE_DIR: cache_vol},
allow_concurrent_inputs=15,
container_idle_timeout=60 * 10,
timeout=60 * 60,
)
class Model:
@modal.enter()
def setup(self):
from sentence_transformers import SentenceTransformer
self.model = SentenceTransformer(
MODEL_ID, revision=MODEL_REVISION, cache_dir=CACHE_DIR, trust_remote_code=True
)
@modal.method()
def embed(self, sentences: list):
return self.model.encode(sentences)
# ## Run the model
@app.local_entrypoint()
def main():
sentences = [
"search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten"
]
print(Model().embed.remote(sentences))
Additional resources
- Nomic AI Documentation - Official documentation and best practices
- MTEB Leaderboard - Benchmark comparisons
- Vector Database Guide - Understanding vector storage
- RAG Architecture Patterns - Implementation strategies