Troubleshooting
“Command not found” errors
If you installed Modal but you’re seeing an error like modal: command not found when trying to run the CLI, this means that the
installation location of Python package executables (“binaries”) are not present
on your system path. This is a common problem; you need to reconfigure your
system’s environment variables to fix it.
One workaround is to use python -m modal instead of modal. However, this
is just a patch. There’s no single solution for the problem, because Python
installs dependencies on different locations depending on your environment. See
this popular StackOverflow question for
pointers on how to resolve your system path issue.
Function side effects
The same container can be reused for multiple invocations of the same Function within an App. This means that if your Function has side effects like modifying files on disk, they may or may not be present for subsequent calls to that Function. You should not rely on the side effects to be present, but you might have to be careful so they don’t cause problems.
For example, if you create a disk-backed database using sqlite3:
import modal
import sqlite3
app = modal.App()
@app.function()
def db_op():
db = sqlite3("db_file.sqlite3")
db.execute("CREATE TABLE example (col_1 TEXT)")
...This function can (but will not necessarily) fail on the second invocation
with an OperationalError: table foo already exists error.
To get around this, take care to either clean up your side effects (e.g.
deleting the db file at the end your function call above) or make your functions
take them into consideration (e.g. adding an if os.path.exists("db_file.sqlite") condition or randomize the filename
above). Alternatively, you can set single_use_containers=True so that every
Function call will spin up a new container; however, note that this will result
in higher cost and worse latency as every invocation will require a cold start.
Heartbeat timeout
The Modal client in modal.Function containers runs a heartbeat loop that the host uses to healthcheck the container’s main process.
If the container stops heartbeating for a long period (minutes), the container will be terminated due to a heartbeat timeout, which is displayed in logs.
Container heartbeat timeouts are rare, and they are typically caused by one of two application-level sources:
- Global Interpreter Lock is held for a long time, stopping the heartbeat thread from making progress. py-spy can detect GIL holding. We include
py-spyautomatically inmodal shellfor convenience. A quick fix for GIL holding is to run the code which holds the GIL in a subprocess. - Container process initiates shutdown, intentionally stopping the heartbeats, but it does not complete shutdown.
In both cases turning on debug logging will help diagnose the issue.
413 Content Too Large errors
If you receive a 413 Content Too Large error, this might be because you are
hitting our gRPC payload size limits.
The size limit is currently 100MB.
Outdated kernel version (4.4.0)
Our secure runtime reports a misleadingly old kernel version, 4.4.0. Certain software libraries will detect this and report a warning. These warnings can be ignored because the runtime actually implements Linux kernel features from versions 5.15+.
If the outdated kernel version reporting creates errors in your application please contact us in our Slack.
CUDA driver initialization failed on L4 GPU type
Certain L4 instance types within Modal’s fleet have a flaky issue in the NVIDIA driver which causes the following CUDA context initialization error:
RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu.A workaround to ensure reliable container startup is given below:
@modal.enter()
def warmup_cuda(self):
import ctypes
import time
import modal
cu = ctypes.CDLL("libcuda.so.1")
max_retries = 10
retry_delay_secs = 0.5
for attempt in range(max_retries):
rc = cu.cuInit(0)
if rc == 0:
break
else:
if attempt < max_retries - 1:
print(f"cuInit failed on attempt {attempt + 1}/{max_retries} with code {rc}, retrying...")
time.sleep(retry_delay_secs)
else:
print(f"CUDA initialization failed after {max_retries} attempts; stopping container")
modal.experimental.stop_fetching_inputs()We are investigating a root cause fix for this problem.