Troubleshooting
Some problems are harder to diagnose because they deal with Elixir releases or Docker build problems. Typically, you don’t run the application that way locally, so you only encounter those problems when it’s time to deploy.
Here are a few tips to help diagnose and identify problems.
- Run
mix release
locally on your project. - Build the Dockerfile locally to verify it builds correctly.
docker build .
- Check the
:prod
config inconfig/runtime.exs
, which is not used locally. Carefully review it. - Run
fly logs
to check server logs.
For diagnosing database app issues, refer to the Postgres Monitoring information.
Here’s a quick hit list of commands to help:
- Run
fly logs -a <pg-db-name>
to check database app’s server logs. - Run
fly checks list -a <pg-db-name>
to check the database app’s health. - Run
fly status -a <pg-db-name> --all
to see if any VMs failed. - Run
fly vm status <id> -a <pg-db-name>
to debug a specific VM.
Diagnosis Tip
Most difficulties center around application config. Applications generated with an older version of Phoenix are configured differently than a newly generated app. If you have problems like connecting to your database, usually an IPv6 configuration update is needed.
The internal networks at Fly.io use a IPv6 addresses. Elixir/OTP needs some config to work smoothly.
One way to identify an issue is to generate a new Elixir application using a current version of Phoenix. Deploy that to Fly.io with a database. With that, you have a local working example to compare against. Don’t worry, you can easily destroy
the test app when you’re ready to.
Suggested files to pay attention to when looking for config differences.
config/config.exs
config/prod.exs
config/runtime.exs
Dockerfile
mix.exs
Not Enough Connections
A common failure mode is the application exhausting the number of free connections, your default fly.toml
has the following settings:
[services.concurrency]
hard_limit = 50
soft_limit = 25
type = "connections"
Setting the hard_limit
and soft_limit
closer to your needs will free up the number of live connections per node. A safe starting point could be 1000 for the hardlimit and 975 for the softlimit. The “right” amount depends on how much data is actively stored in the LiveView processes. That value will vary for each application.
Clustering
Here are some troubleshooting tips when working with Clustering.
When using IEx to remote into a running application to diagnose connection issues, note if you see this warning when connecting:
warning: the --remsh option will be ignored because IEx is running on limited shell
When that warning is present, the Elixir node we are connecting to is not the remote running node. A new node was launched when the remote shell request was ignored. Refer to the docs here about the --pty
option.
Another indication of this situation is when typing Node.self()
, the name of the node is returned with a prefix similar to “rem-ea46-”.
Example:
:"rem-ea46-hello_elixir@fdaa:0:1da8:a7b:115:5641:7e85:2"
A working remote shell will return a node name more like this:
:"hello_elixir@fdaa:0:1da8:a7b:115:5641:7e85:2"
Testing Node Connectivity
Using libcluster
is an easy way to auto-cluster an Elixir application. However, going through the process manually can help diagnose issues.
We can open two terminals locally on our machine. In terminal A, we get an IEx terminal to one node. Then in terminal B, we get an IEx terminal to a different node.
In each terminal, we can ask the node for it’s name:
Node.self
Then, taking the response node in terminal A, we can explicitly try to connect through terminal B (on the other node).
That command might look like this:
Node.connect(:"result-from-terminal-A@ipvs-address")
If the result is true
then it either connected to the other node or we entered into the wrong terminal and it says it’s connected to itself.
If we received a true
response, we can check the list of connected nodes using this command:
Node.list()
An empty list means is has no connections.
During the process, make note of any logged messages that might help explain why the two nodes can’t connect.