We’re Fly.io. We run apps for our users on hardware we host around the world. Fly.io happens to be a great place to run Phoenix applications. Check out how to get started!
Time to face reality. Sending files over the internet via a web browser is not a solved problem. There has been so many bytes spilled on the varied ways of accepting, uploading, streaming, verifying, storing, backing up, and sharing files from a user. It’s 2023, and I realize I’m not getting a flying car, but you’d hope it was as simple as:
<input type="file">
Unfortunately, everyone knows that’s only the very tip of the iceberg, and you can quickly ‘titanic’ the whole project by not taking this one seriously.
That’s not to say there aren’t many third-party solution to make this easier; by all means please tweet at me about them to increase my engagement. Often the very best solution is “submit the form to s3 and forget about it”, this is totally not annoying to configure while attempting to keep spend low. If, you are reading this as a “how to do uploads” post, then stop here and consider doing just that.
The second it’s time to step outside the happy path and do anything more complex, we end up with abominations. Building with state machines, job queues, SQS, Lambdas, ImageMagick, FFmpeg, python, CSV’s, excel and the list goes on. Often including a handful of third party services because we’ve frankly lost our nerve.
Again, once you’ve built the abomination, you can often be “done” with it and move on. So if that’s you, please stop reading and tell me about it on social media.
But what if there was a better way?
Enter LiveView Uploads
With LiveView, one of the very early and ambitious efforts for the team was to try and get this one right. Since the entire page lifecycle for a LiveView is over a WebSocket we had to think really hard and make some engineering trade offs. Today, if you want to do a basic upload to the current file system LiveView has your back. Call allow_upload\3
in your mount
callback
allow_upload(socket, :avatar, accept: ~w(.jpg .jpeg))
in this case only allowing a single file with extensions .jpg
or .jpeg
and then adding live_file_input\1
to your form
<.live_file_input upload={@uploads.avatar} />
and finally in your form submit function you need to consume_uploaded_entries/3
consume_uploaded_entries(socket, :avatar, fn %{path: path}, _entry ->
dest = Path.join([:code.priv_dir(:my_app), "static", "uploads", Path.basename(path)])
File.cp!(path, dest) {:ok, ~p"/uploads/{Path.basename(dest)}"}
end)
It’s all kinda hand wavy and magic, because well it is! We’re hiding an incredible amount of complexity here, and in the end you get a complete file path that you can move to wherever is best for you! AND when the webpage closes, the “temporary” file get’s cleaned up! We even get real time upload progress, drag and drop and the ability to handle multiple uploads at once, for free.
Now the grizzled file upload veteran is already shaking their head to point out some serious downsides with this as a base method:
- What if the file is extremely large?
- What if we want to handle the file as a stream of bytes and process in real time?
- What if I want to send this file to many places? Now I need to copy it?
- What if that file is not cropped correctly?
- It’s still going to want to go to s3…
- and so on
Prior to just a month ago, the answer for all of these questions was to upload directly to s3 and figure the rest out later. Which, as alluded to above, is a totally rock solid solution.
UploadWriter: Chunk-by-Chunk Processing & multi-destination File Uploads
Enter the LiveView.UploadWriter
which, when configured in allow_upload\3
under the option :writer
, can handle file uploads in chunks, effectively streaming the file to the backend allowing you full control over how the file is consumed. In fact, the default writer
is the UploadTmpFileWriter
which generates a random tmp file opens it for writing and writes the chunks as they arrive, closing the file at the end.
Let’s make our own UploadWriter that sends the file to two places!
defmodule DoubleWriter do
@behaviour Phoenix.LiveView.UploadWriter
@impl true
def init(_opts) do
file_name = Utils.random_filename(".jpg")
with {:ok, path} <- Plug.Upload.random_file("local_file"),
{:ok, file} <- File.open(path, [:binary, :write]),
s3_op <- ExAws.S3.initialize_multipart_upload("bucket", file_name)
do
{:ok, %{path: path, key: file_name, chunk: 1, s3_op: s3_op, s3_config: ExAws.Config.new(s3_op.service}}
end
end
@impl true
def meta(state) do
%{local_path: state.path, key: state.key}
end
@impl true
def write_chunk(data, state) do
case IO.binwrite(state.file, data) do
:ok ->
part = ExAws.S3.Upload.upload_chunk!({data, state.chunk}, state.s3_op, state.s3_config)
{:ok, %{state | chunk: state.chunk+1, parts: [part | state.parts]}}
{:error, reason} -> {:error, reason, state}
end
end
@impl true
def close(state, _reason) do
case {File.close(state.file), ExAws.S3.Upload.complete(state.parts, state.s3_op, s3_config)} do
{:ok, {:ok, _}} ->
{:ok, state}
{{:error, reason}, _} -> {:error, reason}
{_,{:error, reason}} -> {:error, reason}
end
end
end
Thanks to s3 being a bear there is maybe too much going on in here but lets walk through it:
def init(opts) do
Initializes the Writer, in our case we initialize an Empty Tmp File, and start an S3 Multipart Upload, and that becomes our initial state.def meta(state) do
Return the metadata from state that we might use on the front enddef write_chunk(data, state) do
Gives you the chunk to write to the file and s3, we also store some state for cleanup.def close(state, _reason) do
This is called when the file is done uploading, or errors for any reasons. In our case, we don’t handle the error, we close the File and complete the s3 upload. Returning any errors that might pop up.
To use DoubleWriter, update our allow_upload
from earlier:
allow_upload(:avatar,
accept: ~w(.jpg .jpeg),
writer: fn _name, _entry, _socket -> {DoubleWriter, []} end )
We could have passed the name/entry to our writer to give it a more friendly name or do more details checks in our init.
Taking a step back what this gets us is fully Streamed, chunk by chunk, uploads to wherever we want them. In our case it would be better to upload the file directly to s3 from the client and save on ingress/egress costs. There are some use cases where you might prefer this for security and compliance reasons, for example maybe you don’t wanna pre-sign an upload and send it to the client.
Could also imagine processing a CSV as it came in building up lines from chunks as they arrived, creating UX around the header line, and then filling in a table in real time. Say we know a line of a CSV is 100 bytes, we could align our chunks to be 100 bytes and process them line by line as they arrive.
Or real time thumbnail generation.
Or scanning a file for a specific value and then cancel the upload once you have the data you need. The list goes on
Wrap up
While this is no flying car or as clean as a single file input, it is cleaner than building up all of this infrastructure from scratch. Using these primitives we can process files of nearly any size; receiving them, processing them and sending them as quickly as they arrive.
And please share with the Phoenix team what you build with this!