3.4 Upload Files In Servlets

Unlike previous sections, in this section, we only focus on a single problem when developing a web application: how to upload files in servlets?

Uploading files in a common requirement. For example, you might upload an avatar when signing up in a social website; when adding a book in the book-selling website, the admin may asked to upload the book's cover; users can upload videos to the stream medias such as YouTube, and BiliBili.

Front-end

In order to upload files, there are some worth-noting configurations in the <form>.

<form action="upload" method="post" enctype="multipart/form-data">
    <label for="file">Choose a file: </label>
    <input type="file" name="file" id="file">
    <input type="submit" value="Upload">
</form>
  • The method attribute must be POST.
  • The enctype attribute must be multipart/form-data.
  • An input element with type="file" to choose a file.

The first and third requirements are intuitive, and we need to explain the second point. Firstly, let's think about another question: what is the default enctype value? Well, enctype means encoding type, and HTML forms provide three methods of encoding: 1) application/x-www-form-urlencoded (the default); 2) multipart/form-data; 3) text/plain[1].

[!NOTE] TL;DR The enctype attribute must be multipart/form-data in <form> if you would like to upload files.

To satisfy your curiosity, let's investigate what encoding exactly is through a small test:

  • Try to input "Chen=" in utf-8.html of ch3/request, and inspect the payload in Network tab for POST, or inspect the URL for GET. Interestingly, you would find that the data sent is Chen%3D, rather than Chen=.
  • Try to input "Zhongpu Chen" in utf8.html of ch3/request. Similarly, you would find that the data sent is Zhongpu+Chen, rather than Zhongpu Chen[2].

This is caused by URL Encoding. For example, since = has special meaning in URL, we have to encode or escape it to avoid confusion. Back to uploading files, we need also some special encoding algorithms. For regular parameters, they are separated by the ampersand &, while multipart/form-data needs to add a special boundary so that the server has to distinguish between text data and binary data[3].

Back-end

Before Servlet 3.0, writing code to receive uploaded files is a bit complicated. But now, we can use MultipartConfig annotation to reduce the workload[4].

@MultipartConfig
public class UploadServlet extends HttpServlet 

Apparently, the uploaded file is not a string, so we need a new method:

Part getPart(String name) Gets the named Part or null if the Part does not exist.

Here, javax.servlet.http.Part represents a part or form item that was received within a multipart/form-data POST request. So anything we want to do for the uploaded file should rely on Part's API. In some cases, we may simply store this uploaded file to a folder in the server. In what follows, we will illustrate how to finish this task. Remember, the code itself is not really important, and what you need to do is to practice how to expand your knowledge by searching, as we mentioned in Section 2.2.

First of all, we can write pseudo code for this task:

s <- get the name of this uploaded file;
save this file to the server to name s;

Pretty clear, right? Let's try to convert the pseudo code into real Java code. For the first sub-task, we need to get the name of the submitted file. After a quick check of the API documentation of Part, we can find a proper method:

getSubmittedFileName() If this part represents an uploaded file, gets the file name submitted in the upload.

Part part = request.getPart("file");
String fileName = part.getSubmittedFileName();

You can print this variable to either the web page or standard out, but we recommend using debug in IntelliJ IDEA[5].

The next subtask can be solved by its another method:

InputStream getInputStream() Obtain an InputStream that can be used to retrieve the contents of the file.

Stream is a common abstraction for files, I/O, and networking. If you are familiar with Java I/O, implementing the rest of code is trivial. The following is a sample code[6]:

InputStream is = part.getInputStream();
Path path = Path.of("/Users/zhongpu/Desktop/" + fileName);
Files.copy(is, path);

It writes (copies) the InputStream to a file in the server's Desktop and the path is in Solaris syntax used by MacOS/Linux. If you are using Windows, please place it in Windows syntax, such as C:\\Users\\zhongpu\\Desktop.

Note that in reality, reusing the submitted file name is not very useful, because different people may upload different files with the same name. To solve this problem, we have to make sure that the file's name is unique while the suffix/extension name (e.g., .txt, .png) is kept. The code to get the suffix name is left as an exercise for readers. And the unique name can be generated by UUID:

String uuid = UUID.randomUUID().toString();

Front-end (2)

In our previous sample code, we can upload any file to the server. But in the real world, we might restrict the file type by its extension. For example, a journal report system may only allow .doc and .docx files. We can achieve this by the following code:

<input type="file" accept=".doc,.docx">

The accept attribute value is a string that defines the file types the file input should accept. This string is a comma-separated list of unique file type specifiers. Because a given file type may be identified in more than one manner, it's useful to provide a thorough set of type specifiers when you need files of a given format.

For instance, there are a number of ways Microsoft Word files can be identified, so a site that accepts Word files might use an <input> like this:

<input type="file" accept=".doc,.docx,application/msword,application/vnd.openxmlformats-officedocument.wordprocessingml.document">

Again, like required, HTML backed restrictions can be bypassed by evil guys, so back-end programers shall take the responsibility to check it again. And this is also left as an exercise for readers.


[1] Never use text/plain in any case. More discussions can be found at What does enctype='multipart/form-data' mean?.

[2] In some browsers, space can be encoded as %20, rather than +.

[3] The detailed multipart/form-data encoding algorithm can be found here.

[4] Any annotation can be replaced by XML settings in DD, and @MultipartConfig is not the exception.

[5] If you don't know what debug is, please refer to Tutorial: Debug your first Java application. And debugging web application is the similar.

[6] Please set the project language level to 11.