A Must-Have for NAS Experts: A Comprehensive Guide to Easy Document Management with Paperless-ngx

After all the tinkering, I realized that 90% of NAS applications are useless.

/en/how-to-install-paperless-ngx-on-your-nas/9058fdc48d0956b6f4f84b29e4a4a940.jpg

Only 3 or 2 of them are actually useful.

Most of the time, after following various guides and tutorials to set them up, we just leave them there untouched.

From the first NAS I had, the Star Snail, to now, after 8 years, I have researched countless self-hosted applications.

The protagonist I am introducing today belongs to the remaining 10% that are actually useful.

To be more precise, it can be considered as the top 1% among this 10% of usefulness.

The benefits it brings me are not only related to life and work, but also to better document management and file search.

Thanks to it, I have saved at least 500 hours.


Before formally introducing it, I want to talk about another topic: why I have always been unable to leave the Apple ecosystem.

The Apple ecosystem has brought me not only convenient systems, security, and smooth integration of all devices, but also one of the biggest reasons: Apple’s powerful photo-OCR function.

For example, if I want to find a chat screenshot from a conversation I had with a seller a while ago, in order to provide evidence,

Compared to the previous method of searching through each image one by one, I can now simply search for the keyword screwdriver. Apple Photos will directly locate the image that contains a screwdriver.

image-20231213105413936

image-20231213105424615

If you are not familiar with this feature yet, don’t rush to try it yourself.

The toy I am introducing to you today can bring you the same:

  • The same effect as the Apple feature.
  • Hosted on your NAS.
  • Complete control over your data.

image-20231213110140424

  • It also supports online preview:

image-20231213131049730

  • It supports all digital documents: Not only images, but also PDFs, Word documents, Excel spreadsheets, and even Markdown files. It truly achieves document digitization, unified management, and efficient search.

image-20231213110911380

This is the new toy I am bringing to you today, Paperless-ngx. As the name suggests, it is all about going paperless.

It can help you organize your contracts, physical documents, bills, and more, while also managing digital documents (Word, Excel, PDF, etc.).

paperless-ngx-banner


Paperless-ngx is not just a document management system. It is a complete solution that converts your physical files into searchable online archives, reducing the use of paper. Its core features include:

  • Document organization and indexing: Organize scanned documents using tags, correspondents, types, and more.
  • OCR text recognition: Perform optical character recognition on documents to enable text search and selection, even for documents with images.
  • Multi-language support: Utilize the open-source Tesseract engine to support over 100 languages.
  • Long-term storage format: Save documents in PDF/A format, designed for long-term storage.
  • Intelligent tagging and classification: Automatically add tags, correspondents, and document types using machine learning.
  • Wide range of file support: Support for PDF documents, images, plain text files, Office documents, and more.
  • Customizable file management: Paperless-ngx manages file names and folders, supporting different configurations.
  • Modern web application: Customizable dashboard, filters, batch editing, drag and drop upload, custom views, shared links, and more.
  • Full-text search: Auto-complete, relevance ranking, and highlighting of matched query parts.
  • Email handling: Import documents from email accounts and configure multiple accounts and rules.
  • Multi-user permission system: Built-in robust multi-user permission system.
  • Multi-core system optimization: Parallel processing of multiple documents.

Setup Steps:

Follow for free to stay on track.

container-manager-1

container-station-1

container-station-2

Tutorial reference:

Install Portainer in NAS in 30 seconds

  • Open File Station and create a paperless-ngx folder in the docker folder.

image-20231212182330224

  • Create the following directories inside the paperless-ngx folder:
    • consume
    • data
    • export
    • media
    • pgdata
    • redisdata

image-20231212182342117

I am using Synology’s Container Manager for this setup, but Portainer and QNAP are similar:

image-20231212182358034

Copy the following configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
version: "3.4"
services:
  broker:
    image: library/redis:7
    restart: unless-stopped
    volumes:
      - /volume1/docker/paperless-ngx/redisdata:/data

  db:
    image: library/postgres:15
    restart: unless-stopped
    volumes:
      - /volume1/docker/paperless-ngx/pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: paperless

webserver:
  image: paperlessngx/paperless-ngx:latest
  restart: unless-stopped
  depends_on:
    - db
    - broker
    - gotenberg
    - tika
  ports:
    - "28000:8000"  # change it if you like
  healthcheck:
    test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
    interval: 30s
    timeout: 10s
    retries: 5
  volumes:
    - /volume1/docker/paperless-ngx/data:/usr/src/paperless/data
    - /volume1/docker/paperless-ngx/media:/usr/src/paperless/media
    - /volume1/docker/paperless-ngx/export:/usr/src/paperless/export
    - /volume1/docker/paperless-ngx/consume:/usr/src/paperless/consume
  environment:
    PAPERLESS_REDIS: redis://broker:6379
    PAPERLESS_DBHOST: db
    PAPERLESS_TIKA_ENABLED: 1
    PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
    PAPERLESS_TIKA_ENDPOINT: http://tika:9998
    PAPERLESS_OCR_LANGUAGES: chi-sim chi-tra  # change it if you like
    PAPERLESS_OCR_LANGUAGE: eng+chi_sim  # change it if you like
    USERMAP_UID: 0
    USERMAP_GID: 0
    PAPERLESS_TIME_ZONE: Asia/Shanghai  # change it if you like
  dns:
    - 8.8.8.8
    - 8.8.4.4

gotenberg:
  image: gotenberg/gotenberg:7.10
  restart: unless-stopped
  command:
    - "gotenberg"
    - "--chromium-disable-javascript=true"
    - "--chromium-allow-list=file:///tmp/.*"

tika:
  image: apache/tika:latest
  restart: unless-stopped

Explanation of the configuration (customizable):

I have marked the parts in the above file that I think can be modified with “# change it if you like”. For the rest of the parts, it is not recommended for beginners to modify.

  • webserver’s port section: you can change it to another port number such as “38000:8000”, do not modify the 8000 at the end

  • PAPERLESS_OCR_LANGUAGES: set the supported languages for paperless, chi-sim chi-tra (Simplified Chinese, Traditional Chinese), you can add the language you want, such as jpn

    In addition, the system already includes English, German, Italian, etc.

  • PAPERLESS_OCR_LANGUAGE: default language for OCR, I have set it to English and Simplified Chinese here

  • PAPERLESS_TIME_ZONE: set your time zone

image-20231212182432126

image-20231212182442058

Access the program in the browser: [ip]:[port]

ip is the IP address of your NAS (mine is 172.16.22.22), and the port is defined in the configuration file above. If you follow my tutorial, it is 28000.

image-20231213115505007

But it seems that you don’t have a username and password yet, so let’s create an account and password:

Select the webserver container and open the terminal:

image-20231212182454272

python3 manage.py createsuperuser

Enter the following information:

  • username
  • email
  • password

image-20231212182500769

image-20231212182524985

image-20231212182535297

The text has been extracted:

image-20231212182553777

image-20231212182603518

image-20231212182610480

image-20231212182618189

In the edit view, you can see the recognized result and make modifications:

image-20231212182641514

image-20231212182625422

image-20231212182740034

image-20231212182653729

You can also download third-party app paperless_app

image-20231213123109606

You can also choose to use other scanning apps and then import them into pp (better recognition), such as the free Microsoft Lens

Screenshot from Microsoft Lens on an iPhone

Screenshot from Microsoft Lens on an iPhone

You can also choose to connect your physical printer and automatically upload to paperless:

image-20231213123313667

If you have more ideas, please feel free to share.

If you like this article, please remember to like, bookmark, and follow Dad’s Digital Garden. We will continue to bring more practical self-built application guides. Together, let’s take control of our own data and create our own digital world!

If you encounter any problems or have any suggestions during the setup process, please feel free to leave a comment below for discussion and learning.