Igor Bubelov About Blog Notes Photos

How to Publish Things Online

Self-Hosting · Linux · May 13, 2021

This is an opinionated yet comprehensive guide on publishing your digital content online. The method I describe is optimized for simplicity, minimalism, vendor-independence and censorship resistance.

Photo by Bruno Martins
Photo by Bruno Martins

Table of Contents

Isn’t Publishing a Solved Problem?

There are many ways to publish things online, so why should we bother inventing the wheel? Most publishing services are easy to use, but they usually hide tremendous complexity behind the scenes. Hidden complexity can be more nasty than the visible one though. By choosing a complex system managed by a third party you make yourself vulnerable to censorship and exploitation. The approach I describe allows you to remove this dependency while boosting your website page load time tremendously.

Go Static

Static Websites is the Future of Self-Publishing

Websites and Webservers

Now that we have a website, it’s time to figure out how to publish it. Web browsers act as HTTP “clients”, and they know polite ways of asking HTTP “servers” to serve webpages for them. You can find a full description of HTTP protocol here. Don’t be afraid of wordly technical documents, HTTP is actually pretty simple, and you can benefit from knowing how it works, even as a casual browser user.

Client and server are popular abstractions used in many protocols, including HTTP. Both client and server should speak the same language, but their tasks and goals are different. By publishing a website, what we really want is to make it possible for various HTTP clients (browsers) to see it. The “social” goals may vary, of course. Maybe you’re a journalist who exposes corruption, or maybe you want to share your hobbies with the rest of the world. On a technical level, it’s all the same.

Websites and webservers are different things. Websites can’t serve themselves, which means we need to install a special piece of software called a webserver and tell it to serve our websites to all interested clients. There are two most popular and mature open source webservers: Apache2 and Nginx. It’s really hard to choose the winner here, because they both have a similar market share of about 35% and both of them are fast and well-documented. In this guide, I’m going to use Apache2, because I use Nginx most of the time and playing with a new piece of software is fun, isn’t it?

Operating System

The most popular server operating system is Linux. Every website needs a webserver and every webserver needs an operating system. There are plenty of free and open source Linux distributions. It can get a bit confusing because Linux may mean both an operating system kernel and an operating system itself. All Linux distributions share the same kernel, and they aren’t that different from each other. It’s hard to make a wrong choice here, but I would recommend you to stick with a popular distribution such as Ubuntu.

Some people point out that Ubuntu isn’t as free and non-binding as the rest of our dependencies. I partly agree, but we shouldn’t forget that Ubuntu is based on Debian and you can always move to Debian without having to learn a completely new set of tools. Canonical offers a good product for free and there are serious checks on its powers, so I wouldn’t worry about that, for now.

Hosting Provider

Websites need webservers, webservers need operating systems, but what does an operating system need? Well, an operating system is a piece of software which can run other pieces of software, but it can’t run itself. Software can’t run without hardware. The line here isn’t that clear in a sense that software is also a kind of hardware. It’s not an abstract idea disconnected from a physical world. Software has a physical form, it “lives” on our storage devices such as SSDs.

Philosophical questions aside, we really need hardware in order to run an operating system. That naturally leads us to a hosting provider. This part is pretty important, because some hosting providers don’t give their customers direct access to their servers. Sometimes, you can’t even choose an operating system. Those providers try to place themselves “in the middle” between your website and its visitors. Their marketing departments work hard to convince you not to bother with setting up your own private server and to buy their pre-made setups which are often locked-in on certain software such as WordPress. This is called “managed” hosting, and it may sound like a good idea at first, until you experience customer lock-in, terrible user interfaces, and unresponsive support. Managing your own server isn’t that hard, and it will save you a lot of time and nerves in the long run.

The choice of a hosting provider isn’t that important, as long as it gives you full access to your server. I often use Digital Ocean, and their service is more or less tolerable. I also use Scaleway, and it’s a good and cheap choice if you want to host your website in Europe. As a rule of thumb, your website should be as close to its visitors as possible. Light travels fast, but not as fast to make the distance unnoticeable when you open a website hosted in the other end of the world.

Connecting to Your Server

Servers are a bit different from traditional desktop computers. They rarely have a graphical user interface, and the best way to manage them is by using a textual interface. You can also use a textual interface to manage your Linux desktop by using a “Terminal” app. Windows machines also have their own flavor of Terminal. Let’s say your hosting provider gave you an Ubuntu server with an IP 100.101.102.103. To connect to its “Terminal”, you can issue the following command:

$ ssh root@100.101.102.103

This command assumes that you want to log in as a root user. If you don’t, just use the username supplied by your hosting provider. If you’re not comfortable with text interfaces, take your time, explore the file system and some basic commands. Mastering command line is a long and interesting journey, but you don’t need to be a command line guru to set up a webserver.

You can always disconnect from your remote shell by typing exit.

Setting Up Webserver Software (Apache2)

Debian-based systems keep internal database of available software. You can think of it as a large spreadsheet which has a bunch of columns like name, version and so on. Folks who maintain this database tend to update it rather often, so you might end up in a situation when your copy of this “spreadsheet” is a bit outdated. Luckily for us, Debian-based systems have a special command called apt which can be invoked in order to update our local package registry:

# APT stands for Advanced Package Tool, and we tend 
# to use terms "program" and "package" interchangeably

$ apt update

# Here you can see how it hits a bunch of official
# package repositories. They are grouped by their
# purpose and importance but you may think of it as
# of a single spreadsheet

> Get:1 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
> Get:2 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
> Get:3 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
> Get:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB]
> Get:5 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
> Get:6 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [274 kB]
> Get:7 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [27.6 kB]
> Get:8 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]      
> Get:9 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [727 kB]  
> Get:10 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]     
> Get:11 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [834 kB]
> Get:12 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]
> Get:13 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [29.8 kB]
> Get:14 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [974 kB]
> Get:15 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [1247 kB]
> Get:16 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [299 kB]
> Get:17 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [4305 B]
> Fetched 17.8 MB in 7s (2639 kB/s)
> Reading package lists... Done
> Building dependency tree
> Reading state information... Done
> 5 packages can be upgraded. Run 'apt list --upgradable' to see them.

Installing new software on Debian-based systems and Linux in general is very easy, especially if it’s included in the official repositories of your Linux distribution. Apache2 is popular, and that’s all you need to do in order to install and run it:

$ apt install apache2

Let’s check if it works. Try to open a following URL (don’t forget to use your server’s IP):

http://100.101.102.103/

Apache2 listens for new HTTP connection attempts made to your server, and it should show you a page with a few interesting tips and tricks. I highly recommend reading them.

This HTML page is located at /var/www/html/index.html Feel free to edit this page or replace its contents with something like this:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>My test page</title>
  </head>
  <body>
    <p>Hello World!</p>
  </body>
</html>

After refreshing your browser, you should see the changes you’ve made.

Deploying Static Website

So, it looks like everything located in /var/www/html/ is immediately visible from a browser window. What’s so special about this directory? First, don’t worry about other directories, they are NOT accessible via browser. You should think twice before putting anything into this directory, but that’s exactly where you should put your static website in order to make it visible to the rest of the world.

So, once you set up your server, publishing your writings is as simple as pasting a bunch of files into a directory. Here is an example of how to copy a directory named “website” from your computer to a remote server:

$ rsync        \
  --checksum   \
  --recursive  \
  --verbose    \
  <path_to_your_website>/ root@100.101.102.103:/var/www/html

# The actual output will depend on your data. It usually shows
# which files are changed since the last sync. It copies only
# the changed files, which makes it super fast to deploy changes

> sending incremental file list
> index.json
> index.xml
> blog/how-to-publish-things-online/index.html

> sent 74,448 bytes  received 11,884 bytes  24,666.29 bytes/sec
> total size is 163,314,189  speedup is 1,891.70

Where:

  • <path_to_your_website> is a path to the directory with your static website on your PC, something like /home/john/website.
  • root@100.101.102.103 identifies your server. Don’t forget the leading slash, it plays an important role.
  • /var/www/html is a path to a public web directory on your server.

This method of deployment is extremely fast and convenient. All changes will be available to your audience in no time. It will also show you which files have changed since the last sync, which can help you to detect unexpected changes and figure out what’s going on.

Setting Up DNS Records

URLs like http://100.101.102.103/ are hard to memorize. It’s one of the reasons why most websites use Domain Name System (DNS). Memorizing names is easier than memorizing numbers, so you can think of DNS as of a big spreadsheet which matches different names with different IP addresses. Let’s say you registered a domain name writings.com, now we need to connect it to our webserver. Our end goal is to make sure that when people type writings.com in their browsers, they will be shown your website. In the end, it’s just a simpler way of typing http://100.101.102.103/, your readers will surely appreciate the convenience.

To bind a domain name to your webserver’s IP address, you have to own both of those things first. There are plenty of companies selling domain names, the only requirement you should have is the ability to manage DNS records. Most providers allow that. Owning domain is not enough, you should also tell your domain where it should redirect all those browsers. This name-to-ip binding can be accomplished by simply adding the following DNS record:

Field Value
Record type A
Name @
Value 100.101.102.103
TTL Any sensible value. Choose one hour if can’t decide.

You might need to wait for a bit for those changes to be applied. Try to ping your domain name:

# -c is short for packet count. Personally, I don't like 
# short arguments due to their steeper learning curve, 
# but I guess they come handy if you use this command 
# hundreds of times

$ ping -c 4 writings.com

> PING writings.com (100.101.102.103) 56(84) bytes of data.
> 64 bytes from 100.101.102.103 (100.101.102.103): icmp_seq=1 ttl=49 time=37.7 ms
> 64 bytes from 100.101.102.103 (100.101.102.103): icmp_seq=2 ttl=49 time=39.2 ms
> 64 bytes from 100.101.102.103 (100.101.102.103): icmp_seq=3 ttl=49 time=37.4 ms
> 64 bytes from 100.101.102.103 (100.101.102.103): icmp_seq=4 ttl=49 time=39.0 ms
> --- writings.com ping statistics ---
> 4 packets transmitted, 4 received, 0% packet loss, time 3303ms
> rtt min/avg/max/mdev = 37.374/38.309/39.201/0.802 ms

If it shows the IP address of your webserver, we’re good to go. If not, don’t worry and go make some coffee, it can take a while.

Getting TLS Certificate From LetsEncrypt

At this point, your website should be reachable by the following URLs:

http://100.101.102.103/

http://writings.com/

You might have noticed that your browser is not comfortable with those URLs. The thing is: they use insecure HTTP, and there are many good reasons to only use secure HTTP, or HTTPS. Insecure connections aren’t private, and they enable ISPs and other actors to basically tap all of your communications over HTTP. That’s why it’s critically important to make your website available over HTTPS, and to do that, we need to obtain a thing called HTTPS certificate.

In ancient times, HTTPS certificates were expensive and hard to set up. Nowadays, thanks to Snowden revelations and EFF’s Let’s Encrypt project, your can get certificates for free, and they usually work out of the box.

First, I would recommend you to read about Let’s Encrypt and Certbot, although it’s not strictly necessary. In short, Certbot is an open-source program which can take care of setting up HTTPS certificates for you, free of charge. Now, let’s install it:

$ snap install --classic certbot

After installing Certbot, just run it and follow the instructions:

$ certbot

That’s it, now you have a website with a dedicated domain name. It also hides the traffic from anyone except your readers and yourself. HTTPS doesn’t let anyone to tap into your traffic and see what exactly your readers are interested in. Here is the final version of a website URL:

https://writings.com/

Conclusion

Publishing your writings in a self-sovereign way is really easy, but it might feel frightening if you aren’t familiar with the command line and server administration. The thing is: if writing is your work or even a hobby, being your own publisher is an investment worth having. The scheme I described allows you to update your website in no time, and it also delivers the best possible performance to your readers, saving them time and nerves.

A sceptical person might argue that all this “self-sovereignty” is a lie, because you’re still dependent on your hosting and DNS providers. Well, DNS is just a convenience feature, you can live without it. The real problem is your hosting provider, and the fact that it can block your webserver and take away your IP address. This is a seemingly unavoidable dependency and having this single dependency is still better than having many dependencies, isn’t it?

Is it even possible to make your website fully uncensorable? Yes, and it’s actually pretty easy. The method I described needs only a few little adjustments in order to make your website available via a Tor network. Tor services can be easily hosted from home, and they don’t even need IP addresses. That’s what we’re going to do next, stay tuned.