Igor Bubelov
About Blog Notes Photos

How to Publish Things Online

Self-Hosting · Linux · May 13, 2021

This is an opinionated yet comprehensive guide on publishing your writings and other stuff online. The method I describe is optimized for simplicity, minimalism, vendor-independence and censorship resistance.

Photo by Bruno Martins
Photo by Bruno Martins

Table of Contents

Isn’t Publishing a Solved Problem?

There are many ways to publish things online, why bother inventing the wheel? Most publishing services are easy to use, but they usually hide tremendous complexity behind the scenes. Hidden complexity can be more nasty than the visible one. By choosing a complex system managed by a third party you make yourself vulnerable to censorship and exploitation. The approach I describe removes this dependency while delivering exceptional performance.

Why Go Static?

Most websites are dynamic, which means they generate webpages “on the fly”. To show you this webpage, your browser sent a request to this website. In response, it got an HTML document, and it doesn’t really know nor care if this document was pre-generated or not. As a visitor, you aren’t supposed to tell the difference between static and dynamic websites. In practice, it’s easy to figure out if a website is static or not. If it’s fast, it’s static. If it’s slow, it’s most likely dynamic. Static websites already have all the answers to all possible questions, so they can quickly return pre-made HTML documents, usually stored as simple HTML files somewhere on a webserver. That’s the secret of static websites: since all pages are pre-generated, there is no need to “think” and generate pages on the fly.

Some websites are inherently dynamic. Let’s say we have a website which has an authentication system with a user profile page, something like https://example.com/profile/. This profile page should show different data to different users. People have different email addresses, profile pics and nicknames, but they all go to a single webpage which is expected to be filled with different data, depending on who requests it. That sort of scenario is where dynamic websites shine. Basically, “dynamic” means “unpredictable”. When it comes to publishing your writings, there is no need to use dynamic pages. You usually have all the content before you hit “publish”, so it makes sense to pre-generate all of your HTML pages.

There is nothing wrong with using dynamic websites when you really need them. Unfortunately, there are a lot of dynamic websites which don’t use nor need dynamic features. By using dynamic website when it’s unnecessary, you get zero benefits while paying huge extra costs. The thing is, dynamic features aren’t free. They are a disaster both for website authors and their audience.

Let’s start with a reader’s standpoint. One of the surest ways to find out if a website is dynamic or not is to check how fast it loads. Remember, only static websites know all the answers, dynamic websites operate more like a lead character in Groundhog Day: they have to start their routine from scratch every time a new request comes. They don’t have content on hand, just the path to the page that doesn’t really exist. How can they generate HTML pages to return to your browser? Well, they have to query their database, which means you need to have a database in order to publish your writings, and you also need to query that database every time and wait for it to respond with the relevant data. All of this makes you life harder and slow down your website.

What about authors and publishers? You need to have a webserver in order to publish HTML documents, but if your pages are dynamic and parts of their content are stored in a database somewhere, you also have to set up and manage that database as well. Complexity is the enemy of resilience and self-sovereignty. Of course, this unnecessary complication leads to unnecessary dependence on various third-party services which promise to make your life a bit simpler. Those services aren’t charities, and they have an incentive to keep you dependent on them. They also need to be flexible in order to meet the needs of their huge and diverse user base. Don’t expect the end result to be optimised for your particular situation.

I know a guy who has a website which exposes various financial fraud schemes. Obviously, the fraudsters are dreaming of taking his website down. Luckily for them, it’s a dynamic WordPress website, so they flood it with millions of requests from time to time, which is enough to take it down for days. This guy struggled with this problem for a long time and ended up hiring some expensive consultants to help him solve this problem. It didn’t solve the problem completely, but it made the attacks more expensive. That’s a real-world example of how “slowness” of your website might be used to censor you.

Exposing fraudsters isn’t risk-free, but what about exposing nation states? These entities have tremendous influence, and they don’t like when someone exposes their dirty tricks. The method I describe makes it easy to resist censorship attempts even when they come from the most powerful world governments.

How to Create Static Websites

A static website is just a collection of HTML pages and other related resources such as images, videos, styles and so on. HTML pages are text documents which can be created or modified in a text editor. There are plenty of free and open source text editors, and it’s easy to switch from one editor to another.

Personally, I don’t recommend working with HTML files directly. There are lots of tools that can help you to generate beautiful static websites from a bunch of simple Markdown files. Markdown is the format used by Wikipedia contributors, and I bet most of them don’t know how to work with HTML. Markdown helps authors focus on their texts and media, there are plenty of open source tools which accept Markdown files and “convert” them into HTML files. The publishing method I describe is agnostic to how you create your static website, but if you are interested in HTML generators, I’d encourage you to try Hugo.

Websites and Webservers

Now that we have a website, it’s time to figure out how to publish it. Web browsers act as HTTP “clients”, and they know polite ways of asking HTTP “servers” to serve webpages for them. You can find a full description of HTTP protocol here. Don’t be afraid of wordly technical documents, HTTP is actually pretty simple, and you can benefit from knowing how it works, even as a casual browser user.

Client and server are popular abstractions used in many protocols, including HTTP. Both client and server should speak the same language, but their tasks and goals are different. By publishing a website, what we really want is to make it possible for various HTTP clients (browsers) to see it. The “social” goals may vary, of course. Maybe you’re a journalist who exposes corruption, or maybe you want to share your hobbies with the rest of the world. On a technical level, it’s all the same.

Websites and webservers are different things. Websites can’t serve themselves, which means we need to install a special piece of software called a webserver and tell it to serve our websites to all interested clients. There are two most popular and mature open source webservers: Apache2 and Nginx. It’s really hard to choose the winner here, because they both have a similar market share of about 35% and both of them are fast and well-documented. In this guide, I’m going to use Apache2, because I use Nginx most of the time and playing with a new piece of software is fun, isn’t it?

Operating System

The most popular server operating system is Linux. Every website needs a webserver and every webserver needs an operating system. There are plenty of free and open source Linux distributions. It can get a bit confusing because Linux may mean both an operating system kernel and an operating system itself. All Linux distributions share the same kernel, and they aren’t that different from each other. It’s hard to make a wrong choice here, but I would recommend you to stick with a popular distribution such as Ubuntu.

Some people point out that Ubuntu isn’t as free and non-binding as the rest of our dependencies. I partly agree, but we shouldn’t forget that Ubuntu is based on Debian and you can always move to Debian without having to learn a completely new set of tools. Canonical offers a good product for free and there are serious checks on its powers, so I wouldn’t worry about that, for now.

Hosting Provider

Websites need webservers, webservers need operating systems, but what does an operating system need? Well, an operating system is a piece of software which can run other pieces of software, but it can’t run itself. Software can’t run without hardware. The line here isn’t that clear in a sense that software is also a kind of hardware. It’s not an abstract idea disconnected from a physical world. Software has a physical form, it “lives” on our storage devices such as SSDs.

Philosophical questions aside, we really need hardware in order to run an operating system. That naturally leads us to a hosting provider. This part is pretty important, because some hosting providers don’t give their customers direct access to their servers. Sometimes, you can’t even choose an operating system. Those providers try to place themselves “in the middle” between your website and its visitors. Their marketing departments work hard to convince you not to bother with setting up your own private server and to buy their pre-made setups which are often locked-in on certain software such as WordPress. This is called “managed” hosting, and it may sound like a good idea at first, until you experience customer lock-in, terrible user interfaces, and unresponsive support. Managing your own server isn’t that hard, and it will save you a lot of time and nerves in the long run.

The choice of a hosting provider isn’t that important, as long as it gives you full access to your server. I often use Digital Ocean, and their service is more or less tolerable. I also use Scaleway, and it’s a good and cheap choice if you want to host your website in Europe. As a rule of thumb, your website should be as close to its visitors as possible. Light travels fast, but not as fast to make the distance unnoticeable when you open a website hosted in the other end of the world.

Connecting to Your Server

Servers are a bit different from traditional desktop computers. They rarely have a graphical user interface, and the best way to manage them is by using a textual interface. You can also use a textual interface to manage your Linux desktop by using a “Terminal” app. Windows machines also have their own flavor of Terminal. Let’s say your hosting provider gave you an Ubuntu server with an IP 100.101.102.103. To connect to its “Terminal”, you can issue the following command:

$ ssh root@100.101.102.103

This command assumes that you want to log in as a root user. If you don’t, just use the username supplied by your hosting provider. If you’re not comfortable with text interfaces, take your time, explore the file system and some basic commands. Mastering command line is a long and interesting journey, but you don’t need to be a command line guru to set up a webserver.

You can always disconnect from your remote shell by typing exit.

Setting Up Webserver Software (Apache2)

Debian-based systems keep internal database of available software. You can think of it as a large spreadsheet which has a bunch of columns like name, version and so on. Folks who maintain this database tend to update it rather often, so you might end up in a situation when your copy of this “spreadsheet” is a bit outdated. Luckily for us, Debian-based systems have a special command called apt which can be invoked in order to update our local package registry:

# APT stands for Advanced Package Tool, and we tend 
# to use terms "program" and "package" interchangeably

$ apt update

# Here you can see how it hits a bunch of official
# package repositories. They are grouped by their
# purpose and importance but you may think of it as
# of a single spreadsheet

> Get:1 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
> Get:2 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
> Get:3 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
> Get:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB]
> Get:5 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
> Get:6 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [274 kB]
> Get:7 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [27.6 kB]
> Get:8 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]      
> Get:9 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [727 kB]  
> Get:10 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]     
> Get:11 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [834 kB]
> Get:12 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]
> Get:13 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [29.8 kB]
> Get:14 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [974 kB]
> Get:15 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [1247 kB]
> Get:16 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [299 kB]
> Get:17 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [4305 B]
> Fetched 17.8 MB in 7s (2639 kB/s)
> Reading package lists... Done
> Building dependency tree
> Reading state information... Done
> 5 packages can be upgraded. Run 'apt list --upgradable' to see them.

Installing new software on Debian-based systems and Linux in general is very easy, especially if it’s included in the official repositories of your Linux distribution. Apache2 is popular, and that’s all you need to do in order to install and run it:

$ apt install apache2

Let’s check if it works. Try to open a following URL (don’t forget to use your server’s IP):

http://100.101.102.103/

Apache2 listens for new HTTP connection attempts made to your server, and it should show you a page with a few interesting tips and tricks. I highly recommend reading them.

This HTML page is located at /var/www/html/index.html Feel free to edit this page or replace its contents with something like this:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>My test page</title>
  </head>
  <body>
    <p>Hello World!</p>
  </body>
</html>

After refreshing your browser, you should see the changes you’ve made.

Deploying Static Website

So, it looks like everything located in /var/www/html/ is immediately visible from a browser window. What’s so special about this directory? First, don’t worry about other directories, they are NOT accessible via browser. You should think twice before putting anything into this directory, but that’s exactly where you should put your static website in order to make it visible to the rest of the world.

So, once you set up your server, publishing your writings is as simple as pasting a bunch of files into a directory. Here is an example of how to copy a directory named “website” from your computer to a remote server:

$ rsync        \
  --checksum   \
  --recursive  \
  --verbose    \
  <path_to_your_website>/ root@100.101.102.103:/var/www/html

# The actual output will depend on your data. It usually shows
# which files are changed since the last sync. It copies only
# the changed files, which makes it super fast to deploy changes

> sending incremental file list
> index.json
> index.xml
> blog/how-to-publish-things-online/index.html

> sent 74,448 bytes  received 11,884 bytes  24,666.29 bytes/sec
> total size is 163,314,189  speedup is 1,891.70

Where:

  • <path_to_your_website> is a path to the directory with your static website on your PC, something like /home/john/website.
  • root@100.101.102.103 identifies your server. Don’t forget the leading slash, it plays an important role.
  • /var/www/html is a path to a public web directory on your server.

This method of deployment is extremely fast and convenient. All changes will be available to your audience in no time. It will also show you which files have changed since the last sync, which can help you to detect unexpected changes and figure out what’s going on.

Setting Up DNS Records

URLs like http://100.101.102.103/ are hard to memorize. It’s one of the reasons why most websites use Domain Name System (DNS). Memorizing names is easier than memorizing numbers, so you can think of DNS as of a big spreadsheet which matches different names with different IP addresses. Let’s say you registered a domain name writings.com, now we need to connect it to our webserver. Our end goal is to make sure that when people type writings.com in their browsers, they will be shown your website. In the end, it’s just a simpler way of typing http://100.101.102.103/, your readers will surely appreciate the convenience.

To bind a domain name to your webserver’s IP address, you have to own both of those things first. There are plenty of companies selling domain names, the only requirement you should have is the ability to manage DNS records. Most providers allow that. Owning domain is not enough, you should also tell your domain where it should redirect all those browsers. This name-to-ip binding can be accomplished by simply adding the following DNS record:

Field Value
Record type A
Name @
Value 100.101.102.103
TTL Any sensible value. Choose one hour if can’t decide.

You might need to wait for a bit for those changes to be applied. Try to ping your domain name:

# -c is short for packet count. Personally, I don't like 
# short arguments due to their steeper learning curve, 
# but I guess they come handy if you use this command 
# hundreds of times

$ ping -c 4 writings.com

> PING writings.com (100.101.102.103) 56(84) bytes of data.
> 64 bytes from 100.101.102.103 (100.101.102.103): icmp_seq=1 ttl=49 time=37.7 ms
> 64 bytes from 100.101.102.103 (100.101.102.103): icmp_seq=2 ttl=49 time=39.2 ms
> 64 bytes from 100.101.102.103 (100.101.102.103): icmp_seq=3 ttl=49 time=37.4 ms
> 64 bytes from 100.101.102.103 (100.101.102.103): icmp_seq=4 ttl=49 time=39.0 ms
> --- writings.com ping statistics ---
> 4 packets transmitted, 4 received, 0% packet loss, time 3303ms
> rtt min/avg/max/mdev = 37.374/38.309/39.201/0.802 ms

If it shows the IP address of your webserver, we’re good to go. If not, don’t worry and go make some coffee, it can take a while.

Getting TLS Certificate From LetsEncrypt

At this point, your website should be reachable by the following URLs:

http://100.101.102.103/

http://writings.com/

You might have noticed that your browser is not comfortable with those URLs. The thing is: they use insecure HTTP, and there are many good reasons to only use secure HTTP, or HTTPS. Insecure connections aren’t private, and they enable ISPs and other actors to basically tap all of your communications over HTTP. That’s why it’s critically important to make your website available over HTTPS, and to do that, we need to obtain a thing called HTTPS certificate.

In ancient times, HTTPS certificates were expensive and hard to set up. Nowadays, thanks to Snowden revelations and EFF’s Let’s Encrypt project, your can get certificates for free, and they usually work out of the box.

First, I would recommend you to read about Let’s Encrypt and Certbot, although it’s not strictly necessary. In short, Certbot is an open-source program which can take care of setting up HTTPS certificates for you, free of charge. Now, let’s install it:

$ snap install --classic certbot

After installing Certbot, just run it and follow the instructions:

$ certbot

That’s it, now you have a website with a dedicated domain name. It also hides the traffic from anyone except your readers and yourself. HTTPS doesn’t let anyone to tap into your traffic and see what exactly your readers are interested in. Here is the final version of a website URL:

https://writings.com/

Conclusion

Publishing your writings in a self-sovereign way is really easy, but it might feel frightening if you aren’t familiar with the command line and server administration. The thing is: if writing is your work or even a hobby, being your own publisher is an investment worth having. The scheme I described allows you to update your website in no time, and it also delivers the best possible performance to your readers, saving them time and nerves.

A sceptical person might argue that all this “self-sovereignty” is a lie, because you’re still dependent on your hosting and DNS providers. Well, DNS is just a convenience feature, you can live without it. The real problem is your hosting provider, and the fact that it can block your webserver and take away your IP address. This is a seemingly unavoidable dependency and having this single dependency is still better than having many dependencies, isn’t it?

Is it even possible to make your website fully uncensorable? Yes, and it’s actually pretty easy. The method I described needs only a few little adjustments in order to make your website available via a Tor network. Tor services can be easily hosted from home, and they don’t even need IP addresses. That’s what we’re going to do next, stay tuned.