Using GeoIP2 Databases With NGINX

Programming |

Updated on

I’ve been using Google Analytics for a while but I never liked the side effects of this way of collecting website usage data such as having to install Google’s tracking scripts on all of my web pages. Those scripts are bad for both user privacy and website performance so I’ve decided to get rid of them and find an alternative way to collect the data I’m interested in. In this post I’m going to show how to set up a NGINX web server that can log the location of it’s clients.

View on Earth from space

NGINX Access Log

The default NGINX access log entries have the following format:

1
2
3
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                '$status $body_bytes_sent "$http_referer" '
                '"$http_user_agent" "$http_x_forwarded_for"';

Which produces the following output to your NGINX’s access.log file:

1
2
3
183.88.21.120 - - [16/Apr/2019:07:03:23 +0000] "GET / HTTP/1.1" 
200 612 "-" 
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0" "-"

As you can see, NGINX provides us with a lot of useful information by default. We can see the IP addresses of our clients as well as their operating systems and their browser info. Unfortunately, there is no information about the geographical location of our visitors. I’m not talking about GPS precision of course but it would be nice to know the country of origin of every incoming request. The country info may be used for analytics as well as for blacklisting certain countries from accessing the web server.

Where Are Our Clients From?

It’s not that hard to find a location of your visitors if you know their IP addresses so the only thing we have to do is to check those IP addresses against the countries database. Luckily for us, such a database is available for free on MaxMind website

In order to use this database, you need to install the MaxMind DB C library on your webserver. The installation instructions are provided on GitHub and here is how a typical setup might be performed on Alpine Linux:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
apk add alpine-sdk perl
MAXMIND_VERSION=1.2.1
wget https://github.com/maxmind/libmaxminddb/releases/download/${MAXMIND_VERSION}/libmaxminddb-${MAXMIND_VERSION}.tar.gz
tar xf libmaxminddb-${MAXMIND_VERSION}.tar.gz
cd libmaxminddb-${MAXMIND_VERSION}
./configure
make
make check
make install
ldconfig

That’s it, now you can use the libmaxminddb utility to get the geo location of any IP address. Here is the example:

1
2
3
4
5
mmdblookup \
  --file /usr/share/geoip/GeoLite2-Country.mmdb \
  --ip 46.35.64.0 country names en

 "Yemen" <utf8_string>

Connecting GeoIP2 Database to NGINX

Now it’s time to automate our IP address recognition routine by connecting mmdblookup utility to NGINX via ngx_http_geoip2_module.

git clone https://github.com/leev/ngx_http_geoip2_module /ngx_http_geoip2_module

Next, you need to call the NGINX’s ./configure command with the following parameter:

--add-dynamic-module=/ngx_http_geoip2_module

Adding Country Info to NGINX Access Log

Now let’s open our nginx.conf and add the following lines:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#...

load_module modules/ngx_http_geoip2_module.so;

#...

http {
    #...

    geoip2 /usr/share/geoip/GeoLite2-Country.mmdb {
        $geoip2_data_country_code source=$remote_addr country iso_code;
        $geoip2_data_country_name source=$remote_addr country names en;
    }  

    log_format  main_geo  '$remote_addr - $remote_user [$time_local] "$request" '
                          '$status $body_bytes_sent "$http_referer" '
                          '"$http_user_agent" "$http_x_forwarded_for" '
                          '$geoip2_data_country_code $geoip2_data_country_name';

    access_log /var/log/nginx/access.log main_geo;
    
    #...
}

You can see the whole nginx.conf file here: https://github.com/bubelov/nginx-alpine-geoip2/blob/master/nginx.conf

The log format defined above is identical to the default log format you can find with most of the NGINX distributions, except that we appended 2 more variables to the end of each log entry: $geoip2_data_country_code and $geoip2_data_country_name. Those variables hold the country info returned by the mmdblookup utility.

Running GeoIP2 Enabled NGINX in Docker

I’ve created this sample project that shows how to use such a setup in Docker. This project is based on the official nginx-alpine Dockerfile with minimal modifications so you may assume that it is configured in the same way as the official image, except for the added plugin and database lookup utility. It also has the countries database built in but you can always override it with your own database using the Docker volume mounting features.

You can build this project by yourself or just use a pre built image in order to run an already configured NGINX instance based on nginx-alpine image:

docker run --rm -p 80:80 bubelov/nginx-alpine-geoip2

Check the log output now. You should see the country info at the end of your log records:

1
2
3
4
183.88.21.120 - - [16/Apr/2019:09:08:55 +0000] "GET / HTTP/1.1" 
200 612 "-" 
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0" "-" 
TH Thailand

Alternative

It is also possible to enrich NGINX logs with geo data at a latter stage. We may let NGINX to write logs as is in its default format and then process them line by line, mapping IP addresses to geo locations. I really like Filebeat Nginx module which utilizes such an approach and it works out of the box with Elastic stack.

Conclusion

Now we have a NGINX server that can identify the geo location of it’s clients. This information is stored in the NGINX’s access.log and can be processed by your metrics or log analysis tools of choice such as Logstash, Telegraf and so on. You can also get more precise locations by plugging in the cities database by following the instructions provided in ngx_http_geoip2_module repository.