Categories
docker postgres

Recreating YikYak with Postgres.

YikYak was an anonymous social network that used your location to show you posts 5km around you. Users of the app could create new posts and the people around them could view the posts and vote up or down.

YikYak filed a few patents for the tech that helped them achieve this. The patents mention segmenting users into buckets by their physical location. One modern tool we have to recreate this type of user segmentation is a data-structure called an R-Tree.

An example on an R-Tree in action

R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons.

Luckily the Postgres database enables us to make use of this data-structure via geospatial extensions. In this post I am going to;

  1. Show how we can enable those extensions.
  2. Seed a few posts into our database.
  3. Find the posts in a small around a specific latitude and longitude using a SQL query.

Let’s get started!


Creating tables.

Firstly you will need an instance of Postgres. It is easy to set up in Docker (I’ve detailed a post here showing how).

I am going to be using DBeaver for this tutorial but you could use psql or any other Postgres connector. Let’s creating a new table for our posts.

Select the SQL Editor
Chose whatever Database you want. I am going with Postgres
Name your script

Ready to go – So below we have a simple example of table for storing new posts. I am using a split latitude and longitude to show how the extensions work, but you could also combine the two into a POINT datatype if you are planning to use a lot of columns.

CREATE TABLE post (
	id int8 NOT NULL GENERATED ALWAYS AS IDENTITY,
	post_content text NOT NULL,
	latitude float8 NOT NULL,
	longitude float8 NOT NULL
);

On executing that you should have a table you can start insert values into.

Inserting posts.

So let’s start out by inserting two posts, the first posted from 10 Downing Street, and the second from Buckingham Palace.

INSERT INTO post VALUES (
	default,
	'I absolutely love the Queen. I hope she thinks I am doing a good job.',
	51.5034,
	0.1276
);
INSERT INTO post VALUES (
	default,
	'The new Prime Minister is a prat! I do hope he doesnt come over often',
	51.5014,
	0.1419
);

Now let’s put another post in from an aspiring politics student who is located in Cambridge University (65 miles away). Now we have an outlier that won’t show up once we do location bound queries later in this tutorial.

INSERT INTO post VALUES (
	default,
	'Day one of my politics degree. Shall be most fun to stalk the halls of Westminister in 4 years.',
	52.2053,
	0.1218
);

Installing Postgres extensions

We would like to be able to stand in St. James park (a large park between 10 Downing Street and Buckingham Palace) and see the two posts close by, but not the one from Cambridge.

So how do we do that? Through extensions! Postgres enables users to incrementally add features that help us do new things with our data.

Once they are installed we can use the latitude and longitude of 51.5032, -0.1349 to create a new select query on our posts table.

You can install extensions in Postgres simply by running a query. The two extensions we need are cube and earthdistance.

CREATE EXTENSION IF NOT EXISTS cube;
CREATE EXTENSION IF NOT EXISTS earthdistance;

After executing those two queries, you should see them under the ‘Extensions’ tab in DBeaver.

Finding nearby posts.

We can now use these built in functions from those extensions to show us the two nearby posts.

SELECT * FROM post
WHERE 
	earth_box(ll_to_earth(51.5032,-0.1349), 50000) 
	@> ll_to_earth(latitude, longitude);

The earth_box function takes two parameters, a point (which is returned by the ll_to_earth function) and a value for the size of the bounding box we want which is in metres.

By using the contains? operator (@>) we are saying we only want values in the table in the bounding box generated by the earth_box function.

When executing that query we will see the two posts we were expecting! Try increasing the bounding box range out and you will be able to see the Cambridge post.

So now we have a working example of how to recreate the YikYak location-based functionality.

So…how?

Okay so why did we need those extensions? Can’t we just take the world, split it into squares and determine which box a latitude and longitude falls into?

Thats what we would like to do – but there are complications caused by the fact that the world is a sphere. To find posts “in your area” you are querying to find straight line distances between two points, your lat-long and for each row in the database. In a sphere there are no straight lines.

There is a way to determine the distance between two points known as the Great-Circle distance. Instead of using straight lines we use circles or curves known as geodesics. Through any two points on a sphere that are not directly opposite each other, there is a unique great circle.

The earthdistance extension allows us to generate queries using the contains? operator from the cube extension to generate efficient distance lookups between points.

Conclusion

One thing to note is that this query will do a sequential scan of the entire table, which can be slow once you get up to thousands of posts.

If you do decide to use this setup in your application you should create an index on the latitude, longitude to dramatically speed up queries. That would look like this.

CREATE INDEX loc_index ON post USING gist (ll_to_earth(latitude, longitude));

Postgres will then determine whether it needs to use this index to speed up queries. You can check if the index is being used by using a tool to view the execution plan when you run the query detailed above. If it says SEQ_SCAN it is not using the index.

And we’re done! If you’ve noticed any mistakes or improvements I can make please drop me an email at adam@adamfallon.com

Categories
docker

Postgres on Docker

Don’t care about why you would want to run Postgres in Docker? Skip to the commands.

Applications and Websites are a little bit useless without data.

So we intrepid developers like to hold that data somewhere to be used later. There are many forms of storage that we can use, such as cookies, local storage in the browser, a SQLite database for our iOS and Android apps or a database that we can pull data out of using an API.

When working on a project that uses a database, it can be painful to keep every ones environments in sync – if Alice is using Postgres version 4 and Bob is using version 6 there could be a time where Bob writes some procedure that runs fine on his database but when Alice tries to run it – it blows up!

This is where Docker comes in handy, you can ship a Dockerfile in your projects source code, put it up on Github and tell all the developers working on your project to run that Dockerfile and you can rest assured that your code will continue to work.

In this tutorial I am going to explain how to install & use Postgres, a popular relational database.

Lets get started!

Installing Docker

First we need to setup Docker if you haven’t already done so. Once that is done, make sure Docker is running by opening your terminal and typing.

docker --version 

If you get an error along the lines of;

Cannot connect to the Docker daemon. Is the docker daemon running on this host?

You need to run Docker CE. How to do that changes between operating systems click here for a thread on how to troubleshoot that.

If everything went well you should see the current version number and are ready to go on to the next steps.

Creating a persistent volume

We want to create a folder on our computer so when we restart the Docker container all of our data will still be there. If we didn’t take this step, each time we start and stop our container we would be starting from scratch!

Open your terminal and type the following to create that volume.

mkdir -p $HOME/docker/volumes/postgres

Pulling Postgres Image from DockerHub

Here we are going to see the beauty of Docker. So the Postgres team have already created a preset image that developers who want to use Postgres can download and use. It contains all the code packaged in a way that can be shared and used with confidence by all the developers in your team. Okay, lets hop back over to your terminal and run;

docker pull postgres

Running a container using the Postgres Image

Now we want to create an instance of a database (i.e we want to create a container using the Postgres image we just pulled as a base) and we need to pass it some parameters for it to work the way we want it to.

docker run \
    --rm --name pg-docker \
    -e POSTGRES_PASSWORD=docker \
    -d -p 5432:5432 \
    -v $HOME/docker/volumes/postgres:/var/lib/postgresql/data \
    postgres

So lets unpack this a little;

  • docker run -rm –name pg-docker
    • Run an container (removing it if it already exists) called pg-docker.
  • -e POSTGRES_PASSWORD=docker
    • Set the environment variable POSTGRES_PASSWORD to ‘docker’
  • -d -p 5432:5432
    • -d = Run in detached mode (i.e return control to the terminal after running.)
    • -p Map local port 5432 to Container port 5432 (the default Postgres Port)
  • -v $HOME/docker/volumes/postgres:/var/lib/postgresql/data
    • Map the local volume we created earlier to the internal folder on the container named /var/lib/postgresql/data
  • postgres
    • Use the image named ‘postgres’ which we pulled earlier

Congrats!

You should now be running Postgres on Docker!

You can verify by running

docker ps | grep pg-docker

Which should show you the following;

Postgres running in Docker!

Now you can use psql to connect to the running instance, or connect to it from your applications, or view the database using a GUI tool like DBeaver!

Fancy making something useful now you’ve got Postgres set up? Click here to see how to use Postgres to make a location-based social network.


TL;DR

docker pull postgres;
mkdir -p $HOME/docker/volumes/postgres;
docker run \
    --rm --name pg-docker \
    -e POSTGRES_PASSWORD=docker \
    -d -p 5432:5432 \
    -v $HOME/docker/volumes/postgres:/var/lib/postgresql/data \
    postgres
psql -h localhost -U postgres -d postgres