blog
PostgreSQL Provisioning and Deployment
MySQL and PostgreSQL – two open source relational database management systems, each with their own set of fans. For years, MySQL has been a source of fun and amusement in the PostgreSQL community – default, very relaxed SQL mode, lack of support for more complex joins, no bitmap indices and so on. On the other hand, MySQL fans pointed out the robustness of MySQL, and how easy it is to scale out with replication.
You may be asking – why would you use both of these databases in one environment? Is there any value in having a replicated PostgreSQL setup running alongside a Galera Cluster? In this blog post, we’ll try to answer this question from the MySQL DBA standpoint. We’ll also discuss different methods of deploying PostgreSQL.
Why PostgreSQL?
Performance of complex queries
One of the strong points of PostgreSQL is its ability to handle complex queries really well, and in an efficient way. PostgreSQL has the ability to use different join algorithms (like hash join), and it is very flexible when it comes to building optimal execution plans. MySQL, on the other hand, supports only nested loop join which may not be optimal for every query.
Another reason why PostgreSQL may be faster than MySQL is subquery handling. The MySQL optimizer is far from perfect, when it comes to executing subqueries. In the past, it executed most of them as a dependent subquery and manual query rewrite was required to speed up this type of query. In MySQL 5.6, some improvements have been made and now, the materialization of the subquery is possible. MySQL 5.7 improved handling of this type of queries even further, but it can still produce a less optimal query execution plan than PostgreSQL.
PL/pgSQL and more
Creating routines in MySQL is really hard work. The programming language available in it is limited and far less flexible than languages available in other RDBMS systems, PostgreSQL included. Building stored procedures in PostgreSQL are so much easier than in MySQL – it makes it possible to build a set of stored routines and move even complex operations to the database. With MySQL you will, most likely, have to move some of the logic to the application due to the limitations of MySQL’s stored procedure language. What’s even better with PostgreSQL is that PL/pgSQL is not the only language you can write procedures in – it can be C, but also PL/Tcl, PL/Perl and PL/Python, which are included in the core distribution.Many others can be added.
JSON and GIS support
PostgreSQL comes with support for JSON and GIS data – if you use one of those types of data, you may want to include PostgreSQL in your environment. When it comes to MySQL, GIS data is supported via MyISAM tables and it was only recently, introduced to InnoDB in MySQL 5.7. JSON data type is also available in MySQL 5.7. At the time of writing, Galera does not support 5.7 so Galera users have the option of either using a separate 5.7 instance or PostgreSQL for that type of data. Another argument for using PostgreSQL is the fact that GIS and JSON support in InnoDB is fairly new. Those features are not mature, and may need some time to iron out the early issues. PostgreSQL gives you a well-tested alternative.
Deploying PostgreSQL
Repositories
PostgreSQL is available in a majority of Linux distributions so it’s very likely you can install it through simple yum or apt-get command.
Once installed, PostgreSQL comes with a secured access – you can access it only from the localhost after you change your user to ‘postgres’. Once there, you can reach the PostgreSQL CLI:
root@ip-172-30-4-22 ~ # su - postgres
postgres@ip-172-30-4-22:~$ psql
psql (9.3.13)
Type "help" for help.
postgres=#
Of course, you can start managing your database right away, but it’d be much better to configure access correctly and create users.
Access methods for PostgreSQL are defined in a file: pg_hba.conf. It can be located in various places. On Ubuntu 14.04 it is located in /etc/postgresql/9.3/main/pg_hba.conf, on Centos 7 on the other hand it’s located by default in /var/lib/pgsql/data/pg_hba.conf. Let’s take a look at a sample from Ubuntu:
# Database administrative login by Unix domain socket
local all postgres peer
# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all peer
# IPv4 local connections:
host all all 127.0.0.1/32 md5
# IPv6 local connections:
host all all ::1/128 md5
As you can see, we have four rules defined. The first one tells us that user ‘postgres’ can access all databases locally as long as it’s logged in the OS as the same user (‘peer’ method). A similar rule governs the rest of the local access – you can create, let’s say, ‘myuser’ user in PostgreSQL and you’ll be able to access it passwordless as long as you are logged as ‘myuser’ in the OS. On the other hand, all local TCP connections, both for IPv4 and IPv6 require md5 passwords.
As next step, we need to create users other than ‘postgres’. We can do it easily by running:
postgres@ip-172-30-4-22:~$ createuser -P --interactive
Enter name of role to add: s9suser
Enter password for new role:
Enter it again:
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) n
postgres@ip-172-30-4-22:~$ createuser -P --interactive
Enter name of role to add: s9sadmin
Enter password for new role:
Enter it again:
Shall the new role be a superuser? (y/n) y
In our example above we’ve created two users – s9suser, which is intended to be a regular user, and s9sadmin, which will be a superuser. Both have been created with password. There are more ways you can create users with this tool, check the ‘–help’ or man output for more details.
Once users have been created, it’s time to create a database. It also can be done from shell (of course, all those steps can be executed using SQL, there’s no need to use shell tools for that).
postgres@ip-172-30-4-22:~$ createdb s9sdb
Finally, it’s time to access our database:
postgres@ip-172-30-4-22:~$ psql -U s9suser -W -h 127.0.0.1 s9sdb
Password for user s9suser:
psql (9.3.13)
SSL connection (cipher: DHE-RSA-AES256-GCM-SHA384, bits: 256)
Type "help" for help.
s9sdb=>
We are in. Unfortunately, at this point, we cannot connect to our PostgreSQL instance from outside of our localhost. To change this, a few steps are required. First, we need to make our PostgreSQL listen on interface other than loopback. We need to find and edit the postgresql.conf file. It’s located in the same directories as pg_hba.conf: /var/lib/pgsql/data/postgresql.conf for Centos 7 and /etc/postgresql/9.3/main/postgresql.conf for Ubuntu 14.04. When you open it with a text editor of your choice, you’ll see, among others, the following entry:
#listen_addresses = 'localhost' # what IP address(es) to listen on;
# comma-separated list of addresses;
# defaults to 'localhost'; use '*' for all
# (change requires restart)
As can be seen here, by default, PostgreSQL listens only on the localhost and we need to change this value to something else in order to accept external connections. Such change requires a restart of the service. Let’s change it to our local IP and restart PostgreSQL. After it comes back up, we can confirm it’s indeed listening there:
root@ip-172-30-4-22 ~ # netstat -lnp | grep 5432
tcp 0 0 172.30.4.22:5432 0.0.0.0:* LISTEN 20026/postgres
unix 2 [ ACC ] STREAM LISTENING 2000624 20026/postgres /var/run/postgresql/.s.PGSQL.5432
Let’s try to connect to our PostgreSQL from a different host:
[root@ip-172-30-4-212 ~]# psql -h 172.30.4.22 -U s9sadmin s9sdb
psql: FATAL: no pg_hba.conf entry for host "172.30.4.212", user "s9sadmin", database "s9sdb", SSL on
FATAL: no pg_hba.conf entry for host "172.30.4.212", user "s9sadmin", database "s9sdb", SSL off
We got the complaint about a lacking pg_hba.conf entry. We need to fix this by adding the following line in pg_hba.conf:
host all all 172.30.4.212/32 md5
Obviously, if you have more clients, you may need to add more entries for other hosts or you can define a broader network range with masks. At the end you may even just use 0.0.0.0/0 entry to accept traffic from every source.
Changes in pg_hba.conf require reloading of the PostgreSQL service before they are applied to the connections. Please keep in mind that reloading PostgreSQL (i.e. service postgresql reload) does not break current connections so it’s fairly non-disturbing way of applying configuration changes like those. Finally, let’s check if we can connect from an external host:
[root@ip-172-30-4-212 ~]# psql -h 172.30.4.22 -U s9sadmin s9sdb
Password for user s9sadmin:
psql (9.2.15, server 9.3.13)
WARNING: psql version 9.2, server version 9.3.
Some psql features might not work.
SSL connection (cipher: DHE-RSA-AES256-GCM-SHA384, bits: 256)
Type "help" for help.
s9sdb=#
Yes, we are in!
Deployment using orchestration software
Obviously, repositories are not the only way to install PostgreSQL. These days, automation is key to efficiently manage your environment and tools so it’s crucial that software can be easily deployed and configured using those tools.
Ansible
PostgreSQL is supported to a large extent through different core modules like postgresql_db to manage databases, postgresql_user to manage users, postgresql_privs to manage privileges and a couple of others. You can find more information on them in the Ansible documentation.
Chef
A cookbook for PostgreSQL prepared by Opscode can be found in Chef supermarket. “Postgresql” cookbook will take care of installation of the database and configuration of PostgreSQL
Puppet
A “postgresql” module from puppetlabs supports installation and configuration of PostgreSQL. It does support user management or even initial configuration tuning.
ClusterControl
PostgreSQL deployment is supported directly from ClusterControl – one can easily deploy a master – slave environment in a couple of clicks. The deployed instances are automatically configured. Other functionality include monitoring and trending data, database advisors and backups.
Summary
As you can see, PostgreSQL can be easily deployed. No matter how you deploy it, either using one of the automation tools or doing it manually from repo, some initial configuration will be required. We will look into it in our next post in this series.