Category Archives: Databases

[Video 435] Simon Riggs: Databases — The Long View

Databases form the cornerstone of many applications, Web sites, and platforms. A huge amount of time, money, and research has been poured into databases over the last few decades. But our thirst for data, and the quantities that we’re trying to read and analyze, continue to grow. What can and should we do? How can we ensure reliability?  How can we communicate with a growing number of other systems? And where does PostgreSQL, an open-source relational database with a growing number of features, fit into this trend? In this talk, Siimon Riggs answers all of these questions, and describes how PostgreSQL’s developers are working to keep as many of these questions in mind as they continue to improve their contribution to the world of databases.

Time: 22 minutes

[Video 378] Jim Mlodgenski: PostgreSQL Federation

PostgreSQL is a well-known and powerful relational database. But it is increasingly a platform that can integrate with other databases and storage systems, using a number of “foreign data wrappers” and other integration mechanisms. How do these work, and how do they allow you to read from and write to other PostgreSQL servers, relational database systems, and even NoSQL systems? And how can PostgreSQL thus become the center of a heterogeneous database federation, rather than a standalone server? In this talk, Jim Mlodgenski introduces the theory and practice of combining various database systems into a single system, all coordinated by PostgreSQL.

[Video 356] Alvaro Hernandez: Logical Decoding

For several years now, PostgreSQL has supported streaming replication. The idea, simply put, is that a master database streams database changes to one or more slave machines. The slave machines then replay those changes, providing us with identical PostgreSQL machines that can be queried in read-only mode. But the setup of such slaves can be a bit complex, and the streaming describes the precise changes that occurred in the database, rather than a logical description of those changes. Logical decoding is a new feature in PostgreSQL that opens to door to better streaming replication, and also to new applications that can take advantage of these logical descriptions. In this talk, Alvaro Hernandez describes the motivation behind logical decoding, the ways in which you can implement it, and ways in which you might want to take advantage of it — for replication, but beyond it, as well.

[Video 353] Sean Chittenden: PostgreSQL Service At Scale

PostgreSQL is a popular open-source database. But can it handle large-scale applications? In particular, can it scale up to handle large Web applications, with millions of users and many billions of dollars at stake? The answer is “yes,” but that just raises the question of “how,” and what techniques can (and should) be used to scale it up in this way. In this talk, Sean Chittenden describes the aspects of scaling, and some of the different ways and layers that can fail. He then describes how Groupon has tried to reduce the risk of failure, and how they have used PostgreSQL in this context, and the ways that you can tune context appropriately (and inappropriately) to scale it up.

[Video 347] Josh Berkus: Explain Explained

If your PostgreSQL query runs slowly, what should you do? How can you find out what has gone wrong, and how to optimize it? The “explain” command (and related ‘”explain analyze” command) tell you what the PostgreSQL query planner intends to do, and then (if you use the “analyze” option) executes the query, as well.  The problem is that the output from “Explain” can often be cryptic to newcomers. In this talk, Josh Berkus introduces the “explain” command, and shows how database developers and administrators can use it to improve the speed of their queries.

[Video 345] Peter Geoghegan: UPSERT Use Cases

The upcoming (9.5) version of PostgreSQL will include the long-awaited “UPSERT” functionality, which makes it possible to insert-or-update with a single query.  How is this different from other functionality, such as MERGE? What are the use cases for this functionality? And what should we keep in mind when we use it? In this talk, Peter Geoghegan introduces UPSERT, compares it with other alternatives (including in MySQL), and demonstrates when and why we can look forward to seeing it in the upcoming version of PostgreSQL.

[Video 311] Rodrigo Schmidt: Scaling Instagram Data Systems

Instagram is a huge Web application, and has needed to scale very rapidly. In this talk, Rodrigo Schmidt asks the questions: What do you scale, how do you scale, and why do you even need to scale? You can’t scale everything at once, so how do you prioritize which things should be scaled first? How can you grow from a Web application to a mix of Web and mobile users?

[Video 309] Andrew Godwin: Dubious Database Design

Web applications generally have a database on the back end. But how should we structure that database? There are many ways to structure our database in a database, and some of them are far more efficient and manageable than others. In this talk, Andrew Godwin discusses and shows many of the different ways in which people use and abuse their databases for back-end storage, and indicates why these might be bad ideas — whether you’re using Django, or any other Web application framework.

[Video 270] Jamey Hanson: If you can’t beat ’em, join ’em (a pun)

SQL vs. NoSQL has become a huge war, in some ways. Which is better? The answer is, of course, “it depends.”  Each has its advantages and disadvantages. But PostgreSQL, which is becoming more of a platform than just a database, now offers us ways to combine SQL and NoSQL documents into the same database model. How does this work? When and why would we want to do this? In this talk, Jamey Hanson gives us a number of examples of how, when, and why combining SQL and NoSQL is both possible and preferable.

[Video 269] Nirbhay Choubey: Galera cluster — Best practices

As your applications scale, you’ll likely find that your database is one of the major bottlenecks. There are a variety of options available for scaling a database server; one is master-slave replication, and another (but often trickier) is master-master replication. MariaDB, an open-source fork of MySQL, supports an add-on package known as MariaDB Galera Cluster, which offers a variety of replication options, including multi-master replication. How does it work? How do you configure it? And does this option apply to your needs? This talk, by Nirbhay Choubey, should provide you with most of the answers that you need.