Saturday, September 21, 2013

Tips to Build a Fault-tolerant Database Application

Applications should be written taking into account that errors will eventually happen and, in particular, database application developers usually consider this while writing their applications.

Although the concepts required to write such applications are commonly taught in database courses and to some extent are widely spread, building a reliable and fault-tolerant database application is still not an easy task and hides some pitfalls that we intend to highlight in this post with a set of suggestions or tips.

In what follows, we consider that the execution flow in a database application is characterized by two distinct phases: connection and business logic. In the connection phase, the application connects to a database, sets up the environment and passes the control to the business logic phases. In this phase, it gets inputs from a source, which may be an operator, another application or a component within the same application, and issues statements to the database. Statements may be executed within the context of a transaction or not.

First Tip: Errors may happen so plan your database application taking this into account

So we should catch errors, i.e. exceptions, in both the connection and business logic phase. This idea can be translated into code using Python as follows:

class Application(object): 
    def __init__(self, **db_params): 
         self.__cnx = None 
         self.__cursor = None 
         self.__db_params = db_params 

    def connect(self): 
        try: 
            self.__cnx = MySQLConnection(**self.__db_params) 
            self.__cursor = self.__cnx.cursor() 
        except InterfaceError: 
            print "Error trying to get a new database connection" 

    def business(self, operation): 
        try: 
            self._do_operation(operation) 
        except DatabaseError: 
            print "Error executing operation" 

if main == "__main__": 
    app = Application(get_params()) 
    app.connect() 

    while True: 
        app.business(get_operation())

The InterfaceError class identifies errors in the interface, i.e. connection, between the application and the MySQL Server. The DatabaseError class identifies errors associated with database operations.

In this simple example though, the application may abort after any connection error. For instance, a MySQL Server will automatically close a connection after a period of inactivity thus causing an application error if one tries to use the invalid connection.

Second Tip: Set up the appropriate timeout properties

There are two properties which fall under this suggestion:

  • wait_timeout - It is a MySQL option that defines the interval that must elapse without any communication between an application and a MySQL Server before the MySQL Server closes a connection.
  • connection_timeout - Sets the socket_timeout property in the Connector Python which defines the maximum amount of time that a socket created to connect to a database will wait for an operation to complete before raising an exception.
The wait_timeout must be set according to the application's characteristics. On the other hand, the connection_timeout property is usually set to zero which means that there will be no socket timeout period. In rare cases, such as when applications must execute operations within a fixed interval, we should set it up.

Third Tip: Connection errors may happen at any time so handle them properly

The previous measurements will not circumvent problems related to transient network issues or server failures though. To handle this type of problem, one needs to consider that a connection may fail at any time. This requires to catch connection errors also while executing the business logic and get a fresh connection to proceed with the execution. In other words, this requires to combine the aforementioned two phases. This idea can be translated into code as follows:

class Application(object): 
    def __init__(self, **db_params): 
         self.__cnx = None 
         self.__cursor = None 
         self.__db_params = db_params.copy() 

    def connect(self): 
        try: 
            self.__cnx = MySQLConnection(**self.__db_params) 
            self.__cursor = self.__cnx.cursor() 
        except InterfaceError: 
            print "Error trying to get a new database connection" 

    def business(self, operation): 
        try: 
            self._do_operation(operation) 
        except (AttributeError, InterfaceError) 
            print "Database connection error" 
            self.connect() 
        except DatabaseError: 
            print "Error executing operation" 

if main == "__main__": 
    app = Application(get_params()) 
    app.connect() 

    while True: 
        app.business(get_operation())

In general, connectors cannot hide connection failures from the application because this may lead to data inconsistency. Only the application has enough knowledge to decide what is safe to do and as such any failure, including connection failures, must be reported back to the application. In what follows, we depict a problem that may happen when a connector tries to hide some failures from the application:



When the connection fails, the server rolls back the on-going transaction thus undoing any change made by the first insert statement. However, the connector gets the error and automatically tries to reconnect and succeeds. With a valid connection to the server, it executes the failed statement and succeeds. Unfortunately, the application does not find out about the connection issue and continues the execution as nothing has happened and by consequence a partial transaction is committed thus leaving the database in an inconsistent state.

It is worth noting that if statements are executed in “autocommit” mode, it is still unsafe to hide failures from the application. In this case, an attempt to automatically reconnect and try to execute the statement may lead to the statement being executed twice. This may happen because the connection may have failed after the statement has been successfully executed but before the server has had a chance to reply back to the connector.



Fourth Tip: Guarantee that session information is properly set after getting a connection

From a fault-tolerant perspective the application looks better now. However, we are still missing one key point.

We should use the "my.cnf" configuration file to set up the necessary MySQL's properties (e.g. autocommit, transaction isolation level). However if several applications share the same database server and require different configuration values, they should be defined along with the routine that gets a connection. If you do it in a different place, you may risk forgetting to set the options up when trying to get a new connection after a failure. Our code snippet already follows this rule and you are safe in that sense.

This suggestion is specially important when the applications (i.e. components) share the same address space and use a connection pool.

We should also avoid using temporary tables and/or user-defined variables to transfer data between transactions. Although this is a common technique among developers, this will fail miserably after a reconnection as the session information will be lost and may require an expensive routine to set up the necessary context. So starting every transaction with a “clean slate” is probably the safest and most solid approach.

Fifth Tip: Design all application components taking failures into account

Finally, it is worth noticing that if the database fails the system as whole will be unavailable. So to build a truly resilient solution, we still need to deploy some type of redundancy at the database level. We will discuss possible high availability solutions for MySQL in a different post.

See http://alfranio-distributed.blogspot.com/2013/09/writing-fault-tolerant-database.html

Writing a Fault-tolerant Database Application using MySQL Fabric

In this post, we are going to show how to develop fault-tolerant applications using MySQL Fabric, or simply Fabric, which is an approach to building high availability sharding solutions for MySQL and that has recently become available for download as a labs release (http://labs.mysql.com/). We are going to focus on Fabric's high availability aspects but to find out more on sharding readers may check out the following blog post:
Servers managed by Fabric are registered in a MySQL Server instance, called backing store, and are organized into high availability groups which deploy some sort of redundancy to increase resilience to failures. Currently, only the traditional MySQL Asynchronous Replication is supported but we will consider other solutions in future such as MySQL Cluster, Windows Server Failover Clustering and Distributed Replicated Block Device. For a great analysis of alternatives for MySQL High Availability Solutions certified and supported by Oracle readers should check out the following white paper:
Different from a traditional MySQL Client Application, a Fabric-aware Client Application does not specify what server it wants to connect to but the Fabric instance from where it will fetch information on the available servers in a group. So a fabric-aware connector is required for this task. Currently, only the connectors PHP, Java and Python have extensions to access Fabric. In the future, we aim to add Fabric support to other connectors as we are fortunate to have a strong influence over many of the key connectors in the MySQL ecosystem.

class Application(object): 
    def __init__(self): 
        fabric_params = { 
            "fabric" : {"host" : "localhost", "port" : 8080}, 
            "user"   : "oracle", "passwd" : "oracle" 
        } 
        self.__cnx = MySQLFabricConnection(**fabric_params) 

    def start(self): 
        self.__cnx.set_property(group="YYZ") 
        cursor = self.__cnx.cursor() 
        cursor.execute("...")

In this sample code, written in Python, the connector sends a request to a Fabric instance located at address "localhost", port "8080" in order to retrieve information on all servers registered in the "YYZ" group and then creates a connection to the master in the group. The communication between the connector and Fabric is done through the XML-RPC protocol which has been chosen for being a "quick and easy way to make procedure calls over the Internet". However the XML-RPC protocol is not well-known for its performance, and to reduce the overhead of contacting Fabric every time a connector needs information on a group, data is stored in a local cache.

If the current master fails though, the "InterfaceError" exception is raised and the application is responsible for catching the exception and getting a new connection to a new elected master if it wants to carry on its regular operations. See a sample code in what follows: 
 
class Application(object): 
    def __init__(self): 
        fabric_params = { 
            "fabric" : {"host" : "localhost", "port" : 8080}, 
            "user"   : "oracle", "passwd" : "oracle" 
        } 
        self.__cnx = MySQLFabricConnection(**fabric_params) 
        self.__cnx.set_property(group="YYZ") 

    def start(self): 
        while self.__run: 
            try: 
                self.__cnx.start_transaction() 
                cur.execute("...") 
                self.__cnx.commit() 
            except InterfaceError as error: 
                cur = self._get_cursor() 
            time.sleep(1) 

    def _get_cursor(self): 
        return self.__cnx.cursor()

Readers will find a full-fledged application, which shows how all the concepts introduced in this post are used together here. The application creates a simple database, a high availability group, registers the MySQL Servers into Fabric and runs a thread that mimics a client and another one that periodically executes a switchover.

To run the application, we need:
  • Python 2.6 or 2.7
  • Three or more MySQL Servers:
    • One backing store (5.6 or later preferable)
    • Two managed servers (5.6 or later necessary)
  • Fabric running
  • Connector Python (Fabric-aware Version) installed 
After configuring the environment appropriately, we can run the application as follows: 

python switchover_application.py --user=root --passwd="" --group="YYZ" \ 
--fabric-addresses="localhost:8080" \ 
--servers-addresses="localhost:13002 localhost:13003"

Please, note that Fabric and the MySQL instances may be running in different addresses and ports in our environment. So change this information accordingly.

Note that Fabric is in it early development stages and there is a long way ahead of us. However we have decided to make it public available through a labs release so that you could contribute to the project with comments, feedback or patches. Any feedback or comment is highly appreciated. Leave messages at this blog or contact us through the following forum:

http://forums.mysql.com/list.php?144