clue - Gistlog.co

Introducing MySQL streaming with ReactPHP

by clue

Today, we're very happy to announce the immediate availability of the next major beta release of friends-of-reactphp/mysql, the async MySQL database client for ReactPHP. 🎉

Now that v0.4.0 has been tagged and released today, let's look into why I think this is not only a major milestone for this project, but could also possibly be a game changer in how people use databases with ReactPHP and eventually maybe with PHP in general.

This post aims more for the "why" instead of "what" has changed. If you're upgrading from a previous version of this project, you may want to take a look at the changelog describing all the changes in greater detail. Alright, so let's dive right in.

Promise all the things!

Originally, this project was maintained by Jin Hu who did an excellent job figuring out all the protocol details and building an API that can be consumed with ReactPHP. All this with just plain PHP, without requiring any custom extensions. Again, thank you!

Around 6 months ago, he handed over this project to @friends-of-reactphp to take over maintenance and to address a number of outstanding feature requests and issues. In the meantime, we've used this as a base to successfully address some of these minor issues and released a number of minor maintenance releases. In the last months, we have prepared some major changes that we are now releasing with the v0.4.0 version.

One of the major changes is that its APIs now use promises consistently as return values instead of accepting callback functions. While we understand that BC breaks may be frustrating when updating and usually try to avoid these as a consequence, we believe that it's very well worth it in this instance. To get a better understanding, let's take a look at what an average query looks like with the old API vs. the new promise-based API:

// old
$connection->query('SELECT * FROM user', function (QueryCommand $command) {
    if ($command->hasError()) {
        echo 'Error: ' . $command->getError()->getMessage() . PHP_EOL;
    } elseif (isset($command->resultRows)) {
        var_dump($command->resultRows);
    }
});

// new
$connection->query('SELECT * FROM user')->then(function (QueryResult $result) {
    var_dump($result->resultRows);
}, function (Exception $error) {
    echo 'Error: ' . $error->getMessage() . PHP_EOL;
});

Code-wise this may seem like a small change. However, it is a small change with a huge impact: Promises are one of the core building blocks of ReactPHP libraries. By allowing you to take advantage of promise chaining, we can significantly simplify integration with the vast ReactPHP ecosystem.

For example, we can use this along with ReactPHP's HTTP server component to build a very simple JSON-based HTTP API endpoint (I'll avoid the term "RESTful" here):

$server = new Server(function (ServerRequestInterface $request) use ($connection) {
    return $connection->query(
        'SELECT * FROM user WHERE id = ?',
        [$request->getQueryParams()['id'] ?? -1]
    )->then(function (QueryResult $result) {
        return new Response(
            $result->resultRows ? 200 : 404,
            array(
                'Content-Type' => 'application/json'
            ),
            json_encode($result->resultRows[0] ?? null)
        );
    });
});

The example output from an HTTP request could look something like this:

$ curl -v http://localhost:8080/?id=1
…
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: …

{"id":"1","name":"Alice","ip":"1.2.3.4"}

Streaming large result sets

The above example should be simple enough to understand how you can send any SQL statement to your MySQL database, fetch the results and then send a formatted HTTP response back for your incoming HTTP request. This works very well for smaller result sets, even when you have a larger number of concurrent incoming HTTP requests and thus may end up sending a lot of queries to your database.

Now let's assume instead of fetching only a single user from the database, we may want to fetch all users (or any large subset, such as new users or using any other criteria). Having millions of user records in a MySQL database is nothing particularly stressful for your database. However, if we have millions of records in our database, we probably don't want to fetch all these rows and keep them in memory in our script.

Enter streaming: The new version now provides a new queryStream() method which works very similar to the query() method in the previous example. However, instead of buffering the full result set in memory and then resolving the promise, it uses streams to emit data events for each individual row. This allows us to process result sets with thousands or millions of rows, without ever having to store all of this in memory at once.

Keeping the previous example, we can use this along with a clue/reactphp-ndjson to build a streaming, newline-delimited JSON (NDJSON) HTTP API endpoint:

$server = new Server(function (ServerRequestInterface $request) use ($connection) {
    $stream = $connection->queryStream(
        'SELECT * FROM user WHERE id > ?',
        [$request->getQueryParams()['id'] ?? -1]
    );
    
    return new Response(
        200,
        array(
            'Content-Type' => 'application/x-ndjson'
        ),
        new \Clue\React\NDJson\Encoder($stream)
    );
});

The example output from an HTTP request could look something like this:

$ curl -v http://localhost:8080/?id=0
…
HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Transfer-Encoding: chunked

{"id":"1","name":"Alice","ip":"1.2.3.4"}
{"id":"2","name":"Bob","ip":"20.2.3.4"}
{"id":"3","name":"Carol","ip":"30.2.3.4"}
…

If you want to learn more about NDJSON, you may want to check out one of the previous blog posts. Similarly, you can also use streaming CSV output or pretty much any other format you prefer.

Looking forward

By using standard promise-based and streaming interfaces, this project can now be integrated much easier with the existing ecosystem and some of the exciting tech I've introduced in the last blog posts. For example, imagine concurrent stream processing and concurrent CSV processing with a MySQL query as an input stream instead…

You see, this blog post is barely touching the surface of why I think this release is a major milestone. I want to keep this blog post short(er), so I'll leave this up for another post soon-ish. But first, make sure to head over to friends-of-reactphp/mysql and let's celebrate this release 🎉

Once again I'd like to thank @geertvanbommel, a fellow software architect specializing in database batch processing and API development, for sponsoring large parts of this development! 🎉 Thanks to sponsors like this, who understand the importance of open source development, I can justify spending time and focus on open source development instead of traditional paid work. If you follow my posts more regularly, I'm sure you'll recall his name. And if you plan to follow my future posts, I'm sure there are more exiting announcements to follow soon…

Did you know that I offer custom development services and issuing invoices for sponsorships of releases and for contributions? Contact me (@clue) for details.

If you have any feedback or just want to reach out and say hello, I'm happy to hear back and appreciate feedback! Use the comment section below or send a tweet to @another_clue.

Just released friends-of-reactphp/mysql v0.4.0! 🎉 It includes many long awaited features: Promise-based APIs and streaming large result sets! Efficiently process thousands or millions of records from your @MySQL database with @ReactpPHP. #async #php https://t.co/b1Hv408RtD

— Christian Lück (@another_clue) 21. September 2018
Created 2 months ago | Updated 2 weeks ago
View on GitHub