One of the key visible benefits of ql.io is that it eliminates the code noise that is common in writing HTTP client apps. As a DSL for writing HTTP client code, it focuses on automating the task of making multiple HTTP requests and processing responses in the best order possible taking care of paralleization, orchestration, projections and normalizations behind the scenes.
In this post, I would like to present the baseline performance benchmarks of ql.io running on node.js 0.4.12. Though I have done some ad hoc tests in the last 2-3 months for hardware sizing purposes, this is my first systematic attempt.
Since this post is long, here is a quick summary for the “TL;DR”.
- For simple scripts - such a query using a
select
statement to get data from an HTTP API, ql.io can handle 2400+ requests/sec at various concurrency levels ranging from 100 to 500. - For scripts involving dependencies between statements (such as statement B needing input from results of statement A), thoughput drops almost linearly and proportionately. For instance, for scenario B below, ql.io can handle nearly 1000 requests/sec.
- The conventional wisdom of using
n
worker processes wheren
is the number of CPU threads provides a reasonable default for all practical purposes but tuning the number of worker processes is a good exericise to do. All the test scenarios below yielded better numebers with5*n
workers. Scenario D, which involves a non-trivial amount of CPU bound work, benefitted the most from the increased number of worker processes.
You can find the raw output files of test runs on github. The application used for these tests is on github. All the ql.io modules used by the app are on npmjs.org.
Test Environment
The test environment is based on the folllowing, each running Ubuntu, sititng under my desk at work.
- An Intel Xeon E5645 workstation with 6 cores (12 CPU threads) and 24GB RAM running the ql.io-site app on node.js 0.4.12 with 12 worker processes.
- An Intel Xeon E5507 workstation with 4 cores (8 CPU threads) with 12GB RAM running apachebench.
- An Intel Xeon E5630 workstation with 4 cores (8 CPU threads) with 24GB RAM running Apache Traffic Server (ATS) 3.0.1 as a forward proxy for all outgoing HTTP requests. The cache is primed before running benchmarks to avoid making requests to any other machines.
All these are running Ubuntu 11.04.
Test Scenarios
These tests cover a range of aggregation and orchestration scenarios possible with ql.io and show how ql.io behaves under varying loads.
Scenario A
select * from twitter.search where q = "ql.io"
This scenario involves sending a HTTP GET request to http://search.twitter.com/search.json
,
parsing the JSON response, and writing it back to the client’s response.
Scenario B
select id as id, from_user_name as user_name, text as text from twitter.search where q = "ql.io";
This scenario is similar to scenario A except the following:
- Extract
results
array from the response and extractsid
,from_user_name
, andtext
for each result. - Assemble the projected fields into an object.
- Write all the objects as an array into the client’s response.
Scenario C
select ItemID, ViewItemURLForNaturalSearch, Location from details where itemId in (select itemId from finditems where keywords='mini cooper');
This scenario involves finding IDs of items from one API and sending those IDs to another API to get details as follows:
- Send an HTTP request to
http://svcs.ebay.com/services/search/FindingService/
, parse the JSON response, and extract the array of items by selecting thefindItemsByKeywordsResponse.searchResult.item
field of the response. - For each item in the array, project the item’s ID. Collect the IDs into an array.
- Then send an HTTP request to
http://open.api.ebay.com/shopping
with all the item IDs. - Parse the JSON response, select the
Item
array from the response, and project eachItem
to extractItemID
,ViewItemURLForNaturalSearch
, andLocation
fields. Assemble the projected fields into an array. - Write all the arrays to the client’s response as an array or arrays.
Scenario D
prodid = select ProductID[0].Value from eBay.FindProducts where QueryKeywords = 'macbook pro'; details = select * from eBay.ProductDetails where ProductID in ('{prodid}') and ProductType = 'Reference'; reviews = select * from eBay.ProductReviews where ProductID in ('{prodid}') and ProductType = 'Reference'; return select d.ProductID[0].Value as id, d.Title as title, d.ReviewCount as reviewCount, r.ReviewDetails.AverageRating as rating from details as d, reviews as r where d.ProductID[0].Value = r.ProductID.Value via route '/myapi' using method get;
The implementation details for this script are a bit more involved, but at a high level, here is what happens under the hood:
- Find the script when the client submits a request to the script through a route
/myapi
. - Send a HTTP request to
http://open.api.ebay.com/shopping?callname=FindProducts
with a keyword and extract product IDs from the response. - Send
5
HTTP requests tohttp://open.api.ebay.com/shopping?callname=FindProducts
with the product IDs found and extract the details. - Send
5
HTTP requests tohttp://open.api.ebay.com/shopping?callname=FindReviewsAndGuides
with the product IDs found and extract the reviews. - Once the
10
requests complete, join details and reviews by matching responses by IDs, and extract the selected fields into an object. - Return an array of objects with each object containing the selected fields.
This script covers most of the code paths of ql.io. See Build an App for a step by step description of this scenario
Differences Between Scenarios
- Both scenario A and B are mostly IO bound.
- Scenario C is also mostly IO bound, but it makes two HTTP requests in sequence as the outer
select
depends on the results of theinner
select. The second request is made after the first one completes. - Scenario D involves making
11
HTTP requests, parsing and projecting response fields, and joining members of responses of the second and third statements. These responses are unsorted, and joining them by a matching product ID takes O(n2) steps - in this case 25. Yes - this can be improved - but let’s measure twice before cutting once.
Test Settings
All tests are done using ab -k
to maintain persistent connections from the client to the server.
The ql.io app is run with 12
node.js worker processes managed by
cluster.
First Round Results
Throughput
Here are the throughput results for concurrency ranging from 100 to 500.
Mean Response Times
The corresponding chart showing the mean response time for the same range of concurrency is below.
Effect of Number of Workers
In these tests, scenario D fared badly as it includes a mixture of IO and CPU workloads. The CPU
workload is not predominant but is not insignificant either. Here is a chart of the CPU data
captured using dstat
at a concurrency level of 200 for sceanrio D.
This confirms that there is a fair bit of CPU bounded work going on. How does the number of workers influence such a scenario? I repeated the tests varying the number of worker processes.
The chart below shows the number of requests per second for Scenario D as I changed the number of workers from 12 to 96 in increments of 12. All the test runs were done at a concurrency level of 100.
The number of requests per sec increase from 192 to 384 as I increased the number of workers from 12 to 96. The improvement is less significant after 60 workers.
Here is chart for the mean response time which shows a similar improvement.
The flatness of these charts with increased worker count can easily be explained by looking at the CPU again. The chart below shows the CPU data at a cocurrency level of 200 for scenario D with a worker count of 96.
The chart below shows the effect of increasing the worker count from 12 to 96 across all test scenarios.
What About Memory
Below is a chart of the memory usage with 96 workers for scenario D at a concurrency level of 200.
The lines remained nearly flat for the duration of the test.
Summary
The goal of this exercise is to set a baseline for future work. The scenarios I used show a range of scripts that cover most of the current capabilities of ql.io.
Here are few key take-aways:
- ql.io is designed for IO bound workloads. However, data aggregation and orchestration often involves some CPU bound work such as projections and joins. This is unavoidable. I suspect that the same is the case with many typical uses of node.js.
- On commodity hardware with commodity network layer, my tests show that ql.io can do 400-2400 requests/sec depending on the nature of the work involved. Your mileage may vary.
- Use of as many workers as there are CPU threads available is a good starting point, but tuning the number based on the characteristics of the app may yield better results.
We’re currently working on upgrading ql.io to node.js 0.6.x. See the 0.4 branch on github. Watchout for a repeat of these tests on node.js 0.6.x.