Welcome to the first of an ongoing series of posts where we take a look at what's been going on in Impala internals in the last week or so.
Analytic window functions:
PERCENT_RANK() and friends
Impala learnt about some new window functions -
CUME_DIST() - this week in the
patch for IMPALA-2081. One
of the cool things about these functions is that they're all implemented by rewriting
queries in terms of existing window functions. See
as an example of how
PERCENT_RANK() can be rewritten in terms of
count(). The benefit of this approach is that there's no new backend machinery to
implement and debug, and any future optimisations made to analytic function evaluation
will improve all these frontend functions!
Nested-loop join (briefly!)
One of the key moving parts for nested-types is the ability to perform
nested loop joins between two tuple
sources. The first implementation of NLJ landed in
trunk last week. This
also allows Impala to execute more varieties of JOIN, including
outer joins without an
equality predicate (e.g.
SELECT * FROM student s, lunch_items i where s.lunch_money <= i.cost).
We found a bug in the way NLJ handles transferring its memory to other nodes, so that commit is reverted from Impala for now - but it should show up again in the next few days!