What's new in Impala, August 28th 2015

Henry Robinson / Fri 28 August 2015

Another week, another good set of changes in Impala.

Stats:

  • 28 commits in the week ending August 28th.
  • 110 files changed, 5649 insertions(+), 847 deletions(-)

Passwords for private key files

A small one this: secure connections between RPC servers and clients can now benefit from even more security: the private key file that's used to encrypt some SSL / TLS traffic may now have an optional password. This is a usability improvement for deployments that need to keep their private key particularly secure; the password used is the result of a shell command that is invoked by Impala at start-up, so may come from a variety of user-configurable locations.

The same feature for the webserver that's run by all Impala processes has just finished review.

Lots of nested types work

As another release looms later this year, a huge amount of development effort is going into finishing initial support for 'nested types' in Impala - non-primitive columns like maps, arrays and structs. There's usually a flurry of nested types activity every week, and this was no exception. For example:

  • Deduplication of identical adjacent tuples in row-batches. The row-batch is the container used by Impala to ship rows (which are comprised of tuples) between operators on the same machine, or to a remote machine. This patch saves a lot of wasted space when tuples are duplicated (as they might be for certain kinds of joins) by only storing duplicate variable length data, such as strings, once and having all pointers refer to the same canonical copy. This helps significantly when nested types are involved.

  • Pretty-printing of complex types in DESCRIBE. This patch is a usability improvement that 'prettily' prints nested types in DESCRIBE's output, where before they would show up as just one continuous string with no line breaks, despite their complexity..

  • Planner tests for 'nested' TPCH. We've built a version of the TPCH schema that uses nested types rather than foreign-key relationships in some cases as a test suite for the nested types execution engine. This patch adds tests for the planner alone. It's a good place to look to understand what plans look like for nested types, and for the new syntax for queries against nested columns!

For an overview of Impala's upcoming nested types support, see this presentation from March, or head over to Strata in New York at the end of September for a talk by Alex, Marcel and Josh.

Column authorisation part one

Another security feature - column authorisation allows users to specify who may access individual columns of tables, not just the tables themselves. This allows users to decouple their schemas from their access control requirements, because column-level authorisation allows them to project out only the columns that they would like a particular user to see.

The first patch for this feature adds support for parsing and analysis of GRANT and REVOKE statements against individual columns. Enforcement is coming in a second patch, possibly this week!