Hive user impersonation
This article centers around a real-life authorization requirement when hive data is exposed through any application. An usual pattern when we build such application is to authenticate user via a Single-Sign-On (SSO) services like pingID and on the application context we would only have the userID from which authorization has to be enabled.
User Impersonation:
A powerful technique most enterprise application uses in a kerberized cluster is to Impersonate/proxy a logged in user with another super user but applying the access privileges of the logged in user. This is achieved by creating a super user say “hivesuperuser” and the application uses this super user’s keytab to authorize and uses JDBC proxy with the logged in user.
Hive Configuration:
On core-site.xml configure the below two Hadoop properties.a. Set the property hadoop.proxyuser.<name>.hosts to specify the list of hostnames from which proxy requests are permitted.
<property>
<name>hadoop.proxyuser. hivesuperuser.hosts</name>
<value>*</value>
</property>
The above definition allows proxy from all hosts for the user hivesuperuser. Set the property hadoop.proxyuser.<name>.groups to specify the list of HDFS groups that can be impersonated
<property>
<name>hadoop.proxyuser.hivesuperuser.groups</name>
<value>*</value>
</property>
Proxy Connection:
Below code snippet helps in acquiring the proxy connection,
// login through superusers keytab and principle
UserGroupInformation.loginUserFromKeytab(“user@domain”,”keytabpath”);
//Proxy user on JDBC connection object
Class.forName(“org.apache.hive.jdbc.HiveDriver”); connection = DriverManager.getConnection( “jdbc:hive2://host1:port,host2:port,host3:2181/default” +”;hive.server2.proxy.user=” + user + “;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;” +”transportMode=http;httpPath=cliservice;principal=hive/_HOST@domain”);
Source code can be found here