Azure HDInsight is an Apache Hadoop distribution powered by the cloud. This means that it handles any amount of data, scaling from terabytes to petabytes on demand. Spin up any number of nodes at any time.
Since HDInsight is a PaaS offering, it is by default publicly accessable from any internet connection. The cluster contains often valuable data of customers. These customers also have requirements how to securely connect to this data, for example using IP restrictions so only their block of IP addresses can connect to the cluster.
In this article we are going to secure the HDInsight cluster so only IP adresses that we specify can connect to it.
- Log in to Azure using http://portal.azure.com
- You must have a Virtual Network (vNet) to continue, if you don’t have a vNet yet, create one. This is mandatory.
- Click on +New
- Search for HDInsight
- Select the HDInsight Cluster
- Click on Create
- Give the HDInsight Cluster a name
- Select the correct Cluster Type and Version
- Enter the correct credentials
- Give the Storage Account and the Container a name
- Select the correct sizing of your cluster.
Be aware that there is a default quota of 60 cores for a Subscription. This can be increased by raising a Support Request.
See https://azure.microsoft.com/en-us/blog/azure-limits-quotas-increase-requests/ for more information about quotas.
- Click on Optional Configuration and select Virtual Network
- Select the correct vNet:
- Select the correct Subscription
- Click on Create and wait 30 minutes:
- Now that the HDInsight Cluster is created it is accessible from the public internet. This is something many customers want to prevent, so we need to secure it.
Since HDInsight is connected to a Private Network, we can assign a Network Security Group (NSG) and then create Inbound Security Rules to allow (not deny) traffic.
Microsoft requires access from some IP adresses for managebility. They provide a PowerShell script to create the Network Security Group and give these addresses access to access the cluster. This script can be downloaded here.
The adjusted script for the environment above, can be seen here.
- It is necessary to modify the script and run it. It will create the Network Security Group and have the Microsoft address as inbound rules.
Note: you cannot set Outbound Security Rules on the Network Security Group.
- Add your own public address, like your datacenter, home IP or office WiFi ip addresses as Inbound Security Rule
Now the HDInsight cluster is only available from the addresses and ports that you specified in the Inbound Security Rules.