Skip to content

Commit af60298

Browse files
committed
* AppRegistry Removed; * Editorial changes for transition to Guidance
1 parent 9e92a7c commit af60298

File tree

2 files changed

+110
-31
lines changed

2 files changed

+110
-31
lines changed

README.md

Lines changed: 110 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,42 @@
11
### Table of contents
22

33
- [Scalable Analytics using Apache Druid on AWS](#scalable-analytics-using-apache-druid-on-aws)
4-
- [Licence](#licence)
5-
- [About this solution](#about-this-solution)
6-
- [Solution overview](#solution-overview)
4+
- [About this Guidance](#about-this-guidance)
5+
- [Guidance overview](#guidance-overview)
76
- [Benefits](#benefits)
7+
- [Use cases](#use-cases)
88
- [Architecture overview](#architecture-overview)
99
- [Architecture reference diagram](#architecture-reference-diagram)
10-
- [Solution components](#solution-components)
10+
- [Guidance components](#guidance-components)
11+
- [Cost](#cost)
1112
- [Prerequisites](#prerequisites)
1213
- [Build environment specifications](#build-environment-specifications)
1314
- [AWS account](#aws-account)
1415
- [Tools](#tools)
15-
- [How to build and deploy the solution](#how-to-build-and-deploy-the-solution)
16+
- [How to build and deploy the Guidance](#how-to-build-and-deploy-the-guidance)
1617
- [Configuration](#configuration)
1718
- [Build and deploy](#build-and-deploy)
1819
- [Access the druid web console](#access-the-druid-web-console)
19-
- [Uninstall the solution](#uninstall-the-solution)
20+
- [Uninstall the Guidance](#uninstall-the-guidance)
2021
- [Collection of operational metrics](#collection-of-operational-metrics)
22+
- [Notices](#notices)
23+
- [Licence](#licence)
24+
- [Disclaimer](#disclaimer)
25+
- [Authors](#authors)
2126

2227
---
2328

2429
## Scalable Analytics using Apache Druid on AWS
2530

26-
Scalable analytics using Apache Druid on AWS is a solution offered by AWS that enables customers to quickly and efficiently deploy, operate and manage a cost-effective, highly available, resilient, and fault tolerant hosting environment for Apache Druid analytics databases on AWS.
31+
Guidance for Scalable analytics using Apache Druid on AWS is a solution offered by AWS that enables customers to quickly and efficiently deploy, operate and manage a cost-effective, highly available, resilient, and fault tolerant hosting environment for Apache Druid analytics databases on AWS.
2732

28-
## Licence
29-
30-
Licensed under the Apache License Version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at
31-
32-
http://www.apache.org/licenses/
33-
34-
or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and limitations under the License.
35-
36-
## About this solution
33+
## About this Guidance
3734

3835
The solution incorporates an [AWS CDK](https://aws.amazon.com/cdk/) construct that encapsulates Apache Druid and is purposefully architected to be easy to operate, secure by design, and well-architected. With this solution, you can establish an Apache Druid cluster for data ingestion and analysis within a matter of minutes. It also provides deployment flexibility by supporting Amazon EC2, Amazon Elastic Kubernetes Service (EKS), and EKS Fargate options, enabling you to run Apache Druid on EC2, EKS, or EKS Fargate as per your preference.
3936

4037
You can use this README file to find out how to build, deploy, use and test the code. You can also contribute to this project in various ways such as reporting bugs, submitting feature requests or additional documentation. For more information, refer to the [Contributing](CONTRIBUTING.md) topic.
4138

42-
## Solution overview
39+
## Guidance overview
4340

4441
Scalable analytics using Apache Druid on AWS helps organizations utilize the full suite of features and capabilities of this powerful open source analytics engine, while leveraging the flexibility, elasticity, scalability and price performance options of AWS’s compute and storage offerings, for their real-time, low-latency analytics processing use cases.
4542

@@ -81,33 +78,99 @@ The following diagram represents the solution's architecture design for EC2 stac
8178

8279
For the architecture of EKS stack, please refer to the diagram located at `source/images/solution_architecture_diagram-eks.png`.
8380

84-
### Solution components
81+
### Guidance components
8582

86-
The solution deploys the following components that work together to provide a production-ready Druid cluster:
83+
The Guidance deploys the following components that work together to provide a production-ready Druid cluster:
8784

8885
- **Web application firewall**: AWS WAF is utilized to safeguard Druid web console and Druid API endpoints from prevalent web vulnerabilities and automated bots that could potentially impact availability, compromise security, or overutilize resources. It is automatically activated when the `internetFacing` parameter is set to true.
8986

90-
- **Application load balancer**: A load balancer serves as the single point of contact for clients. The load balancer distributes incoming application traffic across multiple query servers in multiple Availability Zones.
87+
- **Bastion Host**: A security hardened Linux server (Bastion host) manages access to the Druid servers running in a private network separate from an external network. It can also be used to access the Druid web console through SSH tunneling, where a private Application Load Balancer (ALB) is deployed.
9188

92-
- **Druid master auto scaling group**: An Auto Scaling group contains a collection of Druid master servers. A Master server manages data ingestion and availability: it is responsible for starting new ingestion jobs and coordinating availability of data on the "Data servers". Within a Master server, functionality is split between two processes, the Coordinator and Overlord.
89+
- **Application load balancer**: ALB serves as the single point of contact for clients. The load balancer distributes incoming application traffic from identity providers—such as object identifiers (OIDS) and lightweight directory access protocol (LDAP)—across multiple query servers in multiple Availability Zones.
9390

94-
- **Druid query auto scaling group**: An Auto Scaling group contains a collection of Druid query servers. A Query server provides the endpoints that users and client applications interact with, routing queries to Data servers or other Query servers. Within a Query server, functionality is split between two processes, the Broker and Router.
91+
- **Private Subnet**: The private subnet consists of the following:
92+
- **Druid master auto scaling group**: An Auto Scaling group contains a collection of Druid master servers. A Master server manages data ingestion and availability: it is responsible for starting new ingestion jobs and coordinating availability of data on the "Data servers". Within a Master server, functionality is split between two processes, the Coordinator and Overlord.
9593

96-
- **Druid data auto scaling group**: An Auto Scaling group contains a collection of Druid data servers. A Data server executes ingestion jobs and stores queryable data. Within a Data server, functionality is split between two processes, the Historical and MiddleManager.
94+
- **Druid data auto scaling group**: An Auto Scaling group contains a collection of Druid data servers. A Data server executes ingestion jobs and stores queryable data. Within a Data server, functionality is split between two processes, the Historical and MiddleManager.
9795

98-
- **ZooKeeper auto scaling group**: An Auto Scaling group contains a collection of ZooKeeper servers. Apache Druid uses Apache ZooKeeper (ZK) for management of current cluster state.
96+
- **Druid query auto scaling group**: An Auto Scaling group contains a collection of Druid query servers. A Query server provides the endpoints that users and client applications interact with, routing queries to Data servers or other Query servers. Within a Query server, functionality is split between two processes, the Broker and Router.
9997

100-
- **Aurora based metadata storage**: An Aurora PostgreSQL database cluster provides the metadata storage to Apache Druid cluster. Druid uses the metadata storage to house various metadata about the system, but not to store the actual data.
98+
- **ZooKeeper auto scaling group**: An Auto Scaling group contains a collection of ZooKeeper servers. Apache Druid uses Apache ZooKeeper (ZK) for management of current cluster state.
10199

102100
- **S3 based deep storage**: An Amazon S3 bucket that provides deep storage to Apache Druid cluster. Deep storage is where segments are stored.
103101

104-
- **Application secrets**: Secrets manager stores the secrets used by Apache Druid including RDS secret, admin user secret etc.
102+
- **Application secrets**: AWS Secrets Manager stores the secrets used by Apache Druid, including the Amazon Relational Database Service (Amazon RDS) secret and the administrator user secret.
105103

106104
- **Logs, metrics, and dashboards**: Logs, metrics, and dashboards are supported in CloudWatch.
107105

106+
- **Aurora based metadata storage**: An Aurora PostgreSQL database cluster provides the metadata storage to Apache Druid cluster. Druid uses the metadata storage to house various metadata about the system, but not to store the actual data.
107+
108108
- **Notifications**: The notification system, powered by Amazon Simple Notification Service (Amazon SNS), delivers alerts or alarms promptly when system events occur. This ensures immediate awareness and action when needed.
109109

110-
- **Bastion host**: A security hardened Linux server used to manage access to the Druid servers running in private network from an external network. It can also be used to access Druid web console through SSH tunneling in the case where private ALB is deployed.
110+
### Cost
111+
112+
You are responsible for the cost of the AWS services used while running this guidance. As of the latest revision, the costs for running this guidance with the default settings (small usage profile) in the US East (N. Virginia) Region is approximately **$714.46 per month**, for a medium usage profile in the US East (N. Virginia) Region is approximately **$2,202.47 per month**, and for a large usage profile in the US East (N. Virginia) Region is approximately **$13,645.27 per month**.
113+
114+
These costs are for the resources shown in the [Cost table](#cost-table) section. See the pricing webpage for each AWS service used in this guidance.
115+
116+
We recommend creating a [budget](https://docs.aws.amazon.com/cost-management/latest/userguide/budgets-create.html) through [AWS Cost Explorer](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/) to help manage costs. Prices are subject to change. For full details, see the pricing webpage for each AWS service used in this guidance.
117+
118+
#### Cost table
119+
120+
The following tables provide a sample cost breakdown for deploying this guidance with the default parameters in the US East (N. Virginia) Region for one month, encompassing the small, medium, and large usage profiles.
121+
122+
#### 1. Small usage profile
123+
124+
Profile assumptions: ingestion throughput at 30,000 records per second, query throughput at 25 queries per second.
125+
126+
| AWS service | Dimensions | Cost [USD] |
127+
| ----------- | ------------ | ------------ |
128+
| Amazon EC2 | * Druid master: 3 x t4g.medium | $287.53 |
129+
| | * Druid query: 3 x t4g.medium |
130+
| | * Druid data: 3 x (t4g.medium + 100GB EBS GP2 volume) |
131+
| | * ZooKeeper: 3 x t4g.small |
132+
| Amazon ELB | 1 x ALB, 5 GB/h processed bytes (EC2 Instances and IP addresses as targets) | $45.63 |
133+
| Amazon Aurora | 3 x db.t4g.medium | $247.81 |
134+
| Amazon S3 | 1 TB standard storage + 1,000,000 requests per month | $29.67 |
135+
| AWS Key Management Service | 7 x customer managed keys | $7 |
136+
| AWS Secrets Manager | 4 x secrets | $1.6 |
137+
| Amazon CloudWatch | 50 GB standard logs ingested per month, 200 custom metrics + 1,000,000 metric requests per month | $95.22 |
138+
| | **Total:** | **$714.46 [USD] / month** |
139+
140+
#### 2. Medium usage profile
141+
142+
Profile assumptions: ingestion throughput at 120,000 records per second, query throughput at 100 queries per second.
143+
144+
| AWS service | Dimensions | Cost [USD] |
145+
| ----------- | ------------ | ------------ |
146+
| Amazon EC2 | * Druid master: 3 x m6g.xlarge | $1,572.62 |
147+
| | * Druid query: 3 x m6g.xlarge | |
148+
| | * Druid data: 3 x (m6g.2xlarge + 500GB EBS GP2 volume) | |
149+
| | * ZooKeeper: 3 x t4g.medium | |
150+
| Amazon ELB | 1 x ALB, 20 GB/h processed bytes (EC2 Instances and IP addresses as targets) | $133.23 |
151+
| Amazon Aurora | 3 x db.t4g.medium | $247.81 |
152+
| Amazon S3 | 5 TB standard storage + 5,000,000 requests per month | $119.76 |
153+
| AWS Key Management Service | 7 x customer managed keys | $7 |
154+
| AWS Secrets Manager | 4 x secrets | $1.6 |
155+
| Amazon CloudWatch | 100 GB standard logs ingested per month, 200 custom metrics + 1,000,000 metric requests per month | $120.45 |
156+
| | **Total:** | **$2,202.47 [USD] / month** |
157+
158+
159+
#### 3. Large usage profile
160+
161+
| AWS service | Dimensions | Cost [USD] |
162+
| ----------- | ------------ | ------------ |
163+
| Amazon EC2 | * Druid master: 3 x m6g.4xlarge | $10,268.76 |
164+
| | * Druid query: 3 x m6g.4xlarge | |
165+
| | * Druid data: 3 x (m6g.16xlarge + 5 TB EBS GP2 volume) | |
166+
| | * ZooKeeper: 3 x m6g.2xlarge | |
167+
| Amazon ELB | 1 x ALB, 200 GB/h processed bytes (EC2 Instances and IP addresses as targets) | $1,184.43 |
168+
| Amazon Aurora | 3 x db.t3.large | $427.39 |
169+
| Amazon S3 | 50 TB standard storage + 10,000,000 requests per month | $1,181.60 |
170+
| AWS Key Management Service | 7 x customer managed keys | $7 |
171+
| AWS Secrets Manager | 4 x secrets | $1.6 |
172+
| Amazon CloudWatch | 1,000 GB standard logs ingested per month, 200 custom metrics + 1,000,000 metric requests per month | $574.50 |
173+
| | **Total:** | **$13,645.27 [USD] / month** |
111174

112175
---
113176

@@ -138,7 +201,7 @@ The solution deploys the following components that work together to provide a pr
138201

139202
---
140203

141-
## How to build and deploy the solution
204+
## How to build and deploy the Guidance
142205

143206
Before you deploy the solution, review the architecture and prerequisites sections in this guide. Follow the step-by-step instructions in this section to configure and deploy the solution into your account.
144207

@@ -821,7 +884,7 @@ Upon successfully cloning the repository into your local development environment
821884
822885
---
823886
824-
## Uninstall the solution
887+
## Uninstall the Guidance
825888
826889
You can uninstall the solution by directly deleting the stacks from the AWS CloudFormation console.
827890
@@ -834,14 +897,30 @@ Alternatively, you could also uninstall the solution by running `npm run cdk des
834897
835898
## Collection of operational metrics
836899
837-
This solution collects anonymous operational metrics to help AWS improve the quality and features of the solution. For more information, including how to disable this capability, refer to the [implementation guide](#https://docs.aws.amazon.com/solutions/latest/scalable-analytics-using-apache-druid-on-aws/welcome.html).
900+
This solution collects anonymous operational metrics to help AWS improve the quality and features of the solution. For more information, including how to disable this capability, refer to the [implementation guide](https://docs.aws.amazon.com/solutions/latest/scalable-analytics-using-apache-druid-on-aws/solution-overview.html).
838901
839902
---
840903
841-
Copyright 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
904+
## Notices
905+
906+
### Licence
907+
908+
Copyright 2025 Amazon.com, Inc. or its affiliates. All Rights Reserved.
842909
843910
Licensed under the Apache License Version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at
844911
845912
http://www.apache.org/licenses/
846913
847914
or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and limitations under the License.
915+
916+
### Disclaimer
917+
918+
Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers.
919+
920+
## Authors
921+
922+
Priyank Devenraj<br>
923+
Abhay Joshi<br>
924+
Morris Estepa<br>
925+
Aijun Peng<br>
926+
698 KB
Loading

0 commit comments

Comments
 (0)