You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[How to build and deploy the solution](#how-to-build-and-deploy-the-solution)
16
+
-[How to build and deploy the Guidance](#how-to-build-and-deploy-the-guidance)
16
17
-[Configuration](#configuration)
17
18
-[Build and deploy](#build-and-deploy)
18
19
-[Access the druid web console](#access-the-druid-web-console)
19
-
-[Uninstall the solution](#uninstall-the-solution)
20
+
-[Uninstall the Guidance](#uninstall-the-guidance)
20
21
-[Collection of operational metrics](#collection-of-operational-metrics)
22
+
-[Notices](#notices)
23
+
-[Licence](#licence)
24
+
-[Disclaimer](#disclaimer)
25
+
-[Authors](#authors)
21
26
22
27
---
23
28
24
29
## Scalable Analytics using Apache Druid on AWS
25
30
26
-
Scalable analytics using Apache Druid on AWS is a solution offered by AWS that enables customers to quickly and efficiently deploy, operate and manage a cost-effective, highly available, resilient, and fault tolerant hosting environment for Apache Druid analytics databases on AWS.
31
+
Guidance for Scalable analytics using Apache Druid on AWS is a solution offered by AWS that enables customers to quickly and efficiently deploy, operate and manage a cost-effective, highly available, resilient, and fault tolerant hosting environment for Apache Druid analytics databases on AWS.
27
32
28
-
## Licence
29
-
30
-
Licensed under the Apache License Version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at
31
-
32
-
http://www.apache.org/licenses/
33
-
34
-
or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and limitations under the License.
35
-
36
-
## About this solution
33
+
## About this Guidance
37
34
38
35
The solution incorporates an [AWS CDK](https://aws.amazon.com/cdk/) construct that encapsulates Apache Druid and is purposefully architected to be easy to operate, secure by design, and well-architected. With this solution, you can establish an Apache Druid cluster for data ingestion and analysis within a matter of minutes. It also provides deployment flexibility by supporting Amazon EC2, Amazon Elastic Kubernetes Service (EKS), and EKS Fargate options, enabling you to run Apache Druid on EC2, EKS, or EKS Fargate as per your preference.
39
36
40
37
You can use this README file to find out how to build, deploy, use and test the code. You can also contribute to this project in various ways such as reporting bugs, submitting feature requests or additional documentation. For more information, refer to the [Contributing](CONTRIBUTING.md) topic.
41
38
42
-
## Solution overview
39
+
## Guidance overview
43
40
44
41
Scalable analytics using Apache Druid on AWS helps organizations utilize the full suite of features and capabilities of this powerful open source analytics engine, while leveraging the flexibility, elasticity, scalability and price performance options of AWS’s compute and storage offerings, for their real-time, low-latency analytics processing use cases.
45
42
@@ -81,33 +78,99 @@ The following diagram represents the solution's architecture design for EC2 stac
81
78
82
79
For the architecture of EKS stack, please refer to the diagram located at `source/images/solution_architecture_diagram-eks.png`.
83
80
84
-
### Solution components
81
+
### Guidance components
85
82
86
-
The solution deploys the following components that work together to provide a production-ready Druid cluster:
83
+
The Guidance deploys the following components that work together to provide a production-ready Druid cluster:
87
84
88
85
-**Web application firewall**: AWS WAF is utilized to safeguard Druid web console and Druid API endpoints from prevalent web vulnerabilities and automated bots that could potentially impact availability, compromise security, or overutilize resources. It is automatically activated when the `internetFacing` parameter is set to true.
89
86
90
-
-**Application load balancer**: A load balancer serves as the single point of contact for clients. The load balancer distributes incoming application traffic across multiple query servers in multiple Availability Zones.
87
+
-**Bastion Host**: A security hardened Linux server (Bastion host) manages access to the Druid servers running in a private network separate from an external network. It can also be used to access the Druid web console through SSH tunneling, where a private Application Load Balancer (ALB) is deployed.
91
88
92
-
-**Druid master auto scaling group**: An Auto Scaling group contains a collection of Druid master servers. A Master server manages data ingestion and availability: it is responsible for starting new ingestion jobs and coordinating availability of data on the "Data servers". Within a Master server, functionality is split between two processes, the Coordinator and Overlord.
89
+
-**Application load balancer**: ALB serves as the single point of contact for clients. The load balancer distributes incoming application traffic from identity providers—such as object identifiers (OIDS) and lightweight directory access protocol (LDAP)—across multiple query servers in multiple Availability Zones.
93
90
94
-
-**Druid query auto scaling group**: An Auto Scaling group contains a collection of Druid query servers. A Query server provides the endpoints that users and client applications interact with, routing queries to Data servers or other Query servers. Within a Query server, functionality is split between two processes, the Broker and Router.
91
+
-**Private Subnet**: The private subnet consists of the following:
92
+
-**Druid master auto scaling group**: An Auto Scaling group contains a collection of Druid master servers. A Master server manages data ingestion and availability: it is responsible for starting new ingestion jobs and coordinating availability of data on the "Data servers". Within a Master server, functionality is split between two processes, the Coordinator and Overlord.
95
93
96
-
-**Druid data auto scaling group**: An Auto Scaling group contains a collection of Druid data servers. A Data server executes ingestion jobs and stores queryable data. Within a Data server, functionality is split between two processes, the Historical and MiddleManager.
94
+
-**Druid data auto scaling group**: An Auto Scaling group contains a collection of Druid data servers. A Data server executes ingestion jobs and stores queryable data. Within a Data server, functionality is split between two processes, the Historical and MiddleManager.
97
95
98
-
-**ZooKeeper auto scaling group**: An Auto Scaling group contains a collection of ZooKeeper servers. Apache Druid uses Apache ZooKeeper (ZK) for management of current cluster state.
96
+
-**Druid query auto scaling group**: An Auto Scaling group contains a collection of Druid query servers. A Query server provides the endpoints that users and client applications interact with, routing queries to Data servers or other Query servers. Within a Query server, functionality is split between two processes, the Broker and Router.
99
97
100
-
-**Aurora based metadata storage**: An Aurora PostgreSQL database cluster provides the metadata storage to Apache Druid cluster. Druid uses the metadata storage to house various metadata about the system, but not to store the actual data.
98
+
-**ZooKeeper auto scaling group**: An Auto Scaling group contains a collection of ZooKeeper servers. Apache Druid uses Apache ZooKeeper (ZK) for management of current cluster state.
101
99
102
100
-**S3 based deep storage**: An Amazon S3 bucket that provides deep storage to Apache Druid cluster. Deep storage is where segments are stored.
103
101
104
-
-**Application secrets**: Secrets manager stores the secrets used by Apache Druid including RDS secret, admin user secret etc.
102
+
-**Application secrets**: AWS Secrets Manager stores the secrets used by Apache Druid, including the Amazon Relational Database Service (Amazon RDS) secret and the administrator user secret.
105
103
106
104
-**Logs, metrics, and dashboards**: Logs, metrics, and dashboards are supported in CloudWatch.
107
105
106
+
-**Aurora based metadata storage**: An Aurora PostgreSQL database cluster provides the metadata storage to Apache Druid cluster. Druid uses the metadata storage to house various metadata about the system, but not to store the actual data.
107
+
108
108
-**Notifications**: The notification system, powered by Amazon Simple Notification Service (Amazon SNS), delivers alerts or alarms promptly when system events occur. This ensures immediate awareness and action when needed.
109
109
110
-
-**Bastion host**: A security hardened Linux server used to manage access to the Druid servers running in private network from an external network. It can also be used to access Druid web console through SSH tunneling in the case where private ALB is deployed.
110
+
### Cost
111
+
112
+
You are responsible for the cost of the AWS services used while running this guidance. As of the latest revision, the costs for running this guidance with the default settings (small usage profile) in the US East (N. Virginia) Region is approximately **$714.46 per month**, for a medium usage profile in the US East (N. Virginia) Region is approximately **$2,202.47 per month**, and for a large usage profile in the US East (N. Virginia) Region is approximately **$13,645.27 per month**.
113
+
114
+
These costs are for the resources shown in the [Cost table](#cost-table) section. See the pricing webpage for each AWS service used in this guidance.
115
+
116
+
We recommend creating a [budget](https://docs.aws.amazon.com/cost-management/latest/userguide/budgets-create.html) through [AWS Cost Explorer](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/) to help manage costs. Prices are subject to change. For full details, see the pricing webpage for each AWS service used in this guidance.
117
+
118
+
#### Cost table
119
+
120
+
The following tables provide a sample cost breakdown for deploying this guidance with the default parameters in the US East (N. Virginia) Region for one month, encompassing the small, medium, and large usage profiles.
121
+
122
+
#### 1. Small usage profile
123
+
124
+
Profile assumptions: ingestion throughput at 30,000 records per second, query throughput at 25 queries per second.
| Amazon ELB | 1 x ALB, 200 GB/h processed bytes (EC2 Instances and IP addresses as targets) | $1,184.43 |
168
+
| Amazon Aurora | 3 x db.t3.large | $427.39 |
169
+
| Amazon S3 | 50 TB standard storage + 10,000,000 requests per month | $1,181.60 |
170
+
| AWS Key Management Service | 7 x customer managed keys | $7 |
171
+
| AWS Secrets Manager | 4 x secrets | $1.6 |
172
+
| Amazon CloudWatch | 1,000 GB standard logs ingested per month, 200 custom metrics + 1,000,000 metric requests per month | $574.50 |
173
+
||**Total:**|**$13,645.27 [USD] / month**|
111
174
112
175
---
113
176
@@ -138,7 +201,7 @@ The solution deploys the following components that work together to provide a pr
138
201
139
202
---
140
203
141
-
## How to build and deploy the solution
204
+
## How to build and deploy the Guidance
142
205
143
206
Before you deploy the solution, review the architecture and prerequisites sections in this guide. Follow the step-by-step instructions in this section to configure and deploy the solution into your account.
144
207
@@ -821,7 +884,7 @@ Upon successfully cloning the repository into your local development environment
821
884
822
885
---
823
886
824
-
## Uninstall the solution
887
+
## Uninstall the Guidance
825
888
826
889
You can uninstall the solution by directly deleting the stacks from the AWS CloudFormation console.
827
890
@@ -834,14 +897,30 @@ Alternatively, you could also uninstall the solution by running `npm run cdk des
834
897
835
898
## Collection of operational metrics
836
899
837
-
This solution collects anonymous operational metrics to help AWS improve the quality and features of the solution. For more information, including how to disable this capability, refer to the [implementation guide](#https://docs.aws.amazon.com/solutions/latest/scalable-analytics-using-apache-druid-on-aws/welcome.html).
900
+
This solution collects anonymous operational metrics to help AWS improve the quality and features of the solution. For more information, including how to disable this capability, refer to the [implementation guide](https://docs.aws.amazon.com/solutions/latest/scalable-analytics-using-apache-druid-on-aws/solution-overview.html).
838
901
839
902
---
840
903
841
-
Copyright 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
904
+
## Notices
905
+
906
+
### Licence
907
+
908
+
Copyright 2025 Amazon.com, Inc. or its affiliates. All Rights Reserved.
842
909
843
910
Licensed under the Apache License Version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at
844
911
845
912
http://www.apache.org/licenses/
846
913
847
914
or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and limitations under the License.
915
+
916
+
### Disclaimer
917
+
918
+
Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers.
0 commit comments