High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.
Data Locality: Bring your data close to compute. Make your data local to compute workloads for Spark caching, Presto caching, Hive caching and more. Data Accessibility: Make your data accessible. No matter if it sits on-prem or in the cloud, HDFS or S3, make your files and objects accessible in many different ways. Data On-Demand: Make your data as elastic as compute. Effortlessly orchestrate your data for compute in any cloud, even if data is spread across multiple clouds.
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
As we have some business requirements about data aggregation and online processing, so we did a quick PoC on Apache Druid. Next I will show how to build druid quickly and start your ingestion task.
1.Select release version which is compatible to your existing system and download the package.
2.Choose what kind of druid service you want to start with
For single node, just execute the script under bin directory which is start with start-single-server-, or you can execute start-micro-quickstart
For multiple node cluster, please update the configuration files under start-micro-quickstart in one node and sync to other nodes. If you want to connect to your hadoop cluster, please copy corresponding hadoop xml files and kerberos keytab under druid.
Then you start druid service in every node by execute start-cluster script.
Reply