Tagged: Cluster Toggle Comment Threads | Keyboard Shortcuts

  • Wang 23:23 on 2022-01-16 Permalink | Reply
    Tags: Alluxio, , Cluster, ,   


    Data Locality: Bring your data close to compute.
    Make your data local to compute workloads for Spark caching, Presto caching, Hive caching and more.
    Data Accessibility: Make your data accessible.
    No matter if it sits on-prem or in the cloud, HDFS or S3, make your files and objects accessible in many different ways.
    Data On-Demand: Make your data as elastic as compute.
    Effortlessly orchestrate your data for compute in any cloud, even if data is spread across multiple clouds.

  • Wang 21:06 on 2021-07-14 Permalink | Reply
    Tags: Cluster, , , ,   

    Very well explanation on K8S network 

  • Wang 19:59 on 2021-06-09 Permalink | Reply
    Tags: , Cluster, ,   


    Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.


  • Wang 11:31 on 2020-08-13 Permalink | Reply
    Tags: , Cluster, , , , ,   

    Jupyter Gateway + JupyterHub 


  • Wang 21:38 on 2020-06-13 Permalink | Reply
    Tags: , , Cluster,   

    Data Pipelines with Apache Airflow

  • Wang 20:44 on 2019-12-24 Permalink | Reply
    Tags: , Cluster   

    PoC of Apache Druid 

    As we have some business requirements about data aggregation and online processing, so we did a quick PoC on Apache Druid. Next I will show how to build druid quickly and start your ingestion task.

    1.Select release version which is compatible to your existing system and download the package.

    2.Choose what kind of druid service you want to start with

    • For single node, just execute the script under bin directory which is start with start-single-server-, or you can execute start-micro-quickstart
    • For multiple node cluster, please update the configuration files under start-micro-quickstart in one node and sync to other nodes. If you want to connect to your hadoop cluster, please copy corresponding hadoop xml files and kerberos keytab under druid.

    Then you start druid service in every node by execute start-cluster script.

    3.Visit druid through browser, http://IP:8888

    Next I load the data from local file and can ingest the data file as a datasource, and finally query data by SQL.

    Task configuration

      "type": "index_parallel",
      "id": "index_parallel_wikiticker-2015-09-12-sampled_2020-02-18T11:17:29.236Z",
      "resource": {
        "availabilityGroup": "index_parallel_wikiticker-2015-09-12-sampled_2020-02-18T11:17:29.236Z",
        "requiredCapacity": 1
      "spec": {
        "dataSchema": {
          "dataSource": "wikiticker-2015-09-12-sampled",
          "parser": {
            "type": "string",
            "parseSpec": {
              "format": "json",
              "timestampSpec": {
                "column": "time",
                "format": "iso"
              "dimensionsSpec": {
                "dimensions": [
          "metricsSpec": [
              "type": "count",
              "name": "count"
              "type": "longSum",
              "name": "sum_added",
              "fieldName": "added",
              "expression": null
              "type": "longSum",
              "name": "sum_deleted",
              "fieldName": "deleted",
              "expression": null
              "type": "longSum",
              "name": "sum_delta",
              "fieldName": "delta",
              "expression": null
              "type": "longSum",
              "name": "sum_metroCode",
              "fieldName": "metroCode",
              "expression": null
          "granularitySpec": {
            "type": "uniform",
            "segmentGranularity": "DAY",
            "queryGranularity": "HOUR",
            "rollup": true,
            "intervals": null
          "transformSpec": {
            "filter": null,
            "transforms": []
        "ioConfig": {
          "type": "index_parallel",
          "firehose": {
            "type": "local",
            "baseDir": "/opt/druid-0.16.0/quickstart/tutorial",
            "filter": "wikiticker-2015-09-12-sampled.json.gz",
            "parser": null
          "appendToExisting": false
        "tuningConfig": {
          "type": "index_parallel",
          "maxRowsPerSegment": null,
          "maxRowsInMemory": 1000000,
          "maxBytesInMemory": 0,
          "maxTotalRows": null,
          "numShards": null,
          "partitionsSpec": null,
          "indexSpec": {
            "bitmap": {
              "type": "concise"
            "dimensionCompression": "lz4",
            "metricCompression": "lz4",
            "longEncoding": "longs"
          "indexSpecForIntermediatePersists": {
            "bitmap": {
              "type": "concise"
            "dimensionCompression": "lz4",
            "metricCompression": "lz4",
            "longEncoding": "longs"
          "maxPendingPersists": 0,
          "forceGuaranteedRollup": false,
          "reportParseExceptions": false,
          "pushTimeout": 0,
          "segmentWriteOutMediumFactory": null,
          "maxNumConcurrentSubTasks": 1,
          "maxRetry": 3,
          "taskStatusCheckPeriodMs": 1000,
          "chatHandlerTimeout": "PT10S",
          "chatHandlerNumRetries": 5,
          "maxNumSegmentsToMerge": 100,
          "totalNumMergeTasks": 10,
          "logParseExceptions": false,
          "maxParseExceptions": 2147483647,
          "maxSavedParseExceptions": 0,
          "partitionDimensions": [],
          "buildV9Directly": true
      "context": {
        "forceTimeChunkLock": true
      "groupId": "index_parallel_wikiticker-2015-09-12-sampled_2020-02-18T11:17:29.236Z",
      "dataSource": "wikiticker-2015-09-12-sampled"

    Task running status

    Task finished, you can see the item in datasource/segment/query

  • Wang 23:22 on 2019-10-25 Permalink | Reply
    Tags: , Cluster, , , , ,   

    SpringOne Platform 2019 in Austin, https://springoneplatform.io/

  • Wang 21:30 on 2019-10-22 Permalink | Reply
    Tags: , Cluster, , Istio, ,   

    Istio playbook 

    Cloud platforms provide a wealth of benefits for the organizations that use them. However, there’s no denying that adopting the cloud can put strains on DevOps teams. Developers must use microservices to architect for portability, meanwhile operators are managing extremely large hybrid and multi-cloud deployments. Istio lets you connect, secure, control, and observe services.

    First, download Istio release version, unzip the package and enter the directory.

    Second, verify installation environment

    bin/istioctl verify-install

    Next, deploy Istio and select the demo profile which enable many features like tracing/kiali/grafana

    bin/istioctl manifest apply --set profile=demo

    Then, check Istio pods’ status, make sure all the related pods are running

    Istio Commands

    • authn: Interact with Istio authentication policies
    • authz: (authz is experimental. Use istioctl experimental authz)
    • convert-ingress: Convert Ingress configuration into Istio VirtualService configuration
    • dashboard: Access to Istio web UIs like kiali, grafana, prometheus, jaeger
    • deregister: De-registers a service instance
    • experimental: Experimental commands that may be modified or deprecated
    • help: Help about any command
    • kube-inject: Inject Envoy sidecar into Kubernetes pod resources
    • manifest: Commands related to Istio manifests
    • profile: Commands related to Istio configuration profiles
    • proxy-config: Retrieve information about proxy configuration from Envoy [kube only]
    • proxy-status: Retrieves the synchronization status of each Envoy in the mesh [kube only]
    • register: Registers a service instance (e.g. VM) joining the mesh
    • validate: Validate Istio policy and rules
    • verify-install: Verifies Istio Installation Status or performs pre-check for the cluster before Istio installation
    • version: Prints out build version information

  • Wang 00:14 on 2019-09-04 Permalink | Reply
    Tags: Cluster, ,   

    Elasticsearch Benchmark By Esrally 

    esrally is a benchmark tool for elasticsearch, https://github.com/elastic/rally

    Testing Script:

    esrally --pipeline=benchmark-only --target-hosts=server1:9200,server2:9200,server3:9200,server4:9200,server5:9200

    Server status in Kibana dashboard:

    P.S. Please benchmark your ES cluster based on your real business scenario & cluster topology

  • Wang 22:34 on 2019-05-10 Permalink | Reply
    Tags: , , Cluster,   

    Kubernetes node in “NotReady” status 

    Rencetly I found some k8s nodes became “NotReady”, I checked disk and memory, they both seems fine.

    [xxx@xxx-xxx ~]# kubectl describe node xxx-xxx
      Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason                    Message
      ----             ------    -----------------                 ------------------                ------                    -------
      PIDPressure      False     Fri, 10 May 2019 09:24:43 +0900   Fri, 10 May 2018 00:10:12 +0900   KubeletHasSufficientPID   kubelet has sufficient PID available

    Then I restarted kubelet on server and checked logs, I found:

    [xxx@xxx-xxx ~]# systemctl status kubelet
    ● kubelet.service - Kubernetes Kubelet Server
       Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
    May 10 12:30:30 xxx-xxx kubelet[16776]: F0322 12:30:30.810434   16776 server.go:233] failed to run Kubelet: Running with swap on is not supported, plea...

    So I checked server’s status and turn off swap, then I restarted kubelet and the nodes went well.

    [xxx@xxx-xxx ~]# swapoff -a
    [xxx@xxx-xxx ~]# systemctl restart kubelet

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc
%d bloggers like this: