Updates from December, 2019 Toggle Comment Threads | Keyboard Shortcuts

  • Wang 20:44 on 2019-12-24 Permalink | Reply
    Tags: ,   

    PoC of Apache Druid 

    As we have some business requirements about data aggregation and online processing, so we did a quick PoC on Apache Druid. Next I will show how to build druid quickly and start your ingestion task.

    1.Select release version which is compatible to your existing system and download the package.

    2.Choose what kind of druid service you want to start with

    • For single node, just execute the script under bin directory which is start with start-single-server-, or you can execute start-micro-quickstart
    • For multiple node cluster, please update the configuration files under start-micro-quickstart in one node and sync to other nodes. If you want to connect to your hadoop cluster, please copy corresponding hadoop xml files and kerberos keytab under druid.

    Then you start druid service in every node by execute start-cluster script.

    3.Visit druid through browser, http://IP:8888

    Next I load the data from local file and can ingest the data file as a datasource, and finally query data by SQL.

    Task configuration

      "type": "index_parallel",
      "id": "index_parallel_wikiticker-2015-09-12-sampled_2020-02-18T11:17:29.236Z",
      "resource": {
        "availabilityGroup": "index_parallel_wikiticker-2015-09-12-sampled_2020-02-18T11:17:29.236Z",
        "requiredCapacity": 1
      "spec": {
        "dataSchema": {
          "dataSource": "wikiticker-2015-09-12-sampled",
          "parser": {
            "type": "string",
            "parseSpec": {
              "format": "json",
              "timestampSpec": {
                "column": "time",
                "format": "iso"
              "dimensionsSpec": {
                "dimensions": [
          "metricsSpec": [
              "type": "count",
              "name": "count"
              "type": "longSum",
              "name": "sum_added",
              "fieldName": "added",
              "expression": null
              "type": "longSum",
              "name": "sum_deleted",
              "fieldName": "deleted",
              "expression": null
              "type": "longSum",
              "name": "sum_delta",
              "fieldName": "delta",
              "expression": null
              "type": "longSum",
              "name": "sum_metroCode",
              "fieldName": "metroCode",
              "expression": null
          "granularitySpec": {
            "type": "uniform",
            "segmentGranularity": "DAY",
            "queryGranularity": "HOUR",
            "rollup": true,
            "intervals": null
          "transformSpec": {
            "filter": null,
            "transforms": []
        "ioConfig": {
          "type": "index_parallel",
          "firehose": {
            "type": "local",
            "baseDir": "/opt/druid-0.16.0/quickstart/tutorial",
            "filter": "wikiticker-2015-09-12-sampled.json.gz",
            "parser": null
          "appendToExisting": false
        "tuningConfig": {
          "type": "index_parallel",
          "maxRowsPerSegment": null,
          "maxRowsInMemory": 1000000,
          "maxBytesInMemory": 0,
          "maxTotalRows": null,
          "numShards": null,
          "partitionsSpec": null,
          "indexSpec": {
            "bitmap": {
              "type": "concise"
            "dimensionCompression": "lz4",
            "metricCompression": "lz4",
            "longEncoding": "longs"
          "indexSpecForIntermediatePersists": {
            "bitmap": {
              "type": "concise"
            "dimensionCompression": "lz4",
            "metricCompression": "lz4",
            "longEncoding": "longs"
          "maxPendingPersists": 0,
          "forceGuaranteedRollup": false,
          "reportParseExceptions": false,
          "pushTimeout": 0,
          "segmentWriteOutMediumFactory": null,
          "maxNumConcurrentSubTasks": 1,
          "maxRetry": 3,
          "taskStatusCheckPeriodMs": 1000,
          "chatHandlerTimeout": "PT10S",
          "chatHandlerNumRetries": 5,
          "maxNumSegmentsToMerge": 100,
          "totalNumMergeTasks": 10,
          "logParseExceptions": false,
          "maxParseExceptions": 2147483647,
          "maxSavedParseExceptions": 0,
          "partitionDimensions": [],
          "buildV9Directly": true
      "context": {
        "forceTimeChunkLock": true
      "groupId": "index_parallel_wikiticker-2015-09-12-sampled_2020-02-18T11:17:29.236Z",
      "dataSource": "wikiticker-2015-09-12-sampled"

    Task running status

    Task finished, you can see the item in datasource/segment/query

  • Wang 17:35 on 2019-12-17 Permalink | Reply  


    BCP/DR: Business Continuous Plan/Disaster Recovery

    Why is business continuity planning important?

    Every organisation is at risk from potential disasters that include:

    • Natural disasters such as tornadoes, floods, blizzards, earthquakes and fire
    • Accidents
    • Sabotage
    • Power and energy disruptions
    • Communications, transportation, safety and service sector failure
    • Environmental disasters such as pollution and hazardous materials spills
    • Cyber attacks and hacker activity.

    A Business Continuity Plan includes:

    • Plans, measures and arrangements to ensure the continuous delivery of critical services and products, which permits the organization to recover its facility, data and assets.
    • Identification of necessary resources to support business continuity, including personnel, information, equipment, financial allocations, legal counsel, infrastructure protection and accommodations.

    Creating a business continuity plan

    A BCP typically includes five sections:

    1. BCP Governance
    2. Business Impact Analysis (BIA)
    3. Plans, measures, and arrangements for business continuity
    4. Readiness procedures
    5. Quality assurance techniques (exercises, maintenance and auditing)

    Establish control

    Senior managers or a BCP Committee would normally:

    • approve the governance structure;
    • clarify their roles, and those of participants in the program;
    • oversee the creation of a list of appropriate committees, working groups and teams to develop and execute the plan;
    • provide strategic direction and communicate essential messages;
    • approve the results of the BIA;
    • review the critical services and products that have been identified;
    • approve the continuity plans and arrangement;
    • monitor quality assurance activities; and
    • resolve conflicting interests and priorities.

    This BCP committee is normally comprised of the following members:

    • Executive sponsor has overall responsibility for the BCP committee; elicits senior management’s support and direction; and ensures that adequate funding is available for the BCP program.
    • BCP Coordinator secures senior management’s support; estimates funding requirements; develops BCP policy; coordinates and oversees the BIA process; ensures effective participant input; coordinates and oversees the development of plans and arrangements for business continuity; establishes working groups and teams and defines their responsibilities; coordinates appropriate training; and provides for regular review, testing and audit of the BCP.
    • Security Officer works with the coordinator to ensure that all aspects of the BCP meet the security requirements of the organization.
    • Chief Information Officer (CIO) cooperates closely with the BCP coordinator and IT specialists to plan for effective and harmonized continuity.
    • Business unit representatives provide input, and assist in performing and analyzing the results of the business impact analysis.

    Business impact analysis

    • Identify the mandate and critical aspects of an organization
    • Prioritize critical services or products
    • Identify impacts of disruptions
    • Identify areas of potential revenue loss
    • Identify additional expenses
    • Identify intangible losses
    • Insurance requirements
    • Ranking
    • Identify dependencies

    Plans for business continuity

    • Mitigating threats and risks
    • Analyze current recovery capabilities
    • Create continuity plans
    • Response preparation
    • Alternate facilities

    Readiness procedures

    • Training
    • Exercises
    • Goal
    • Objectives
    • Scope
    • Artificial aspects and assumptions
    • Participant Instructions
    • Exercise Narrative
    • Communications for Participants
    • Testing and Post-Exercise Evaluation

    Quality assurance techniques

    • Internal review
      • On a scheduled basis (annually or bi-annually)
      • when changes to the threat environment occur;
      • when substantive changes to the organization take place; and
      • after an exercise to incorporate findings.
    • External audit
      • Procedures used to determine critical services and processes
      • Methodology, accuracy, and comprehensiveness of continuity plans

    What to do when a disruption occurs

    Disruptions are handled in three steps:

    1. Response
    • Incident management
      • notifying management, employees, and other stakeholders
      • assuming control of the situation
      • identifying the range and scope of damage
      • implementing plans
      • identifying infrastructure outages; and
      • coordinating support from internal and external sources
    • Communications management
    • Operations management
    1. Continuation of critical services
    2. Recovery and restoration

    Recovery and restoration

    • Re-deploying personnel
    • Deciding whether to repair the facility, relocate to an alternate site or build a new facility
    • Acquiring the additional resources necessary for restoring business operations
    • Re-establishing normal operations
    • Resuming operations at pre-disruption levels


    • When critical services and products cannot be delivered, consequences can be severe. 
    • All organizations are at risk and face potential disaster if unprepared. 
    • A Business Continuity Plan is a tool that allows institutions to not only to moderate risk, but also continuously deliver products and services despite disruption.


    • plans must be updated and tested frequently;
    • all types of threats must be considered;
    • dependencies and interdependencies should be carefully analyzed;
    • key personnel may be unavailable;
    • telecommunications are essential;
    • alternate sites for IT backup should not be situated close to the primary site;
    • employee support (counselling) is important;
    • copies of plans should be stored at a secure off-site location;
    • sizable security perimeters may surround the scene of incidents involving national security or law enforcement, and can impede personnel from returning to buildings;
    • despite shortcomings, Business Continuity Plans in place pre September 11 were indispensable to the continuity effort; and
    • increased uncertainty (following a high impact disruption such as terrorism) may lengthen time until operations are normalized.

    Reference: https://www.publicsafety.gc.ca/cnt/rsrcs/pblctns/bsnss-cntnt-plnnng/index-en.aspx

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc
%d bloggers like this: