Choose the Right Tools for Big Data Development
Leverage Hadoop as your pilot project to gain organizational buy-in and build institutional learning.
Onsite Workshop
The lack of a defined and comprehensive approach to big data leads to:
- Inability to address the skills gaps of database experts and data scientists
- Inappropriately handling of single points of failure
- Compliance and security risks due to the lack of security in many big data products
A well-defined big data tool stack provides the ability to:
- Quickly handle large volumes of data with multiple schemas
- Leverage the scalability of the database cluster to accommodate load spikes
- Implement redundancies to ensure high availability and fault tolerance
Module 1: Assess fit for big data
The Purpose
- Understand the current big data landscape.
- Identify the project team.
- Assess the current data analytics stack.
Key Benefits Achieved
- Understand the organization’s readiness for big data development.
Activities: | Outputs: | |
---|---|---|
1.1 | Document and assess the development process |
|
1.2 | Assess the data analytics stack |
|
1.3 | Address the gaps in the stack |
|
Module 2: Draw the big data flow
The Purpose
- Map the requirements to big data.
- Draw the big data flow.
Key Benefits Achieved
- Ensure business requirements are mapped to each component of the big data flow.
Activities: | Outputs: | |
---|---|---|
2.1 | Document the business requirements and use cases |
|
2.2 | Draw the top-down and bottom-up big data flows |
|
Module 3: Build the Hadoop stack
The Purpose
- Choose the appropriate installation approach.
- Import data into Hadoop.
- Develop the MapReduce program.
- Select big data analytics tools.
- Conduct end-to-end testing.
Key Benefits Achieved
- Create a baseline Hadoop stack that fits the organization’s needs.
- Understand the challenges of installing and managing the Hadoop stack.
Activities: | Outputs: | |
---|---|---|
3.1 | Select the installation approach |
|
3.2 | Classify the imported data |
|
3.3 | Select the data collection tools |
|
3.4 | Design the relational schema |
|
3.5 | Test and validate the dataflow |
|
3.6 | Choose analytics tools |
|
3.7 | Perform end-to-end testing |
|
Module 4: Roll out Hadoop in the organization
The Purpose
- Prepare the Hadoop stack for deployment.
- Gain institutional learning.
- Create an organizational rollout plan
Key Benefits Achieved
Activities: | Outputs: | |
---|---|---|
4.1 | Establish instrumentation points |
|
4.2 | Optimize the Hadoop stack |
|
4.3 | Develop an organization rollout plan |
|
4.4 | Establish a stakeholder communication plan |
|