Comprehensive software reviews to make better IT decisions
The End of Hadoop and Cloudera?
Hadoop was designed as a NoSQL and non-proprietary technology to persist large volumes of structured and unstructured data with minimal risk of loss using inexpensive hardware. I believe it has fulfilled this goal pretty well. So, where’s the problem?
I think the root cause of all “failed” Hadoop implementations lies in the “enterprise data warehouse” mentality, architecture and methodology used to build enterprise data lakes. Trying to squeeze all phases of a data lifecycle – creation, accumulation, integration, augmentation, packaging for BI and analytics – into one monolithic process implemented in a technology that was not designed for such use case is doomed for failure.
Instead of acknowledging the methodological mistake, the implementers started blaming the technology – which was never designed for their use case in the first place. This blame resulted in an economic downturn for the main vendor – Cloudera – which apparently positioned the technology to match the “enterprise data warehouse” expectations rather than what it is best suited for. This strategy resulted in the sharp drop of the stock price and the departure of the CEO in June 2019. However, the market is still betting on the technology and the company – they may not be ready to go yet.
Source: SoftwareReviews Big Data Data Quadrant, Accessed August 21, 2019.
Hadoop may be still a good choice for structured and unstructured data accumulation and “as is” storage. Its technology may still be too rudimentary for data augmentation and is absolutely a misfit for data packaging for BI and analytics. Hadoop is in the trough of disillusionment – and this is good. I hope that Hadoop failure stories will be regarded as lessons learned against the magic-technology-first approach to creating data management solutions. There’s no “magic technology” – even AI is not magic. Every good solution requires a combination of business, information, and technology architectures. Data creation, accumulation and persistence, data integration and augmentation, and data packaging for BI and analytics are all distinctly different phases of data progression and require different architecture, governance, and technologies.
Want to Know More?
Create a Customized Big Data Architecture and Implementation Plan
Architect Your Big Data Environment
Main Hadoop Developers – Hortonworks and Cloudera – Under One Roof
SAS Hadoop on SoftwareReviews
IBM Raises Price on Software Support; Shoves Customers Toward the Cloud
IBM is changing the terms of its ubiquitous Passport Advantage agreement to remove entitled discounts on over 5,000 on-premises software products, resulting in an immediate price increase for IBM Software & Support (S&S) across its vast customer landscape.
Lewis Carrol on DataOps: No Data Today, All Data Tomorrow?
The beauty of good story telling is its applicability to the most unexpected situations. In 1871, Lewis Carroll wrote about the evil Queen trying to convince Alice to work for her, with a promise of “jam to-morrow and jam yesterday – but never jam to-day.” Little did he know that this one statement would be used by economists, politicians, playwrights, and musicians long after he wrote it – it's time to add data analysts to the list.
PHEMI: A Data Privacy Tool for Healthcare Providers
PHEMI is a data privacy solution focused on keeping data-processing activities secure by redacting information based on the role of the accessor. Thus, allowing such data to be used for multiple use cases without compromising privacy.
Board International Solution Marketplace Gets SOC Compliance: Prêt-à-Porter Solutions and Assured Security in the Cloud
Board International follows the trend of delivering solutions by opening a solution marketplace while strengthening customer trust by getting SOC-2 and SOC-3 certifications.
Immuta Named to Fast Company’s 2020 List of the World’s 50 Most Innovative Companies
Joining the ranks of giants such as Snap (Snapchat’s parent company), Microsoft and Tesla, Immuta the automated Data Governance company has been named to Fast Company’s 2020 list of the World’s 50 Most Innovative Companies.
Databricks Lakehouse Combines the Best of Data Lake and Data Warehouse in a Single Platform
Databricks has launched a new Data Ingestion Network, made up of partners whose integrations to Data Ingest provide hundreds of connectors and enable automation to move disparate data into Databricks’ new storage layer, eliminating the need to maintain siloed data in a data lake and data warehouse.
EU to Invest €6 Billion to Build a Single European Data Space
The EU plans to invest €6 billion to build a single European data space, reports EURACTIV. The envisioned space will house personal, business, and “high-quality industrial data” and create the infrastructure for data sharing and use across businesses and nations.
Databricks and Immuta Partner to Provide End-to-End Data Governance for Machine Learning
Databricks, a data processing and analytics platform with a strong focus on AI and ML, has partnered with Immuta to deliver automated end-to-end data governance for AI, data science, and ML projects.
AnswerRocket, “The AI-Driven Analyst for Everyone”
There’s a proliferation of AI-driven/AI-powered/AI-[insert-your-own-favorite-verb-here] tools and products on the market, because AI – and its underlying technology, machine learning – is sexy and it sells. (And, in some cases, delivers.) We decided to take a look at one of the vendors, AnswerRocket.