Comprehensive software reviews to make better IT decisions
The End of Hadoop and Cloudera?
Hadoop was designed as a NoSQL and non-proprietary technology to persist large volumes of structured and unstructured data with minimal risk of loss using inexpensive hardware. I believe it has fulfilled this goal pretty well. So, where’s the problem?
I think the root cause of all “failed” Hadoop implementations lies in the “enterprise data warehouse” mentality, architecture and methodology used to build enterprise data lakes. Trying to squeeze all phases of a data lifecycle – creation, accumulation, integration, augmentation, packaging for BI and analytics – into one monolithic process implemented in a technology that was not designed for such use case is doomed for failure.
Instead of acknowledging the methodological mistake, the implementers started blaming the technology – which was never designed for their use case in the first place. This blame resulted in an economic downturn for the main vendor – Cloudera – which apparently positioned the technology to match the “enterprise data warehouse” expectations rather than what it is best suited for. This strategy resulted in the sharp drop of the stock price and the departure of the CEO in June 2019. However, the market is still betting on the technology and the company – they may not be ready to go yet.
Source: SoftwareReviews Big Data Data Quadrant, Accessed August 21, 2019.
Hadoop may be still a good choice for structured and unstructured data accumulation and “as is” storage. Its technology may still be too rudimentary for data augmentation and is absolutely a misfit for data packaging for BI and analytics. Hadoop is in the trough of disillusionment – and this is good. I hope that Hadoop failure stories will be regarded as lessons learned against the magic-technology-first approach to creating data management solutions. There’s no “magic technology” – even AI is not magic. Every good solution requires a combination of business, information, and technology architectures. Data creation, accumulation and persistence, data integration and augmentation, and data packaging for BI and analytics are all distinctly different phases of data progression and require different architecture, governance, and technologies.
Want to Know More?
PHEMI is a data privacy solution focused on keeping data-processing activities secure by redacting information based on the role of the accessor. Thus, allowing such data to be used for multiple use cases without compromising privacy.
Board International Solution Marketplace Gets SOC Compliance: Prêt-à-Porter Solutions and Assured Security in the Cloud
Board International follows the trend of delivering solutions by opening a solution marketplace while strengthening customer trust by getting SOC-2 and SOC-3 certifications.
Joining the ranks of giants such as Snap (Snapchat’s parent company), Microsoft and Tesla, Immuta the automated Data Governance company has been named to Fast Company’s 2020 list of the World’s 50 Most Innovative Companies.
Databricks has launched a new Data Ingestion Network, made up of partners whose integrations to Data Ingest provide hundreds of connectors and enable automation to move disparate data into Databricks’ new storage layer, eliminating the need to maintain siloed data in a data lake and data warehouse.
The EU plans to invest €6 billion to build a single European data space, reports EURACTIV. The envisioned space will house personal, business, and “high-quality industrial data” and create the infrastructure for data sharing and use across businesses and nations.
Databricks, a data processing and analytics platform with a strong focus on AI and ML, has partnered with Immuta to deliver automated end-to-end data governance for AI, data science, and ML projects.
There’s a proliferation of AI-driven/AI-powered/AI-[insert-your-own-favorite-verb-here] tools and products on the market, because AI – and its underlying technology, machine learning – is sexy and it sells. (And, in some cases, delivers.) We decided to take a look at one of the vendors, AnswerRocket.
Microsoft claims its newly announced Azure Synapse Analytics service is four times faster than Amazon Redshift and 75 times faster than Google BigQuery. This announcement positions Microsoft as a leader in this market, but it is also likely to generate counterclaims from its competitors.
AWS Lake Formation makes it easier for users to set up and manage data lakes. But organizations will face challenges in determining how to derive value from their data lakes.