Databricks Sets New World Record for CloudSort Benchmark Using Apache Spark at $1.44 Per Terabyte

Databricks Sets New World Record for CloudSort Benchmark Using Apache Spark at $1.44 Per Terabyte

ID: 506978

In Collaboration with Industry Partners, Databricks Earns Second World Record in Two Years, Reducing Data Processing Costs In the Cloud per Terabyte by 68 Percent

(firmenpresse) - SAN FRANCISCO, CA -- (Marketwired) -- 11/15/16 -- ®, the company founded by the the team that created the popular Apache® Spark™ project, announced today that in collaboration with industry partners, it has broken the world record in the , a third-party industry benchmarking competition for processing large datasets.

Utilizing and working in close collaboration with and to form the team, NADSort, Databricks architected an efficient cloud platform for data processing. The platform sorted 100 terabytes (TB) of data at a total cost of USD $144, or $1.44 per TB, worth of cloud computing resources for both the Daytona and Indy CloudSort competitions. This record outperformed the previously held record by University of California, San Diego of $4.51 per TB, with savings of 68 percent.

The objective and purpose of entry is to measure the lowest cost in public cloud pricing per terabyte, reducing the total cost of ownership of the cloud architecture (a combination of software stack, hardware stack, and tuning) and encouraging organizations to adopt and deploy big data applications onto the public cloud. In 2014, Databricks set the record for , sorting 100TB of data, or 1 trillion records in 23 minutes, which was 30 times more efficient per node than the previous record held by Apache Hadoop. The sorting program, based on the and updated for better efficiency for the cloud, ran on 394 ECS.n1.large nodes on the Alibaba Cloud, each equipped with an Intel Haswell E5-2680 v3 processor, 8 Gigabytes of memory, and 4x135 GB SSD Cloud Disk.

"Databricks reduced the per terabyte cost from 4.51 dollars, the previous world record held by University of California, San Diego in 2014, to 1.44 dollars, meaning our optimizations and advances in cloud computing have tripled the efficiency of data processing in the cloud," said Databricks Chief Architect and leader of the CloudSort Benchmark project, Reynold Xin. "With these innovations, to process the same amount of data in 2016 in the cloud costs one third of the price in 2014!"





Three important factors made this CloudSort cost efficiency possible, according to Reynold Xin in his blog:

: Increased competition among major cloud providers has lowered the cost of resources, making deploying applications in the cloud economically feasible and scalable;
2. : Continued innovations in Apache Spark, such as , , and , has benefited Spark enormously improving all aspects of the Spark stack;
3. : Combined in-house expertise in Spark and deep expertise gained in operating and tuning cloud-native data architecture of tens of thousands of clusters for customers have led to incremental gains of efficiency, developing the most efficient cloud architecture for data processing.

"The achievements of two world records in two years leave us humbled, yet they validate the technology trends we've invested in heavily," said Databricks CEO, Ali Ghodsi. "First, we believe open source software is the future of software evolution, and Apache Spark is the most efficient engine for data processing. And second, cloud computing is becoming the most cost-efficient, effective, and scalable architecture to deploy big data applications."

Read the blog to learn more:

Contact Databricks to get started:

:

Databricks' vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache® Spark™, a powerful open source data processing engine built for sophisticated analytics, ease of use, and speed. Databricks is the largest contributor to the open source Apache Spark project. The company has also trained over 20,000 users on Apache Spark, and has the largest number of customers deploying Spark to date. Databricks provides a just-in-time data platform, to simplify data integration, real-time experimentation, and robust deployment of production applications. Databricks is venture-backed by Andreessen Horowitz and NEA. For more information, contact .

© Databricks 2016. All rights reserved. Apache, Apache Spark and Spark are trademarks of the Apache Software Foundation.



Suzanne Block for Databricks
P: 617-824-0981
E:


Themen in dieser Pressemitteilung:


Unternehmensinformation / Kurzprofil:
drucken  als PDF  an Freund senden  
Bereitgestellt von Benutzer: Marketwired
Datum: 15.11.2016 - 11:00 Uhr
Sprache: Deutsch
News-ID 506978
Anzahl Zeichen: 0

contact information:
Town:

SAN FRANCISCO, CA



Kategorie:



Diese Pressemitteilung wurde bisher 166 mal aufgerufen.


Die Pressemitteilung mit dem Titel:
"Databricks Sets New World Record for CloudSort Benchmark Using Apache Spark at $1.44 Per Terabyte"
steht unter der journalistisch-redaktionellen Verantwortung von

Databricks (Nachricht senden)

Beachten Sie bitte die weiteren Informationen zum Haftungsauschluß (gemäß TMG - TeleMedianGesetz) und dem Datenschutz (gemäß der DSGVO).


Alle Meldungen von Databricks



 

Werbung



Sponsoren

foodir.org The food directory für Deutschland
News zu Snacks finden Sie auf Snackeo.
Informationen für Feinsnacker finden Sie hier.

Firmenverzeichniss

Firmen die firmenpresse für ihre Pressearbeit erfolgreich nutzen
1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z