Apache Spark is a distributed data processing analytics engine that makes available new capabilities to data scientists, business analysts, and application developers. Apache Spark runs on Hadoop, Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources including Hadoop Distributed File System (HDF), Cassandra File System (CFS), Hadoop Database (HBase), and Simple Storage Service (S3).

Leave a Reply