Spark supports multiple programming languages. Out of them most used languages are Scala, Python and Java. But which is best suitable to use with Spark? There are various factor factors that will decide this and we can never say that best programming language because each of the above mentioned languages has it's own pros and cons. Let's see them for each programming language.
One of the most used languages in industry. Lot of developers will have good idea on Java. And also it has lot documentation support and community support. Java has got lambda expressions support since 1.8 version, which was little late compared to other languages. But Java is too verbose due to it's boiler plate code. If we decide to develop Spark applications in Java we will end up with writing a lot of code. Writing code in Scala or Python takes very less time. New features release in Java will be little late compare to other languages.
A programming language that runs on JVM and Spark framework itself developed in Scala. So it offer best APIs compared to other languages and we will few performance benefits also if we develop Spark application in Scala. Scala is much faster than Java and Python to process data. Apart from these Scala also supports other frameworks like Akka and Play. Scala works best when we write streaming applications. Coming to the cons, it's code, too weird and sometimes even experienced developers also will get confused. So to work with Scala we need a bit more practice.
A non-JVM programming language, which has been used vastly in Data Science applications. Python has lot of developers community support and has lot of libraries. Easy to write, understand and maintain the code. Has good support for lambda expressions as well. But Python is very slow compared Java and Scala as it is a non-JVM programming language and calls to JVM is very costly.