A new high-performance programming language for data manipulation and analysis promises to help developers meet the increasing demands for scalability in a “big data” environment. Julia, which was created at MIT and first released in open-source in February 2012, represents an effort to overcome some of the shortcomings of popular numerical and statistical programming languages, such as R and Matlab. These languages, though widely used in data science applications, begin to exhibit poor performance as the size of a data set increases; in addition, Matlab is proprietary and costly, particularly for industry applications.
With a faster numerical computing language, one day the physicists using Matlab to model large-scale atmospheric phenomena or the economists using R to conduct global econometric analysis may be able to do their jobs more quickly than ever before, and at larger scales than ever previously possible.
Julia core developer John Myles White spoke earlier this month at the Statistical Programming DC Meetup group to explain how the language outperforms these established players and what the future holds for the young project.
The central conundrum Julia’s developers seek to address is that easy-to-use languages are not necessarily computationally efficient, and vice versa. For a variety of technical reasons, the most popular languages for data science applications carry inefficiencies that make them fundamentally unsuitable for very large scale data analysis. White quipped, “Computers today are fast. But R sometimes makes it easy to forget that.” Even as processor speeds increase, these limitations will remain. There have been major efforts to retrofit these languages with “big data” capabilities, such as R’s bigvis and pbdR packages and Matlab’s parallel computing toolbox, but the fact that these additions are not native to the languages means that a language built with scalability in mind ought to be able to do better.
But faster languages, such as the venerable C and FORTRAN, are not particularly well-suited to these applications either, not least because they are difficult to learn and hard to read. While some prominent developers argue that C remains useful in general-purpose development, White made the case for an easier-to-understand high-performance language, saying, “C is one of the greatest languages ever written, but it’s a pain to have to write a lot of C code.”
Enter Julia. The language’s developers set out to create a language that offers “the speed of C…[but is] as usable for general programming as Python, as easy for statistics as R, [and] as powerful for linear algebra as Matlab.”
How Does it work?
One of the reasons Julia can offer such efficiency is that it is built from the ground up; most of Julia is actually written in Julia, meaning that most of its more elaborate functions are crafted out of simpler ones in the same language, without reliance on external code. This stands in stark contrast to languages like R, about which White commented, “All good R code is actually written in C.”
But Julia also keeps things simple for users. It reduces the number of specialized functions a user has to remember by providing general functions that have been preconfigured to run differently for different inputs. Even simple operations such as multiplication are optimized to execute differently depending on whether the input is, for example, an integer, a real number or a complex number. Julia’s developers have designed the language’s functions to automatically carry out computations that are optimized for different data types, even if users are vague in their directions.
What does the future hold?
The language seems promising for a wide range of data science applications, and it will likely become even more useful as its user community grows and helps build in additional functionality. Some skeptics remain, such as Python developer and numerical computing authority Wes McKinney, who argued in 2012 that Julia will have a hard time matching value added by the vibrant communities supporting R and Python. On the other hand, those languages are both over two decades old, and Julia may simply need some time to catch up.
A much more extensive summary of Julia’s capabilities can be found in the core developers’ summary paper.