Data Innovators Aurelius partner Marko Rodriguez

Published on November 9th, 2013 | by Travis Korte

0

5 Q’s with Graph Database Expert Marko Rodriguez

The Center for Data Innovation spoke with Dr. Marko A. Rodriguez, a partner at graph computing startup Aurelius. Uniquely suited to storing and processing data from social networks, human brain research and other disparate fields, graph databases and graph analytics have enjoyed a swift increase in popularity over the last several years. Dr. Rodriguez spoke about his technologies’ capabilities and why he thinks they will one day lead to big breakthroughs in neuroscience and artificial intelligence.

Travis Korte: Can you first give a brief overview (for an educated non-engineer) of the Aurelius Graph Cluster.

Marko Rodriguez: The Aurelius Graph Cluster is a cluster of interoperable graph computing technologies that works over a multi-machine compute cluster. Titan is a distributed graph database that has been demonstrated to handle graphs on the order of 100 billion edges and transactions at the rate of 10,000 a second. Faunus is a graph analytics system that leverages Hadoop to do global graph traversals as well as bulk loading/mutating of the graph data contained within Titan. These two technologies currently form the online transaction processing (OLTP) and online analytics processing (OLAP) aspects of the Aurelius Graph Cluster.

TK: What are some use cases for which graph analysis is particularly well suited?

MR: Anytime a data set can be represented as discrete “things” (vertices) that can be associated with one another by various types of relationships (edges), then a graph database becomes a useful medium for storing that data. Once the data has been stored, the next requirement is querying that data. Typically, queries are represented as traversals whereby a traverser moves from vertex to vertex over the edges that connect them. Graph databases excel in expressing and executing traversals that are recursive (e.g. walking trees) and deep (e.g. long paths across the graph are explored). At the more laymen level, graph databases are well positioned to handle network- and hierarchical-data.

TK: I’ve heard the narrative in the past that graph databases are hard to scale. How do you manage to keep Titan moving quickly with large data volumes?

MR: The narrative of “graph databases are hard to scale” is effused by vendors that have completely designed their graph database for single-machine, in-memory usage. When the architecture is pigeonholed for single machine usage, its hard to move to multi-machine. If you design the architecture from the start to be distributed, then a graph database can be distributed and done effectively given an intelligent design. Titan leverages [NoSQL database] Cassandra (or HBase) to store its serialized graph on the disks of a multi-machine compute cluster. The BigTable data model of Cassandra/HBase is actually an excellent medium for representing graphs as it forms an adjacency list, where each row is a vertex and the incident edges of the vertex are the columns. In a BigTable system, a row can have an arbitrary number of columns (as a vertex can have an arbitrary number of incident edges). With this disk layout, a vertex’s incident edges are colocated with the vertex (reducing disk seek times). Moreover, with column predicates, edges can be sorted and thus, faster disk-access times can be realized (reducing latency due to less data being fetched). Titan leverages off-heap caching to ensure consistent Java Virtual Machine (JVM) garbage collection behavior and in turn, this allows for a high transactional throughput. In the end, Titan is designed from the ground up to support massive-scale graphs being heavily traversed by numerous concurrent threads of execution.

 TK: The basic idea of graph databases has been around for a while. Why don’t you think they’ve caught on to a greater degree. What’s different now?

MR: The concept of graphs/networks has been around for centuries in academia. Only now are people starting to realize that many data sets can be naturally stored as a graph and problems can be naturally solved using graph traversals. With the rise of social media sites such as Facebook, Twitter, LinkedIn, etc., the popular zeitgeist realizes that “the world” is best represented as a graph. This is contrary to the common thought of the early days of databases when the world was seen in terms of tables (spreadsheets, ledgers, etc.). While the graph perspective continues to grow, a graph database will be a tool of choice — but like all things tested against time, world views are fleeting.

TK: What do you think the future holds for graph database technology?

MR: Into the future, I believe that as neuroscience sufficiently advances to be able to explain, computationally, the means by which data is stored and processed in the human brain, neural-inspired data structures and algorithms will be applied to problems using a graph database. Massive graphs stored in compute clusters executing neural algorithms will provide us novel, artificially-intelligent technologies — automated categorization/classification, associative memory, and input/output “behavioral” pathways. The entailment will be the realization of correlations across space and time that currently take centuries of human-based scientific investigation to grasp.

Tags: , , , , , , ,


About the Author

Travis Korte is a research analyst at the Center for Data Innovation specializing in data science applications and open data. He has a background in journalism, computer science and statistics. Prior to joining the Center for Data Innovation, he launched the Science vertical of The Huffington Post and served as its Associate Editor, covering a wide range of science and technology topics. He has worked on data science projects with HuffPost and other organizations. Before this, he graduated with highest honors from the University of California, Berkeley, having studied critical theory and completed coursework in computer science and economics. His research interests are in computational social science and using data to engage with complex social systems. You can follow him on Twitter @traviskorte.



Back to Top ↑

Show Buttons
Hide Buttons