Ling Liu's SC13 paper "Large Graph Processing Without the Overhead" featured by HPCwire.
Another list highlighting Open Source Software Releases.
Second GraphLab workshop should be even bigger than the first! GraphLab is a new programming framework for graph-style data analytics.
Open Source Software Releases
open source development efforts
|IterStore is a parameter server library for iterative convergent machine learning applications. Our example applications include matrix factorization, LDA, multi-class logistic regression, and PageRank.|
|GeepS is a parameter server library that scales single-machine GPU machine learning applications (such as Caffe) to a cluster of machines.|
|Elijah-related open source software comes in two parts. The first part is a set of cloudlet-specific extensions to OpenStack. By applying these extensions, OpenStack becomes "OpenStack++". These extensions are released under the same open source license as OpenStack itself (Apache v2). The second part is a set of new mobile computing applications that build upon OpenStack++ and leverage its support for cloudlets. All Elijah-related source code is in GitHub. The links below help you access the various components.|
|OpenFace is a Python and Torch implementation of face recognition with deep neural networks and is based on the CVPR 2015 paper FaceNet: A Unified Embedding for Face Recognition and Clustering by Florian Schroff, Dmitry Kalenichenko, and James Philbin at Google. Torch allows the network to be executed on a CPU or with CUDA.
Crafted by Brandon Amos, Bartosz Ludwiczuk, and Mahadev Satyanarayanan.
GraphLab is a graph-based, high performance, distributed computation framework written in C++. While GraphLab was originally developed for Machine Learning tasks, it has found great success at a broad range of other data-mining tasks; out-performing other abstractions by orders of magnitude. GraphLab features:
|Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark, and other applications on a dynamically shared pool of nodes.|
|SHAPE (Semantic HAsh Partitioning-Enabled distributed RDF data management system) is a distributed RDF data management system based on semantic hash partitioning. SHAPE, fully written in Java, provides distributed implementations of semantic hash partitioning and query processing, on top of Hadoop.|
Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.
Spark was initially developed for two applications where placing data in memory helps: iterative algorithms, which are common in machine learning, and interactive data mining. In both cases, Spark can run up to 100x faster than Hadoop MapReduce. However, it can be used for general data processing too.
|The Tashi project aims to build a software infrastructure for cloud computing on massive internet-scale datasets (what we call Big Data). The idea is to build a cluster management system that enables the Big Data that are stored in a cluster/data center to be accessed, shared, manipulated, and computed on by remote users in a convenient, efficient, and safe manner.|
ADDITIONAL software releases
GeePS: GeePS is a parameter server library that scales single-machine GPU machine learning applications (such as Caffe) to a cluster of machines.
Spectroscope: Spectroscope is an implementation of request-flow comparison, a technique for diagnosing performance changes in distributed systems. Please see the NSDI 2011 paper "Diagnosing performance changes by comparing request flows" for more information.
Optimized rank & select code for making succinct data structures (research demo release more than usable):
Iulian Moraru's tools for building persistent storage on NVRAM:
Egalitarian Paxos and comparison framework:
Wyatt Lloyd's Eiger:
A research fork of cassandra that provides causal+ consistency, read-only transaction, and write-only transaction across all the servers in each datacenter.
The Parrot stable and deterministic multi-threading system.
dbug : Systematic Testing of Distributed and Multi-Threaded Systems
TableFS: Enhancing Metadata Efficiency in the Local File System
A number of
benchmarks have been shared as open source, as well, and
are listed on our Benchmarks Page.