# SEARCH

# ISTC-CC NEWSLETTER

# RESEARCH HIGHLIGHTS

Ling Liu's SC13 paper "Large Graph Processing Without the Overhead" featured by HPCwire.

ISTC-CC provides a listing of useful benchmarks for cloud computing.

Another list highlighting Open Source Software Releases.

Second GraphLab workshop should be even bigger than the first! GraphLab is a new programming framework for graph-style data analytics.

# ISTC-CC Abstract

**Exact and Approximate Computation of a Histogram of Pairwise Distances between Astronomical Objects**

*First Workshop on High Performance Computing in Astronomy (AstroHPC'12), June 2012. *

**Bin Fu, Eugene Fink, Garth Gibson and Jaime Carbonell**

Carnegie Mellon University

We compare several alternative approaches to computing correlation functions, which is a cosmological application for analyzing the distribution of matter in the universe. This computation involves counting the pairs of galaxies within a given distance from each other and building a histogram that shows the dependency of the number of pairs on the distance.

The straightforward algorithm for counting the exact number of pairs has the O(n^{2}) time complexity, which is unacceptably slow for most astronomical and cosmological datasets, which include billions of objects. We analyze the performance of several alternative algorithms, including the exact computation with an O(n^{5/3}) average running time, an approximate computation with linear running time, and another approximate algorithm with sub-linear running time, based on sampling the given dataset and computing the correlation functions for the samples. We compare the accuracy of the described algorithms and analyze the tradeoff between their accuracy and running time. We also propose a novel hybrid approximation algorithm, which outperforms each other technique.

**KEYWORDS**: Approximation, astrophysics, large-scale data, eScience, kd-tree, sampling, Hadoop.

**FULL PAPER: pdf**