Saturday, 27 August 2011

HPPC (High Performance Primitive Collections)

HPPC API overview

While HPPC is not strictly modeled after Java Collections Framework (JCF), we did try to make the APIs look similar enough for comfortable use. One particular JCF feature not found in HPPC is collection views, such as sub-lists. These can be easily implemented as decorators over HPPC collections if needed. On the other hand, HPPC collections offer a number of additional performance-oriented methods not found in JCF, please see code samples for an overview.
HPPC collections come in two flavours: for primitives and with generics. Technically, the primitive collections are automatically generated based on the generic ones. While HPPC is mainly about primitives, we also distribute the generic-based collections, so that the benefits of direct data store access are also available for collections of non-primitives.
Relationships between HPPC and Java Collections.
Java Collections HPPC (primitives) HPPC (generics)
bit sets BitSet BitSet n/a
array-backed lists ArrayList, Vector [type]ArrayList ObjectArrayList<T>
stacks Stack [type]Stack ObjectStack<T>
deques ArrayDeque [type]ArrayDeque ObjectArrayDeque<T>
hash maps HashMap [keyType][valueType]OpenHashMap ObjectObjectOpenHashMap<K, V>
[keyType]ObjectOpenHashMap<V>
Object[valueType]OpenHashMap<K>
The method-level API of the corresponding types is also similar to Java Collections, but distinct differences exist. Please consult the JavaDoc of a specific class for details.

Example code

Below are some code snippets for the most common programming tasks in which HPPC is likely to be helpful. Certain goals can be achieved in a number of different ways resulting in different performance and code maintainability.
While we try to indicate which variant is likely to be the fastest, your results may vary based on many different factors, including: runtime platform, JVM, structure of your code. Therefore, micro-benchmark your actual application on the target JVM, operating system and hardware to get accurate performance predictions        .

Iterating

There are three common patterns for iterating over all HPPC collections: using an iterator, with closure-like procedures and by accessing the internal storage buffers. The latter will most likely be the fastest, at the price of code elegance and dependency on HPPC internals.

Iterating over lists

Below are the three common and one list-specific iteration patterns. We present them in the order of performance, from the slowest (but most elegant one) to the fastest one.
  • Using an iterator slowest
    // Prepare some list to iterate over
    final IntArrayList list = prepare(10);
    // Lists implement the Iterable interface that returns [type]Cursor elements.
    // The cursor contains the index and value of the current element.
    for (IntCursor c : list)
    {
        System.out.println(c.index + ": " + c.value);
    }
  • Using direct get
    final IntArrayList list = prepare(10);
    // Another way to iterate over array list is to access each element
    // of the list using the get() method.
    final int size = list.size();
    for (int i = 0; i < size; i++)
    {
        System.out.println(i + ": " + list.get(i));
    }
  • Using closure-like procedures
    final IntArrayList list = prepare(10);
    // Lists also support iteration through [type]Procedure interfaces.
    // The apply() method will be called once for each element in the list.
    list.forEach(new IntProcedure()
    {
        public void apply(int value)
        {
            System.out.println(value);
        }
    });
  • Using direct buffer access fastest
    final IntArrayList list = prepare(10);
    // For the fastest iteration, you can access the lists' data buffer directly.
    final int [] buffer = list.buffer;
    // Make sure you take the list.size() and not the length of the data buffer.
    final int size = list.size();
    // Iterate of the the array as usual.
    for (int i = 0; i < size; i++)
    {
        System.out.println(i + ": " + buffer[i]);
    }

Iterating over deques

Iterating over deques (double ended queues) is similar to iterating over lists with an addition of last-to-first iteration.
  • Using an iterator slowest
    // Prepare some deque to iterate over
    final IntArrayDeque deque = prepare(10);
    // Deques implement the Iterable interface that returns [type]Cursor elements.
    // The cursor contains the index and value of the current element.
    // Please note that the for-each loop as below will always
    // iterate from the deque's head to the deque's tail.
    for (IntCursor c : deque)
    {
        System.out.println(c.index + ": " + c.value);
    }
  • Using closure-like procedures
    final IntArrayDeque deque = prepare(10);
    // Deques also support iteration through [type]Procedure interfaces.
    // The apply() method will be called once for each element in the deque.
    // Iteration from head to tail
    deque.forEach(new IntProcedure()
    {
        public void apply(int value)
        {
            System.out.println(value);
        }
    });
  • Using direct buffer access fastest
    final IntArrayDeque deque = prepare(10);
    // For the fastest iteration, you can access the deque's data buffer directly.
    // Note that this time it's a little more complicated than with array lists.
    final int [] buffer = deque.buffer;
    final int bufferSize = buffer.length;
    // Iteration from head to tail
    final int forwardStart = deque.head;
    final int forwardStop = forwardStart + deque.size();
    for (int i = forwardStart; i < forwardStop; i++)
    {
        System.out.println(buffer[i % bufferSize]);
    }
    // Iteration from tail to head
    final int backwardStart = deque.tail + bufferSize - 1;
    final int backwardStop = backwardStart - deque.size();
    for (int i = backwardStart; i > backwardStop; i--)
    {
        System.out.println(buffer[i % bufferSize]);
    }

Iterating over sets

Iterating over sets is in principle the same as for lists, the only major difference is iterating using direct buffer access shown below.
  • Using direct buffer access fastest
    final IntOpenHashSet set = prepare(10);
    // For the fastest iteration, you can access the sets's data buffers directly.
    final int [] keys = set.keys;
    final boolean [] allocated = set.allocated;
    // Note that the loop is bounded by states.length, not keys.length. This
    // can make the code faster due to range check elimination
    for (int i = 0; i < allocated.length; i++)
    {
        if (allocated[i]) {
            System.out.println(keys[i]);
        }
    }

Iterating over maps

Iterating over maps is similar to lists, the only difference is that the cursors are now [keyType][valueType]Cursor and procedures are [keyType][valueType]Procedure.
  • Using an iterator slowest
    // Prepare some set to iterate over
    final IntCharOpenHashMap map = prepare(10);
    // Maps implement the Iterable interface that returns [keyType][valueType]Cursors
    // The cursor contains the key, value and internal index of the current element.
    for (IntCharCursor c : map)
    {
        System.out.println(c.index + ": " + c.key + " -> " + c.value);
    }
  • Using closure-like procedures
    final IntCharOpenHashMap map = prepare(10);
    // Maps also support iteration through [keyType][valueType]Procedure interfaces.
    // The apply() method will be called once for each key/value pair in the map.
    // Iteration from head to tail
    map.forEach(new IntCharProcedure()
    {
        public void apply(int key, char value)
        {
            System.out.println(key + " -> " + value);
        }
    });
  • Using direct buffer access fastest
    final IntCharOpenHashMap map = prepare(10);
    // For the fastest iteration, you can access the sets's data buffers directly.
    final int [] keys = map.keys;
    final char [] values = map.values;
    final boolean [] states = map.allocated;
    // Note that the loop is bounded by states.length, not keys.length. This
    // can make the code faster due to range check elimination
    for (int i = 0; i < states.length; i++)
    {
        if (states[i]) {
            System.out.println(keys[i] + " -> " + values[i]);
        }
    }

Collecting values from closures

Pseudo-closures (anonymous class instances) are a convenient way to iterate over all values of a given container (list, set or map). Java imposes several restrictions on passing values to and from such anonymous types (in general, only constant values can be passed to the anonymous class's object via synthetic methods or constructors). Starting from version 0.3.2 of HPPC, forEach method will return the argument type it receives, which makes it possible to access any methods or fields of that type (even if it is anonymous). An example of applying this feature is shown below to count map items with value larger than 10.
  • Accessing anonymous class's fields in forEach calls.
    // Create a map.
    final IntIntOpenHashMap map = new IntIntOpenHashMap();
    map.put(1, 2);
    map.put(2, 5);
    map.put(3, 10);
    int count = map.forEach(new IntIntProcedure()
    {
       int count;
       public void apply(int key, int value)
       {
           if (value >= 5) count++;
       }
    }).count;
    System.out.println("There are " + count + " values >= 5");

Counting character bigram occurrences

A bigram is a pair of characters. Counting bigram occurrences in strings is a very common text processing task. It can be easily accomplished with an int to int map:
// Some character data
final char [] CHARS = DATA;
/*
 * We'll use a int -> int map for counting. A bigram can be encoded
 * as an int by shifting one of the bigram's characters by 16 bits
 * and then ORing the other character to form a 32-bit int.
 */
final IntIntOpenHashMap counts = new IntIntOpenHashMap(
    IntIntOpenHashMap.DEFAULT_CAPACITY,
    IntIntOpenHashMap.DEFAULT_LOAD_FACTOR);
for (int i = 0; i < CHARS.length - 1; i++)
{
    counts.putOrAdd((CHARS[i] << 16 | CHARS[i+1]), 1, 1);
}


kentucky wildcats football tickets
iowa state cyclones football tickets
kansas jayhawks football tickets

No comments:

Post a Comment