AQUAgpusph 4.1.2
Loading...
Searching...
No Matches
Functions
RadixSort.cl.in File Reference

Radix sort OpenCL methods. (See Aqua::CalcServer::RadixSort for details) More...

Functions

__kernel void init (__global unsigned int *perms, unsigned int n)
 
__kernel void histogram (const __global unsigned int *keys, __global unsigned int *histograms, const unsigned int pass, __local unsigned int *loc_histo, const unsigned int n)
 
__kernel void scan (__global unsigned int *histograms, __local unsigned int *temp, __global unsigned int *globsum)
 
__kernel void paste (__global unsigned int *histograms, __global unsigned int *globsum)
 
__kernel void sort (const __global unsigned int *in_keys, __global unsigned int *out_keys, __global unsigned int *histograms, const unsigned int pass, __global unsigned int *in_permut, __global unsigned int *out_permut, __local unsigned int *loc_histo, const unsigned int n)
 
__kernel void inversePermutation (__global unsigned int *perms, __global unsigned int *inv_perms, unsigned int n)
 

Detailed Description

Radix sort OpenCL methods. (See Aqua::CalcServer::RadixSort for details)

Note
The header CalcServer/RadixSort.hcl.in is automatically appended.

Function Documentation

◆ histogram()

__kernel void histogram ( const __global unsigned int *  keys,
__global unsigned int *  histograms,
const unsigned int  pass,
__local unsigned int *  loc_histo,
const unsigned int  n 
)

Perform the local histograms. The histograms are the number of occurrences of each radix. Since we are working in parallel, the number of ocurrences of each radix will be splited in blocks of dimension _ITEMS * _GROUPS: | it(0)gr(0)ra(0) | it(1)gr(0)ra(0) | ... | it(items)gr(0)ra(0) | | it(0)gr(1)ra(0) | it(1)gr(1)ra(0) | ... | it(items)gr(1)ra(0) | | ... | ... | ... | ... | | it(0)gr(groups)ra(0) | it(1)gr(groups)ra(0) | ... | it(items)gr(groups)ra(0) | | it(0)gr(0)ra(1) | it(1)gr(0)ra(1) | ... | it(items)gr(0)ra(1) | | ... | ... | ... | ... | | it(0)gr(groups)ra(1) | it(1)gr(groups)ra(1) | ... | it(items)gr(groups)ra(1) | | ... | ... | ... | ... | | it(0)gr(groups)ra(radix) | it(1)gr(groups)ra(radix) | ... | it(items)gr(groups)ra(radix) | where it is the thread, gr is the group, and ra is the radix.

Parameters
keysInput unsorted keys allocated into the device.
histogramsComputed histograms.
passPass of the radix decomposition.
loc_histoHistograms local memory to speed up the process.
nNumber of keys.

◆ init()

__kernel void init ( __global unsigned int *  perms,
unsigned int  n 
)

Initializes the permutations assuming that the keys are sorted.

Parameters
permsInitial permutations (null)
nNumber of keys.

◆ inversePermutation()

__kernel void inversePermutation ( __global unsigned int *  perms,
__global unsigned int *  inv_perms,
unsigned int  n 
)

Compute the inversed permutations, which allows to know the original position of a key from the sorted one.

Parameters
permsDirect permutations (from the unsorted position to the sorted one)
inv_permsInverse permutations (from the sorted position to the unsorted one)
nNumber of keys.

◆ paste()

__kernel void paste ( __global unsigned int *  histograms,
__global unsigned int *  globsum 
)

Paste the _HISTOSPLIT accumulated global sums into the splited accumulated histogram, to get a global accumulated histogram.

Parameters
histogramsAccumulated histogram. At the start of the method the first value of each split of the histogram is zero. When the method ends the first value of each split will be the accumulated number of keys in all the previous splits.
globsumAccumulated global sums (_HISTOSPLIT components).

◆ scan()

__kernel void scan ( __global unsigned int *  histograms,
__local unsigned int *  temp,
__global unsigned int *  globsum 
)

perform a parallel prefix sum (a scan) on the local histograms (see Blelloch 1990), retrieving the accumulated histogram. Each workitem cares about two memories. See also http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html

Parameters
histogramsHistograms, or number of ocurrences of each radix, divided by blocks, as shown in histogram() method.
tempLocal memory used to speed up the process.
globsumTotal number of keys at each group (output).
Note
The histogram histograms will be transformed in the accumulated histogram.
Remarks
This method is called two times:
  1. The first time _HISTOSPLIT global sums are computed, as well as _RADIX*_GROUPS*_ITEMS/_HISTOSPLIT accumulated histograms.
  2. The second time the previously computed global sums are transformed in a accumulated histogram.

◆ sort()

__kernel void sort ( const __global unsigned int *  in_keys,
__global unsigned int *  out_keys,
__global unsigned int *  histograms,
const unsigned int  pass,
__global unsigned int *  in_permut,
__global unsigned int *  out_permut,
__local unsigned int *  loc_histo,
const unsigned int  n 
)

Perform permutations using the accumulated histogram.

Parameters
in_keysInput unsorted keys.
out_keysOutput sorted keys.
histogramsAccumulated histograms.
passPass of the radix decomposition.
in_permutInput permutations.
out_permutOutput permutations.
loc_histoHistogram local memory to speed up the process.
nNumber of keys.
Warning
Radix sort needs several pass, so the output sorted keys of this pass must be the input unsorted keys of the next pass. Don't forgive to swap the OpenCL identifiers (for keys and permutations).