-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathcuda.html
More file actions
58 lines (52 loc) · 2.52 KB
/
cuda.html
File metadata and controls
58 lines (52 loc) · 2.52 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<!--#include file="ssi/head.ssi"-->
</head>
<body>
<!--#set var="page" value="academic"-->
<!--#set var="subpage" value="research"-->
<!--#include file="ssi/body.ssi"-->
<h2>Non-Uniform Parallelism on a GPGPU</h2>
<p class="authorsLine">Benjamin Lerner, Trevor Jim and
Yitzhak Mandelbaum</p>
<h3>Downloads</h3>
<ul>
<li><a href="files/njpls.pdf">Experiences
coding Non-Uniform Parallelism using the CUDA GPGPU
Architecture</a>, (presented at
<a href="http://domino.research.ibm.com/comm/research_projects.nsf/pages/plday.plday2008.html"
title="NJPLS 2008">NJPLS</a>, August 2008)</li>
</ul>
<h3>Overview</h3>
<p>It is well known that certain kinds of tasks are better
suited for parallel architectures than others. Heavily
imperative, sequential code does not parallelize well at
all, while data-parellel code runs exceptionally well.
The gray area in between, of non-uniform parallelism, is
more difficult to map onto parallel architectures.</p>
<p>Recent years have seen an increasing interest in the
use of <i>General Purpose GPU</i> computation, where
graphics cards are being used as coprocessors to speed up
many kinds of software problems. In this work, we
examined how well non-uniform parallel problems fit onto
a GPU architecture, and specifically focused
on <i>parsing</i>, a well-studied problem with known
opportunities for parallelism. The goal was to see how
much of a speedup was provided by the GPGPU, and how
difficult it was to achive that improvement.</p>
<p>Over the course of a ten-week intership at AT&T
Reseach, I implemented the
<a href="http://en.wikipedia.org/wiki/Earley_algorithm"
title="Earley parsing algorithm">Earley parsing
algorithm</a> with several different optimizations for the
GPGPU. Ultimately, we failed to achieve any speedup (by
the end of the summer, we had reached amortized parity
with the simple CPU algorithm), but in so doing we learned
several programming idioms and difficulties with the GPGPU
style of coding.</p>
<!--#include file="ssi/footer.ssi"-->
</div>
</body>
</html>