In R, often times we get stuck by the limited processing power of our machines. This can be easily solved by using parallel processing. In R, there are various libraries which enable parallel processing, but here I will use only parallel library.

**Example: **Here, I will explain a simple scenario of parallel package usage. Consider I have a data frame with thousand rows and two columns. Now, I need to compute the sum of each of 100 subsequent rows, i.e, I want to compute sums of rows c(1:100), c(101:200) ,…, c(901:1000) in parallel. This means I won’t compute sums in serial manner.

library(parallel) # Create a dummy data frame with 1000 rows and two columns set.seed(100688) df = data.frame(x=rnorm(1000),y=rnorm(1000)) no_cores = detectCores()-1# No. of cores in your system cl = makeCluster(no_cores) # Make cluster # Generate list of indexes for row summation of data frame indexs = seq(from=1,to=1000, by =100) clusterExport(cl,'df') # pass parameters on-fly to the threads start_time = Sys.time() # start time of parallel computation parallel_result = parLapply(cl,indexs,sumfunc) total_time = Sys.time()-start_time # total time taken for computation cat ('Total_parallel_time_taken',total_time) stopCluster(cl) sumfunc = function(ind) { # Computs row sum of 100 rows starting with the index, ind rowsum = colSums(df[ind:ind+99,]) return (rowsum) } # More than one parameter can be sent in the form of a list as clusterExport(cl,list('xx','yy','zz') # parameters sent on-fly

**Other Related Blogs:
**

- How-to go parallel in R – basics + tips
- A brief foray into parallel processing with R
- Parallel computing in R on Windows and Linux using doSNOW and foreach

Advertisements