## Background

In our last paper we compared two alternate machine-learning techniques from the Apache Mahout stable, namely: Apache Sparks’, spark-itemsimilarity, and its counterpart Apache Hadoop’s MapReduce. We saw how Apache Spark was better both qualitatively as well as quantitatively even for moderately sized sites. In this paper, we look at how we can further optimize the efficiency of these runs without compromising on quality. We determine how the two algorithms we studied last time perform when run on all data available and when run only with success data. In the e-commerce domain, success data is defined as below: Success data is defined as data from users who have bought at-least one item.

## Data Gathering and setup

Relevant click stream data for was collected. This constitutes user behavior, namely view and buy. Based on this, predictive analytics for item-similarity was run using the Apache Spark and Apace Hadoop mapreduce Log Likelihood in both cases (i.e. All data and only success data).

The data set we used contains the following information

- Total data points (ALL DATA ) = 110 Million records of click stream data (views, buys, and add

carts ) - Total data points ( SUCCESS DATA ) = 22 Million records of click stream data from users who are categorized as buyers ( bought at least 1 item )
- 70 / 30 split between training data and test data. (i.e. we split the data set in #1, in the 70 / 30 ratio. We used 70 % of the data to create recommendations on and used the balance 30 % to test )
- Total buyers ( unique people who bought) = 300 K

We believe the above sample is representative of a mid sized E-commerce company. We then ran this sample considering all data, and then again with only success data ( defined above ). We employed two algorithms (i.e. LLR and spark ) to compare the effect of running only with success data as against all data onthese two algorithms. The analysis of our run is described below

## Quantitative Analysis

We gathered the following data:

- Number of incorrect recommendations (i.e. Number of products we recommended that users did not buy) – False positives
- Number of correct product recommendations (i.e. Number of products that users bought that we recommended) – True positives
- Total recommendations
- Users who bought products that we recommended.

## Observations

### Total recommendations

We clearly see that LLR algorithm on ALL data yields far more recommendations than any other variant. The effect of using only success data on LLR drastically reduces the number of recommendations that the algorithm yields. However, in the case of Spark, the effect of using only success data does not drastically reduce the number of recommendations.

## Observations

### Number of correct product recommendations (True Positives)

We clearly see that LLR algorithm on success data yields more correct recommendations followed closely by Spark on success data, followed by LLR and Spark on ALL data. The effect of using only success data with LLR drastically improves the quality of recommendations that the algorithm yields. Even though in the previous graph the LLR algorithm on ALL data yielded most recommendations, the quality of those recommendations were not good as is shown in this graph. The LLR algorithm on SUCCESS data yields far better results followed closely by the SPARK algorithm on SUCCESS data.

## Observations

### Number of incorrect product recommendations (False Positives)

As expected as a consequence of having a low true positive, the false positive of running LLR with ALL data is significantly higher than other algorithms. Thus we can see that though the algorithm yields most recommendations, most of them are useless. We also notice that the false positive rate of LLR on SUCCESS data is more than that of SPARK, and SPARK on Success data has the least false positive rate, which is what is desired.

## Observations

### Accuracy / Precision

As seen from the above graph, when taken holistically, and the ratio of true positives (useful recommendations) to false positives (useless recommendations) is taken, the SPARK algorithm on SUCCESS data comes out a clear winner.

### Inference

Hence we conclude that using only success data which is only a fifth of the total data yields better quality results in both LLR and SPARK. The quality improvement (percentage improvement) in LLR is significant when run only on SUCCESS data, as compared to SPARK. Over all we see SPARK behaves consistently irrespective of whether it is run on ALL data or SUCCESS data, with the quality SPARK on SUCCESS data being marginally better. Hence since the data set is significantly smaller, and the time taken to run these algorithms is directly proportional to the data set, we see that running SPARK on SUCCESS data yields the best results.