Interspire Shopping Cart Blog - http://www.thestoresuiteblog.com
The Theory Behind Real-Time Recommendations
http://www.thestoresuiteblog.com/articles/13/1/The-Theory-Behind-Real-Time-Recommendations/Page1.html
By Chris Smith
Published on 08/5/2007
 
Mitch has been working on adding a 'Customers Also Bought' feature today and needed some help with the database query to get the right info back out. This feature is much the same as the one on amazon - where you view a particular product and it shows a list of other products that you may be interested in...

The Theory Behind Real-Time Recommendations
Working out what customers also bought was a little tricky. We are working off products that others have ordered - this data is already kept so we don't need any new database tables. First of all, the basics of the database schema we have to work with.

The 'products' table only has two fields of real interest in this exercise - which are 'productid' (an auto-increment field) and 'prodname'. The 'order_products' table has two fields of interest as well - 'orderorderid' which is a foreign key to the 'orders' table, and 'ordprodid' which is a foreign key to the 'products' table.

To work out what others have bought based on a productid, I basically broke the process down into a few steps:
  1. The first step was to work out all of the other products that have been bought except the one you are viewing (where 'X' is the productid):

    select ordprodid from order_products where ordprodid <> 'X';

    Pretty easy.
     
  2. The next step is to rank the other products in terms of popularity. That is - work out the most popular combinations.

    select ordprodid from order_products where ordprodid in
    (
        select ordprodid from order_products where ordprodid <> 'X'
    )
    group by ordprodid order by count(ordprodid) desc;

    The subselect (the bit inside the brackets) selects all products that are not the product we are viewing. The bit outside the subselect takes that info, groups it together and orders it for us.
     
  3. Next, we need to work out the product names and their id's to display a list. The query we end up with is:

    select productid, prodname from products where productid in
    (
        select ordprodid from order_products where ordprodid in
        (
            select ordprodid from order_products where ordprodid <> 'X'
        )
        group by ordprodid order by count(ordprodid) desc
    )
    limit 5;

    Done!

A pretty complicated query. One problem though is the limit. MySQL 4.1 and MySQL 5.0 don't support limit clauses inside a subselect (you get an error if you try). That is - we only want to fetch the 5 most popular combinations. The first subselect with the grouping and the ordering is going to rank everything - then the outside query is going to limit the results. Not what we want.

So in StoreSuite we ended up with:

select ordprodid from order_products where ordprodid in
(
    select ordprodid from order_products where ordprodid <> 'X'
)
group by ordprodid order by count(ordprodid) desc limit 5;

Then we fetch the results from that query and put them in to the next one:

select productid, prodname from products where productid in (1,2,3,4,5);

We end up with two queries but it's a lot more efficient this way than doing it all in one go. Next step is to add a lot of products and orders and see how this performs - but that's a blog for another time :) Mitch will be posting a few screenshots of how everything ended up looking. He's just polishing things up so they look pretty.