The Theory Behind Real-Time Recommendations
- By Chris Smith
- Published 08/5/2007
- Database Queries
- Unrated
Working out what customers also bought was a little tricky. We are working off products that others have ordered - this data is already kept so we don't need any new database tables. First of all, the basics of the database schema we have to work with.
The 'products' table only has two fields of real interest in this exercise - which are 'productid' (an auto-increment field) and 'prodname'. The 'order_products' table has two fields of interest as well - 'orderorderid' which is a foreign key to the 'orders' table, and 'ordprodid' which is a foreign key to the 'products' table.
To work out what others have bought based on a productid, I basically broke the process down into a few steps:
A pretty complicated query. One problem though is the limit. MySQL 4.1 and MySQL 5.0 don't support limit clauses inside a subselect (you get an error if you try). That is - we only want to fetch the 5 most popular combinations. The first subselect with the grouping and the ordering is going to rank everything - then the outside query is going to limit the results. Not what we want.
So in StoreSuite we ended up with:
select ordprodid from order_products where ordprodid in
(
select ordprodid from order_products where ordprodid <> 'X'
)
group by ordprodid order by count(ordprodid) desc limit 5;
Then we fetch the results from that query and put them in to the next one:
select productid, prodname from products where productid in (1,2,3,4,5);
We end up with two queries but it's a lot more efficient this way than doing it all in one go. Next step is to add a lot of products and orders and see how this performs - but that's a blog for another time :) Mitch will be posting a few screenshots of how everything ended up looking. He's just polishing things up so they look pretty.
The 'products' table only has two fields of real interest in this exercise - which are 'productid' (an auto-increment field) and 'prodname'. The 'order_products' table has two fields of interest as well - 'orderorderid' which is a foreign key to the 'orders' table, and 'ordprodid' which is a foreign key to the 'products' table.
To work out what others have bought based on a productid, I basically broke the process down into a few steps:
- The
first step was to work out all of the other products that have been
bought except the one you are viewing (where 'X' is the productid):
select ordprodid from order_products where ordprodid <> 'X';
Pretty easy.
- The next step is to rank the other products in terms of popularity. That is - work out the most popular combinations.
select ordprodid from order_products where ordprodid in
(
select ordprodid from order_products where ordprodid <> 'X'
)
group by ordprodid order by count(ordprodid) desc;
The subselect (the bit inside the brackets) selects all products that are not the product we are viewing. The bit outside the subselect takes that info, groups it together and orders it for us.
- Next, we need to work out the product names and their id's to display a list. The query we end up with is:
select productid, prodname from products where productid in
(
select ordprodid from order_products where ordprodid in
(
select ordprodid from order_products where ordprodid <> 'X'
)
group by ordprodid order by count(ordprodid) desc
)
limit 5;
Done!
A pretty complicated query. One problem though is the limit. MySQL 4.1 and MySQL 5.0 don't support limit clauses inside a subselect (you get an error if you try). That is - we only want to fetch the 5 most popular combinations. The first subselect with the grouping and the ordering is going to rank everything - then the outside query is going to limit the results. Not what we want.
So in StoreSuite we ended up with:
select ordprodid from order_products where ordprodid in
(
select ordprodid from order_products where ordprodid <> 'X'
)
group by ordprodid order by count(ordprodid) desc limit 5;
Then we fetch the results from that query and put them in to the next one:
select productid, prodname from products where productid in (1,2,3,4,5);
We end up with two queries but it's a lot more efficient this way than doing it all in one go. Next step is to add a lot of products and orders and see how this performs - but that's a blog for another time :) Mitch will be posting a few screenshots of how everything ended up looking. He's just polishing things up so they look pretty.
Register for Your Beta Invite
Be the first to test drive StoreSuite. Fill out the form below to receive an exclusive beta invite when StoreSuite's ready to test.
