Marketing Analytics: Data-Driven Techniques with Microsoft Excel (2014)

Part VIII. Retailing

Chapter 30. RFM Analysis and Optimizing Direct Mail Campaigns

When a catalog company such as J.Crew mails catalogs it wants its mailings to generate profits that outweigh its mailing costs. This chapter discusses how mail order companies can use information about a potential customer's recency (date of last purchase), frequency (average number of transactions per year), and monetary value (average amount purchased per year) to optimize the profitability of a mailing campaign. You will walk through all the steps involved in an RFM analysis and see how the Excel functions discussed in Chapter 3, “Using Excel Functions to Summarize Marketing Data,” make it easy to perform the analysis.

RFM Analysis

Many direct mail merchants use RFM analysis to predict the response rate and profitability generated by mailing a catalog to a customer. The basic idea is that customers who have bought more recently (R = recency), more often (F = frequency), and spent more dollars (M = monetary value) on the products previously are more likely to order in the future. In an RFM analysis each customer is given a 1–5 rating (5 = likely to purchase, 1 = less likely to purchase) on R, F, and M. A score of 5, for example, is given on recency if a customer ranks in the top 20 percent on recency, and a score of 1 is given if the customer ranks in the bottom 20 percent on recency. Then the combinations of R, F, and M that appear to be profitable based on the customer's recent history receive the next mailing.

Unfortunately, by coding each customer on a 1–5 scale with R, F, and M, you lose valuable information on each customer. For example, Customer 1 and Customer 50 might both score a 5 on monetary value but Customer 1 might have spent $10,000 and Customer 50 only $500. To make up for this shortcoming, you can utilize exact values of R, F, and M to guide the customers who receive a catalog. This type of analysis (call it an Exact RFM analysis) is more complex than the traditional RFM analysis but can often yield much larger profits than RFM analysis.

To demonstrate RFM analysis, suppose J.Crew is ready to do a mailing on January 1, 2014 to a subset of 5,000 customers. You are given data on 100,000 sales transactions involving these customers. Use the file RFMexample.xlsx to retrieve the data for this example and to illustrate how RFM analysis works. A subset of the data is shown in Figure 30.1. The first transaction, for example, involves Customer 4184 purchasing $30.00 of items on 9/30/2011.

Figure 30-1: RFM data


Computing R, F, and M

To begin, compute for each customer the following quantities:

·        The most recent transaction

·        The number of transactions per year for each customer

·        The average amount purchased each year by each customer

To accurately perform the calculation of these quantities, proceed as follows (see Figure 30.2):

1. Name each column with its row 6 heading by selecting Create from Selection from the Formulas tab.

2. In the Q7 array enter (see Chapter 3 for details) the formula =MAX(IF(Customer=O7,Date," ")). This creates an array that contains the transaction date if the transaction involves Customer 1 and a blank otherwise. Then the MAX function finds the last date for a transaction involving Customer 1 (9/1/2013). Copy this formula to Q8:Q5006 to compute for each customer the date of the most recent transaction.

3. Similarly, in the R7 array enter the formula =MIN(IF(Customer=O7,Date," ")). This creates an array that contains the transaction date if the transaction involves Customer 1 and a blank otherwise. Then the MIN function finds the first date for a transaction involving Customer 1 (1/31/2010). Copy this formula to R8:R5006 to compute for each customer the date of the first transaction.

4. Copy the formula =($Q$4-R7)/365 from S7 to S8:S5006 to compute the number of years the customer has been with you as of the date of the mailing. For example, as of 1/1/2014 Customer 1 has been with you 3.92 years.

5. Copy the formula =SUMIF(Customer,O7,Amount)/S7 from T7 to T8:T5006 to compute the average amount purchased per year by each customer. For example, Customer 1 purchased an average of $239.00 per year.

6. Copy the formula =P7/S7 from U7 to U8:U5006 to compute for each customer the average number of transactions per year. For example, Customer 1 was involved in an average of 3.83 transactions per year.

Figure 30-2: Calculation of R, F, and M


Now determine how each customer ranks on R, F, and M:

1. Copy the formula =RANK(Q7,Most_recent,1) from V7 to V8:V5006 to compute each customer's rank on recency. The last argument of 0 ensures that the customer with the most recent purchase gets a rank of 5,000, and so on. For example, Customer 1 is ranked as the 2,117th lowest on recency.

2. Copy the formula =RANK(U7,Frequency,1) from W7 to W8:W5006 to compute each customer's rank on frequency. For example, Customer 1 is ranked as the 497th lowest on frequency.

3. Copy the formula =RANK(T7,Monetary_Value,1) from X7 to X8:X5006 to compute each customer's rank on monetary value. For example, Customer 1 ranks 442nd lowest on monetary value.

4. Use VLOOKUP formulas to convert a customer's ranks on R, F, and M to the wanted 1–5 rating by copying the formula =VLOOKUP(V7,rfmlookup,2) from Y7 to Y7:AA5006. This converts ranks of 1–1,000 to 1, 1,001–2,000 to 2, 2,001–3,000 to 3, 3,001–4,000 to 4, and 4,001–5,000 to 5. The range AB5:AC10 (see Figure 30.3) is named rfmlookup. For example, Customer 1's R rank of 2,217 converts to 3, her F rank of 497 converts to 1, and her M rank of 442 converts to 1.

Figure 30-3: Calculation of R, F, and M response rates


To Which R, F, and M Values Should You Mail?

The next step in an RFM analysis is to identify the RFM combinations (there are 53 = 125 combinations) that appear (based on customer data) likely to yield a profit. First perform a break even analysis to determine a response rate that would enable J.Crew to break even. Define Profit = Expected profit per order, Response_Rate = chance a customer will respond to a mailing, and Mailcost = cost of mailing the catalog to a customer. J.Crew will break even if the expected profit received per customer equals the mailing cost or both of the following are true:


Therefore J.Crew should mail to segments for which Response_Rate is likely to exceed Mailcost/Profit. Assume J.Crew expects to earn a profit of $20 per order and mailing a catalog costs $0.50. Then J.Crew should mail to RFM combinations for which it expects a response rate exceeding 0.50/20 = 2.5 percent. To be on the safe side, assume J.Crew will mail to RFM combinations that have at least double the break-even response rate (or 5 percent). Column N (see Figure 30.4) tells you whether a given customer responded to the last mailing (1 = response and 0 = no response). For example, Customers 2 and 5 responded to the last mailing.

Figure 30-4: Response by customer


Now you are ready to determine the profitable R, F, and M combinations. To begin, list in the range AC14:AE138 the 125 possible RFM combinations that range from 1 1 1 through 5 5 5. Then proceed as follows:

1. Copy the formula =COUNTIFS(R_,AC14,F,AD14,M,AE14) from AG14 to AG15:AG138 to count the number of customers falling into each RFM category. For example, 260 customers fell into the 1 1 1 category.

2. Copy the formula =COUNTIFS(R_,AC14,F,AD14,M,AE14,actualrresponse,1) from AH14 to AH15:AH138 to calculate the number of customers in each RFM combination that responded to the last mailing. For example, 24 customers in 1 1 1 responded to the last mailing.

3. Copy the formula =IFERROR(AH14/AG14,0) from AF14 to AF15:AF138 to compute the response rate for each RFM combination. For example, the response rate for 1 1 1 is 9.2 percent. The IFERROR function is needed to prevent a #DIV/0! from appearing in R F M cells containing no observations.

4. Use Excel's Conditional Formatting Formula option to highlight all RFM combinations yielding a response rate of at least 5 percent. To do so first put the cursor in the upper-left corner (AC14) of the range you want formatted (AC14:AF138) and select the wanted range.

5. From the Home tab, select Conditional Formatting, choose New Rule, . . . and then select Use a Formula to determine which cells to format. This brings up the dialog box shown in Figure 30.5.

6. In this dialog box first enter a formula that when copied down and across the selected range will evaluate to TRUE for each cell you want formatted. You can use the formula =$AF14>=0.05 for this example.

7. Dollar sign Column AF to ensure that each selected cell will be highlighted if the entry in Column AF of the cell's row is at least .05.

8. To finish filling in the Select Rule Type dialog box, select Format and from Fill tab, select a color (orange in this example). All RFM combinations with response rates of at least 5 percent are now highlighted (refer to Figure 30.3).


If the database is small in size, the marketing analyst might want you to create terciles (only three categories) for R, F, and M to ensure that sufficient observations exist in each R, F, and M cell to accurately estimate an average response rate.

Figure 30-5: Using conditional formatting to highlight profitable R F M combinations


An RFM Success Story

On his website ( Arthur Hughes, author of the bestselling Strategic Database Marketing (McGraw-Hill, 2011), describes a successful application of RFM analysis. A Southern educational products company annually mailed a promotion for a video offer to its entire 2 million customer database and earned little profit on a 1.3 percent response rate. The company then mailed to a sample of 30,000 customers and classified each customer into the 125 RFM cells and recorded the response rate for each cell. This exploratory mailing lost money but led to a mailing to the 34 RFM cells that were profitable in the test mailing. The final mailing to 334,000 customers yielded a 2.76 percent response rate and a profit of $307,000.

Using the Evolutionary Solver to Optimize a Direct Mail Campaign

The 5 percent rate chosen as the mailing cutoff for the J.Crew example was arbitrarily chosen to be double the break-even response rate. In this section you will learn that mailing decisions can actually be optimized by mailing to customers who maximize the expected revenue or profit from the mailings.

Now suppose it is January 1, 2015 and J.Crew wants to mail to 10 percent of the customers. Suppose you have the following information for all customers (see the RFMtop10%.xls file and Figure 30.6).

Figure 30-6: Mailing to 10 percent of customers


The RFMtop10%.xls file contains the following information for each customer:

·        Number of purchases made by the customer in period January–June 2014

·        Amount spent by the customer during January–June 2014

·        Amount spent by the customer during July–December 2014 after a catalog was mailed

You can create a scoring rule of the following form:

(frequency weight) * (purchases made in the first 6 months of 2014) + (amount spent weight) * (amount spent in the first 6 months of 2014)

Assume you will mail a J.Crew catalog to the prospective customers that score in the top 10 percent by this formula. You can use the Evolutionary Solver to choose weights that maximize the amount of revenue earned during the last six months of 2014 from the selected customers. The recommended mailing strategy is to mail to anyone in a prospective database whose score exceeds the top 10 percent of scores from the database. The work proceeds as follows:

1. Enter trial weights in cells C10 and D10.

2. Copy the formula =$C$10*C12+$D$10*D12 from F12 to cell F13:F3045 to generate the score for each customer.

3. The formula =PERCENTILE(F12:F3045,0.9) in G8 determines the 90th percentile of the scores.

4. Copy the formula =IF(F12>$G$8,1,0) from G12 to G13:G3045 to determine whether each customer's score is in the top 10 percent.

5. Enter in cell G10 the formula =SUMPRODUCT(G12:G3045,E12:E3045) to compute the total revenue earned from the top 10 percent of the scores by counting only the purchases for each customer whose scores rank in the top 10 percent.

6. Use the Evolutionary Solver to find the weights that maximize the July–December 2012 revenue obtained from mailing to the top 10 percent of the scores. The Solver window is shown in Figure 30.7; constrain your weights to be between .01 and 10.

Figure 30-7: Solver window for mailing to best 10 percent of customers


The Solver finds that the best scoring rule is 7.92 * Frequency + .13 * (amount spent in the last six months). If this score is at least 1,480, you should mail to a customer. The weights imply that a 7.92/.13 = $61 increase in the amount spent has the same effect on the score as an increase of 1 in frequency. Note that if you wanted to mail to the top 20 percent or 30 percent, you might get a different set of weights. The advantage of the current approach over RFM is that the current approach does not throw away any information. Also, the weights are chosen to meet your goal of maximizing revenue (or if you want, profit) from the mailing, whereas RFM analysis just chooses the combinations that yield a response rate that exceeds the break-even level. In Exercise 2 you will find a rule that mails to 20 percent of all customers.


In this chapter you learned the following:

·        To conduct an RFM analysis, each customer is classified on a 1–5 scale on recency, frequency, and monetary value. A score of 5, for example, is given on recency if a customer ranks in the top 20 percent on recency, and a score of 1 is given if the customer ranks in the bottom 20 percent on recency.

·        The response rate for each of the 125 RFM combinations is calculated. Based on these response rates, you mail to the RFM combinations whose response rate exceeds the break-even level.

·        Alternatively, you can isolate the data (perhaps number of recent purchases and amount of recent purchases) that you believe will best predict the revenue and use the Evolutionary Solver to derive a scoring rule that correlates strongly with customers who are likely to spend the most money in response to a mailing.


1. The file RFMdata.xlsx contains the date and size of transactions for 256 customers of a mail order catalog company. RFM (recency, frequency, and monetary value) attempts to predict how a customer will perform in the future based on ranking for recency, frequency, and monetary value.

(a) Rate each person on a 1–4 scale on each attribute, with a rating of 4 being the best and 1 the worst.

(b) On each attribute a customer will get a rating of 4 if he is in the top 25 percent, and so on. Use the RANK function to determine where a customer ranks on each attribute. Thus if you rank customers so that the largest value of an attribute gets a 256, then anyone with a rank of 193–256 gets a 4 rating; a rank of 129–192 gets a 3 rating; 65–128 gets a 2 rating; and a rank of 1–64 gets a 1 rating. Assume that each customer has received 80 mailings and a mailing costs $0.50.

(c) In E8:E71 use conditional formatting to highlight in yellow each profitable RFM combination.

2. For the file RFMtop10%.xls develop a strategy to maximize revenue generated by mailing to 20 percent of the customers.