Knitting patterns price distribution

Ravelry allows knitting designers from all over the world to sell individual patterns directly to knitters. This post looks at the distribution of the prices of these items. The Ravelry pattern database holds patterns published as books, e-book collections, pdf downloads, or club subscriptions; all patterns that are available online for purchase have their price indicated on their Ravelry page.

The following code queries the Ravelry API for the 500 patterns with most projects in each of the following categories: “hat”, “sweater”, “neck/torso”, “feet/legs”, “hands”, “home”, “component” (these are tutorial or special techniques instead of full patterns), “toys and hobbies”, and “pet”. From this data we get the URLs to the patterns and use standard web scrapping to get the html code for each pattern page. This works on patterns, since their pages are public (viewable by users not logged into Ravelry), but would not work for projects or stash entries, which are private by default (viewable only by Ravelry users).




categories <- c("hat","sweater","neck-torso","feet-legs","hands","home","toysandhobbies","pattern-component","pet")

# Get dataset of non-free patterns with a high number of projects, available as pdf downloads
# (Full treatment 1h30 for 500/category)
search_url <- "https://api.ravelry.com/patterns/search.json?page_size=500&sort=projects&craft=knitting&availability=ravelry%2B-free&pc="
cat_search <- sapply(categories, function(name) paste(search_url, name,sep="", collapse=""))

# Get lists of search results; price attribute is NULL => use web scraping to get it
pat0 <- lapply(cat_search, GET, config=config("token"=ravelry.token))
pat <- lapply(pat0, content)

# Extract patterns permalinks in each category
permalinks <- sapply(pat, function(x) sapply(x$patterns, function(y) y$permalink))
names(permalinks) <- categories
permalinks <- melt(permalinks)
names(permalinks) <- c("link","category")

permalinks_full <- sapply(permalinks$link, function(name) paste("http://www.ravelry.com/patterns/library/",name,sep="",collapse=""))

# Random sampling for testing
samp = sample(1:length(permalinks$link),length(permalinks$link))
permalinks_full <- permalinks_full[samp]       
permalinks <- permalinks[samp,]       

# Web scraping to get the price from the pattern page
# Takes about 1 min for 50 links
n=dim(permalinks)[1] # 1000 ok
pattern_info <- lapply(permalinks_full[1:n], htmlTreeParse, useInternalNodes = TRUE)
names(pattern_info) <- permalinks$link[1:n]

Once we have the html code for each pattern page, we parse it for the prices. The path to the price information in the html tree can be checked by looking at the source code for a typical pattern page. Since most patterns are priced in US dollars (around 75% of them in this dataset), all the price data is converted to current USD to match, using the R quantmod library.

 

pattern_prices <- lapply(pattern_info, function(html) getNodeSet(html, 
                                                                 path="//strong[@class='price']/a/text()", 
                                                                 fun=xmlValue)[[1]] )

num_prices <- lapply(pattern_prices, function(str) c("price"=regmatches(str,
                                                                regexpr("[[:digit:]]+\\.*[[:digit:]]*",str)),
                                                     "currency"=substr(str, nchar(str)-2, nchar(str)) 
                                                     )
                     )


pattern_nbr_projects <- melt(sapply(pattern_info, nbr_projects))
price_data  <- data.frame(matrix(unlist(num_prices), nrow=length(num_prices), byrow=T), stringsAsFactors=F)
price_data <- cbind(pattern_nbr_projects, permalinks[1:n,], price_data)
names(price_data) <- c("nbr_projects", "link","category", "price", "currency")
price_data$price <- as.numeric(price_data$price)

# Local currency conversion is proposed by Ravelry only for logged in users
# => do normalizeing of prices here
currencies_codes = sapply(price_data$currency, paste,"USD",sep="")
# getFX puts exchange rate in the environment, but sapply does not change env. variables
for (curr in unique(price_data$currency)) getFX(paste(curr, "/USD", sep=""), from = Sys.Date())
exchange_rates = sapply(currencies_codes, get)
price_data$price_usd = price_data$price * exchange_rates

And finally, the global price distribution (all categories aggregated):

ggplot(price_data) +
  geom_histogram(aes(x=price_usd), fill='Blue', alpha=0.5, binwidth=0.5) +
  scale_x_continuous(limits = c(0, 20), breaks = round(seq(0, 20, by = 1), 1)) +
  xlab("Pattern price in USD") +
  ylab("Number of patterns")

hist_prices

Pattern prices distribution. The data is only shown for patterns up to 20 dollars (there are a few expensive outliers, mostly kits with pattern+yarn included).

It looks like the “99 cents is cheaper than 1$” strategy is mostly used in the lower prices range. In the prices 6$ to 9$, there are much fewer price points just below the integer prices, but in the 3$ to 5$, it’s the contrary.

The breakdown by category:

ggplot(price_data, aes(x=category, y=price_usd, fill=category)) +
  geom_boxplot(alpha=0.5) +
  ylab("Price in USD") +
  title("Pattern prices distributions in each category")
Prices by category

Pattern price is pretty constant.

Price does not depend on category much. But negative results are just as interesting as positive results, so this graph is still proudly displayed! I was a bit surprised by this, since there can be a lot a variance in pattern design complexity between a one-size-fits-all accessory and a sweater.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s