Ravelry allows knitting designers from all over the world to sell individual patterns directly to knitters. This post looks at the distribution of the prices of these items. The Ravelry pattern database holds patterns published as books, e-book collections, pdf downloads, or club subscriptions; all patterns that are available online for purchase have their price indicated on their Ravelry page.
The following code queries the Ravelry API for the 500 patterns with most projects in each of the following categories: “hat”, “sweater”, “neck/torso”, “feet/legs”, “hands”, “home”, “component” (these are tutorial or special techniques instead of full patterns), “toys and hobbies”, and “pet”. From this data we get the URLs to the patterns and use standard web scrapping to get the html code for each pattern page. This works on patterns, since their pages are public (viewable by users not logged into Ravelry), but would not work for projects or stash entries, which are private by default (viewable only by Ravelry users).
categories <- c("hat","sweater","neck-torso","feet-legs","hands","home","toysandhobbies","pattern-component","pet") # Get dataset of non-free patterns with a high number of projects, available as pdf downloads # (Full treatment 1h30 for 500/category) search_url <- "https://api.ravelry.com/patterns/search.json?page_size=500&sort=projects&craft=knitting&availability=ravelry%2B-free&pc=" cat_search <- sapply(categories, function(name) paste(search_url, name,sep="", collapse="")) # Get lists of search results; price attribute is NULL => use web scraping to get it pat0 <- lapply(cat_search, GET, config=config("token"=ravelry.token)) pat <- lapply(pat0, content) # Extract patterns permalinks in each category permalinks <- sapply(pat, function(x) sapply(x$patterns, function(y) y$permalink)) names(permalinks) <- categories permalinks <- melt(permalinks) names(permalinks) <- c("link","category") permalinks_full <- sapply(permalinks$link, function(name) paste("http://www.ravelry.com/patterns/library/",name,sep="",collapse="")) # Random sampling for testing samp = sample(1:length(permalinks$link),length(permalinks$link)) permalinks_full <- permalinks_full[samp] permalinks <- permalinks[samp,] # Web scraping to get the price from the pattern page # Takes about 1 min for 50 links n=dim(permalinks) # 1000 ok pattern_info <- lapply(permalinks_full[1:n], htmlTreeParse, useInternalNodes = TRUE) names(pattern_info) <- permalinks$link[1:n]
Once we have the html code for each pattern page, we parse it for the prices. The path to the price information in the html tree can be checked by looking at the source code for a typical pattern page. Since most patterns are priced in US dollars (around 75% of them in this dataset), all the price data is converted to current USD to match, using the R quantmod library.
pattern_prices <- lapply(pattern_info, function(html) getNodeSet(html, path="//strong[@class='price']/a/text()", fun=xmlValue)[] ) num_prices <- lapply(pattern_prices, function(str) c("price"=regmatches(str, regexpr("[[:digit:]]+\\.*[[:digit:]]*",str)), "currency"=substr(str, nchar(str)-2, nchar(str)) ) ) pattern_nbr_projects <- melt(sapply(pattern_info, nbr_projects)) price_data <- data.frame(matrix(unlist(num_prices), nrow=length(num_prices), byrow=T), stringsAsFactors=F) price_data <- cbind(pattern_nbr_projects, permalinks[1:n,], price_data) names(price_data) <- c("nbr_projects", "link","category", "price", "currency") price_data$price <- as.numeric(price_data$price) # Local currency conversion is proposed by Ravelry only for logged in users # => do normalizeing of prices here currencies_codes = sapply(price_data$currency, paste,"USD",sep="") # getFX puts exchange rate in the environment, but sapply does not change env. variables for (curr in unique(price_data$currency)) getFX(paste(curr, "/USD", sep=""), from = Sys.Date()) exchange_rates = sapply(currencies_codes, get) price_data$price_usd = price_data$price * exchange_rates
And finally, the global price distribution (all categories aggregated):
ggplot(price_data) + geom_histogram(aes(x=price_usd), fill='Blue', alpha=0.5, binwidth=0.5) + scale_x_continuous(limits = c(0, 20), breaks = round(seq(0, 20, by = 1), 1)) + xlab("Pattern price in USD") + ylab("Number of patterns")
It looks like the “99 cents is cheaper than 1$” strategy is mostly used in the lower prices range. In the prices 6$ to 9$, there are much fewer price points just below the integer prices, but in the 3$ to 5$, it’s the contrary.
The breakdown by category:
ggplot(price_data, aes(x=category, y=price_usd, fill=category)) + geom_boxplot(alpha=0.5) + ylab("Price in USD") + title("Pattern prices distributions in each category")
Price does not depend on category much. But negative results are just as interesting as positive results, so this graph is still proudly displayed! I was a bit surprised by this, since there can be a lot a variance in pattern design complexity between a one-size-fits-all accessory and a sweater.