Pattern launch study

On Ravelry, users make heavy use of the bookmarking tools (“favorite” and “queue”) to remember the knitting patterns that caught their eye among the several millions that the database holds.

The popular Brooklyn Tweed design house released their Fall 2015 knitting patterns collection on Wednesday, September 16th; I used this opportunity to look at the bookmarking activity on these patterns in real time for several days after the launch.

Screen Shot 2015-09-26 at 11.26.49

Some of the 15 patterns in the Brooklyn Tweed Fall 2015 collection.

The pattern names are kept secret until the launch, so it’s not possible to “listen” to the Ravelry database to detect their apparition and start following them. One needs to first search the Ravelry database for recently published Brooklyn Tweed designs, and assuming that the Fall 2015 designs will be among the most recent results. This can, and should, be changed after the launch, editing the script to query the actual names of the collection patterns, because at some point there will be more recent patterns tagged “Brooklyn Tweed” that push those from this Fall collection out of the most recent results. The following code does just that, getting the permalinks (unique pattern names) for the 30 most recent “Brooklyn Tweed” search results:

# Search for recent BT patterns (will also have others made with BT yarns)
BT_query <- GET("https://api.ravelry.com/patterns/search.json?page_size=30&query=Olga%20Buraya%20Kefelian&sort=date",
             config=config("token"=ravelry.token))
content_to_follow <- content(BT_query)

permalinks_to_follow <- sapply(content_to_follow[[1]], function(x) x$permalink)

Once we have the links, the following code queries the API for each pattern for the properties we want to look at: current number of favorites, of users who queued the pattern, of comments, and of projects. It then appends the resulting data to a text file “BT_follow.txt”.

# get url from pattern unique name (permalink)
links_to_follow <- sapply(permalinks_to_follow, function(name) paste("https://api.ravelry.com/patterns/",name,".json&amp;quot;,sep=&amp;quot;&amp;quot;,collapse=&amp;quot;&amp;quot;))
# query the API
pat0 <- lapply(links_to_follow, GET, config=config("token"=ravelry.token))
pats <- lapply(pat0, content)

# filter the results for the properties we are interested in
pattern_data <- sapply(pats, function(x) x$pattern[c("permalink",
                                                    "queued_projects_count",
                                                     "projects_count&amp",                                                       "favorites_count",
                                                     "comments_count")])

# turn into a data frame object
pattern_df <- data.frame(matrix(unlist(pattern_data), nrow=length(links_to_follow), byrow=T))
# add time and date
pattern_df$time <- Sys.time()

# write to text file
write.table(pattern_df, file = "/Users/saskia/R/ravelry_explorer/BT_follow.txt",append=T,row.names=F,col.names=F,quote=F)

This code gets the data at the point in time where it is run; in order to get this data as a function of time, the code above in ran automatically every 30 minutes using a Cron job (useful advice here).

After letting the data file grow for a few days, it’s time for harvest !

# read from the text file
BT_time <- read.csv("BT_follow.txt",header=F,sep=";")
# add columns names
names(BT_time) <- c("pattern","queued_projects_count","projects_count","favorites_count","comments_count","day","time")

# Fall 2015 pattern unique names
names_BT <- c("ashland-2","bannock","birch-bay-2","cascades","deschutes-2","copse-2",
             "fletching-3","lander","lolo","mcloughlin","nehalem-2","riverbend-2",
             "sauvie","trailhead-2","willamette-7")

# Fall 2015 pattern official names
names_full <- c("Ashland","Bannock","Birch Bay","Cascades","Deschutes","Copse",
                "Fletching","Lander","Lolo","McLoughlin","Nehalem","Riverbend",
                "Sauvie","Trailhead","Willamette")

names(names_full) <- names_BT

# filter the data to keep only Brooklyn tweed Fall 2015 patterns
BT_time <- subset(BT_time,subset=(pattern %in% names_BT))
BT_time$pattern <- droplevels(BT_time$pattern)
# use pattern name as identifier instead of permalink
BT_time$pattern <- revalue(BT_time$pattern, names_full)
# get full date from day and hour/min, in Brooklyn Tweed's local time (PDT)
BT_time$date <- as.POSIXct(paste(BT_time$day, BT_time$time))
attributes(BT_time$date)$tzone <- "America/Los_Angeles"

And now we can plot the graph of the number of favorites on each pattern as a function of time. I chose 15 colors as far from each other as possible using this handy tool from the Medialab at Sciences Po.

ggplot(BT_time) + 
  geom_line(aes(x=date, y=favorites_count, color=pattern, group=pattern)) +
  geom_text(data=BT_time[BT_time$date ==lastdate,], 
            aes(x=date,
                y=favorites_count,
                color=pattern,
                label=pattern),
            vjust = -0.2) +
  xlab("Time") +
  ylab("Number of favorites") +
  theme(legend.position="none") +
  scale_colour_manual(values=BT_colors) +
  ggtitle("Evolution of number of favorites on BT Fall 2015 patterns")
The sun (almost) never sets on the knitting empire !

The sun (almost) never sets on the knitting empire! (click to enlarge)

Looks like the hottest time window is within the first day! The time dependence has a very similar shape for each pattern, probably because they are all released in the same context, with a consistent design concept throughout the collection.

People bookmark patterns all the time, although when it’s night in the USA, there is a small dip in activity. This is much more visible if we plot the favoriting rate (number of new favorites per hour):

# function getting the nbr of favorites per unit time
dfav_dt <- function(data){
  n <- dim(data)[1] # nbr lines
  favdiff <- c(0, diff(data$favorites_count)) # length n
  timediff <- difftime(data$date, c(data$date[1], data$date[1:n-1]),units="hours")
  # div by 0 on first item, but ggplot will just ignore those points
  timediff <- as.numeric(timediff) 
  favderiv <- SMA(favdiff/timediff)
  return(data.frame(deriv = favderiv, 
                    date = data$date))
}

# rate gets pretty constant after a few days, so keep only
# the points shortly after launch
m <- 3000
# get rates for each pattern
rates <- ddply(BT_time[1:m,],.(pattern), dfav_dt)

# normalize the data to more easily see the general behavior
norm_col <- function(data){
  maxd <- max(data$deriv, na.rm=T)
  return(cbind(data,norm_deriv=data$deriv/maxd))
}

rates <- ddply(rates,.(pattern), norm_col)

ggplot(rates) + 
  geom_line(aes(x=date, y=norm_deriv, color="red", group=pattern)) +
  xlab("Time") +
  ylab("Normalized favoriting rate") +
  theme(legend.position="none") +
  ggtitle("Evolution of favoriting rate on BT Fall 2015 patterns")

BT_fav_rate_t

Favoriting rate for the few days after the launch; the dip in activity when night moves over the USA is now quite visible. There are 15 normalized curves, one for each pattern in the collection. (click to enlarge)

The time dependence of the number of times a pattern was queued is very highly correlated with the time dependence of the number of favorites, so I’m not showing it here.

I’m letting the code run once per day now; I’ll post the new data in a few months to look at long term tendencies, and at the number of projects (there are obviously not a lot of them so soon after the launch). I’m guessing Christmas Crunch will be having interesting effects …

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s