Draw your own dataset

For all those who want to visually test their classification algorithms on toy data, here is a RShiny app to easily click & draw your own custom dataset ! It can be accessed on shinyapps.io, and the source code is on Github.

It’s mostly useful for small two-dimensionnal numeric datasets that are inconvenient to build as a few superpositions of classic distributions samplings.

screenshot2

Screenshot of the app

RShiny is a web application framework by RStudio that allows you to make neat interactive web apps entirely in R. Shiny is quite practical because it handles most of the event-handling and variables updating under the hood. All we need to do is to declare the variables that are subject to interactive change as reactive values, and of keep track of where the changes happen.

In this app, every time the user clicks on the canvas, a small group of points is created near the click position; the number, the class (color), and the spread of the points created at each click can be adjusted in the parameters bar. The canvas can be cleared or the latest points undone if necessary. Once the dataset is ready, it can be downloaded as a csv file containing the x,y coordinates and class of each point; the downloaded dataset is scaled to zero mean and unit variance.  This GUI is defined in the ui.R program:


library(shiny)
library(shinydashboard)

dashboardPage(
  dashboardHeader(title="Draw Dataset"),
  dashboardSidebar(disable=T),
  dashboardBody(
    box(width=12,
        title="Parameters",
        fluidRow(
          column(4, numericInput("num_points", "Number of points per clic", 3)),
          column(4,
                 numericInput("sigma",
                              "Standard deviation of each clic point set",
                              step=0.01,
                              value=5)
          ),
          column(4,
                 selectInput("class", "Class (color)", choices=c("red"="firebrick1",
                                                                 "green"="forestgreen",
                                                                 "blue"="dodgerblue")))
        ),
        fluidRow(
          column(12,
                 downloadButton("save", "Save"),
                 actionButton("clear", "Clear", icon=icon("remove")),
                 actionButton("undo", "Undo", icon=icon("undo"))
          )
        )
    ),
    box(width=12,
        height=870,
        title="Draw your own dataset by clicking on this canvas",
        plotOutput("data_plot", click="plot_click")
    )
  )
)

The code in server.R shown below holds the data frame that will contains the dataset, and updates it whenever a user triggered event happens.
The dataset is a reactive object; its potential user-triggered changes are handled in the observe({}) blocks. The variables corresponding to user input are called input$something, where the various “something” are defined in ui.R.


library(ggplot2)

server <- function(input, output, session) {

  addGroup <- function(data, n, center, sigma, class){
    # append a group of points distributed around the clic coordinates
    # to a data frame holding all the points created and their color.
    new_group <- data.frame("x"=rnorm(n, mean=center$x, sd=sigma),
                            "y"=rnorm(n, mean=center$y, sd=sigma),
                            "class"=class)
    return(rbind(data, new_group))
  }

  # initialize reactive dataset holding the points created by the user
  v <- reactiveValues(data = data.frame())

  observe({
    # populates the dataset with points distributed around the clic target point
    if(!is.null(input$plot_click)) {
      # "isolate" reads fresh value of v$data without the update re-evaluating it
      # avoids infinite loop of update with rbind then re-rbind the updated data with the new group
      v$data <- isolate(addGroup(v$data, input$num_points, input$plot_click, input$sigma, input$class))
    }
  })

  observe({
    # remove all points from the canvas and the dataset when clear button is clicked
    if(!is.null(input$clear)) {
      v$data <- data.frame()
    }
  })

  observeEvent(input$undo, {
    # remove the latest drawn point from the dataset when undo button is clicked
    v$data <- v$data[-nrow(v$data), ]
  })

  output$save <- downloadHandler(
    # save the dataset as a csv file
    # scale to zero mean and unit variance before saving
    filename = function() {'DIYdataset.csv'},
    content = function(file) {
      write.csv(data.frame(scale(v$data[,c("x","y")]), "color"=v$data$class),
                file,
                quote=F,
                row.names=F)}
  )

  output$data_plot <- renderPlot({
    # display the base plot
    plot <- ggplot() + xlim(0, 100) + ylim(0, 100) + xlab("x") + ylab("y")     # if data is not empty, add it to plot     # points outside of plot boundaries are added to the dataset but not displayed     if (nrow(v$data) > 0) {
      plot <- plot + geom_point(aes(x=v$data$x, y=v$data$y),
                                size=4,
                                colour=v$data$class,
                                show.legend=F)
    }
    return(plot)
  }, height=800)

}

The block in which the dataset receives new points uses “isolate” to avoid an infinite loop.
Without this, the dataset would first be updated with the new points, but RShiny would then detect that the dataset has changed and would reiterate the assignment of new points, then would detect this new change, etc.

RShiny is a great tool for quickly developing apps to demo models or visualize data; this app takes only 100 lines of code !

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s