Adding sequence numbers using R in Azure ML

When going through data preparation sometimes sequence numbers need to be added. If you are like me, you probably spent some time looking for a component in Azure ML to do this. I never found it.

Turns out it is really easy to do this in R and as a result also very easy to do in Azure ML.

In your experiment, add an Execute R Script component and connect it to the data flow.

Edit the script and add a column to the dataset that equals:

seq.int(nrow(dataset1))

See my code example:

# Map 1-based optional input ports to variables]
dataset1 <- maml.mapInputPort(1) # class: data.frame
dataset1$time=seq.int(nrow(dataset1)) 
# Select data.frame to be sent to the output Dataset port 
maml.mapOutputPort("dataset1");

 

On the third line the column is added and defined as a sequence number. The resulting dataset indeed has an extra column (called time) that like this:

The small histogram at the top and the details that right confirm it has only unique values and starts at 1; our sequence column has been added!