Last year I make an analysis of four key Gartner Magic Quadrants (last year’s post with the explanation of how I did it can be found here). Two of them have updated and of course I have updated my analysis accordingly. See below and note that Microsoft is again on top for the Operational Database Management Systems quadrant… 🙂
So here you are, creating your very own Power BI custom visual. You have read the documentation and ran the tutorial (https://github.com/Microsoft/PowerBI-visuals/blob/master/Readme.md and https://github.com/Microsoft/PowerBI-visuals-sampleBarChart). You feel proud because you are done creating your awesome looking 3d-piechart-barchart-mashup visual. Then it happens. You run: pbiviz start to view your visual and….BAM:
Ouch. Now, before you starting banging your head against the wall until it hurts, here is the solution:
Most probably you have (as good practice dictates) changed the class name of your visual from the default ‘Visual’ to something more interesting, such as MyAwesome3DpieChartBarChartMashupTheDutchDataDudeIsSoGoingToHateThisVisual.
Well, you forgot to change the visualClassName as specified in pbiviz.json so the code can actually find the entry point for your awesome visual. So, quick fix: open pbiviz.json and change the visualClassName property into your class name (which is hopefully not alike the one above). Save the file, re-run pbiviz start and done!
(I know this is a very newbie / getting started type of error, but it took me more than 5 minutes searching for it when I first encountered it. I figured it is worthwhile saving everyone’s time and log it for my own future reference ;))
Esri is a leader in the GIS industry and ArcGIS is a very popular product to build great maps. Now, you can use ArcGIS maps in Power BI (in preview). See the official information here: https://powerbi.microsoft.com/en-us/blog/announcing-arcgis-maps-for-power-bi-by-esri-preview/. This is really cool, I know a lot of you have been asking for this for a long time!
You will find the option to enable this preview in PowerBI.com, not in the Power BI Desktop. Log in to PowerBI and open the settings. You can find the ArcGIS preview there and enable it by simply selecting the checkbox:
With that enabled, create a report with some geographical information (or edit an existing one). I used the Google Analytics data that keeps track of my blog. Google Analytics data can be loaded into Power BI simply by using the content pack. In edit mode in the report you will find the ArcGIS component in the Visualizations list:
Click it and create your map as you would with the normal map. I noticed it needs some time to build the map (probably due to the preview) but once it is done it is fully interactive with the other items on your report as you would expect:
You can change a lot of the ArcGIS options, such as switching out maps, changing symbol styles, adding reference layers, etc.
I love this – the awesome power of ArcGIS and Power BI combined! I cannot wait to see what you will create with this.
When going through data preparation sometimes sequence numbers need to be added. If you are like me, you probably spent some time looking for a component in Azure ML to do this. I never found it.
Turns out it is really easy to do this in R and as a result also very easy to do in Azure ML.
In your experiment, add an Execute R Script component and connect it to the data flow.
Edit the script and add a column to the dataset that equals:
See my code example:
# Map 1-based optional input ports to variables]
dataset1 <- maml.mapInputPort(1) # class: data.frame
# Select data.frame to be sent to the output Dataset port
On the third line the column is added and defined as a sequence number. The resulting dataset indeed has an extra column (called time) that like this:
The small histogram at the top and the details that right confirm it has only unique values and starts at 1; our sequence column has been added!
Another great resource for learning about Power BI is the course on EDX: Analyzing and Visualizing Data with Power BI. Granted, has been around for a while, but I forgot blogging about it; maybe it is a bit easier to find now.
Enjoy the new skills you will learn with this!
A recurring theme with customers and partners is Power BI data refresh. More specifically, there is some confusion on what refresh scenario requires a Pro version and what can be done with the Free version of Power BI. I made the diagram below to help explain this. It shows the refresh scenarios against data storage solutions, such as SQL Azure, SQL in a virtual machine in Azure or SQL server on premises. I used these as examples, there are other options as well. I think the overall time carries over to other data storage solutions. The diagram shows the refresh that can be done using a Power BI Free account as orange and the refresh scenarios that need Power BI Pro as green lines. As shown in the diagram, if you want to refresh against on-premises sources or a database running in a VM in Azure you will need a gateway and Power BI Pro. This applies not only to the creator of the report and schedule but also to every consumer. If you use PAAS solutions for data storage in Azure such as SQL Azure, it becomes a bit more difficult and it is really dependent on the type of refresh required. If you need a refresh cycle higher than once a day (either max 8 times per 24 hours or live) you will need Power BI Pro. If you just want to refresh against such as SQL Azure and once a day is enough you can do that using Power BI Free. Again, the license requirement carries over from author to viewer; if the author of the report requires Pro, then the viewers also need Pro.
Hope this helps. If you have any questions or feedback, please comment below!
All this talk all the time about Big Data and Advanced Analytics is all well and good, in fact it is something I do most of my time. The technology is there and has great potential. The biggest question now is how to use these technologies to their full extent and maximize the benefits of the technologies for your organization. The answer lies in becoming a data driven organization.
A data driven organization is an organization that breathes data, not only in the sense of producing data, but also in the sense of analyzing, consuming and really understanding data, both their own as well as the data others can provide. In order to have a sense to become a data driven organization, you will need to change People, Process and Technology. There is enough talk about the Technology in the market already (and on this blog), so I will come back to that later and not go into much detail now. Let’s look at the other two: People and Process. I view Process as very much related to People: bringing in new skills without the proper Process in place for how to work with them and for the new People to work together will not be very useful.
So, what People do you need? In other words: what roles do you need in a data driven organization? I see four required roles in any organization that wants to be more data-driven. This is not to say that these four roles should be four different people; it is very well possible that someone might take on more than one role. I am however confident that there exist very few people who will able to do all four roles since each requires specific skills, focus and passion.
The four roles are: Wrangler, Scientist, Artist and Communicator. Let’s look at the four roles in more detail.
The role Wrangler, or data wrangler as others call this role is responsible for identifying, qualifying and providing access to data sets. In this sense the data is the wild horse that the wrangler tames. This role is a need for the Scientist role to work with qualified, trustable and managed data sources. In much situations, this looks a lot like the current data management roles already present in organizations. This role lives mostly in IT. Keywords here are databases, connection strings, Hadoop, protocols, file formats, data quality, master data management, data classification.
More popularly called the Data Scientist, a lot of people seem to believe that as long as you hire a Data Scientist you are a data driven company. This is much the same as saying that if you have Hadoop you ‘do Big Data’. This is about as smart as saying that if you got your driver’s license you make an excellent Formula 1 driver. It is just not true, sorry. Note also, that the opposite applies; if you are a great Formula 1 driver you could be a very bad driver on open roads. Running Hadoop does say you use Big Data. Hiring a Data Scientist does not mean you are a data driven company.
A Scientist is someone who applies maths, a lot of maths, to convert data into information. He or she applies statistical models and things like deep learning, data mining and machine learning to make this happen. Scientists are the rock stars of this data-focused world since they are the once actually making the magic happen. However, they cannot do it alone. They need good quality and trustable data, which is what the Wrangler supplies. Also, these Scientists happen to be ill-understood by the rest of the organization. This do this experiment: have your (Data) Scientist stick around the water cooler for 15 minutes every day and let him / her talk to people (I know, for some this is hard already). Then, check how quickly the person the Scientist is talking to disconnects. My experience is that someone who is not a fellow Scientist or Communicator will not make it for 15 minutes. Just try it, you will see what I mean.
Keywords here are data mining, R, Python, machine learning, statistics, algorithms.
The Artist role converts information the Scientist brewed up to insight that the consumers can understand and use. This role focusses on esthetics and the best way of data visualization to bring the message across in the best possible way. While the Wrangler is a very IT focused role and the Scientist is very mathematical, the Artist often comes from the creative arts world. The Artist just loves making things understandable and loves making the world a better place by creating beautiful things, such as great looking reports and dashboards. They often employ storytelling and other powerful visual methods such as infographics to convey their message to the consumers.
Keywords: data visualization, dashboard design, signaling colors, storytelling.
The last role in data driven organizations is a chameleon; If you look at the types of person in the Wrangler, Scientist and Artist role it is clear to see that these are very different people, with different backgrounds and different passions. Just as much as some of them find it hard to talk to the rest of the organization they can find it difficult to talk among their own and work together. In order to make sure there is no communications breakdown, many organizations invest in a Communicator; someone who has enough understanding of the passion of the people in the other roles to be able to level with them, understand their needs and explain the needs of others to them. Sub types of the Communicator is the Wrangler-Scientist communicator and the Scientist-Artist communicator.
This concludes the roles I see in a data driven organization; of course these roles with need the be supported with the right Processes and Technology. Having a Technology platform instead of disparate tools will help you to achieve this and make the best out of the investments you are making in these roles.
Many customers asked me questions on Azure Machine Learning (Microsoft’s fully managed machine learning and data mining solution) and more specifically on it’s pricing. In this post I will try to explain how the pricing works and what components you need to be aware of.
Azure Machine Learning is offered in two tiers: Free and Standard. The Free tier is obviously, well, free. It is however as you could expect limited compared to Standard. Differences are mostly in performance (multiple nodes for execution in standard vs. just one node in free) or storage (10 gb in free, unlimited in standard). There is no SLA for the free version, you cannot set up a production Web API to automate experiments in free and the staging web API is throttled.
For the standard tier, the following items need to be taken into consideration:
- Seat; Azure ML has a monthly fee per seat, which translates to a user (mostly your data scientist) using the Azure ML web interface to develop and tune experiments. This price is per month per subscription/seat.
- Studio usage; This is an hourly price for usage of the Azure ML web interface; so you will need to estimate the number of hours the users will be logged in. If they work 40 hours per week on the project then you would need to take into consideration 40 hours of studio usage time per week per data scientist.
- API Usage; Azure ML allows you to bring an experiment online through the use of RESTful web services. This means you can automate score and training and have applications, websites, etc. use the experiment without human interference. With this you could do an automated credit scoring, recommendation or churn prediction directly from your app or website. In order to make this work you will need to create a web service in Azure ML (also called API). Azure ML charges per hour for compute used in an API that is production, so that is the fee you will need to pay per hour the web service / API is ‘online’ and usable. Also, you will need to pay per 1000 transactions. Transactions in this case are interactions with the API, such as one recommendation, one churn or one credit score.
Hope this clarifies a bit. Please refer to the official page linked above for more details and for the pricing details.
Als iemand een paar jaar terug tegen mij had gezegd dat Microsoft Multi platform zou gaan en dat ook SQL Server op Linux zou uitkomen, dan had ik hem voor gek verklaard. Echter sinds de komst van Satya Nadella als CEO en mensen als Scott Guthrie (CVP voor Cloud & Enterprise) waait er een frisse wind door Microsoft!
En als Microsoft employee voelde ik natuurlijk al wat aankomen, maar het is toch altijd weer top om het nieuws dan nu in de buitenwereld te zien landen: Gister heeft Microsoft de Linux versie van SQL Server geannonceerd in preview. Dit betreft voorlopig alleen het RDBMS en niet de overige SQL tools als SSRS en SSAS, maar een mooie start om nu ook op Linux de (volgens Gartner) de leading database te kunnen runnen!
Ik zou het leuk vinden als jullie na het testen van deze versie op het blog zouden willen reageren met de eerste indruk van de performance van deze preview versie.
De blogpost van Scott over de annoncering: http://blogs.microsoft.com/blog/2016/03/07/announcing-sql-server-on-linux/
Inception style insight: analyzing Gartner Magic Quadrants for BI, Analytics and Databases using Power BI
Disclaimer: Yes, I know the Gartner Magic Quadrants normally should be bought. The information in this post however is based on the web searches much like this: http://www.bing.com/images/search?q=gartner+magic+quadrant+bi&FORM=AWIR which happens to return the Gartner Magic Quadrants…
A while ago I used Power BI to analyze the BI market. Back then I did that in Excel. Given the recent movement in the Magic Quadrants for databases, analytics and BI I figured I needed to do a re-run. This also enables me to show off the new Publish to Web feature of Power BI. You can see the result below; I found images of the following Gartner Magic Quadrants:
- Advanced Analytics
- Business Intelligence
- Data Warehouse Database Management Systems for Analytics
- Operational Database Management Systems
Then I followed the same process as in the previous blog post on this subject (that means a lot of manual labor :)) and made a Power BI report on top of it. Also I added the logos of the companies involved. The report has two tabs: the first focusses on one magic quadrant at a time, while the second allows you to see the position of a certain vendor in all five MQs I analyzed. Both tabs allow you to look back in time and see the development of them over time.
Enjoy this bit of Inception-style Insight :). If you have ideas on how this could be improved, let me know! Have a look below or open the report full screen.