Data Scientists – Hero or Hype?


Data Scientists – Hero or Hype?

Jim concludes his Analytics Mindset Series elucidating on one of the hottest career disciplines – Data Scientists – Hero or Hype? Catch up on his previous blogs of the series here:

Of all the enabling technologies built on the internet, few have been as historically important as the browser and the search technology layered on top. Since we first laid hands on Netscape in the early 1990s, humankind has been captivated by the unlimited universe that the internet has revealed. “Googling” has indeed become a verb spoken by the common person in daily life, simplifying what is a complex capability that requires unfathomable networks of computers and code into a single word. Browsing and searching are native to our relationship with technology, just as running water is to the feeling of home – in their absence, civilization itself exists in a diminished state.

Analytics Glossy Flyer

This first era of the internet has been referred to as the Internet of Information (IoI), having enabled all variety of services that have a primary mission around the sharing of ideas and associated commerce.  Within this paradigm, the computer programmer is the oracle behind the curtain making the magic happen.  Now, this information age is not done yet – many believe it is near full maturity – but a new dawn is upon us. This new era has creeped up over the past decade and is now evident in all of the smart devices, robotics, and connected networks that power everything from Wall to Main Street. One might call this the Internet of Enablement (IoE) as it is transforming capabilities and human capacity in all aspects of daily life.

In this IoE, the data scientist is clearly a workhorse – and what code is to the programmer, data is to the data scientist. Most assuredly, given the Artificial Intelligence (AI) infused technology that exists in our homes, offices, markets, streets – everywhere – we have unprecedented amounts of data to contend with today. Data scientists are not flustered by this as they know the real truth – without gobs of data over time, AI and its offspring Machine Learning (ML) are not practical.

Demand for data scientists has never been greater and the educational system has responded accordingly. Dozens of universities now offer Bachelors and Masters programs that churn out these algorithmically minded citizens, most of which are scooped up by hiring enterprises as soon as their cap and gown are removed. With six-figure starting salaries and mid-careers approaching $300k, there seems to be unbounded love for these darlings of the IoE.

Just hire one (or three), point them to the data warehouse, and let ‘er rip – yes?

Caveat Emptor. As we learned in the 1849 California Gold Rush, a pan and pick-axe does not a successful 49er make. Given the demand for data scientists, there seems to be an unnatural spike in that job title in the marketplace. True, data scientists have a strong foundation in mathematics – but a B.S./M.S. in Statistics is table stakes on this journey. The successful data scientist will have an algorithmic mindset evidenced by their ability to deploy mathematical frameworks to business-driven models. They must go “beyond the math” and be adept at using modern programming languages (R, Python, SAS) and visualization tools to deliver business consumable outcomes. They are data-whisperers, able to see patterns in the data to “find the story” and make order out of seeming chaos. Finally, the best data scientists can tell that story, getting beyond the jargon and connecting directly with business sponsors to profound impact.

Domo Arigato, Mr. Roboto? On the analytics maturity scale, the ability to prescribe actions based on model outputs is a goal that you would want to achieve in your analytics program. Data scientists have a role in achieving this goal as they have the skills to interpret model outputs within the context of a business objective. In consultation with business experts, they can deliver action-oriented results that drive transformation. Today’s ML systems are quite sophisticated, some demonstrating that model output assessment and prescription of next steps are possible without human interaction. Certainly, we see this today in factories where sensors capture real-time data and take action to avoid unwanted outcomes. Or, in algorithm-driven trading systems that take input from relevant market ecosystems that decide what to buy or sell, when, and in what quantity. However, the ability to interpret across broad use cases and within nuanced circumstances (think healthcare) is still beyond most automated capabilities, thus human skill is still front and center in most implementations.

Cloud Glossy Flyer

Beyond IT. The habitat for data scientists should not be restricted to technology teams, per se. We should not mistake them for developers or architects, which are generally considered native to Information Technology departments. Rather, a broader perspective of the citizen data scientist should be adopted, whose habitat is within operational and corporate groups, charged with leveraging data and algorithmic approaches to solve business objectives. These citizen data scientists require most of the skills that make their IT counterparts effective but have the advantage of working within the business units themselves. Despite (potentially) lacking some educational aspects of the discipline, citizen data scientists are nonetheless an important way to accelerate your analytics maturity as an organization.

So, to answer the initial question – hero most certainly, with proper expectations and support. If you find yourself trying to sell the notion of a data scientist with something akin to “greatest thing since …”, reel yourself back in or risk crossing over to hype territory.

This blog was originally published on LinkedIn


Algorithms Hold the Key to Your Analytics Success

Here is Jim’s 4th take on the Analytics Mindset Series. If you have missed his earlier musings, follow the links below:

If you’ve been following along my recent blog path, this topic should come as no surprise to you – skip to the next paragraph, oh faithful one. For those reading for the first time – welcome – allow me to quickly summarize those prior musings to make sure you are up-to-speed. In my prior blogs, my basic positions were that a) advanced analytics should be the eventual goal of your data project (; b) the BI tool itself is not the most important thing to focus on (; c) take advantage of cloud analytics to achieve the greatest returns on your project (

Given those premises, and keeping in mind the ultimate goal of maturing your analytics capability, let’s chat about the what, why, and how of algorithms.

What? a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer ( A generic start, but at least something from which we can build understanding. From this description, one could see that the programming code itself is algorithmic in nature, and that is quite true. Thus, at the most basic level, an algorithm in our context is most often expressed as a code that is written to perform specific functions, and executed within the context of your application to produce outputs that are useful in an analytical way. So, if an algorithm is “just” code, what makes it so important?

Cloud Glossy Flyer

Why? As we know, computer code is an assemblage of logic statements, calculations, and actions that determine how the software reacts to various inputs (data from keyboards, clicks, or other systems). Algorithms are coding objects that have a specific purpose in a larger scheme; that is, they take specific inputs, manipulate them in a scientific or modeled approach, to produce outputs that have value in a larger context. Analogy – let’s take Nanna’s Irish soda bread recipe. The individual ingredients (code) have no special value in and of themselves. Yet, when they are assembled in a specific portion and sequence, it is then that the magic is revealed. One could say that the recipe is an algorithm that unlocks the wonder of that loaf. Of course, the algorithms you deploy might not be as tasty – but certainly more germane to your business objectives, and include additional elements that computers excel at.

How? When you get down to it, this is the crux of the matter. How does one create an algorithm? Where do the inspiration and raw materials come from? Once created, how do I use the algorithm within the context of my analytics framework? Let’s examine this one at a time…

Where is your pain?  Let’s say you are working with the Sales team to improve their strike ($) and hit (win) rates. Currently, the Sales team accepts proposals from customers and treats them with equal importance. The VP of Sales is looking for a way to understand the nature of a customer proposal and derive a score such that it would help the team prioritize those opportunities that provide the greatest value. This is the making of an algorithm, wouldn’t you say? There are data elements to consider (customer history, market history, complexity, potential revenue, etc.) as you work with business experts, data scientists, and data engineers to test models that provide outputs that (hopefully) indicate the relative attractiveness of that proposal. The model could be used to inform where limited resources should be applied, among other things.

It’s alive! To bring your scoring model alive, the data science team can no doubt use any number of tools available in the marketplace. Open source has provided abundant resources – like R, Python, and Spark – through which your algorithm can come to life. However, many enterprises prefer to scale their solution using licensed products that provide integrated capabilities across development, visualization, and governance requirements. Either way, this is the step where you will use data to train and improve the model over time as you drive to production readiness.

Hey, look at me! We have this fantastic algorithm that Sales loves, now we need to surface it for greatest impact and accessibility. Certainly, you can embed the scoring metric within the Sales pipeline dashboard along with other key data about pending deals. Better still, create a specific “top deal targets” page on your intranet or mobile app that sales teams (and C-suite) can easily see on a daily basis – a good way to drive interest to these algorithmic endeavors and, along with it, additional budget support. Another option – integrate the scoring metric directly with your Sales/CRM tool so that busy teams don’t have to go into another screen to see this data, it is right there in the tool they use on a daily basis.


Analytics frameworks that utilize dashboards and tabular reports (I’m looking at you Finance) have been standard fare for over a decade and are still useful in the modern approach. Through the adroit use of algorithms, however, you have the ability to unlock insights that form the basis of your enterprise advantage. Unleash that advantage and you just might become the analytics hero of your organization.

This blog was originally published on LinkedIn


Your Analytics Ought to be in the Cloud

Jim picks up yet another interesting topic for discussion in the Analytics Mindset Series 3/5. If you have missed the previous blogs 1/5 & 2/5, of the series, read it here:

Way back in 2010, I was quoted in a Computerworld article ( noting that BI could be the “…next killer SaaS application.” It was a somewhat controversial statement to make in those “early” days of cloud adoption; indeed, even I was not fully convinced that all of the necessary pieces would ever be there to make cloud analytics work properly. At the time, bandwidth, security, cost, processing capacity, and privacy were some of the concerns that came to the forefront for those who had reservations about analytics in the cloud.

In the subsequent years, cloud adoption has taken off in most corners of enterprise technology enablement. Consider, most (if not all) major software companies deliver their wares in cloud-first or cloud-only formats, with the lions-share of R&D dollars following that model. Customers are modernizing their back-office operations by adopting cloud tools at a record pace across all their key systems, such as ERP, Workforce Management, and CRM. In doing so, enterprises are expecting the promise of lower costs, unbounded capacity, and superior innovation to come together to deliver their corporate objectives. Certainly, when you examine the revenue trends of the leading cloud providers today, one could certainly come to the conclusion that customers are seeing these benefits and are hungry for more.

Given what appears to be a mad sprint to the cloud, why has analytics been slow to join the dance? Certainly, it cannot be a lack of interest in the space as BI adoption continues to be high on the list of CIOs “critical application” surveys. Truly, you cannot throw a rock these days and not hit a news article where some C-suite leader is touting the benefits of “AI” and “Machine Learning” within their organization. Given all of this talk around leveraging data to engender digital transformation, you might conclude that the cloud should be the place to plant the “data flag” that powers your enterprise – right?

Yes! Let’s consider the case for analytics in the cloud with these thoughts in mind.

{MSRCosmos Analytics Practice –}

Already there. Without even realizing it, you are probably enjoying cloud analytics today. For example, are you using a leading cloud CRM that has a sophisticated analytics capability bolted onto it (named after some guy who discovered E=mc2)? Or, how about a leading BI dashboarding tool that has power written all over it? Maybe you are familiar with a website toolkit that provides deep analytics on your site traffic (it can make you googly-eyed)? Yup, analytics has been in the cloud for years, powered by the leading enterprise software companies and their platforms, and you were right there with them. Moreover, if you were to take inventory of where your enterprise data resides, I would be willing to wager that more than half is already in some cloud environment today. Thus, it would follow that integrating all of that data on a similar or adjacent cloud architecture could provide synergies.

Getting there. So, you want to develop a custom data warehouse solution in the cloud that serves up analytics to support your business objectives. As with all analytics projects, it starts with the data. Anyone who has delivered successful analytics projects of size knows that sourcing, assembling, and architecting the necessary data is (easily) the most important technical challenge to conquer.

The good news is that modern cloud environments from leading vendors have very sophisticated analytics platforms embedded within that provide an environment approaching smartphone app store convenience. Each has their specific strengths, certainly, but all can provide the basic toolkits to get data in, cleaned, and richly presented to your consumers. Some of these cloud platforms are cousins to their on-premise incarnation – if they exist – and might be slightly behind the capability curve, but vendors are keenly aware of that gap and are mindful to close it.

Power up. Thinking you will have to sacrifice performance in this “fluffy” new world? Think again. There are numerous database/data warehouse tools that have been developed in the past five years in the cloud, and they are native, angry high performing beasts. Seriously. I have spoken to customers and reviewed statistics, and these cloud-native (most built from scratch) have shown they can outperform even purpose-built appliances. Better still, a leading cloud-native data warehouse (let’s call them “flakers”) takes full advantage of the paradigm – independently scalable compute and storage, agile node management, and automated maintenance (DBAs not needed here). This is where I lament that I had to work hard “back in the day” to build stuff, these kids today…

{MSRCosmos Cloud Enablement –}

From my perspective, the move to cloud analytics is a clear mandate. For those free of legacy analytics or starting anew, move with purpose and alacrity to the cloud. If you have “lift and shift” applications to deal with, find a partner that can help you properly scope and plan your migration. When done correctly, the results should be more “move and improve” than just a change of scenery and a fresh coat of paint. Cloud is the foundation of most enterprise digital transformations and analytics ought to be a VIP in that journey.


Choosing a BI Tool – Does It Matter?

Welcome to yet another blog post. We are adding an interesting topic: ‘Choosing a BI Tool – Does It Matter?’ to our Analytics Mindset Series. Catch up, if you have missed our previous blog, here:

In that discussion, I made this simple statement:  Don’t argue over BI.  That is, don’t make your analytics project about choosing “the best” Business Intelligence tool. Mostly, I argued, that is a personal preference statement, one that will not ultimately decide the fate of your project. Today’s user experience is mostly terrific in all of the leading tools, providing a rich visual experience that can produce a gazillion chart styles, dashboard templates, complex expressions, and guided data discovery pathways. Thus, trying to convince someone of the superiority of your favorite tool will probably be as fruitful as that argument supporting Android OS – move on mister, not giving up my Apple. The point – find something that fits your talent pool, budget, and licensing model, and get on with the work at hand.

Well, sort of.  You see, this point was made within the larger context of a specific subject, namely advanced analytics. When considered within that topic, amongst all of the decisions to be made that have a large influence on the success of those initiatives, the BI tool does not rank so high.  Indeed, when you consider the data, process, architecture, and algorithmic deployment challenges that demand attention within advanced analytics, this continues to be an entirely sensible standpoint to maintain.

Yeah, like I said – sort of. You see, there are a few considerations that – when taken within the context of your project – should result in some discussion about which BI is best for you. Here are some of those considerations.

Are you pre-packaged or custom?  Are your analytics needs mostly satisfied with pre-built constructs – users turning dials to adjust key dimensions (time, product, geography) – driving them down specific process steps? Or, are your needs more like an empty Bento Box that users want to fill with data from here, there, and everywhere, and then analyze together in a meaningful concoction that only they (and their department) can appreciate? Some might call this use case self-service or data discovery, and as expected most leading BI tools provide for this, but how they do it can differ greatly.

In fact, some BI tools take the modern approach to data discovery as the default journey/view within their user interface, making self-service analytics a core strength. Other tools prefer to maintain a balance between the traditional dashboard construct (pre-packaged model) and the ability to go off and perform self-directed analyses. Having an understanding of which use case holds more importance in your project will guide you to the best choice.

How big is your data? BI tools of yore would, for the most part, write and optimize the SQL needed to return the data requested. That SQL would then be executed against a relational database (most often) and return results – after some time. This approach still applies in some tools today and works well when you have data that spans multiple use cases, domains, or subject areas. The modern approach, however, is to bring the necessary modeled data into memory so that queries run entirely there, avoiding the performance penalty that databases and disk drives often suffer. The memory-optimized model is great as users do not wait long for most queries to return results.  Now, if you have terabytes of data to interrogate, shoving all of that into a single memory model might prove impractical.  Many tools can accommodate both models, but results will differ – so be sure to ask the vendor for real performance metrics.

Stay still, you’re making me dizzy. Well, not literally, but maybe you need to have your analytics running in various form factors: web/desktop, web/mobile, mobile app, tablet.  Most leading vendors claim they support all of these, and that’s true.  But just like we know that store brand cola just won’t cut it in our favorite cocktail (wink), some tools translate better than others in this any device world. If tablet functionality is important, ask the vendor to demonstrate applications in both web and mobile formats so you can be the judge.

Please, sir, I’ll have another.  License, that is. Vendors deploy licensing models that can vary from per user, per seat, per node/core, subscription, and many more. Yes, it can get confusing and costly, especially when you consider named (better for vendor) versus concurrent (better for you) licenses within these models. Get the real numbers around how big your analytics project might become so that you can determine the best approach for you.

So, I confess – there might be a reason to debate BI tools within the context of your analytics project. But please, don’t make this a drag-out knock-down deathmatch thing.  We have bigger fish to fry – such as, pondering whether all this can/should be done in the cloud? Hmm, an interesting topic that one…

This blog was originally published on LinkedIn


Advanced Analytics – I Can Do That, Right?

Let’s set the stage for Mega Corporation (sounds important, very fictious). It seems that Mega leadership has realized that existing dashboards, KPIs, and scorecards are useful but not quite making the grade. The C-suite is all jazzed about this concept of Digital Transformation and want some of that for Mega. Putting aside what that term means, we now have the makings of a new project (we’ll call Titan) that needs definition, scope, talent, and resources. Of course, you are the person for the job as you have run numerous successful analytics projects in the past. Easy pickings, right?

First, you get the development A-team all set, ready to design, code, and deliver this critical project. Then, you gather stakeholder agreement on the problem statement and desired outcomes and agree on budget and resourcing. Finally, you document everything in the business case, functional specifications, design architecture, and countless other artifacts that your thorough Software Development Lifecycle call out. You are ready to pull the trigger on another successful project.

Maybe. Modern analytics projects – often preceded by the adjective advanced – are not going to succeed using the same formula you’ve always used. Certainly, the probability of meeting desired outcomes might not be as high unless proper consideration to emergent capabilities is given. Initially, you might think that the relationship between analytics / advanced analytics is akin to the difference between algebra / honors algebra (sorry if this dredges up awful high school memories). You know, the same material just done faster with a higher degree of difficulty. Nope…think like that and you might be on a path to ruining your perfect project delivery record (side note – if you are perfect so far, you are either running your first project or have only worked in academia where “the college try” counts as a win…but I digress).

Let’s dive into why these projects require additional consideration.

Advanced analytics require high-calorie data diets. Data is the oxygen for any analytics project, we all know that. We also know that for years we’ve collected all sorts of data and analyzed much of it – what’s different now? A lot.

To satisfy the demands of advanced analytics, you need real-time data feeds – not (gasp!) daily. You also need data that often does not reside within the firewall of your enterprise. Think social data, web data, public data, syndicated data, and even sensor data. This assortment must be brought to bare in a sophisticated manner or risk diminishing the impact your project will achieve.

Not all data will require real-time processing, agreed. But in this new world of advanced, the data that lends that spice to your “secret sauce” will most assuredly not age well. Consider your data architecture carefully as it will demand modern approaches to ETL/ELT/acquisition. Indeed, focusing on your data needs and integration strategy will be time well spent.

Don’t argue over BI. Lord, the argument over which tool is best to present the data is way past its prime. Put a pin in it, we’re done with this. Truly, all of the modern leaders in data user interfaces (about a dozen) have pros and cons, depending on what you value most. Despite that fact, all of them can serve you well in delivering compelling visuals, statistics, and actionable information to your customers. None of them, by themselves, will derail the mission.

What happens frequently is the customer (department, division, etc.) has their favorite horse in this race and you (IT) have the “company standard”. If you are lucky, they are one and the same – great, move on. If not, then quickly come to the understanding that picking the “best” BI tool is like picking the best paint color for your car – it’s personal, not performance, related. As long as you go with one of the market leaders, you will be fine (a vocado is not a car color, sorry).

Algos rule the day. Dashboards are nice and scorecards are cool, but we are well beyond that when we talk advanced. Think scoring algorithms, predictive models, and automated learning. If you are really looking to transform Mega Corporation and impress customers, deploy these bad boys to the edge so that they have a real financial impact.

Think about it. Much of what delights (frightens) us about Google and Amazon is their ability to know what we mean or want at any given moment. They’ve long understood that having the best data (see point #1 above) and combining it with the smartest algorithms (models) can often lead to competitive advantage and customer wonderment. Imagine when you deliver algorithms that increase sales margins, improve employee retention, and discover new markets? Yeah, take a bow.

So, don’t burn your old analytics playbook – there’s a lot of good stuff there that still works today. Just modify the recipe a bit to ensure that you are bringing your advanced game.

This blog was originally published on LinkedIn

Contact MSRCosmos

Contact Us