IBM Datapalooza Takes Aim At Data Scientist Shortage

IBM announced in June that it has embarked on a quest to create a million new data scientists. It will be adding about 230 of them during its Datapalooza educational event this week in San Francisco, where prospective data scientists are building their first analytics apps.

Next year, it will take its show on the road to a dozen cities around the world, including Berlin, Prague, and Tokyo.

The prospects who signed up for the three-day Datapalooza convened Nov. 11 at Galvanize, the high-tech collaboration space in the South of Market neighborhood, to attend instructional sessions, listen to data startup entrepreneurs, and use workspaces with access to IBM’s newly launched Data Science Workbench and Bluemix cloud services. Bluemix gives them access to Spark, Hadoop, IBM Analytics, and IBM Streams.

Rob Thomas, vice president of product development, IBM Analytics, said the San Francisco event is a test drive for IBM’s 2016 Datapalooza events. “We’re trying to see what works and what doesn’t before going out on the road.”

Thomas said Datapalooza attendees were building out DNA analysis systems, public sentiment analysis systems, and other big data apps.

Read more at IBM Datapalooza Takes Aim At Data Scientist Shortage

Share your opinions in the comment box and subscribe us to get more updates in your inbox.

Share on FacebookShare on Google+Share on LinkedInTweet about this on TwitterEmail this to someone

How can Lean Six Sigma help Machine Learning?

Note that this article was submitted and accepted by KDnuggest, the most popular blog site about machine learning and knowledge discovery.

I have been using Lean Six Sigma (LSS) to improve business processes for the past 10+ year and am very satisfied with its benefits. Recently, I’ve been working with a consulting firm and a software vendor to implement a machine learning (ML) model to predict remaining useful life (RUL) of service parts. The result which I feel most frustrated is the low accuracy of the resulting model. As shown below, if people measure the deviation as the absolute difference between the actual part life and the predicted one, the resulting model has 127, 60, and 36 days of average deviation for the selected 3 parts. I could not understand why the deviations are so large with machine learning.

After working with the consultants and data scientists, it appears that they can improve the deviation only by 10%. This puzzles me a lot. I thought machine learning is a great new tool to make forecast simple and quick, but I did not expect it could have such large deviation. To me, such deviation, even after the 10% improvement, still renders the forecast useless to the business owners.

Read more at How can Lean Six Sigma help Machine Learning?

Leave your comments below and subscribe us to get updates in your inbox.

Share on FacebookShare on Google+Share on LinkedInTweet about this on TwitterEmail this to someone

Great Suppliers Make Great Supply Chains

As an analyst who covers supply chain management (SCM) and procurement practice across industry, I tend to keep my keyboard focused on the disruptive themes that continue to re-define it. That said, if you’re expecting me go on about the unprecedented growth of the SCM solution markets, the accelerated pace of innovation, tech adoption, social change, etc., don’t hold your breath. I can’t, as the data argue otherwise. Too many of us conflate diversification with acceleration –and there’s a difference.

The most notable, defining advances of the last decade (Amazon, Twitter, Google, etc.) share something in common: they do not require consumer investment. If you take those monsters out of the equation and focus on corporate solution environments, the progress, while steady, has not been remarkable. Let’s just say there remains plenty of room for improvement, especially in supply chain and procurement practice areas.

I fell onto this tangent unexpectedly. It happened while interviewing Mr. Dan Georgescu, Ford Motor Company, adjunct Professor of Operations and Supply Chain Management, a highly regarded expert in the field of automotive industry supplier development. “For supply chains to be successful, performance measurement must become a continuous improvement process integrated throughout,” he said. “For a number of reasons, including the fact that our industry is increasingly less vertically integrated, supplier development is absolutely core to OEM performance.”

Read more at Great Suppliers Make Great Supply Chains

If you have any comments about this topic, share it with us below. Subscribe to get updates in your inbox.

Share on FacebookShare on Google+Share on LinkedInTweet about this on TwitterEmail this to someone

One-Page Data Warehouse Development Steps

Data warehouse is the basis of Business Intelligence (BI). It not only provides the data storage of your production data but also provides the basis of the business intelligence you need. Almost all of the books today have very elaborated and detailed steps to develop a data warehouse. However, none of them is able to address the steps in a single page. Here, based on my experience in data warehouse and BI, I summarize these steps in a page. These steps give you a clear road map and a very easy plan to follow to develop your data warehouse.

Step 1. De-Normalization. Extract an area of your production data into a “staging” table containing all data you need for future reporting and analytics. This step includes the standard ETL (extraction, transformation, and loading) process.

Step 2. Normalization. Normalize the staging table into “dimension” and “fact” tables. The data in the staging table can be disposed after this step. The resulting “dimension” and “fact” tables would form the basis of the “star” schema in your data warehouse. These data would support your basic reporting and analytics.

Step 3. Aggregation. Aggregate the fact tables into advanced fact tables with statistics and summarized data for advanced reporting and analytics. The data in the basic fact table can then be purged, if they are older than a year.

Read more at One-Page Data Warehouse Development Steps

What do you think about this topic? Share your opinions below and subscribe us to get updates in your inbox.

 

Share on FacebookShare on Google+Share on LinkedInTweet about this on TwitterEmail this to someone

The Bank of England has a chart that shows whether a robot will take your job

robot jobs

The threat is real, as this chart showing the rise and fall of various jobs historically shows. Agricultural workers were replaced largely by machinery decades ago. Telephonists have only recently been replaced by software programmes. This looks like good news for accountants and hairdressers. Their unique skills are either enhanced by software (accountants) or not affected by it at all (hairdressers).

The BBC website contains a handy algorithm for calculating the probability of your job being robotised. For an accountant, the probability of vocational extinction is a whopping 95%. For a hairdresser, it is 33%. On these numbers, the accountant’s sun has truly set, but the relentless upwards ascent of the hairdresser is set to continue. For economists, like me, the magic number is 15%.

Another data analysis about jobs which will be phased out as time goes. It is an interesting analysis of historical job data. However, after I glanced through the bank report referenced in the article, I am not sure robots are the reason of the job replacement. For example, it could be replaced by cheap labor in foreign countries. The bank report shows only the jobs subject to be phased out due to technology advancement. People could just become productive. So, do not take robots too seriously!

Read more at The Bank of England has a chart that shows whether a robot will take your job

What do you think about this article? Share you opinions in the comment box and subscribe us to get updates.

Share on FacebookShare on Google+Share on LinkedInTweet about this on TwitterEmail this to someone

Big data analytics technology: disruptive and important?

Of all the disruptive technologies we track, big data analytics is the biggest. It’s also among the haziest in terms of what it really means to supply chain. In fact, its importance seems more to reflect the assumed convergence of trends for massively increasing amounts of data and ever faster analytical methods for crunching that data. In other words, the 81percent of all supply chain executives surveyed who say big data analytics is ‘disruptive and important’ are likely just assuming it’s big rather than knowing first-hand.

Does this mean we’re all being fooled? Not at all. In fact, the analogy of eating an elephant is probably fair since there are at least two things we can count on: we can’t swallow it all in one bite, and no matter where we start, we’ll be eating for a long time.

So, dig in!

Getting better at everything

Searching SCM World’s content library for ‘big data analytics’ turns up more than 1,200 citations. The first screen alone includes examples for spend analytics, customer service performance, manufacturing variability, logistics optimisation, consumer demand forecasting and supply chain risk management.

Read more at Big data analytics technology: disruptive and important?

Share your opinions regarding this topic in the comment box below and subscribe us for more updates.

Share on FacebookShare on Google+Share on LinkedInTweet about this on TwitterEmail this to someone

Data Lake vs Data Warehouse: Key Differences

Some of us have been hearing more about the data lake, especially during the last six months. There are those that tell us the data lake is just a reincarnation of the data warehouse—in the spirit of “been there, done that.” Others have focused on how much better this “shiny, new” data lake is, while others are standing on the shoreline screaming, “Don’t go in! It’s not a lake—it’s a swamp!”

All kidding aside, the commonality I see between the two is that they are both data storage repositories. That’s it. But I’m getting ahead of myself. Let’s first define data lake to make sure we’re all on the same page. James Dixon, the founder and CTO of Pentaho, has been credited with coming up with the term. This is how he describes a data lake:

“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.”

And earlier this year, my colleague, Anne Buff, and I participated in an online debate about the data lake. My rally cry was #GOdatalakeGO, while Anne insisted on #NOdatalakeNO. Here’s the definition we used during our debate:

“A data lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed.”

Read more Data Lake vs Data Warehouse: Key Differences

What do you think about this topic? Share your opinions below and subscribe us to get updates in your inbox.

 

Share on FacebookShare on Google+Share on LinkedInTweet about this on TwitterEmail this to someone

INFO GRAPHICS WITH EXCEL

I’m not always the biggest fan of info graphics. Many of the posters-sized info graphics released these days have issues. But lately I’ve also received several requests on how to do info graphics with Excel. Many people don’t know where to start.

How Info Graphics are Different
Info graphics differ somewhat from your usual dashboard-style reporting. When we report with business tools, we use the data points–charts, tables, etc–to investigate a problem or monitor a system. That is, we use data to find results. Info graphics are used when we already know the results and we want to present it in an interesting, sometimes even artistic, way. Info graphics, then, are more about style and appearance–they wouldn’t necessarily find a good home on a dashboard. But they do work well in magazines, newspapers, and some student projects.

Info Graphics and Excel
Many info graphics are made with graphic editing programs like Adobe Illustrator. As far as I know, these illustrations are static. So each change in the underlying data won’t be automatically updated in the graphic. You would just have to redraw the graphic. Excel provides a benefit here: if we use Excel’s charts to make our info graphics, we can update the underlying data and the result appears automatically.

Read more at INFO GRAPHICS WITH EXCEL

Write your opinions about this article below and subscribe us for the latest article in your inbox.

Share on FacebookShare on Google+Share on LinkedInTweet about this on TwitterEmail this to someone

Six signs that your Big Data expert, isn’t

big-data-iceberg-napkin-21-608x608

This is so far the best article that I have been reading about the Big Data. It is what I have been advocating to people.

1. They talk about “bigness” and “data,” rather than “new questions”

… It seems most of the tech industry is completely drunk on “Big Data.”

… most companies are spending vast amounts of money on more hardware and software yet they are getting little, if any, positive business value.

… “Big Data” is a terrible name for the revolution going on all around us. It’s not about Bigness, and it’s not about the Data. Rather, it’s about “new questions,” being facilitated by ubiquitous access to massive amounts of data.

… If all you’re doing is asking the same old questions of bigger amounts of the same old data, you’re not doing “Big Data,” you’re doing “Big Business Intelligence,” which is itself becoming an oxymoron.

Continue reading

Share on FacebookShare on Google+Share on LinkedInTweet about this on TwitterEmail this to someone

10 Web Scraping Tools

Web Scraping tools are specifically developed for extracting information from websites. They are also known as web harvesting tools or web data extraction tools. These tools are useful for anyone trying to collect some form of datafrom the Internet. Web Scraping is the new data entry technique that don’t require repetitive typing or copy-pasting.

These software look for new data manually or automatically, fetching the new or updated data and storing them for your easy access. For example, one may collect info about products and their prices from Amazon using a scraping tool. In this post, we’re listing the use cases of web scraping tools and the top 10 web scraping tools to collect information, with zero coding.

Use Cases of Web Scraping Tools:

  1. Collect Data for Market Research
  2. Extract Contact Info
  3. Look for Jobs or Candidates
  4. Track Prices from Multiple Markets

Tools:

  1. Import.io
  2. Webhose.io
  3. CloudScrape
  4. Scrapinghub
  5. ParseHub
  6. VisualScraper
  7. Spinn3r
  8. 80legs
  9. Scraper
  10. OutWit Hub

Read more at 10 Web Scraping Tools

What do you think about this topic? Leave your comment below and subscribe us to get updates.

Share on FacebookShare on Google+Share on LinkedInTweet about this on TwitterEmail this to someone