Avoiding Data White Elephants

As an IT professional working in the data management space (dare I say with Big Data) you currently face some particularly challenging & critical technology choices that can have a profound impact on your business operations so you need to avoid any more white elephants: Wikipedia, “A white elephant is a possession which its owner cannot dispose of and whose cost, particularly that of maintenance, is out of proportion to its usefulness.” – Does that ring any bells with your incumbent technology stack? Then read on.

It was my respected competitor Talend’s CMO that prompted me to write this post when he claimed: “Talend is the only one-stop shop for a modern, agile, integration platform (with all the bell & whistles) at a great price.” LinkedIn 13/8/15 which whilst being understandably narcissistic is clearly disingenuous.

A more honest approach will deliver a long & rewarding career, so allow me to set the record straight in a broader market context: As you know, every technology choice available to you has dependencies which fall into two categories; ‘outbound’ & ‘inbound’ with a question associated with each; namely “Can we get rid of this stuff without taking our business down with it?” & “What’s the catch if we get this new stuff in?” With regard to the latter, there is a bunch of really good new stuff out there but it comes with a heap of pre-requisites plus downstream costs which you’re not told about before you sign up for it!

For example why change your data store to get high performance analytic capability, any BA or IA will know you can use what you’ve got if you build the right architecture & data platform around it. Feel sorry for those that live in hope that Hadoop or SAP HANA will cure their problems, the recent 30-day trial offer is like giving Crossrail a free JCB for a month to get the Underground kick started?! Don’t you just have to love your SI for that very reason, yeah right they love you for the ratio of services to software they make out of your struggles. You really can turn that ‘norm’ on it’s head with a modern data management platform but be diligent with your choice.

There is no one stop for all your needs but what’s really needed here is an option that has zero dependency on additional components is non-intrusive from both technology and business process aspects, covers not just Data Integration but Data Quality, Master & Reference Data Management & runs anywhere (on-premise/cloud) with no mods’ plus can be implemented with incremental results in an Agile sprint like way that has an SME price tag with no multi-year TCO sting-in-the-tail like the aforementioned.

OK Sherlock, where can this be found? – Request a Proof of Value PoV for the Semarchy “Evolutionary Data Convergence Platform” & you’ll get palpable results in days – we use our own technology to run Semarchy the company, ask our CEO he’ll be happy to demo how Semarchy drink their own champagne – this is no gimmick, we’re refreshingly honest brokers – you can ask our customers…The final word

BI & Analytics

BI & Analytics is like decorating…

Imagine you decide to renovate your house…

You (or your spouse) believe your house just needs a fresh coat of paint so you spend a lot of time choosing the colour, quality & price of your paint & finally decide on a practical compromise – then you crack on with the job, only to find that the woodwork is rotten, the roof & gutters are leaking and actually you really need some new doors & windows!? You dig deep into your pockets, swallow your pride & pay someone else (contractor) to do the now significant job – sound familiar?

Well, at work you may probably run the same gambit with slightly different livery:-

Imagine you (or your boss) have read the latest Gartner or whatever report that the top investment priority in business today is rightly BI & Analytics, so you decide to get in the latest & greatest reporting suite known in the software world after a lengthy & costly evaluation of all the weird & wonderful products out there with names that stretch your imagination. You finally get them installed & working after negotiating a rocket science payment contract because you (or your boss) chose to run on-premise & in the cloud because corporate paranoia (legal) policy dictates only the most complicated terms will suffice – only to find that the data you want to present in the slick new dashboard is incomplete, out of date or not available in the right format i.e. your data plumbing is broken or non-existent.

Oh well, you then get authorisation to go out to tender for a complete new data warehouse with associated integrated (aka proprietary) slicing & dicing accessories. However this has to be done by your incumbent SI as it’s too big for you to handle & they have a stranglehold on your senior management anyway because they’re paid too much to think for themselves plus they’re in bed with a software mega-vendor. Eventually you get the whole thing completed just in time for a management reshuffle when the new boss coming in from a competitor realises that the business model needs to be adapted to take advantage of multi-channel, e-commerce, digital media or whatever trend & you’re the one who has to give the bad news that your systems are so inflexible or costly to maintain that you need a new one, whereupon you’re told to just turn this ‘pigs ear into a silk purse’ with some fancy selective reporting techniques & so the whole cycle begins again like painting the Forth Bridge!?

This is not such a far fetched story as I’ve lived through similar project scenarios & I’m sure you have – break the cycle of deprivation – get a decent software platform like Semarchy Convergence Suite & you’ll be amazed how much longer your career will last…

Peter

Uncategorized

Who Owns Enterprise Anaytics & Data?

Following an Informatica Blog by Myles Suer relating to CFO Magazine Article by Frank Friedman here’s my sequel:-

Frank Friedman clearly delivers good reasons for his nomination of the CFO for ownership of this critical issue which is made highly visible by it’s very nature.
His rationale is based on classic experience & a deterministic approach typical of an accountant (even if he’s not) so it stops short of the limitations that every company will face once the basic steps of BI Reporting & Analytics have inevitably been taken.
After all, who can’t read a chart or use a spreadsheet? – everyone knows the world runs on Excel charts so these skills are a prerequisite to being in business not just the preserve of CFO’s.
The point is, it’s really easy to do the bean-counting which usually results in “paralysis by analysis” & does not address the fundamental next step which is to do something about it to improve the business – this requires an effective feedback/correction mechanism that will be non-trivial & require a deep understanding of the entire business value chain. (every engineer knows that a measurement dashboard is easy to put in place, the hard bit is establishing a closed-loop feedback control system to keep the system stable & improve performance)
Any company will be extremely lucky if there is such a business savvy individual in the organisation with the knowledge and authority to fix things & even luckier if the multiple individuals that are needed to cover the various processes can communicate/work well together.
Not wishing to over simplify the issue, those organisations that focus on “generation” rather than just “inspection or analysis” of their business are the ones that proliferate & easily out perform their competition by providing a creative environment in which to leverage their most valuable assets, their skilled & talented people with specific end-to-end experience of the business they are in…
My vote goes to those who are not only willing to challenge the norm but provide solutions using modern software tools & stand up to be counted in the CFO’s spreadsheet!
Peter
Business Engineer

Uncategorized

If Data Projects Weather why not Corporate Revenue?

Further to an Informatica blog by Stephan Zoder http://blogs.informatica.com/perspectives/2014/10/29/if-data-projects-weather-why-not-corporate-revenue/#fbid=syxgmrw49EV drawing parallels between weather forecasting & the business forecast by corporates – my sequel suggests it’s not just about the data:-

Forecasters of weather crunch data in a model that is constantly refined both in terms of the quality, volume & accuracy of the data used but more importantly they make significant efforts to refine & improve the granularity of the model itself.
That can’t be said of (m)any corporates trying to forecast revenue – Excel has it’s limitations!

Predicting weather is based largely on measurable scientific & logical parameters, the same can’t be said about forecasting revenue, the most influential ‘weighting factors’ are biased much more towards the emotional – such as “loyalty”, “preferred supplier” & “budget allocation” which depend on people with allegencies, egos & political constraints that do not fit into any spreadsheet and remember this equation has two sides: internal & external so it’s doubly volatile & exponentially unstable. As a result there is usually no connection between the folks that do the revenue generation & those that count the results, this mismatch correlates directly to a flawed model – it’s a very rare animal that can combine any permutation of hard skills (technology, accounting, product development) with soft skills (HR, sales, management) so no doubt this syndrome is only set to proliferate.

There are clearly some vague analogies with the weather but the comparison ends at a point that companies should easily recognise but fail to modify their approach – that is: each time they get their forecast wrong they end up firing a whole lot of valuable FTE’s (are they considered as real people?) OR panic hire a bunch of rookies ready to ride the next chunder from development (aka sows’ ear turned into silk purse for FYXX kick-off) or the next wave of IT rhetoric around Data (white) Elephants or whatever – they do not analyse where mistakes were made, assumptions need to be narrowed & the model needs to have critical components added.
If they did, they would find that most such errors emanate from the soft squishy bits loosely called management that dictate the flawed guidelines & inadequate spreadsheets that occupy their introverted daily lives with a focus on margin & utilisation rather than more relevant “feelgood” or “creativity” factors. If they would just focus on ‘generation’ rather than ‘inspection’ the results would take care of themselves then who would care about the forecast only that it’s going up!

Given good leadership that is firmly in touch with what the company is all about at it’s customer facing sharp end, most of these ailments would give way to innovation, expansion & a healthy self-generating work ethos.
In summary, I’ll wager that every listed company will “forecast less business more accurately” towards the end of each & every quarter until they go broke profitably…it doesn’t take a rocket scientist to work out what they should have done instead given the right tools.

Peter
A not at all cynical businessman having worked for the likes of Steve Jobs & witnessed what it takes to do good things

Uncategorized

Semarchy comes of age

As a newcomer to the UK market, Evolutionary MDM vendor Semarchy has earned the right to sit at the top table with the other so called ‘mega-vendors’ of Data Integration, Data Quality & Master Data Management software such as IBM, Oracle & Informatica.

This is not a self-opinionated claim, rather a result of customer driven selection criteria in response to RFI’s, competitive bids & general success in this market.

To understand their claims in more detail, please refer to this enlightening review of their software platform by a leading independent consulting firm who operate exclusively in the MDM space:-

————————————————————————————————————————————————————

This report is a summary of the considered professional observations for the Semarchy “Convergence for MDM” software product suite, made by the top UK/European independent specialist MDM consulting firm who also represent & use other market leading vendor solutions in this specific sector:

Conclusions

We liked the product. It seems modern, well thought-out, and without legacy baggage.
Compared to the Gartner leading quadrant vendor’s solutions, it’s not easy to compare directly but the others look clunky by comparison.
As a browser delivered product rather than a physical install on each users machine, it does have appeal, and it seems to have all the right elements in all the right places (in terms of a sound architecture, deployment story, developer story, customization and so on).

What we did come away with, is that it would be an ideal product for Local Gov’t, CLA, Health & SME types rather than the corporates of this world – and that’s the market they’re focused on at present.

We were also impressed that they’ve released to the Amazon (web service) marketplace – that effectively gives anyone the ability to start up a machine with it running almost at the press of a button – meaning that it’s ideal for people to “try out” without needing any heavy infrastructure or support.

Supporting detail

Repository style – imports data into the hub for matching, viewing, editing etc.
Allows payload data as well
Delivered as a web application
Runs on Tomcat with Oracle back end. Quick setup.
Design part using Eclipse RAP – so has the look and feel of “Eclipse in the Cloud”
End User part has a modern “Windows 8” styling.
Basic workflow is:
Design your data model (entities)
Design Matchers
Design Enrichers
Design Validators
Design Business Objects (hierarchies of entities, e.g. – Account Manager with Customers)
Design View / Edit forms
Design Workflows
Deploy Model and Application (versioned)
Version / Snapshot Data as required.
Repeat
Data Modelling
GUI designer built into the tool
Simple, complex and composite attributes
Hierarchies of entities
Has – a
Is – a
Parent/child
Relationships between entities
Yes! Has a, Is a.
Publishing events
There is a Data Integration layer (separate product) which sits over the MDM layer, and can track batches submitted, poll for changes and such.
http://www.semarchy.com/data-integration/
The DI layer orchestrates things like push events
Java API,
Web Service API (SOAP)
Not MQ – need to write that using the Java API
Data Quality reports
Yes! Using the additional PULSE tool
http://www.semarchy.com/data-governance/
Pulse provides “Before and After” views of data quality – so that you can see the improvements that MDM has made
Pulse is another layer over the core MDM tool
Pulse Profiling for profiling data before MDM
Pulse Metrics for reporting on MDM data after matching
See diagram and dashboard screenshots on website:
http://www.semarchy.com/en/convergence/pulse/
Data authoring
Yes, using custom web forms designed and deployed as part of a versioned “application”
Full LDAP role integration determining Read/Write/View/Edit functionality at a Field > Entity level, with custom filtering
Security
LDAP Integration
Semarchy only knows about roles, not users
Can read LDAP properties and store them in variables for use in filters and the SemQL query language
APIs also participate in the security.
State Management
Data has a number of states
Certified (i.e. it’s participating in the golden view)
Rejects/Errors
Create workflow tasks to resolve / resubmit
Duplicate
Machine Merged
User Merged
User Split
User actions always override machine, even for future updates.
Matching strategy
Simple match (eg, by ID)
Fuzzy, using: Levenshtein, Jaro-Winkler distance, Name normalization and Soundex
Can combine multiple fields and algorithms using SemQL language (which essentially gets translated back to Oracle SQL)
Match tuning capabilities
There isn’t a match tuning capability, except by trial and error, for example, setting the Levenshtein distance to 65, then 70, then 75 etc… & looking at the output results.
Internally, they take the generated SQL and use that to tune. The generated SQL is very readable (we saw examples), and with comments identifying the user definable parts.
Trust / Survivorship across sources
Yes, lots and custom
“Custom ranking” allows complex logic, such as “If this field is > 1 year old, trust the other system instead…”, or “If this field is supplied then also use that field”.
Auditing
Model audits (who changed what)
Created By / Updated By + timestamps
Data steward audits
Full data lineage.
Who read / changed what and when, from which source system etc…
Data lifecycle management
Soft deletes only.
Deletes from source systems indicated by a flag.
API / Interface options
The DI layer provides SOAP web services
There is a Java API also available for building custom interfaces as an alternative to the defined “Application”
The Java API uses the underlying workflow and entities
their customers have done this – using Vaadin (GWT) to write their own custom app over the top of the Semarchy Java API
Plugin API for Enrichers & Validators
g. Google Geocode address
Java interface, Junit tests, with good documentation
They also provide training for partners
“Code” generation from other tools
The Model itself is represented by XML, and you can transfer a model between environments by importing / exporting this model.
So, in theory, as long as you are able to create this XML file, you could codegen.
Deployment to different environments ( DEV / UAT / PROD)
Versioning
Both the model and data can be versioned
This also versions the APIs
Data can be snapshotted (point in time)
Allows things like a product catalogue to be snapshotted, so that downstream applications use the current catalogue while upstream data managers maintain the upcoming catalogue.
The data is viewable using multiple versions (without duplicating the data)
This means that a downstream application could be using V1 of the data, while another application could be using V2 of the data
Branching
If a version is applied with bugs, then you can go back to a previous version, and start changing that instead.
No destructive changes on the hub.
Production only accepts “Frozen” versions – no updates in production
Functional Testing options
Internally, test is done by having a set of known input records, and a set of expected output records, then a series of scripts to compare the output of MDM with the expected outputs.
No automated testing built in to the application
Scalability
Recommended architecture for clustering etc. (documented on website)
Essentially, multiple passive instances for end users (taking updates, viewing data etc…), and one active instance for orchestrating match jobs etc
Talking to an Oracle RAC cluster on the back end.
Cloud
Yes – running on AWS, RSD etc…
They have released to AWS Marketplace
Press release:

http://www.semarchy.com/news/semarchy-announces-cloud-mdm-success-at-the-gartner-enterprise-information-master-data-management-summit/

"Evolutionary MDM"

Data Management Made Easy

Avoiding Data White Elephants

BI & Analytics is like decorating…

Who Owns Enterprise Anaytics & Data?

If Data Projects Weather why not Corporate Revenue?

Semarchy comes of age