Data Management | "Evolutionary MDM"

As an IT professional working in the data management space (dare I say with Big Data) you currently face some particularly challenging & critical technology choices that can have a profound impact on your business operations so you need to avoid any more white elephants: Wikipedia, “A white elephant is a possession which its owner cannot dispose of and whose cost, particularly that of maintenance, is out of proportion to its usefulness.” – Does that ring any bells with your incumbent technology stack? Then read on.

It was my respected competitor Talend’s CMO that prompted me to write this post when he claimed: “Talend is the only one-stop shop for a modern, agile, integration platform (with all the bell & whistles) at a great price.” LinkedIn 13/8/15 which whilst being understandably narcissistic is clearly disingenuous.

A more honest approach will deliver a long & rewarding career, so allow me to set the record straight in a broader market context: As you know, every technology choice available to you has dependencies which fall into two categories; ‘outbound’ & ‘inbound’ with a question associated with each; namely “Can we get rid of this stuff without taking our business down with it?” & “What’s the catch if we get this new stuff in?” With regard to the latter, there is a bunch of really good new stuff out there but it comes with a heap of pre-requisites plus downstream costs which you’re not told about before you sign up for it!

For example why change your data store to get high performance analytic capability, any BA or IA will know you can use what you’ve got if you build the right architecture & data platform around it. Feel sorry for those that live in hope that Hadoop or SAP HANA will cure their problems, the recent 30-day trial offer is like giving Crossrail a free JCB for a month to get the Underground kick started?! Don’t you just have to love your SI for that very reason, yeah right they love you for the ratio of services to software they make out of your struggles. You really can turn that ‘norm’ on it’s head with a modern data management platform but be diligent with your choice.

There is no one stop for all your needs but what’s really needed here is an option that has zero dependency on additional components is non-intrusive from both technology and business process aspects, covers not just Data Integration but Data Quality, Master & Reference Data Management & runs anywhere (on-premise/cloud) with no mods’ plus can be implemented with incremental results in an Agile sprint like way that has an SME price tag with no multi-year TCO sting-in-the-tail like the aforementioned.

OK Sherlock, where can this be found? – Request a Proof of Value PoV for the Semarchy “Evolutionary Data Convergence Platform” & you’ll get palpable results in days – we use our own technology to run Semarchy the company, ask our CEO he’ll be happy to demo how Semarchy drink their own champagne – this is no gimmick, we’re refreshingly honest brokers – you can ask our customers…The final word

As a newcomer to the UK market, Evolutionary MDM vendor Semarchy has earned the right to sit at the top table with the other so called ‘mega-vendors’ of Data Integration, Data Quality & Master Data Management software such as IBM, Oracle & Informatica.

This is not a self-opinionated claim, rather a result of customer driven selection criteria in response to RFI’s, competitive bids & general success in this market.

To understand their claims in more detail, please refer to this enlightening review of their software platform by a leading independent consulting firm who operate exclusively in the MDM space:-

————————————————————————————————————————————————————

This report is a summary of the considered professional observations for the Semarchy “Convergence for MDM” software product suite, made by the top UK/European independent specialist MDM consulting firm who also represent & use other market leading vendor solutions in this specific sector:

Conclusions

We liked the product. It seems modern, well thought-out, and without legacy baggage.
Compared to the Gartner leading quadrant vendor’s solutions, it’s not easy to compare directly but the others look clunky by comparison.
As a browser delivered product rather than a physical install on each users machine, it does have appeal, and it seems to have all the right elements in all the right places (in terms of a sound architecture, deployment story, developer story, customization and so on).

What we did come away with, is that it would be an ideal product for Local Gov’t, CLA, Health & SME types rather than the corporates of this world – and that’s the market they’re focused on at present.

We were also impressed that they’ve released to the Amazon (web service) marketplace – that effectively gives anyone the ability to start up a machine with it running almost at the press of a button – meaning that it’s ideal for people to “try out” without needing any heavy infrastructure or support.

Supporting detail

Repository style – imports data into the hub for matching, viewing, editing etc.
Allows payload data as well
Delivered as a web application
Runs on Tomcat with Oracle back end. Quick setup.
Design part using Eclipse RAP – so has the look and feel of “Eclipse in the Cloud”
End User part has a modern “Windows 8” styling.
Basic workflow is:
Design your data model (entities)
Design Matchers
Design Enrichers
Design Validators
Design Business Objects (hierarchies of entities, e.g. – Account Manager with Customers)
Design View / Edit forms
Design Workflows
Deploy Model and Application (versioned)
Version / Snapshot Data as required.
Repeat
Data Modelling
GUI designer built into the tool
Simple, complex and composite attributes
Hierarchies of entities
Has – a
Is – a
Parent/child
Relationships between entities
Yes! Has a, Is a.
Publishing events
There is a Data Integration layer (separate product) which sits over the MDM layer, and can track batches submitted, poll for changes and such.
http://www.semarchy.com/data-integration/
The DI layer orchestrates things like push events
Java API,
Web Service API (SOAP)
Not MQ – need to write that using the Java API
Data Quality reports
Yes! Using the additional PULSE tool
http://www.semarchy.com/data-governance/
Pulse provides “Before and After” views of data quality – so that you can see the improvements that MDM has made
Pulse is another layer over the core MDM tool
Pulse Profiling for profiling data before MDM
Pulse Metrics for reporting on MDM data after matching
See diagram and dashboard screenshots on website:
http://www.semarchy.com/en/convergence/pulse/
Data authoring
Yes, using custom web forms designed and deployed as part of a versioned “application”
Full LDAP role integration determining Read/Write/View/Edit functionality at a Field > Entity level, with custom filtering
Security
LDAP Integration
Semarchy only knows about roles, not users
Can read LDAP properties and store them in variables for use in filters and the SemQL query language
APIs also participate in the security.
State Management
Data has a number of states
Certified (i.e. it’s participating in the golden view)
Rejects/Errors
Create workflow tasks to resolve / resubmit
Duplicate
Machine Merged
User Merged
User Split
User actions always override machine, even for future updates.
Matching strategy
Simple match (eg, by ID)
Fuzzy, using: Levenshtein, Jaro-Winkler distance, Name normalization and Soundex
Can combine multiple fields and algorithms using SemQL language (which essentially gets translated back to Oracle SQL)
Match tuning capabilities
There isn’t a match tuning capability, except by trial and error, for example, setting the Levenshtein distance to 65, then 70, then 75 etc… & looking at the output results.
Internally, they take the generated SQL and use that to tune. The generated SQL is very readable (we saw examples), and with comments identifying the user definable parts.
Trust / Survivorship across sources
Yes, lots and custom
“Custom ranking” allows complex logic, such as “If this field is > 1 year old, trust the other system instead…”, or “If this field is supplied then also use that field”.
Auditing
Model audits (who changed what)
Created By / Updated By + timestamps
Data steward audits
Full data lineage.
Who read / changed what and when, from which source system etc…
Data lifecycle management
Soft deletes only.
Deletes from source systems indicated by a flag.
API / Interface options
The DI layer provides SOAP web services
There is a Java API also available for building custom interfaces as an alternative to the defined “Application”
The Java API uses the underlying workflow and entities
their customers have done this – using Vaadin (GWT) to write their own custom app over the top of the Semarchy Java API
Plugin API for Enrichers & Validators
g. Google Geocode address
Java interface, Junit tests, with good documentation
They also provide training for partners
“Code” generation from other tools
The Model itself is represented by XML, and you can transfer a model between environments by importing / exporting this model.
So, in theory, as long as you are able to create this XML file, you could codegen.
Deployment to different environments ( DEV / UAT / PROD)
Versioning
Both the model and data can be versioned
This also versions the APIs
Data can be snapshotted (point in time)
Allows things like a product catalogue to be snapshotted, so that downstream applications use the current catalogue while upstream data managers maintain the upcoming catalogue.
The data is viewable using multiple versions (without duplicating the data)
This means that a downstream application could be using V1 of the data, while another application could be using V2 of the data
Branching
If a version is applied with bugs, then you can go back to a previous version, and start changing that instead.
No destructive changes on the hub.
Production only accepts “Frozen” versions – no updates in production
Functional Testing options
Internally, test is done by having a set of known input records, and a set of expected output records, then a series of scripts to compare the output of MDM with the expected outputs.
No automated testing built in to the application
Scalability
Recommended architecture for clustering etc. (documented on website)
Essentially, multiple passive instances for end users (taking updates, viewing data etc…), and one active instance for orchestrating match jobs etc
Talking to an Oracle RAC cluster on the back end.
Cloud
Yes – running on AWS, RSD etc…
They have released to AWS Marketplace
Press release:

http://www.semarchy.com/news/semarchy-announces-cloud-mdm-success-at-the-gartner-enterprise-information-master-data-management-summit/

"Evolutionary MDM"

Data Management Made Easy

Tag Archives: Data Management

Avoiding Data White Elephants

Semarchy comes of age