As a newcomer to the UK market, Evolutionary MDM vendor Semarchy has earned the right to sit at the top table with the other so called ‘mega-vendors’ of Data Integration, Data Quality & Master Data Management software such as IBM, Oracle & Informatica.
This is not a self-opinionated claim, rather a result of customer driven selection criteria in response to RFI’s, competitive bids & general success in this market.
To understand their claims in more detail, please refer to this enlightening review of their software platform by a leading independent consulting firm who operate exclusively in the MDM space:-
————————————————————————————————————————————————————
This report is a summary of the considered professional observations for the Semarchy “Convergence for MDM” software product suite, made by the top UK/European independent specialist MDM consulting firm who also represent & use other market leading vendor solutions in this specific sector:
Conclusions
We liked the product. It seems modern, well thought-out, and without legacy baggage.
Compared to the Gartner leading quadrant vendor’s solutions, it’s not easy to compare directly but the others look clunky by comparison.
As a browser delivered product rather than a physical install on each users machine, it does have appeal, and it seems to have all the right elements in all the right places (in terms of a sound architecture, deployment story, developer story, customization and so on).
What we did come away with, is that it would be an ideal product for Local Gov’t, CLA, Health & SME types rather than the corporates of this world – and that’s the market they’re focused on at present.
We were also impressed that they’ve released to the Amazon (web service) marketplace – that effectively gives anyone the ability to start up a machine with it running almost at the press of a button – meaning that it’s ideal for people to “try out” without needing any heavy infrastructure or support.
Supporting detail
- Repository style – imports data into the hub for matching, viewing, editing etc.
- Allows payload data as well
- Delivered as a web application
- Runs on Tomcat with Oracle back end. Quick setup.
- Design part using Eclipse RAP – so has the look and feel of “Eclipse in the Cloud”
- End User part has a modern “Windows 8” styling.
- Basic workflow is:
- Design your data model (entities)
- Design Matchers
- Design Enrichers
- Design Validators
- Design Business Objects (hierarchies of entities, e.g. – Account Manager with Customers)
- Design View / Edit forms
- Design Workflows
- Deploy Model and Application (versioned)
- Version / Snapshot Data as required.
- Repeat
- Data Modelling
- GUI designer built into the tool
- Simple, complex and composite attributes
- Hierarchies of entities
- Has – a
- Is – a
- Parent/child
- Relationships between entities
- Yes! Has a, Is a.
- Publishing events
- There is a Data Integration layer (separate product) which sits over the MDM layer, and can track batches submitted, poll for changes and such.
- http://www.semarchy.com/data-integration/
- The DI layer orchestrates things like push events
- Java API,
- Web Service API (SOAP)
- Not MQ – need to write that using the Java API
- Data Quality reports
- Yes! Using the additional PULSE tool
- http://www.semarchy.com/data-governance/
- Pulse provides “Before and After” views of data quality – so that you can see the improvements that MDM has made
- Pulse is another layer over the core MDM tool
- Pulse Profiling for profiling data before MDM
- Pulse Metrics for reporting on MDM data after matching
- See diagram and dashboard screenshots on website:
- http://www.semarchy.com/en/convergence/pulse/
- Data authoring
- Yes, using custom web forms designed and deployed as part of a versioned “application”
- Full LDAP role integration determining Read/Write/View/Edit functionality at a Field > Entity level, with custom filtering
- Security
- LDAP Integration
- Semarchy only knows about roles, not users
- Can read LDAP properties and store them in variables for use in filters and the SemQL query language
- APIs also participate in the security.
- State Management
- Data has a number of states
- Certified (i.e. it’s participating in the golden view)
- Rejects/Errors
- Create workflow tasks to resolve / resubmit
- Duplicate
- Machine Merged
- User Merged
- User Split
- User actions always override machine, even for future updates.
- Matching strategy
- Simple match (eg, by ID)
- Fuzzy, using: Levenshtein, Jaro-Winkler distance, Name normalization and Soundex
- Can combine multiple fields and algorithms using SemQL language (which essentially gets translated back to Oracle SQL)
- Match tuning capabilities
- There isn’t a match tuning capability, except by trial and error, for example, setting the Levenshtein distance to 65, then 70, then 75 etc… & looking at the output results.
- Internally, they take the generated SQL and use that to tune. The generated SQL is very readable (we saw examples), and with comments identifying the user definable parts.
- Trust / Survivorship across sources
- Yes, lots and custom
- “Custom ranking” allows complex logic, such as “If this field is > 1 year old, trust the other system instead…”, or “If this field is supplied then also use that field”.
- Auditing
- Model audits (who changed what)
- Created By / Updated By + timestamps
- Data steward audits
- Full data lineage.
- Who read / changed what and when, from which source system etc…
- Data lifecycle management
- Soft deletes only.
- Deletes from source systems indicated by a flag.
- API / Interface options
- The DI layer provides SOAP web services
- There is a Java API also available for building custom interfaces as an alternative to the defined “Application”
- The Java API uses the underlying workflow and entities
- their customers have done this – using Vaadin (GWT) to write their own custom app over the top of the Semarchy Java API
- Plugin API for Enrichers & Validators
- g. Google Geocode address
- Java interface, Junit tests, with good documentation
- They also provide training for partners
- “Code” generation from other tools
- The Model itself is represented by XML, and you can transfer a model between environments by importing / exporting this model.
- So, in theory, as long as you are able to create this XML file, you could codegen.
- Deployment to different environments ( DEV / UAT / PROD)
- Versioning
- Both the model and data can be versioned
- This also versions the APIs
- Data can be snapshotted (point in time)
- Allows things like a product catalogue to be snapshotted, so that downstream applications use the current catalogue while upstream data managers maintain the upcoming catalogue.
- The data is viewable using multiple versions (without duplicating the data)
- This means that a downstream application could be using V1 of the data, while another application could be using V2 of the data
- Branching
- If a version is applied with bugs, then you can go back to a previous version, and start changing that instead.
- No destructive changes on the hub.
- Production only accepts “Frozen” versions – no updates in production
- Functional Testing options
- Internally, test is done by having a set of known input records, and a set of expected output records, then a series of scripts to compare the output of MDM with the expected outputs.
- No automated testing built in to the application
- Scalability
- Recommended architecture for clustering etc. (documented on website)
- Essentially, multiple passive instances for end users (taking updates, viewing data etc…), and one active instance for orchestrating match jobs etc
- Talking to an Oracle RAC cluster on the back end.
- Cloud
- Yes – running on AWS, RSD etc…
- They have released to AWS Marketplace
- Press release:
http://www.semarchy.com/news/semarchy-announces-cloud-mdm-success-at-the-gartner-enterprise-information-master-data-management-summit/