What “fixing data before S/4HANA migration” looks like as a project plan

June 16, 2026by Divyesh Wani

Most enterprises planning an S/4HANA migration eventually run into the same operational question: what do we do about procurement data?

Over the last two editions of Beyond Procurement, we’ve worked through two realizations most CPOs reach, sometimes after watching a peer’s migration go sideways. 

The first: procurement data work is easier and cheaper to do before the migration than after

Cleaning vendor masters, deduping SKUs, and rebuilding taxonomy mid-migration means doing it under deadline pressure, with the migration partner’s bill running, and with stakeholders watching every slip. Doing it after means inheriting whatever the migration produced and trying to retrofit visibility onto a system already in production.

The second: the clock for this work isn’t the migration date; it’s the 2026 budget cycle.

Migration timelines slip routinely. Budget approval windows don’t. If procurement data work doesn’t have its own line item in the 2026 plan, it ends up funded from whatever’s left over from the migration partner’s program, which, in practice, means it doesn’t get funded at all.

If you’ve reached the same two conclusions, you’re past the why. The useful question now is what the project actually looks like.

We structure it as a 6–8-week procurement-owned program that runs alongside the migration partner’s program rather than within it. The deliverable is spend visibility, usable as soon as it lands, and independent of when the migration actually completes.

Where the migration partner’s data workstream ends, and yours begins

When the migration partner describes a data workstream inside the migration plan, the purpose is narrow: get your records loadable into S/4HANA without rejection errors. Format, length, mandatory presence, and referential integrity. That standard matters for migration mechanics, but procurement reporting needs a different one.

A vendor entered ten times under ten variations of ABC Corp will load cleanly and still show as ten suppliers in your spend cube. A material code attached to a free-text description like Brake Pad Set 45022-SNA-A00 Akebono Front Axle ECN-112 will load cleanly and remain invisible to any category analytics that hasn’t first parsed brand, position, supplier, and revision level.

Procurement’s part of the data problem sits across four domains:

→ Vendor master – One supplier, one record. Legal entity validated, tax ID populated, parent–subsidiary resolved. Typical starting dedup rate runs around 40%. The target is 95% or higher.

→ Material master – Every SKU classified to UNSPSC at the 8-digit commodity level. Free-text collapsed to controlled vocabulary. Industry overlays where UNSPSC is too coarse: POSM for FMCG, sub-assembly hierarchies for automotive, dosage forms for pharma. Typical starting coverage below 20%. Target: 97% or higher.

→ Contract master – Active agreements tied to the correct vendor record, with terms, renewals, and clause references in a queryable form. Most procurement organizations discover during an audit that meaningful commitments live in inboxes, not systems.

→ Transaction history – 12 to 24 months of PO line data, normalized to one currency basis. FX errors resolved. Embedded costs (logistics, tooling, prototype) separated from unit prices so cross-market benchmarking can run on comparable numbers.

None of these match what a migration partner’s data team is scoped or paid to deliver.

The four phases of the data program

Two failure patterns recur in pre-migration data programs.

The first is starting classification before the data has been profiled. The LLM extraction runs on garbage, produces confident-sounding wrong answers, and propagates errors downstream where they’re harder to fix.

The second is running an analytics pass on raw data early to see where we are. The numbers are wrong in non-obvious ways, the CFO sees them, trust erodes, and the program stalls in week three.

The sequence below outlines the methodology used by Powerweave’s data intelligence team across procurement engagements. Each phase is a precondition for the next.

Phase 1: Discovery and understanding (weeks 1–2)

Stakeholder workshops with procurement, IT, finance, and operations, two to three hours per function. System mapping across Coupa, Ariba, SAP, Oracle, and any source of record outside those. Gap analysis against the procurement reporting that the steering committee has asked for, but hasn’t been getting. The deliverable is a written Understanding Document covering the target master schema, controlled vocabularies, the taxonomy standard the program will follow (UNSPSC, eClass, a custom hierarchy, or a hybrid), governance rules, and the KPIs by which the program is measured. Nothing else moves until that document is signed off on.

Phase 2: Data profiling and format detection (week 3)

Automated field profiling runs on actual source extracts, not samples. Every field is classified as Mandatory / Recommended / Optional and scored on a 1–5 data quality scale. Language detection flags multilingual content that will need to be routed to Phase 3. Pattern recognition surfaces the format anomalies that distort spend reports without anyone noticing: FX formula failures (the #REF! and #ERROR! cells), inconsistent date formats, embedded units in numeric fields, and mixed encoding in supplier names. Format errors are resolved at source, where the rule can be set, or queued for transformation in Phase 5, where they can’t be. Gaps that will block downstream analytics are flagged with proposed mitigations: LLM extraction, manual enrichment, or a governance rule.

Phase 3: LLM feature extraction (weeks 4–5)

Free-text fields run through LLM-based feature extraction tuned to your sector’s patterns. POSM type vocabulary (Gondola End, Shelf Tray, Wobbler) for FMCG. OEM part numbers with revision and supersession chains for the automotive industry. NDC, GTIN, INN, and strength/form/pack structures for pharma. Multilingual NLP classification resolves regional variants in the same pass: KopstellingTête de gondole, and Testata di Gondola collapse to a single canonical Gondola End Display across NL, FR, and IT inventories. The model returns structured attributes: brand, type, dimension, regulatory grade, and supplier canonical name. Each is tagged with a confidence score. Low-confidence outputs are queued for human review rather than committed.

Phase 4: Harmonization, dedup, and taxonomy (weeks 6–7)

Now that every record carries structured attributes, deduplication can run on the cleaned fields rather than on fuzzy text. The matching engine works across names, descriptions and codes, with weights tuned per sector. The sector tuning matters: in automotive, part number plus revision level is the deciding key; in FMCG, the part reference unifies the same POSM across language and market variants; in pharma, INN plus strength, form and pack identifies the unique product.

Clusters above the confidence threshold merge into a golden record per item, with missing fields filled in from other records in the cluster. Between 70 and 89% confidence, clusters route to a steward for review. Below that, no merge. Every merge retains a full audit trail: source IDs, the rule that triggered it, the date, and the steward who confirmed it. Silent merges destroy audit trails.

Taxonomy mapping runs in parallel. UNSPSC at the 8-digit commodity level is the default. Where UNSPSC is too coarse (POSM types, automotive sub-assemblies, pharma dosage forms), a custom hierarchy is built alongside. Mappings between UNSPSC, eClass, GPC and internal codes are bi-directional and versioned, so a future system change doesn’t force reclassification from scratch.

Enrichment fills the gaps: country, DUNS, VAT, parent–subsidiary inferred from D&B. External validations against GS1 for GTINs, OEM catalogs for automotive part numbers, ISO codes for country and currency. Compliance flags where relevant — controlled-substance and Ph.Eur. grade for pharma; REACH and RoHS for automotive; FSC and recyclable material for FMCG.

Phase 5: ETL pipeline and governance (week 8, ongoing thereafter)

Clean, harmonized data publishes into the analytics layer through an ETL pipeline that handles repeatable steps: ingest, validate, transform, govern, and publish, not a one-time load. Dashboard-ready pre-aggregated views are built: spend by supplier, category, market, and site. Anomaly alerts run on price, quantity, and FX. The governance KPI report is generated and runs on a monthly cadence going forward. Audit trails, KPI monitoring, and ongoing quality are part of the pipeline, not a separate workstream.

At the end of phase five, measurable output against the targets set in week one. Brand consistency at 100% from a typical baseline of near 60%. Supplier dedup at 95%+ from a baseline near 40%. Taxonomy coverage at 97%+ from a baseline under 20%. FX error rate at zero.

These are numbers a CPO can put in a steering committee paper and defend.

Why the data program should run alongside the migration partner’s program, not inside their scope

Inside the migration partner’s scope, procurement data work competes with the migration schedule for attention. It loses that competition reliably. Migration partner commercial structures are organized around go-live dates rather than spend cubes. When something has to give as deadlines approach, the procurement-specific data work is what gets cut, because cutting it doesn’t slip the migration date.

There’s a competence dimension too. Migration partner data engineers are skilled at structural data work: referential integrity, format conversion, and schema enforcement. That’s a different skill set from procurement-specific data work. Resolving KopstellingTête de gondole, and Testata di Gondola into a single canonical Gondola End Display across NL, FR, and IT inventories isn’t migration data engineering. It’s procurement taxonomy work, executed by people who have previously classified spend in your sector, usually using industry vocabularies (POSM, VDA, ATC, GS1 GPC) that aren’t in a generalist data team’s toolkit.

Two practical conclusions follow.

The work is genuinely parallelizable. The master records, transaction history, taxonomy build, and visibility output do not depend on the migration being live to start, to progress, or to deliver value.

The procurement workstream needs its own scope, its own budget line, and its own name on the steering committee plan. Inside the migration partner’s stream, it doesn’t survive contact with the timeline.

None of this is a critique of any specific migration partner. It’s a structural observation about how their scopes are written and how they execute under deadline pressure.

What the program produces, independent of the migration date

The strongest argument for funding this work in the 2026 budget cycle, rather than waiting for the migration to firm up, is that the deliverable has standalone value.

A global beverage major running this program with Powerweave reached:

  • 100% indirect spend visibility across key suppliers 
  • 22% of previously uncategorized spend identified 
  • 8,496 SKUs standardized across 274 catalogs

None of it was contingent on a migration date. The cleaned catalogs were loaded directly into Coupa and used the following day.

Powerweave dramatically improved our procurement catalogs’ quality and visibility. Their structured process, supplier coordination, and AI-driven approach saved us valuable time and ensured compliance across multiple markets. We now have complete clarity and confidence in our procurement data.

— Procurement Operations Lead, Global Beverage Company

This is the budget case that holds up in a steering committee. The program produces a clean spend cube, audit-ready supplier data, and a category benchmarking baseline in the same quarter the budget is approved.

When the migration does complete, the procurement data going into S/4HANA is already clean, which de-risks the migration itself rather than relying on a final-month data scramble.

The two timelines run on separate clocks. The procurement clock is the shorter one.

Click here to download the full case study here

Where to start

If you want to scope your version of 

  • What your four data domains look like at the first audit
  • Where the deduplication work concentrates in your industry
  • How an 8-week plan maps against your headcount and the 2026 budget calendar

That’s a useful conversation to have. We call it a Pre-migration Data Readiness Review.