Field Notes

Metadata: The Hidden MVP to AI Localization Success

Stephanie Episode 7

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 14:56

Metadata sounds like the boring part of localization until you realize it can be the difference between a scalable operation and a constant fire drill. We get specific about what’s at stake when a major share of a multi-billion-dollar industry goes to coordination, project management overhead, and transactional friction rather than value creation. If you’ve ever felt like your team is moving fast but still not getting ahead, this conversation puts a spotlight on the hidden system underneath the work.

We also unpack where AI fits realistically. AI can summarize messy inputs, assist classification, and spot anomalies or risk patterns across disconnected tools. What it cannot do reliably is act as a deterministic engine for pricing, exact routing, or vendor choice without well-designed rules and clean data. That difference is crucial as translation cost drops and the overhead layer becomes a larger percentage of total spend. The big opportunity shifts to workflow orchestration, connectors, and the metadata that tells systems what something is and what should happen next.

From there, we get practical: start by identifying and defining your critical metadata, beginning with language codes that are often dangerously vague. We talk about tracking where PM and coordinator time is actually consumed, and we explore risk scoring as a metadata field that can route content to MT-only, MT plus AI review, or high-touch human workflows based on probability and consequence. We close with why organizations avoid metadata work (ownership fragmentation, overloaded teams, institutional inertia) and a simple approach to rank metadata categories by risk and variability so you can prioritize cleanup.

If this helped you rethink localization automation and AI orchestration, subscribe, share the episode with a teammate, and leave a quick review. What’s the messiest metadata problem you want to fix first?

Why Metadata Matters

Stephanie Harris-Yee

Hello, I am Stephanie, and I'm here with another episode of Field Notes with Erik Vogt. Now, this episode we're going to be talking about one of those more, I don't know, obscure might be the words, topics of metadata. So, um, Erik, kind of over to you. So

The Billions Lost To Overhead

Stephanie Harris-Yee

I know that you said in the past that metadata is becoming one of the biggest untapped opportunities in localization. So what is really at stake here?

Erik Vogt

As we've talked about before, if you look at a $70 billion industry, and I'm just rounding out here, and we assume that roughly a tenth of it is spent on just consuming transactional overhead. So project management, project coordination, the transaction layer as opposed to the value creation itself, which is often the human in a loop or the technology being deployed. That means there's something like $7 billion being spent on coordination work of some kind. But really, uh metadata becomes one of the most important pieces of the puzzle when we're trying to how to make that seven billion dollar load more efficient. And if we could, if we could make that, let's say, two 30% more efficient, we're talking about a $2 billion potential in our industry. So it's certainly nothing to sneeze at. It's absolutely worth paying attention to.

Stephanie Harris-Yee

So then

Where AI Helps And Fails

Speaker

with that said, where does AI actually fit into that whole picture? So how does AI change the way we think about metadata?

Erik Vogt

Well, I've I've uh been party over my career of watching several ERP systems either be deployed and fail entirely and just got canceled entirely, or they were stripped down so much that the original idea of the data that was being managed wouldn't really deliver the value that they're looking for. So they stripped it down and only delivered a fraction of what the systems were designed to handle. It's uh different systems that don't talk to each other. We talked about connectors before. We talked about how the complexity of these different pieces of information that are spread out all over the place, what is really going on? How can we really address this problem space? This is the complaint that that is driving a lot of the inability of us to get more done faster. And so, how does AI play into this? Well, AI is good at some things and it's not very good at others. And I think one of the one of the things that it's very good at is summarizing things, but it's not deterministic. So it's good at taking blobs of stuff and distilling it down, or taking a lot of ambiguity and bringing clarity to it, but it's not very good at at directing things. So getting the exact pricing calculated for a particular task is very difficult for AI to do reliably. It's also hard to do routing. Almost all the discussion about AI in our industry is about how to make translation more efficient or more accurate. I'm talking about the layers outside of that. So, yes, we are talking about making MT more efficient. Yes, we're talking about cleaning it up, automatic LQAs. All that's very important. That's the essential part of our business. But as the task cost goes down, the transactional friction as a percentage of overhead will go up. That's just if we do things the same way, handling a 20-hour task versus a two-hour task with the same project management overhead, we're going to end up with a massive mismatching of cost allocations. So things like licensing costs, things like the project management time becomes a bigger lift. Now, where does AI fit into this? AI is often highlighted as this sort of magic thing that's going on, but we have to break this down into where exactly it fits into this ecosystem. And I think what it really does well is to do summarization, as I was mentioning earlier. So when we're thinking about that as a problem space, what are we actually summarizing that we can make more efficient? So one of them is what workflow should a request follow? Some systems are designed to look at an asset and summarize what it is, and then pick the right domain that it belongs to. So being able to correctly assess and then tag that information about what this thing is can help us do the routing more efficiently. I think it also can handle things like looking for anomalies. It can identify risk patterns. It's good at that kind of thing, but it's not very good at saying which vendor should I use or which MT model should I use necessarily. It can be a facilitator in the classification step of a process, but you still need to build a logic into the system to make that make that work. The orchestration needs data, it needs access and it needs data. AI can be a component of delivering part of that data or helping to inform the orchestration model as to how to deliver certain output.

Stephanie Harris-Yee

And so I'd imagine with that, cleaner metadata is better. So can you explain a little bit how cleaner metadata and AI lead to more meaningful orchestration and just not the automation hype that we hear?

Erik Vogt

So

Orchestration As Metadata In Motion

Erik Vogt

let's break this down. So the AI orchestration is metadata in motion. So the metadata are containers that are that an object thing, some asset or some segment or whatever. And the metadata can also be a set of rules. So the AI can be an interpreter of that and can help understand it. And then the orchestration is the execution of that. So in order to build this up, you use metadata to build a map, and that's all the instructions that are under the hood. AI can help you navigate, and then orchestration is really the driving of the car through that system. So we could also think about this in a metaphor of uh computer vision, where the data is a bunch of numbers about how far away objects are from the vehicle, from the sensors, and the AI synthesizes all that information into meaningful recommendations, and the orchestration is the actual choice to steer the wheel to the left or to the right to avoid the fire hydrant or the car stop before the car in front of you stops. If you have bad metadata, then AI has the wrong information upon which to make the decisions, and it can misclassify the content, it can pick the wrong MT model, it can it creates work, it creates risk, and it basically automates the wrong steps. So it automates based on the wrong information. So you can think about this, and on the opposite side, which is good metadata plus AI, can help drive self-improving workflows. This is places where AI can help enrich data. And in fact, you can use AI to help enrich the metadata with supervision, obviously. It can help identify missing or inconsistent metadata. So that's a information layer where we're using AI to help find where our systems are lacking in the structured data that it needs for this. And when I'm talking about structured data, I'd like to look, expand the bubble here a little bit because it isn't just the TM, it's structured data, it's source equals target. And then there might be some product information who worked on it now, blah, blah, blah. All that stuff we're used to analyzing. But we also have other data about, say, the thing that we're translating about. So we can make a knowledge graph that says this product has these characteristics. So when translating, make sure that as you're referring to this product, make sure that it refers to this set of structured information about this particular product. There's so much potential here that the tools that are developing data lakes or data that that is informing the translation, it's more than just the segment A equals B. It's all this other information that could potentially enrich that translation layer. But AI can also help with all with the actual meta layer, the layer about the workflow and how the different tools are talking to each other.

Stephanie Harris-Yee

Okay, this

Practical Metadata Hygiene That Works

Stephanie Harris-Yee

might be kind of a two-part question, but first, what does this look like in practice if you're actually going to try to go about this? And then maybe what should people realistically expect when they're doing this? And they're focusing on the metadata and that metadata hygiene with that AI-assisted classification.

Erik Vogt

So, step one is to identify your metadata. And any taxonomist or any kind of structural information needs to identify what it is that needs to be structured. So think about a super simple layer is in the language ID. We run into this already, our industry runs into this problem, very simple metadata structure where we don't even know exactly what we mean by the language codes. Many times localization teams will start off with we want ES, IT, FR, and DE, and we sort of intuitively know, okay, those are general next question: French for Canada, French for France, French for New Guinea, Spanish for Latin America, or just for Puerto Rico. We've had requests for very specific variants. All versions of Spanish aren't the same. And so if everybody started off with structured metadata, say we understand that we need to be precise about what we're classifying, we can start to have a much better conversation about what this actually means. But there's also other types of metadata that we should be talking about, like where is our PM time being consumed? Being able to track internal overhead of coordinator time or project management time or localization engineering time, these are transactional tasks that require a lot of human labor to execute certain steps. In order to measure those, you have to decide that they matter, and then you measure them, and then you can start optimizing them. There's other things. I just saw a presentation at Slater yesterday was talking about risk. That's another layer of metadata that I think we as an industry could do a lot with. So, how risky is it that this is wrong? What are the consequences of failure? So, in this presentation by SAP, they classified it risk in terms of consequence and probability. And that's how risk managers think about things. So, what if all of our projects had a risk value associated with it? And we use that to route things. Like low risk goes to MT only, for example, or maybe medium risk might go through an MT plus AI review and maybe a superficial human monitoring of that system. And maybe a high risk, high touch, would be not touched by MT at all, but be 100% hand on. So a risk that's metadata. We'd expect being able to make better decisions with this information with better data about these systems.

Why Companies Avoid The Work

Stephanie Harris-Yee

So why don't more companies focus on this? It seems like the upside is very large. So what's holding people back?

Erik Vogt

Yeah, I've seen several ERP implementation systems fail, and largely because the amount of data that we have in our industry is massive and complicated and difficult. And I think there's an element of that which is an ergonomic element. Like, how do people interact with this data? How do we collect it? How do we validate that it's real? It's also tedious, it's invisible, there's ownership fragmentation. Like lots of organizations don't really have a core owner of this as a like maybe there's a company with a chief information officer, but many of us don't have the luxury of having a chief information officer that can say, hey, team A, I need this and this from you. Team B, I need this and this from you. All of this stuff fits in with an architecture that all fits together. Also, let's be honest, PMs are overloaded. I think they're generally scheduled at 110 to 140% of their capacity generally. So they're usually working long hours dealing with a lot of uncertainty. They don't have time to really think about this metadata structurally, and they're also reacting to a lot of the other systems that impose metadata on them, such as many different TMSs, many different CMSs, many different LMSs, all these different systems that I've talked about before, the complexity makes it really hard to sort of take a step back and say, what is the system that we can organize this information with? And I think there's kind of a fear of breaking things. There's institutional amnesia and institutional inertia, both of those tend to slow this down. I remember once I inherited um two different project teams who were previously competitors with each other. And they, it's it's funny, the the two teams were working for the exact same client. I mean, literally the exact same subdivision of the exact same client. They had totally different coding mechanisms, totally different units of measurement, totally different ways of structuring things. So, in order to reconcile this, took a couple of years because we had to slowly kind of unpack. If you you could take one system and impose it on the other one, but very often both teams have some influence and they're resisting giving up their preferred methodology. So they end up battling it down to it's a war of attrition internally, as teams are like struggling to hold on to the system that they're comfortable with at the expense of the bigger picture. So, yeah, there's a lot of things that make metadata much harder to deal with, and it's hard for businesses to grapple with this, it's non-trivial.

A Simple Plan To Start

Erik Vogt

However, AI is giving us this interesting new way of looking at things. I used AI to kind of create a matrix of risk versus variability. So I was looking at all the different metadata for all the different sort of ways that we can think about metadata. And then I asked to classify where the variability are, where is the biggest chance that metadata will mismatch? So, language codes, for example, it's catastrophic. If you get the long language code map, you get nowhere. ID, project ID is also very critical. There's others, you know, that can be a little sloppier and maybe a little less risky. Anyway, my recommendation here is anybody who's interested is to take a step back, think about what matters, look at the metadata, maybe do an analysis of how important each of these metadata categories is, use AI to help find or plan for how to collect it or clean it up, and then think about how this could be used in a structural way to either enhance your operational throughput or be able to improve the product of your output, but the actual translation itself, which I think most of our industry is talking a lot about. I'm here to sort of raise the flag on a whole bunch of hidden stuff that we tend to be fretting about in the background, but not necessarily surfacing as a real business problem.

Stephanie Harris-Yee

Thank you again, Eric. And I think this has been a good episode. So we'll see you next time.

Erik Vogt

Thanks, Steph.