I have spent months, if not years trying to prepare contextual data ahead of 3rd party cookie deprecation. The crunch is on and I've ramped up efforts to map first party contextual signals to the main taxonomy adopted across my programmatic service providers - The IAB Content Taxonomy.
The IAB Content Taxonomy is a mash up of segments that range all kinds of content classifications:
Classic editorial segments like "Pets" and "Pop Culture"
News segments like "Elections" or "Celebrity Deaths"
In-market intent for services like "Roadside Assistance" and "Internet Service Providers"
Products like "Party Supplies and Decorations" and "Deodorant and Antiperspirant"
Ad hoc industry rollups like "Advertising Industry" and "Hospitality Industry"
To say the least, it covers a lot of ground but not with consistent depth or accuracy compared to other taxonomy standards. A bit of superficial segment analysis easily reveals this framework's coverage gaps and biases:
Video game segments represent 24 of the 705 categories in the content taxonomy 3.0. That translates to about 3.5% of all segments.
Real Estate should offer a huge menu of advertiser alignment and value but represents just 12 segments.
Law has no granularity beyond the parent category yet there are many areas of law that could reveal information about a user's social, economic and political interests.
A personal beef of mine, there is no detail in the IAB Taxonomy segment for Farming topics (haha see what I did there...). But seriously, out of luck if you are looking to categorize beyond "Agriculture."
Animals are only covered if they are pets. Generally reference publishing is hugely under represented which is ironic given its long history of organizing and classifying information.
I suspect a video game developer with a network of apps across a range of categories uses granular genre descriptors to classify it's portfolio. I also expect a real estate site needs a heck of a lot more to classify it's content than "Houses" vs "Apartments." In fact, their user journey starts with intent - buying vs renting and potentially delves into different property sub-genres distinguished as commercial (high value business context) vs residential (tangental home ownership interests). Real estate apps are not the only type of real estate content online. For example, HowStuffWorks explains Escrow and the Closing Process. This type of content isn't represented in the IAB taxonomy at all; nor is brokering or real estate law.
And yet 3.5% of the taxonomy is dedicated to video game genres...
As publishers, we have a problem. In the current state, our first party data signals are doomed to fail as open market signals - because we don't have a taxonomy that represents the diversity of modern publishing or user interests. To be successful, contextual targeting has to deliver performance for advertisers which means aligning ads with relevant content based on explicit and implied user interests that are classified by a comprehensive, granular taxonomy.
So what happens next:
We all create our own proprietary taxonomies and sell campaigns targeted to our first party data? Don't we wish we had enough scale and relevance to sell 100% of our inventory based on context...
Publisher provided signals fail for lack of coverage and accuracy and Topics emerges as the industry dominant user interest signal - there goes any hope we had for Safari and Firefox.
3rd parties charge us to classify and package our inventory on our behalf even though we should be able to do it ourselves with more accuracy and efficiency than a vendor. Don't even talk to me about scraping publisher data just to sell it back to the publisher...
Publishers didn't self select an industry taxonomy for their content - in typical ad tech fashion our vendors chose for us. I doubt there are many actively maintained, openly published taxonomies that cover the breadth of ad supported content. I have personally asked vendors look at the AdWords taxonomy because it is one of the most heavily invested taxonomies in the world and it is already used at scale to align advertising with content. Whether or not you like where it comes from, it's undeniable that it's robust and is doing the job our industry needs. Ultimately, I suspect most vendors building proprietary taxonomies are in fact secretly co-opting the AdWords taxonomy.
The industry hails contextual as the most important, large scale data solution post cookiepocalypse. I'm afraid in spite of all the hopeful discourse, none of our service providers actually believe publisher first party contextual signals will be adopted by advertisers. They aren't incentivized to help us help ourselves on this front which includes feedback to the IAB on what a successful foundation should look like. Ultimately, publishers are in the middle of another service race and I expect we'll all pay in one form or another for 2nd and 3rd party contextual services (i.e. chatGPT based article scans mapped to some "proprietary" variation of the AdWords taxonomy) - a ridiculous cost given the experts in content creation have grouped and labelled their own content since the beginning of the internet.
Maybe not all contextual is "doomed" but I think once again we are failing as an industry on data democratization, transparency and efficiency - all because we don't have the right words to communicate with specificity what we are publishing. That seems like an easy problem to collectively solve and something we as a publishing community should own. Imagine a world where 3rd party data services were actually held accountable to a publisher's standard of information. Advertisers should be able to do that from the bid stream just by comparing a 3rd party classification to the publisher contextual signal. At the very least, this type of analysis would expose contextual fraud on either side of the fence. Ideally, it would deliver confidence in publisher signal quality and performance and over time eliminate 3rd party fees. To realize these goals, we must have a taxonomy as deep and descriptive as our competitors.
I don't know what the next steps are to do better but I think this issue is important. I'm willing to put some work behind it but I alone can't come up with the words to categorize our entire publishing industry nor am I the right person. Im not a taxonomist or a leader in information science. I do know programmatic and I know the open market will need stronger signals than what we are able to send today...