Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concerns from adserver perspective #20

Closed
janwinkler opened this issue Aug 16, 2019 · 3 comments
Closed

Concerns from adserver perspective #20

janwinkler opened this issue Aug 16, 2019 · 3 comments
Assignees

Comments

@janwinkler
Copy link

Hello everyone,

as an adserving company we develop software which allows our clients to track conversions. We understand that Privacy is important and therefore like to contribute to this initiative here. Anyhow, the proposal seems to not cover the current market situation on how tracking is done. Here are the main issues we see with the proposal:

  1. In the example it is outlined that a website would trigger the click to a shop and the browser would later report the conversion back to the website. That is not how the advertising industry works. In a normal advertising scenario there will be a construct like:
  • Website implements adcode of its adserver
  • AdServer delivers creative code from SSP
  • SSP delivers creative code from DSP
  • DSP delivers creative code from Agency Adserver
  • Agency AdServer delivers creative code from advertiser adserver
  • Advertiser adserver delivers the actual ad
    In short: In order to display an ad there will always be (much) more than only two parties involved. In most cases, the website itself doesnt care about the conversions happening. In most cases it will be the agency and the advertiser, sometimes the dsp, who want to track the conversion.
    Hence: The attribution needs to be able to cover more than one conversion targets.
  1. In order to track a conversion there is always at least 2 information necessary: a) where did the click happen (the placement on the website, e.g. if its a banner on the top of the page, left side, right side, native ad, ...) and b) on which creative was clicked (in most cases a campaign would include many creatives). In most cases there will even be a third identifier necessary (e.g. for the advertiser to identify which website the click was on).
    Hence: Having only one campaign identifier is not sufficient for tracking. As it would only tell which campaign triggered the conversion, but not which creative, which placement or which website.

  2. In most cases adtech companies work with incrementing numbers for all identifiers. this means that a normal adtech solution uses IDs that are much higher than 64. limiting the identifyer to 64 is not sufficient for tracking. We recommend to increase the number to at least 65k (16 bit). We understand that numbers this high are a potential privacy risk and are open to discuss lower numbers. Anyhow, a limitation to 6bits is tooooo little for any tracking logic as most adtech companies deal with thousands of live campaigns, websites, placements and creatives.

  3. Conversion tracking is not only limited to clicks but also to impressions. It is important to be able to track if a user that saw an ad that converted in a sale (although he did not click on the ad). In many cases we see that users see an ad and will then later search for the brand name and buy something. Advertisers need to be able to attribute this sale to the impression.

  4. Using HTML attributes seems inflexible and involves many changes on publisher side. At the same time it does not allow for click conversions via javascript and multiple redirects.

  5. In order to prevend conversion-fraud it is necessary for the advertiser to report order-numbers or similar identifiers to its reporting system so it can later check which orders were cancelled by the buyer and which were paid.

  6. in some cases it is essential for an advertiser to count only one conversion - in other cases it is essential to count all conversions of the same user. hence it should not be up to the browser to decide which conversion is fired and which not.

  7. it is essential for the advertiser to understand what kind of conversion happend. e.g. if it was a sale, a submit of a form or any other event. therefore the conversion needs to get a "type" or some kind of identifier telling the advertiser about the conversion itself.

Our proposal:
In order to cover the above issues, we propose to go a different direction:

  • All links, redirects and tracking mechanisms stay in place as they are. Publishers, marketers and advertisers would not need to change any codes.
  • Browsers candecide to block cookies and similar technologies in order to prevent user-identifyable tracking
    In order to enable tracking we propose a combination of html and custom http headers:
  1. Once a click happens, the user is redirected through multiple adservers until he reaches the shop.
  2. each adserver can pass a number of custom headers with the http redirect. for example:
    x-tracking-campaign: 12345
    x-tracking-creative: 52736
    x-tracking-placement: 59999
    x-tracking-info: 72517
  3. the browser will save this tracking information along with the domain of the server that send these headers. it is essential that the browser will store all tracking infos for all campaign IDs for a certain amount of time (7days).
    Note: it should also be possible for an adserver to simply fire a pixel that, when loaded by the browser, sends these http headers. this way the adserver can enable post-view conversion tracking in case no click happens. in addition to that, there should be a way for the adserver to tell the browser how view-data vs. click-data is treated (e.g. if click is more important than impression or if the first tracked event is more important than the last or if the last event is more important than the first regardless of the type, ...)
  4. each time a conversion happens on the page, the conersion will be measured via 1x1 pixels. these pixels will contain an additional html-attributs (e.g. ad-tracking-campaign="12345" ad-track-conversion-type="7261").
  5. the browser will see that there is a pixel in the page that contains the ad-tracking-campaign attribut and can see the domain of the pixel that should be loaded.
  6. the browser will block the pixel on the site in order to prevent direct association of IP+time with the click that happend before. instead the browser will add a conversion event to its list and fire it later (24-48 hours). the list will typically contain domain, campaign-id, conversion type and the data from the click or impression event that caused the matching.
  7. once the conversion event should be fired, the browser will call the domain from the pixel that was blocked before and send the http headers associated with the campaign id 12345.
    example: if the pixel was <img src="https://www.myadserver.com/abc/def/something" ad-tracking-campaign="12345" ad-track-conversion-type="7261" ..> the browser will send the following HTTP GET to www.myadserver.com via HTTPS:
    GET /.well-known/ad-tracking/conversion-7261
    x-tracking-campaign: 12345
    x-tracking-creative: 52736
    x-tracking-placement: 59999
    x-tracking-info: 72517
    (alterative: all ids could be send via the http request url)

The advantages of this proposal is, that it would be fully compatible with all browsers, whether they support it or not: A browser that does not support this feature, will still fire the normal pixels and tracking will still work as before. a browser that supports the feature will block the tracking pixels and use the privacy-tracking logic.

additional privacy considerations:
in order to prevent fingerprinting, we recommend to limit the use of subdomains: An adserver could use the subdomain on a short term conversion to transfer user identification e.g. if the click url includes the domain user8122531.ads.adserver.com. on the conversion page, the adserver would typically use a script in order to write the tracking pixel into the page. the adserver could associate the ip of the user to the domain and the script could now write a tracking pixel with the same domain into the page. a later call to this tracking pixel would still reveal the user's full ID and therefore must be blocked to keep the privacy. hence we recommend to only allow a certain amount of characters per subdomain or even only whitelist certain subdomains (in this case ads.adserver.com)

@johnwilander
Copy link
Collaborator

johnwilander commented Aug 20, 2019

Hi and thanks for filing your concerns!

First, there are reasons for not sending attribution data to third parties:

  • The user perspective. Users need to have a reasonable chance of understanding to whom data is shared about their activities on the web, even if there are privacy preserving protections in place. Users don't know about the numerous third parties that are involved in online ads. What they do know is that they visited news.example or search.example and clicked/tapped an ad there to go to shop.example.
  • First party control. We've already discussed in Allow PCM metadata in nested iframes to allow third-party serving of ads #7 that third parties should be able to provide the link metadata adDestination and adCampaignID. If we were also to send attribution data to third parties, first parties would have no control over who claims what on their website. Even worse, if third parties were abusing PCM, first parties wouldn't have a way to detect it. All the data would flow to other players. We want first parties to get in control of attribution. In addition, first parties should be able to make business deals to have their attribution data analyzed. If they never see the data, they can't.

Second, addressing some of the things you bring up:

  • The goal of PCM is not to "cover the current market situation on how tracking is done." In fact, it is intended to change the current situation for the better.
  • If "the website itself doesn't care about the conversions happening," the website here being the click source, we'd rather explore the briefly mentioned JavaScript API with which attribution can be sent directly to the click destination. That is the website that should care.
  • "Having only one campaign identifier is not sufficient for tracking." We are not trying to reinvent tracking at the level it happens in browsers without privacy protections today. We are setting the limit to 12 bits. What goes into the 4 to 8 bits of campaignID (see the 4+4+4 bit budget in Let browsers have different privacy settings #11) is up to the click source and destination to decide. But there's not going to be more entropy to spend whatever we call the parameter.
  • "Conversion tracking is not only limited to clicks but also to impressions." We know. 🙂 This proposed standard deals with clicks. A future one may deal with impressions.
  • The HTTP headers you bring up – x-tracking-campaign, x-tracking-creative, x-tracking-placement, and x-tracking-info – seem ripe for cross-site tracking abuse. If that much data is associated with a click on the web, how do we make sure that the data is not doctored to individually identify the user?
  • "The browser will block the pixel on the site" – how is this possible without collateral real image and ad blocking? How can the browser know what is a tracking pixel and what is an image without making the request?
  • The proposal as it stands already trims subdomains.

@janwinkler
Copy link
Author

@johnwilander
While I understand most of your points from a data protection perspective, I dont see that developing a mechnism that is designed "too hard/strict/inflexible", will get any market adoption. Instead marketers will search and find other ways to be able to track the same data they are already tracking (e.g. server side first party tracking which browsers are not able to block). Trying to change a multi billion dollar industry can only work, if the alternative still provides the same minimum of flexibility/features that marketers already have. The ONLY way to get to more privacy is to enable marketers with tools they can use to get the same/similar result as before but with benefit of "integrated" data protection.

Regarding the things you mention:

  1. User perspective: Yes the user does not understand that there are 50 other parties on the website. But if these parties do not process any personal data, the user doesn't care about them. The only concern for the user is when his/her personal data is processed and that can be limited.

  2. First party control: As written before, that is not how it works. Several reasons:
    Besides the really big players (Google, Facebook, ...) NO publisher has a direct contact to its advertisers - there is always some third party inbetween. Just imagine the biggest advertising network Google Adsense as an example:
    a) It is just not possible for every publisher to know every advertiser (there are too many advertisers).
    b) Same for advertisers: it is just not possible for an advertiser to work with publishers direktly (there are too many publishers)
    c) It is not practical for a publisher to have direct links to an advertiser (publishers want ads to rotate, campaigns to start/stop automatically, apply volume and frequency capping etc). there is always a need for some tech inbetween (adserver, ssp, adnetwork, ...)
    Hence thrid party tracking is essential.

  3. More identifiers / 12 bits / Entropy / HTTP-Headers: All basically touch the same issue, "how much data is needed vs how much data is possible without beeing able to find out which user it was". As written before, 12 bits are not sufficient for tracking the necessary data and would therefore bring marketers to use other ways to get the same old data. As a bare minimum we see 2x12 bits allowing marketers to have 4k active creatives on 4k active placements.

  4. "The browser will block the pixel on the site" – how is this possible without collateral real image and ad blocking? How can the browser know what is a tracking pixel and what is an image without making the request? --> The is set with the html-attribut ad-tracking-campaign="...". Hence the browser does not need to fire the pixel but only take the content of the attributes and save it along with the domain.

  5. Subdomain: Unfortunately I cant find anything about it. Can you point me at the corresponding section?

Best regards,
Jan

@johnwilander
Copy link
Collaborator

Thank you for sharing your thoughts! I'm sorry I didn't get back to you earlier. However, this issue touches on a large number of concerns. Please file individual concerns for consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants