Transcript
Transcript: CSPS Virtual Café Series: Innovative Approaches to Collecting Real-Time Sentiment Data
[The CSPS logo appears onscreen alongside text that reads "CSPS Virtual Café Series" in English and French.]
[The screen fades to Kyle Burns in a video chat panel.]
Kyle Burns: Hello and welcome to the Canada School of Public Service Virtual Café Series. This is a series that features leading academic experts, non-governmental leaders, media experts, and public servants who share and debate their ideas and perspectives on a whole range of economic, social, and technological topics. For this session, we will be hearing about innovative approaches to collecting real-time sentiment data. My name is Kyle Burns and I'm the Director General of Public Sector Innovation at the Canada School of Public Service, and I'm pleased to be here today to moderate this event. Please note that today's session is in English; however, simultaneous interpretation is available.
Before we begin, I'd like to acknowledge that I'm joining you today from Ottawa, which is situated on the traditional unceded territory of the Anishinaabe people. While participating in this virtual event, I would ask that we recognize that we all work in different places, and that we therefore work in different traditional Indigenous territories. Additionally, before we get going, I'll just mention a few housekeeping items. The first is, to make your viewing experience better, we encourage you to disconnect from your VPN if possible, and reconnect to the event. Please note that we do have simultaneous interpretation and CART services available to you for this event. Please refer to the reminder e-mail that you received from the School or visit our vExpo to learn how to access these features. And I'll also note that we'll be taking questions throughout the event via the Collaborate chat platform. And it's really simple to submit questions. You simply click on the bubble icon at the top right-hand of your screen. You won't see your questions appear in the chat, but we will be receiving them and we'll try to get to as many of them as time will permit today. We encourage you to participate in the language of your choice.
So, now I'm pleased to introduce our guest, Danielle Goldfarb. In baseball, Danielle would be described as a five-tool player. She's extraordinary. She is an accomplished and innovative executive advisor, analyst, economist, and educator. She's known for her ability to lead and develop innovative research and analysis on topics related to public policy, technology, alternative data, global trade, and the global economy. Danielle has been a public policy researcher for over two decades, and she's currently the VP of Public Policy, Global Affairs and Economics at RIWI, and RIWI stands for Real-Time Interactive Worldwide Intelligence.
[Danielle Goldfarb appears in a separate video chat panel.]
Danielle, I'd like to offer you and extend to you a very warm welcome. Thank you for joining us today.
Danielle Goldfarb: Thanks so much, Kyle. It's great to be here and to speak to everybody who's attending. I'm really excited to talk to you today a bit about the fact that we've all gone online, our society has digitized and our economies have digitized. And with that, we've seen this explosion of global data, and whether that's text data, satellite data, mobility data. And this presents huge opportunities for policymakers to seize these data at the same time as it presents major challenges. And so, I'm really excited to actually tell you about one of the tools and technologies that exists that I've actually had the privilege of working with over the last five years, that is actually leveraging the fact that people are online, and using it to develop and create real-time and more inclusive datasets. So, during the pandemic we've really been able to see this acceleration of some of these new datasets and tools.
And so, I'm going to dive right in and share a bit more with you about the big picture, the landscape we're facing, and kind of what we're seeing from RIWI's perspective as we explore the uses of this technology. Maybe I'll just mention that RIWI is a Canadian technology developed at the University of Toronto, now used around the world in every country except for North Korea. We're not able to use it in North Korea, no Internet. But we're using it all around the world, including in Canada, and I'll share some examples as we go along. So, maybe I'll go to the next slide now.
[A slide is shown with the title "Harnessing the Digital Data Revolution for Real-time and More Inclusive Data" in English and Frenc.]
And what I wanted to start by sort of saying, if we could just go to the next slide, I wanted to share with you a little bit about sort of where we're... what we use in terms of our legacy data, the data that we use as kind of our information anchors. If we can move to the next slide, I'll just... it'll show you there. Okay, maybe I'll just speak to it until we get there. Basically, our data that we have right now, we're very lucky in Canada. We've got a great statistical agency. It provides our kind of information anchors.
[A slide is shown displaying a list of policy announcements from May 2023 chronologically. The release dates and reference periods are listed beside each one.]
So, you'll see in this slide, our data comes out.
[Text appears above the chart that reads "Data lags reality, challenging ability to make timely interventions" in English and Frenc.]
This is May 2023, the days that the data is released in May 2023. But if you look at the reference periods, the reference periods are all from several weeks or months earlier. And we saw during the pandemic the need for timely... when you're having policy changes that are really rapid, you really need more timely measures to be able to make timely policy interventions. So, that's one of the challenges with our existing data.
Now, if we go to the next slide, we have to start asking, as our economies digitize and so on... keep moving along with the bullet points. Yeah, as our economy becomes more digitized, more complex, are the data sources that we're using now as reliable as they have been historically, as relevant. Are they as granular enough as they need to be? And so, this is data, this is referring to the situation in the U.S., it's a National Bureau of Economic Research quote. But what they found and what has also been found in Canada and pretty much all over the world is that it's been very, very difficult to employ traditional methods to collect survey data from people. And response rates have fallen considerably, raising questions about whether we're really getting a complete picture of what's happening. And then, people are demanding more timely data, more granular data as the economy also has become more complex, or society has become more complex, and the issues that we're dealing with are becoming more complex.
Next slide, please.
[A slide is shown with the text "As society has digitized, an explosion of new data is now available" in English and French. A graph appears below showing the number of Google searches in Italy of "Non Sento Odori" ("I can't smell") in March 2020.]
And so, at this moment where we're really... as we saw during the pandemic, that was very difficult for us, to rely on some of our traditional data sources and be able to make timely... really understand what was happening in a timely fashion. We've been exposed to a huge explosion of data, whether it's... this is an example here that I always thought was quite interesting. It was published in the New York Times, basically using online search data to actually identify a new COVID symptom before it was actually reported on and validated as an actual COVID symptom. Now, we know online search data can also lead us astray, as there was this infamous case of Google flu trends and so on. But nevertheless, we do have these new data sources, and the question is how can we leverage them to improve our ability to measure what's happening and to understand what's happening? Next slide, please.
[A graph shows the number of Statistics Canada job vacancies and the number of Indeed job postings for January 18th – 23rd, 2022 and July 18th – 23rd, 2022.]
And so, we can now see some trends in real time, which we were not able to see, which some of our traditional datasets don't allow us to. And this is work that's done by Brendan Bernard using Indeed.com, which is a jobs postings website, probably most of you are familiar with it. And basically, the amazing thing about the dataset that he used is that it actually tracks the official data very closely, but it's available in advance of the official data. And in fact, there was a period during COVID where the data were not collected, and you can see that kind of pink line that is continuous because he was able to actually pull together that dataset throughout that time. Now, there's, again, challenges, biases in that dataset. But nevertheless, these new datasets are out there and available. Next slide, please.
[A graph shows the Zillow home value index in 22 different U.S. states in September 2022.]
And so, we have the ability here to make better predictions and interventions in theory, if we leverage these tools well and understand what they can do and what they can't do. And this is data from Zillow. So, Zillow is a housing platform and is now being used to make predictions about inflation, and it's been quite helpful in terms of actually predicting, far in advance, what inflation is - far in advance of official data - what inflation is actually going to look like, coming out of the official data. So, again, we just need to be aware of these datasets. And next slide, please.
[A graph shows the number of Baidu searches in China of "Antigen", "Lianhua qingwen", "Covid", "Antipyretics", "Fever", and "Funeral services" between October and December 2022.]
And they can allow us also... these new datasets can allow us to validate or invalidate other sources. And this is the example right after China did its kind of about face on its zero-COVID policy and said, we're not going to... we're going to open the economy up, we're going to open society up again. And then they reported that there were no deaths due to COVID. And of course, nobody believed them. But you could actually validate what... instead of just saying, we've got to wait to the official data, a revise or anything like that, we could look here at Baidu... the online search data, and we could also look at satellite data of morgues, and we could look at, really... we did self-reported data that actually showed us a much more accurate picture of the state of affairs as it was happening in real-time. So, this is really important and can be used for all kinds of purposes of understanding what's happening in the world. Next slide, please. And... go back one, please.
[A graph shows responses to the question "Prior to this survey, when was the last time you answered survey questions?" The question is given in English and French.]
So, the advantage, also, of this new digital data, this revolution and this ability to kind of digitize and get online and access people online, is that we now have new ways to access populations that were typically not included or under-reflected in some of our traditional ways of gathering data. And this is what one of the big things that we're doing at RIWI is, that we're really trying to access people. We also, we want to access the engaged population, but we also want to make sure that we are gathering the broadest dataset that we can get access to, that we can really engage people that maybe weren't asked for their opinion in the past. And so, when we go out... and I'll tell you a little bit more about how we do that, but we are able to capture a significant share of our respondents who tell us that they have never answered a survey, or certainly haven't been doing so on a regular kind of weekly... or they're not on a survey panel, for example, and they're answering repeated surveys. And so, this is our ability to access populations that may be more mistrustful of government, that maybe don't have a bank account, or that maybe may not want to trust public health guidance around vaccination. We really want to get access to those groups of the population because we need to understand, as policymakers, those of you in the audience who are thinking about these issues, we need to really... we can be led astray if we are not hearing from that full spectrum of opinion. Next slide, please.
Okay, so here's how we are doing it, and there's lots of other different approaches. This is one of the different ways that we are doing. We are able to... just hold off and I'll tell you when to go ahead on the points. So, basically, we have a really inclusive approach. Our idea here is that we want to actually increase the coverage, minimize the bias associated with our dataset. So, we are trying to... the online population offers us the ability to access anyone online. So, the founder of RIWI invented the concept of web interception, basically, this idea that people are online, going about their daily activities, as we all are, and whether it's on their phone, whether it's on their desktop, and, in theory we want to be able to access anybody online randomly. And so, we... that's our objective. Our objective is to basically broaden the potential respondent pool and intercept them in so many different ways that we are minimizing the biases associated with conventional methods. So, if you go down to the next point, we're not... it's not like we are going online and saying, okay, we're going to recruit everybody on Facebook. No, we have hundreds and thousands of different ways in which we're intercepting people online, of which Facebook.com would not be any of those ways. We're doing it through all the different ways that people are engaging with the Internet. And if you go on to the next point, we're also trying to do this in ways that we are minimizing whatever biases we can. Now, I should say straight out, there's no technology that has no biases. This technology has biases, too. But we're trying our best to remove any biases. So, one of the ones that is associated with some conventional approaches, data collection, is to limit, restrict data collection to, let's say a period, a week, period of reference, period of time, and only contact people between the hours of 9 to 5, potentially via a phone survey. We want to remove any time-of-day bias. So, we are running these continuously. And the other thing that we do is we actually run the surveys on an ongoing basis so that if there is an event, say a pandemic, banking shock, persistent... we want to understand... any number of events that we have faced in the last few years. We want to be able to do an event study and we want to be able to know if that event has impacted people's sentiment and behaviour. So, we do this on a continuous basis. And next point, please.
The next point... okay, so we also are thinking very carefully. Because we're intercepting people online and we don't have... they don't have to be doing this, we want to make it as... and our goal is to keep them and to keep this dataset as broad-based, as inclusive as possible. We want to make it very easy to answer. We were very careful about how we designed the questions, how the interface looks, so that it's very easy and simple to answer. We want to make sure that if you're in a low bandwidth area, that we are able to... that this will run seamlessly no matter what, that it'll be very, very easy to use. And that's, again, all with the purpose of really trying to minimize the coverage bias, like trying to get as many people involved as possible. Can I please go ahead? The next slide, next point. And, yes, we optimize it for low literacy environments as well as... optimize the question design for people with low literacy. And we also use a lot of sensitivity around question design because we're asking questions about everything from mental health to very, very politically sensitive questions, perhaps, for example, in Iran, asking people about their support for the protests, or in Russia, asking people about whether they support Vladimir Putin's approach to the war, for example. And so, we're asking very, very sensitive questions and we think very carefully about how we design those questions. And I can talk more about that if people are interested. Next point, please.
And here, this is a very important point. I should have probably put it in bigger font, that we really want to make sure... there's a lot of issues with data privacy and so on. Inherent in the design of this particular technology was a decision that we made to make sure we are not capturing anyone's social insurance number, e-mail address, any kind of personal identifiers. And the reason we want to do that is because we want to make sure that... it's not like we're capturing it and then we're anonymizing it. We're never capturing it in the first place. And that's because we really want to make sure that it's where people are trusting, are going to feel as comfortable as possible. We can't eliminate the fact that they may not feel trustful, but we are trying to minimize social desirability bias. We're trying to minimize the chance that... maximize the chance that they're going to tell us the truth and that they're going to feel comfortable staying and answering these questions. Next slide, please... and we're doing it everywhere.
Okay, so how do we do this? We're intercepting people. It's agnostic to the device that they're on. So, you can be on a tablet, you can be on a desktop, you can be on a smartphone, you can be in Ukraine, leaving your home because the war has started or the war is ongoing and you could be intercepted on your smartphone, for example. And so, this, if you just go ahead, I'll show you both the English and French. This is a study we ran recently in the G7 countries on political polarization. We're actually running it on an ongoing basis and looking to see whether polarization is worsening. Anyhow, Canada is the least polarized of all the countries that... well, one of the... Canada and Japan are the least polarized, but we're going to be tracking it over time to see, is this changing, is it worsening? And you can see it's a very... this is what it looks like on a smartphone. I just took a picture from my own smartphone, and it's very simple, very straightforward. And so, we're intercepting people, we're asking them some very basic questions about their age and gender as a first question. And as soon as they answer that question, well, they go on to the next set of questions, but we also capture their latitude and longitude by their IP provider, so not their actual physical location, like they're in this coffee shop in this place, but sort of their general... their city, for example. So, we would know if they were in Quebec City or something like that. And then we would we would ask them... but we would not have personal identifiers, and then we would ask them a series of very, very simple questions, try to make them very simple and easy to understand. Next slide, please.
And the amazing thing is that as we've continued to use this, is that it's continuously surprised us in predicting various geopolitical or other events that have surprised other polls, those who have used more conventional polling approaches. So, the first one was the fall of Mubarak in 2011, next, Brexit in 2016, where the polls engaged very... the polls contacted a lot of highly engaged young people but didn't speak to disengaged young people. And so, when we used this approach, we were able to contact the more naturally disengaged populations and predict that Brexit was likely to happen. Same thing, a slightly different thing, in 2016, when many people in and around the world were surprised by President Trump's election win, RIWI's CEO contacted everybody associated with the company the morning of the election and said that our respondents are telling us that Trump is going to win, and that's because they were able, this technology was able, to access the disengaged populations that weren't included in pollsters' models in previous years because they didn't vote in previous elections. And so, if you keep going on the rest of the points there, so other recent examples, and the most recent example actually was last... over the last couple of weeks where everybody thought that Erdogan would not win re-election in Turkey and there was going to be a major change in terms of that country's orientation and what we could expect to see, and the RIWI technology, the RIWI respondents, were telling us, no, there's a huge amount of support for Erdogan, that we were going to see an Erdogan win. We were seeing this weeks ago in the dataset. And then, of course, there's also... we'll see what we see in the 2024 U.S. election as well. So, it's been actually pretty amazing from my perspective to kind of, every time you think, you got to get one of these wrong, we're still always accessing this disengaged population and therefore really being able to more accurately see what's going to happen in the future. Next slide, please. And this is just a quote during the Brexit study that we did, which said that this method, the RIWI method, naturally contacts politically disengaged young people compared to standard methodologies. And these are researchers who compared, actually, the demographic profiles of the respondents to the Brexit survey to the sort of gold standard face-to-face British election survey, and they were like, okay, this is actually technology that's seeing something different and it's able to naturally contact these disengaged populations. Next slide, please.
This I thought was interesting. This is a study with the Public Policy Forum. And what we did is we showed them an image of the freedom convoy and we asked, what are these people doing? So, the respondents were predictable. We had some people saying, okay, yes, they're defending freedom, this is all about freedom. And then on the other hand, we had people saying, okay, these people are causing chaos and disruption in Ottawa and ruining our lives. But we also had this substantial group of people, and this was actually done for young people under age, which they defined as under age 35, and we had a substantial number of people who actually gave us the following types of answers, which are, and you can show all of the answers, so, I don't know, "Sorry, I don't know", maybe they're celebrating Canada Day, maybe they're protesting the government, I'm not sure, looks like a peaceful protest. So, we had a lot of people who kind of, clearly did not know what this was. And so, it might be shocking to... it's shocking to me, as somebody who religiously wakes up every morning and reads the newspaper, actually even the physical newspaper, but many people are operating in different news environments and may not be... like living in these silos and may not have been exposed to this. And so, I think it's really... this is one of the things that people are most surprised about when you use a technology like this, is, oh my goodness, like how many people are not knowledgeable about a particular thing, and that we really need to open our eyes to what the real true sentiment is out there. And sometimes it's, be careful what you ask for kind of responses that come in. Next slide, please.
[A graph shows the percentage of participants who avoided social gatherings compared to the number of COVID-19 cases between December 22nd, 2020 and June 14th, 2021.]
This is a study that we did with the Fields Institute of Mathematics at the University of Toronto during the pandemic. And you can show... yeah, great. You can show the whole thing there. Basically, we were asking, we were using this to collect real-time data. This slide's a little busy, but basically what it's showing, basically what the Fields Institute was doing was modeling for the Scientific Advisory Table, to the Ontario government, was modeling sort of the trajectory of the pandemic. And they had... they were using all kinds of data, really interesting set of projects. But the testing data, as we all know, was both lagged and did not have complete coverage. And then we stopped testing at one point. So, the modeling group really wanted to use... they used Google mobility data to know where people were going, but then they needed to understand how people were actually behaving once they... wherever they were. So, maybe they were going out more, but maybe they were still avoiding gatherings, or they were masking, right? So, they didn't know this behavioural component. And so, we worked with them to ask, on a daily basis, are you compliant with the public health guidance, are you masking, are you avoiding social gatherings, etc. So, what's interesting about this is you can see in real-time, and we were told by the modelers, like at various stages of the pandemic, we thought we knew what was going to happen next but this kind of opened our eyes to the fact that people stopped complying at various points despite the policy guidance, and we were going to have outbreaks. And this was actually data that was collected at the postal code level. And so, they were able to actually get very granular information about where the outbreaks were likely to be, including amongst disengaged populations that were potentially either unvaccinated and/or exposed to misinformation related to vaccination. So, this I think is a really, also, interesting example. You can see that there was a share of the population that was never compliant, there's a share that was always compliant, but there's that share in the middle that stopped complying at various stages and it didn't necessarily align with the public health guidance. And so, I think it's really important to understand these decisionmakers found it really important to be able to have access to this kind of information, to be able to make timely decisions. We'll go to the next slide.
[A graph shows how much participants expected the percentage of local employment to either increase, decrease, or stay the same over a six-month period.]
This is work with the Bank of Canada, actually, on labour market conditions in Canada. And traditionally, what central banks have done is that they have had a survey, either it's annually or quarterly. But
things happen that you can't... as we know during the pandemic, the banking shock that happened in March. Traditionally, we would just survey in February and then again in May, but then you don't know what's happening in March. So, this is just an example of how the Bank of Canada, and this was published in a recent Bank of Canada publication, about, in real-time, what are Canadians' views on the labour market in real-time. It can also be used, we're using it for other examples of consumer demand, inflation expectations, and other things that central bankers care a lot about. But then they can... because it's done on a daily basis and because it gives them an alternative perspective of whether their conventional survey data is actually missing something, because this is an alternative group of respondents that may not be reflected in the conventional datasets. Okay, so next slide, please.
[A graph shows the amount of participants who lost their job or had someone close to them lose their job between November 2022 and May 2023.]
Okay, this is something we're tracking here. So, we're asking, this is data from Canada, it's a couple of weeks old but it doesn't look too different now, and if you want me, I can pull up the most recent data, if you call me, I'll pull it up for you, have you or a close friend or family member lost their job in the past two weeks? And so, we're asking people... our assumption throughout all of this is that by getting people who are closest to the ground to report what's happening around them, we're going to get much more accurate information. And we're really looking for change, right? So, this is, have you or a close friend or family member lost their job in the past two weeks? And in the last, I don't know, three or four weeks, we do see that there is a little bit of a change, not as much as probably... we're kind of questioning, are we heading into recession? So, what we're doing is looking here, okay, we're looking, can we see any evidence of that? So, gone are the days of having to wait. I mean, I wouldn't say gone are the days. We still have to look and see, was there two consecutive quarters of negative GDP growth to conclude that there was officially a recession? But if you're making decisions today, you can't wait until you see that kind of information. I mean, you can, but there's a cost to waiting, in terms of people's lives and livelihoods that are impacted. And so, this is why I think this kind of information is so helpful. Next slide, please.
[A graph shows the military tension score for Ukraine and Russia between May 2022 and May 2023.]
This, I just thought it would be interesting for you to see a couple examples of what we're doing around the world. We're doing this in Ukraine, Russia, China, Taiwan, Israel, Iran, major geopolitical conflict zones around the world. And we've been asking them since the first... since ten days before Russia invaded Ukraine, do you think that military tensions between Ukraine and Russia will be intensifying or becoming less intense? And so, anything above zero, we average it out and we take a seven-day moving average. And so, everything above zero is that people think things are going to intensify. So, you can see big spikes in various times during the war. And you can see that Ukrainians, of course, unlike the Russians who have... there's very different narratives, of course, that are being told in both countries, but the Ukrainians, of course, are seeing much more of the fighting around them and have basically predicted... they have predicted every kind of turning point or inflection point in the conflict. There's a spike a couple of weeks ago, it's actually come down since... if you look. I just checked the data this morning and it's come down a bit. But there is no question that this is... Ukrainians are kind of seeing this on the ground. And this is another thing that we're able to do, is really see what's happening in real-time. Because Ukrainians are actually amongst the most digitally connected. Everybody has a smartphone in Ukraine, so they're really able to respond. And we're also using this for humanitarian purposes to understand who's leaving their homes, where are they going, what are their needs, information that's going to be very helpful as Ukraine needs to be reconstructed, hopefully, after the war ends. Next slide, please.
So, in conclusion, the data landscape has really changed. And we saw this happening pre-pandemic but it really accelerated during the pandemic. And we have a new set of tools here. Many of those tools are resident in technology companies, or they're open-source tools. They're not necessarily resident in government. But they allow... they raise opportunity and they raise challenge. And we can use these tools, as we're doing, really, to reach previously under-reflected populations. Next point, please. And we can also measure what's happening in real-time, we can measure what's happening in advance, in some cases, and we can, in theory, leverage these tools for better, more timely policy interventions. In my work, I talk to a lot of people in the private sector who are using these tools, every piece of sensor data or alternative satellite data, text data. And we want to see that the public sector... and also examples in the public sector too, really great examples of what's happening, but a lot more can be done to be able to use these tools. And yeah, we have to also... we're trying our best at RIWI to actually address some of the biases that these new tools present, but a lot more work needs to be done, and to understand, really, what the opportunities are and what the biases and challenges are, and how do we mitigate some of those as we go forward in this new landscape. So, yeah, that's it, and I'd love to hear your questions.
Kyle Burns: Well, you will get to hear their questions, and they've been sending some in, but not before I get to ask a few more questions. And I will say, Danielle, that that was amazing. I'm going to give you an opportunity to have a quick sip of water, if you need to. But I'm also going to say that you and I have had some really fascinating conversations recently, and I think you're probably glad that you're not sitting beside me on an airplane on a long flight, because I just wouldn't stop asking questions. So, I will encourage those who are on the line to use the bubble up top. And we do have questions already rolling in, not surprisingly, they're really good ones. But I'm going to reflect a little bit before we turn to some of those questions. And since I've been in the federal public service for over 20 years now, we've always been asked, as public servants, for our advice. More and more, we're being asked for timely advice. It's got to be fast, it's got to be accurate, it's got to be trustworthy, inclusive, incredible. And that's just naming a few of the things. And we have people in this conversation, I'm sure, who are conducting legislative surveys, who are conducting other types of surveys, who are involved in the census. These things take time. And our Deputy Minister has conveyed to us his view, which is that a decision delayed is a decision made.
And so, having the type of information that RIWI is able to provide is exceptionally powerful. And we will get into some questions around biases and whatnot, but I wonder if we could just start off with a little bit about your methodology and the way in which you sort of turn the data you gather into the type of advice that you're able to produce. It's fascinating that you've been able to predict so many different elections, and that you're able to gather information from some of the world's most vulnerable people. I think back to an earlier conversation we had where you said it's actually quite possible to gather data from refugees. But you have been able to gather data from refugees in transit. I've had a chance to try out your survey methods, and the design question is one that will certainly come up as well. But all that to say that we don't have a long flight somewhere, so I'm just going to pause and maybe give you a chance to speak a bit to the methodology. And I might ask another couple of questions before we invite others into the questions as well.
Danielle Goldfarb: Okay, great. Yeah, so, I mean, I think the methodology, the base of the idea, is that at every stage of this data collection process, we can introduce biases or we can minimize and then kind of correct for some of those biases. And before I was the data business, I didn't really understand, like all the different places at which one could introduce bias. So, the first thing that we're doing in our approach and method is that we are trying to expand the potential universe of people that are intercepted online. So, we're intercepting them. So, let's say you are... one way we might do this is if you're surfing online and you might come across an error message, right? So, you're surfing online, a website does not exist. We could show you a survey at that point, and then you could choose whether to opt in or not. So, you're doing something else, it intercepts you, and we're doing this across hundreds and thousands of web domains, different web domains. There's this whole long tail of web domains out there that are parked, inactive web domains. And you might otherwise see an ad or something like that, but instead we would show you a survey, or you might see an error message. And so, we occupy these for fractional periods of time and we're constantly rotating them. And so, there's machine learning algorithms behind the scenes doing this kind of work and making sure that we're getting access to this wide range of web domains. We're also doing this by... you might be playing a game on your phone on an app. We could intercept you while doing that. So, there's so many different ways in which we're intercepting people while they're online. And our goal is to maximize the number of ways in which we're doing that so that we are minimizing the bias from any particular one source. So, that's one of the things we're doing, and that's our method. And then as I mentioned, we're also doing these... making it anonymous, making it easy to answer, and employing some of these other design features that I think anybody can employ, so that we are able to really kind of make sure people find the experience easy. Like, we're forced... because people are not expecting to be intercepted, we're forced to make it easy for them to answer, whereas... so, I think that's another thing when you're particularly accessing populations that might be more mistrustful, and so on. You need to make these things very easy to respond to. And we do that even like if there's a picture or a video, like there are different things that we show people. You need to make it engaging, otherwise people will not answer.
Kyle Burns: Well, and I'll attest to that. I set aside several minutes to try out the demo, thinking, okay, I've got to make sure I've got enough time to finish it, and it was over before I knew it. And so, it is quite easy. I'm going to stop being selfish.
I'm going to turn to some of the questions that we've been receiving from our participants, and they're really thoughtful. I'm going to read this first one. So, it says, "I'm curious about the conversations you had about privacy. I keep thinking about when U2's album just showed up in iTunes uninvited, and the backlash that followed. Would these interceptions come across as intrusive? And what was the conversation around that?"
Danielle Goldfarb: That's a very interesting question. Yeah, I mean, so, we're always having these questions. So, we've... absolutely. So, I think a couple of things. First of all, one thing that's very important in these discussions is consent, so whether people have the right. So, we are always, because... and one thing I didn't mention is the person that created this technology comes from a public health background. So, consent, privacy were kind of top of mind when he developed the technology. His name is Neil Seeman, he's a public health researcher at the University of Toronto. And his point was, well, we have to make sure that we're receiving consent at every stage of this, that we are making sure people have the option to opt out at all stages. And so, what happens is you can opt... you have to choose to respond to this. If not, there's just... you just "X" out the corner and you're done, you don't respond to the survey. I don't know if you tried that, Kyle, but you can... if you don't want to respond, you can just exit out and you're done and that's it. And actually, you can do that at any stage of the process. So, let's say you answered question one and you don't want to answer question two, because I don't know, it's asking you about something you find too sensitive, you just "X" out and you're gone from the survey. So, it is something that we kind of focus really hard on, is to make sure that there's no... that people do not feel compelled to stay on if they don't think that it's a valuable use of their time. So, I don't know if that answers that question fully, but these are, I think, ongoing conversations, and we have to always be thinking about, how are we engaging with people? Is it respectful? Do we have consent? Is it an appropriate set of questions for the audience that we are asking? Like, are we putting people at risk by asking some of these questions? And we're in some of the... when you're asking people in Iran to say whether they support the protests or whether they... all kinds of questions like this, we have to be very careful about how we ask those questions.
Kyle Burns: Well, it really does speak to the way you're able to build trust because even the phrase, "web interception", I mean, I try not to be intercepted as much as possible. But you have been able to get to some of the most vulnerable communities. And I think that does speak to the design.
And maybe I'll turn to the next question because it's a slightly different take on the U2 question. And it reads, "To what extent do you consider online fora or other sources for having explicit rules and mining them for data other than the intended purpose? Is that a breach of privacy?"
Danielle Goldfarb: Yeah, these are all good questions.
Kyle Burns: Fascinating.
Danielle Goldfarb: I don't have easy answers.
Kyle Burns: Yeah.
Danielle Goldfarb: Yeah, I mean, you raise a very important point. There's a huge amount of work that's being done, that I see in my line of work in terms of social media analysis and text analysis, and being all kinds of researchers that are pulling all this stuff off and studying it, and I don't have a good answer, whether it's a breach of privacy or not. I think that this is an ongoing conversation that we all... some people might consider it a breach of privacy and other people might consider that their stuff is online and that is able to be used for other purposes. I think that you'd probably find differences in terms of generational impact, generational views on this, for example. Like, my kids think if they put something online, like that's it, people can use, it's fair game. But not everybody agrees with that. So, I think that this is a very important question because there is so much work, particularly in the private sector, being done to mine all kinds of online fora for a whole range of issues. And then even when you think about government, trying to understand, for example... like there's work being done in the U.S., for example, monitoring online fora to try to predict mass shootings, for example. And so, to what degree do we have, does law enforcement, do investigators have the right to use those kind of data to prevent a terrible incident from happening, and to use, to apply machine learning tools to that, and so on and so forth? I mean, I think these are sort of the central questions of our time actually, about how do we... what kind of rights to privacy do we have online, and so on?
Kyle Burns: Yeah, I'm actually really glad that our participants are asking these really tough questions because it just, I think it reflects the investment that they have in these issues and the care and thought that they approach them with. Maybe on the flip side, we had a participant ask, how do you factor in segments of the population who are not online?
Danielle Goldfarb: Right.
Kyle Burns: Maybe accessibility or barriers, and they want to ensure that the data does not lead to a perpetuation of underservice caused by that.
Danielle Goldfarb: Yes, 100%. So, that's a huge issue. So, if you have a particular policy question that requires... that you need to get access to parts of the population that will not have a device, that are not online, absolutely, you would need to use a mixed methods approach where you're accessing those people by another method. What we're doing with this approach is saying, let's try to access... let's leverage the fact that people are online to try to access a wider array of people than our access using conventional methods. But for example, even when you're working with refugees, most of them, most Ukrainians, have smartphones and can tell us where they're going. Fewer Syrians maybe would have smartphones, but still a lot do. But are you capturing every last person? No, you're not. And there's other biases as well you're going to be capturing, because it's the online population, and because of the way Internet use works. You're going to be biased to younger people, biased to male populations. And different parts of the world have different biases in terms of the way they use the Internet. And so, this method, you will need to use... if you want a complete picture, you will need to use a variety of methods. And actually, one of the things I didn't emphasize enough, I think, is that, to me, what's most important, or what I'm learning, and from all the research and work that I've done, and probably it's, I imagine, through the work of many in the audience as well, that you really need to rely on multiple datasets, data sources, to get a complete picture of what's happening. And what I'm trying to highlight is that there's some new ways that we can get access to new types of datasets that can help us complete the picture of what we're seeing out there. But you don't want to rely on any one dataset to draw any conclusions. You want to be able to pull together all the available sources of intelligence that now exist out there to be able to make policy decisions.
Kyle Burns: Yeah, no, so true. And I think that's an important takeaway, is that you're not sort of suggesting that you've cornered the market on predictive analysis, but in fact that you do really good work and that it could be supplemented with others who are in this world.
I think the next question I'm going to ask may sound simple, but I wonder if it has a multi-layered answer. And that's, "How far in advance can you predict?" Now, that didn't end with predict an election, or predict a pandemic, or predict an economic event. So, I'll leave it to your experience base to sort of interpret that question.
Danielle Goldfarb: Okay, how far in advance can you predict? That is a good question. I'm not sure that this particular technology is going to be able to tell us, for example, I don't know, something that's going to happen a couple years from now or something like that. Can it predict the U.S. 2024 election today? Probably not, right? Because... but could it give us indications, I mean, of where people are at today, where their intentions are, going forward? I think it really depends on the question. So, let's say we take a question around inflation, inflation expectations. If you ask people, do you think that prices are going to come down, go up, are going to be increased in the next... like, what's going to happen to prices, let's say, in the next 12 months, right? That, actually, if people have persistent expectations of inflation, that could actually tell you something about what's going to happen to inflation a number of months down the road, right? It really depends on the... whereas if you're asking them, what did you... how much did you buy today in terms of entertainment and things like that versus last month, you're going to get a kind of, what's happening now, how are they thinking about now? So, it really depends on the way you ask questions, and also how you ask the question. So, this is another thing I found interesting when I joined, really, is they didn't ask the company... so, to ask the respondents, okay, what's going to happen in the U.S. election? We actually don't ask people, who are you going to vote for? Well, we do ask people who you're going to vote for, but we also ask people, who do you think is going to win? And, who do you think is going to win, is a more predictive question. In almost every single election that we have looked at around the world, that is a more predictive question. And so, how far in advance? In that situation, we knew it, really, weeks in advance, what was likely to happen. But then there's also always events that happen that disrupt your expectations of what's going to happen. So, it really, really depends. I hope this is a somewhat satisfactory answer on this question. But it depends on the circumstances.
Kyle Burns: Yeah.
Danielle Goldfarb: What you can predict and when you can predict it, and I'm just thinking about, yeah, like all kinds of impacts of different things. It just depends on how you... I mean, another example I can give you is that, with respect to the Chinese economy, when it was re-opening post-zero-COVID, everybody thought there would be a huge boost in... the economy would grow very rapidly, there was all this pent-up demand, there would be this huge upswing. And right away, we were seeing initial signs that this was not going to be the case. But of course, the GDP numbers don't come out until months and months after the case. China's somewhat opaque, stopped releasing some of its economic data. It's hard to get a picture of exactly, what's going on. And so, did we predict that months in advance? Well, we had indications months in advance that things weren't going to be quite as robust as some people thought, right? So, they're always more nuanced.
Kyle Burns: Absolutely, and I think that when you do have sort of milestone reports, like the one that you've just described, the trend analysis that sort of provides a bit of foreshadowing for those reports could be highly valuable and give you a sense of where things are heading, especially as nations around the world are paying so much attention to supply chains.
The questions just keep rolling in and each one is more fascinating than the next. So, maybe I'll turn to a security-type question. And this one is, "Who does the application security for the system? In some contexts, the threat model would include a nation state actor wanting to abuse it." Does that resonate for you, and have you come across these types of questions in the past or as you're designing? I'll stop talking and leave it to you.
Danielle Goldfarb: Wait, so is the question... okay, here, who does the application security? Okay, it would include... is the question really, could this be used by, for example, authoritarian regimes to do nefarious things?"
Kyle Burns: Yeah, like the way I was interpreting it is that, you talked about doing polling in every country, I think, except for North Korea. And so, you mentioned Iran and Syria, and some other places where people have restrictions or limited freedoms. And so, yeah, that's the way I was interpreting it. And is there a security that will prevent the nation-states from abusing it?
Danielle Goldfarb: Yeah, so, we are not conducting any work on behalf of those such actors, and we take precautions. So, for example, any data that we collect in China, for example, is actually not stored in China itself. It's stored outside of China in systems outside of the country. And so, that's just one example of one of the kind of security precautions that we take to make sure, first of all, that it's not subject to state censorship, that we're able to conduct this kind of research. And so, that's one thing. But in theory, yes, I mean, lots of the... not speaking to this technology itself, but lots of different technologies can be used and abused by agents that we don't want to be doing this. And I think this is one of the big challenges, is to make sure that we utilize these tools to their best of their potential in terms of stopping such information. So, some of the work we're actually doing is about... is related to human rights issues and understanding what's happening in areas where we know there are human rights abuses taking place, and gathering data and trying to understand. That's like some of the work that we do kind of behind the scenes, is around some of those issues. We also do a lot of work around misinformation and disinformation, understanding the degree to which people believe conspiracy theories, for example, and how do we combat and address some of those issues? And we just did a study on belief in conspiracy theories in China around U.S. bioweapons labs in Ukraine, and the astounding number of people who believe that that's actually happening, and then messaging that you can use to combat that and which is the most effective messaging. So, we're trying to work on kind of addressing that, those challenges. But absolutely, there are going to be an increasing number of use of similar types of approaches and technologies that could be used for ends that we do not want to perpetuate.
Kyle Burns: Yeah, for sure, for sure. We are so close to time, but I'm going to ask one that may not require a long response so that we can slip in one more question. And that is just around the frequency at which the interceptions are sent for each question answered, and then a follow-up question. I suspect that this person is maybe involved in survey design. "Is there a certain percentage of questions that are typically completed?"
Danielle Goldfarb: So, at what frequency are interceptions sent? So, it really depends on the research objective, right? So, if your goal is to... again, so, some of the real-time kind of economic tracker-type questions, right? If you want to understand the labour market, for example, in real-time, we are asking... every day, we know we want to get, I don't know, let's say 75 or 100 respondents every day in Canada. And so, we just know how many people we need to intercept in order to get 100 people completing the survey. And we would know that we know sort of typically how much people drop out throughout the survey. And the question about is there a certain percentage of questions that are typically completed, so we just go out and intercept people until we get the questions completed. And the optimal survey length is like, let's say, ten questions, but we do longer surveys and we do shorter surveys, and so on. But it is... that's sort of typically how it works. Sorry if that wasn't... that's maybe not a short answer, but it's very flexible.
Kyle Burns: No, that's perfect.
Danielle Goldfarb: If your goal is like... for example, during the pandemic, our goal was to get as many Ontarians to respond in a very short period of time to understand what was happening in real-time. Then, we just go out and get... we just go out until we get that number. But that's different than if you're doing kind of a daily tracker where you really want to have real-time information, then you just do it at a different pace. It really depends on your research objective.
Kyle Burns: And that makes perfect sense. And so, Danielle, you've been so generous with your time, both for me as we were leading up to this event, and with our participants who really do ask the insightful questions. And you know what? Before I wrap up a little bit, I am going to make a very selfish plug. But actually, it's for those in the audience that may be interested in learning more about data. I'm sure they're going to want to follow up on what you've been talking about and what RIWI is doing. And I'm also going to just note that we have an incredible Government of Canada Data Community, which is one of my teams within Public Sector Innovation, and it's a really vibrant community that is looking at the application of data, much as you are. But with that, I'm going to stop my own selfish plug and thank you for your selflessness. Danielle, you have been so thoughtful and willing to participate and share your knowledge and insights with us, and in today's digital age, that's just such a gift to us. I'd also like to thank our participants. These events work best when participants are active, so thank you for being here today and asking great questions. And before you get on with the rest of your days, I will encourage all of you to provide your feedback to us. We like to tailor our events to meet your needs, and your feedback is really important to us. And I'll also just note that we do have some incredible events that are on the horizon for the month of June and out into the rest of the year, so please do check them out. We do try to tailor them to your needs. And with that, I will say thank you once again to Danielle, to our tech team, and to our participants, and wish everyone a terrific day.
[The CSPS logo appears onscreen alongside text that reads "CSPS Virtual Café Series" in English and Frenc.]
[The Government of Canada logo appears onscreen.]