Number Soup: Behind the Research Interview Transcript
In this edition of our “Behind the Research” series, we discuss the methods and analytical tools used to collect, process, and interpret the data that served as the basis for “Number Soup: Case Studies of Quantitatively Dense News”—a study undertaken jointly with researchers at PBS NewsHour.
Hi everyone. My name is Elliott, and I am the writing and communications lead here at Knology. I am joined right now by three Knology researchers—Jena [Barchas-Lichtenstein], Bennett [Attaway], and John [Voiklis]—who are here to talk with me about a recent peer-reviewed article of theirs published in Journalism Practice. The paper, which goes by the fantastic title "Number Soup," is part of an NSF-funded project called Meaningful Math, which we are jointly undertaking with PBS NewsHour. In another article featured on our website, we provide a summary of the paper, highlighting its main findings and implications. Since we're covering those things elsewhere, here, for this feature, we're going to do something else. What I'd like to do with all of you is really take a deep dive into the methodology behind the study, so that those who read the full paper can get an understanding of the kinds of analytical tools that you all used to gather, process and interpret your data. Because you created this really rich data set for the study! And I think a lot of people who read the article would just be interested to know how you assembled this dataset, and how you made use of it. Maybe even in talking about some of these things, we could help others who might want to take the same kinds of approach that you're bringing to this project. Before we get started, I just wanted to give each of you a few seconds to introduce yourselves and talk about the particular role that you played in this research.
Hey Jena, do you want to go first since you're the lead author?
Okay, hi, everyone. I'm Jena Barchas-Lichtenstein, I use she and they pronouns. If you can't see me, I am a white person who's like fortyish, with really big glasses and really big hair. And I am the co-PI of the grant alongside Patti Parson at PBS NewsHour, so she is the lead person responsible for all production activities on the news side. I am ultimately responsible for all of the research. I am also the first author of this paper. That said, it used to be combined with what ultimately became a separate paper, of which John is the first author.
I guess that's my cue. I believe officially, I am listed as senior personnel on this grant, which in reality means that I'm Jena's thought partner. And we come up with a lot of ideas together, and we run a little bit on our own, and then, you know, try to reconcile things.
In all but bureaucracy, we're equal partners.
Yes, and I've been bringing a lot of methodological ideas to this because it needed a sizable toolbox.
And I'm Bennett Attaway, I helped with the process of co-developing the codes along with Jena, John, and another intern who used to work here. And I did most of the coding for the stories and clauses in our dataset.
Awesome, thanks, guys. So the first question I wanted to ask is about the subtitle of the paper, which is "Case Studies of Quantitatively Dense News." At one point, in the article, you write about how you wanted to take a closer look at phrases and full news reports that rely particularly heavily on quantification. So one of the things I'm wondering is just like, how did you determine whether something is quantitatively dense? Like, how did you define that term? Or, you know, what makes a given report heavy or light in terms of the numerical information it contains?
I can take the first pass at this question. We had a fairly complicated coding scheme involved. So we broke up articles into chunks, and each chunk could receive any of a whole bunch of codes. So things like this are about official statistics. This is about percentages. And, really simply, any story that had an average of more than two codes per unit of text, we classified as dense. I think that was about a dozen stories. By contrast, there were stories in our dataset that had barely any codes at all. So two per chunk of text is really quite a lot.
Yeah, we had almost 300 articles in total, so 12 were really bad examples. Or were really good examples. However you want to put it.
I wouldn't say containing a lot of quantification or requiring a lot of quantitative reasoning on the part of the reader is bad per se. I would say that it makes it more difficult for your average audience to understand. It could be appropriate in something like a financial specific publication, which is where I know at least one of our articles came from.
So things that are aimed at a more specialist audience. So this, these terms of like, codes and clauses actually kind of gets me into the next question I was wondering about. So John, you mentioned that you have almost 300 stories that comprised your dataset for the study, which is like an awful lot of content. And in the paper, you talk about how you broke all of these stories down into I think, something like 9500 or so clauses. And you have some codes that are at the story level, other codes that are at the clause level, I think, you know, everyone basically kind of knows what a story is, but what about a clause? Especially in terms of this journalistic context that you're working in? Could you maybe just help us understand what a clause is, how you defined it, and just how you went about breaking down these stories into their component clauses?
Jena and I agonized over this, because there's a linguistic version of a clause. And there's what we did.
As a linguist, I have strong feelings. But at the end of the day, we needed something that would be consistent and relatively machine parsable. Because just breaking it up alone would have been days and days of agonizing work by hand.
Yeah, I wasn't about to copy-paste every single thing by hand, much less do it for grammatical clauses, which I would actually have to think about-"is this a grammatical clause?" So we ended up separating by periods and semicolons. And I was able to, with the HTML that we scraped from the news websites—oh, I should explain. We sampled from Google News, the top stories in four different content areas. So business, economics, science, and partner businesses, economics, business, science, health and politics, we took, which are known to be like more quantitative focus than, say, your entertainment page. So we sampled from those we scraped everyday the top results, we also added in some content from our partner on the grant, PBS NewsHour. And so I took that scraped HTML, stripped out a bunch of the formatting and parsed it into clauses, which I still had to do some cleanup by hand afterwards, because we were getting content from a wide variety of news sites. And it's hard to account for all the potential differences in how things are laid out.
Also I have to give another linguist caveat, which is that anything that was a video or audio—okay, so people don't speak in sentences; punctuation is a thing that only exists in writing. There's a lot of misunderstandings about that. But for that reason, so that we would have something comparable, we use—wherever possible—the official transcript provided by the news outlet on the assumption that whoever was doing the transcribing would use their judgment in much the same way that they do when turning a quote into a written quote, for example, so that we would have kind of fairly comparable units of text. Not perfect, but I think a pretty solid compromise.
Yeah, it sounds like this is one of the initial challenges of the project, right? Like, having these definitions of things that you can use for different kinds of content, whether it's text based, or video or something else even, right?
We went through multiple iterations of developing the codebook as well—the rules for when we would say that a certain clause contained a certain type of quantification.
And we should say that code is a classification category. In the rest of the world's language, I think code is only used in the behavioral and social sciences.
I used to be a software engineer and it confused me so much.
Yeah, yeah. Well, that actually kind of is a nice segue into the next question I wanted to ask all of you. So you said we've got these like 900, sorry, 9500 clauses, and then you segment them out into the individual clauses. And then you code the clauses. And you have some that are story level code, some that are clause level codes. And this is just another thing that I bet, you know, audiences are interested in learning more about is that, you know, like, what were the different codes? And how did you go about assigning codes to particular clauses? Like just how did all that work?
Sure. So we had a number of different codes. I think Jena already mentioned a couple of them. We had some that are pretty basic, like “magnitude and scale” was one that just meant like, there's a number here like such and so many people, $50—something like that. And then we had “proportion or percentage,” which is also really self explanatory. A code that came up a lot in a lot of different stories was “comparison”, which is pretty much what it says on the tin. We had “risk and probability,” something that you hear about a lot, even if it's maybe not given as a specific number. You'll often hear things like “the unemployment rate is forecasted to rise in the upcoming months.” We had a code on “research methods,” which could include enumerating everything or doing a sample, we had a lot of stories, because we did this data collection at the very beginning of the COVID pandemic, we had a lot of stories that reported on case counts. And those case counts often got another code that was for “official statistics”—which is anything that's released by the government or a government-like organization. Like the census and stuff would be in there too, and would also be the research methods, there would also be enumeration. And then, we had a code for “central tendencies and exceptions.” So, saying something like "the average American makes this many dollars," or, "this summer was abnormally hot compared to a typical year" would get that code. And then we had, similar to that, we had “variability,” which was talking about how subgroups differ from the overall group. So, saying something like "white Americans agreed with this statement," or, "across the board, people are supporting this candidate." So, those are the codes that we had. As to how we assigned them: we went through several rounds, as a team, with stories that were not from this sample, but that we collected earlier, and spoke together to make sure that we could—we'd code it separately and then speak together to make sure we were in agreement about what should be assigned, which codes. And after we'd done a couple practice rounds of that —we even checked interrater reliability, which just tells you sort of how consistent people were with each other— then I ended up being the one to assign codes to most of the stories and clauses in the data set.
One thing that I can add just really quickly, which you didn't ask, is that the set of codes itself was developed iteratively by reading news stories, talking about them. We had it reviewed by journalists, we had it reviewed by math professors, it was the consensus of a lot of different kinds of expert opinion that went into the what of the codes.
So you had to think after looking at things like what do we call this particular code? Like, what's the quantitative concept behind it?
And what's a bucket that's big enough to tell us something meaningful, but discrete enough that we can actually do this in a consistent way?
So, replication. Replication is a big—I mean, you were talking before about it being useful to others. This needs to be a method that can be useful to others, including the way we arrived at consensus. And, you know, we had a very high level of consensus by the end. So, lots of times with these things, the issue of objectivity comes into play, which is a sort of ridiculous notion, but because the only way you the only way to achieve objectivity is intersubjectivity, and interrater reliability is a way of measuring whether there's, this is an inner subjective reality—meaning, do people agree? If they—when they're working independently, do they come to the same conclusion? And yes, it took some training because this is not a normal pattern of thought you have. You don't sit around saying this: "these are the kinds of quantifications in this article." So yes, that required training, but after training, they could maintain that level of intersubjectivity.
And did you have to do the coding manually by yourself going through them kind of one by one, because that's a lot of clauses.
Yes, I did have to read every sentence. We did use a software called Dedoose that allowed me to sort of mark sentences up with the codes that they contained. So I didn't have to type it all into an Excel spreadsheet myself or anything, but I did have to read through everything. Yes.
The software also allows you to manage different levels of data hierarchy, so that you could have, you know, clauses within a document, you could pull up all of the things marked with a particular code, if you wanted to check that they were consistent. So it's set up for this kind of analysis, but it doesn't do the coding for you, it just puts the database structure in place for you to manage it more effectively.
Yeah, they could fly out of your control very quickly. Yeah.
So that is a lot of work, even just assembling this and assigning the codes, because as you talked about in the article, right there, there are some clauses that have lots of codes. So you have to figure out, you know, its magnitude and proportion, right, or variability, right. And so in the paper...
Just as an aside, as you said it is a lot of work. I want to take advantage of you flagging that for an ethics consideration that I wish people gave more attention to. We were doing a lot of this coding. I mean, Bennett was doing a lot of coding, in the second half of March 2020. We'd just been sent home from work. And half of the stories were about COVID. And I would say a huge part of our weekly check-in every week—and really it was bi-directional—both of us was going, "how are you holding up?" "Is this topic too stressful?" "Do you need to do something else and take a break for a little while?" Because it's almost hard to remember now, but reading those stories at that time, was—at least for me—viscerally anxiety-inducing. I couldn't read more than about two or three of them in a row without needing to do something else.
And I think sometimes we don't talk enough about the emotional work of certain kinds of research, and the ethical challenge of that. So that was something I really tried to be very mindful of—that the topic was hard, especially at that time, and just making sure that we kind of both built ourselves in the breaks that we needed to be able to manage that.
Sorry, that was very intense. But I just, I wish, I wish people just had these conversations.
No, I would totally agree that the emotional labor is a huge part of it, and a huge part of what makes it so difficult. Like, it's just as someone who researches and writes on medical topics himself, right? Like, yeah, sometimes you don't want to and sometimes you need to put things down for a while, because you're learning about people getting sick and dying. Right. And you're worrying about yourself. So. Yeah, yeah. So, yeah, so as we said, we have these like 166 clauses that you focus on in "Number Soup." And you focused on these because they're the ones with the most codes—where they had four codes, or five codes, or in some cases, six codes. And you found that these clauses came from 72 different stories. And these 72 stories, as I understand it, they're kind of like, the worst offenders. Although I take what, you know, what Ben said earlier about how, you know, like, it depends kind of where, where it's published, because you can assume different levels of statistical literacy. But these are the ones that are kind of like, just jam-packed with statistics, right?
The least accessible for sure.
Yeah. Um, I was wondering if you could maybe walk us through like, one of the worst instances of bad quantitative reporting that you encountered in your research—like if there's a particularly egregious example you want to highlight and also just kind of like, what was going through all of your heads as you read examples like this, and you're just like, “oh, man, oh, no. What's happening?”
Do you want to—if I read, will one of the two of you jump in anytime I say something that you would code and explain it. Does that work? I'll read—this is one sentence, mind you. "Seoul, South Korea—Associated Press. South Korea reported an eight fold jump in viral infections Saturday."
Okay, so that's proportion.
And also official statistics, right, it's South Korea...
Because South Korea reported it. Presumably they're in the health department.
"With more than 400 cases, mostly linked to a church and a hospital."
So that's magnitude.
And also probably variability, because there's a sense in there, that...
Because it's saying that they're mostly concentrated in this specific region.
It's not just sort of these 400 random people. "While the death toll in Iran climbed to six." That's, you know, more of the same.
It's just more magnitude and official statistics.
"And a dozen towns in Italy effectively went into lockdowns as health officials around the world battle a new virus that has spread from China." I feel like we tagged something else in there too.
Funny story, I tried to rewrite this one better for an example. And I went back to the numbers for that week. And I actually couldn't figure out what the "eightfold" was referring to, or what the "400" was referring to. Because I couldn't see—the only thing I found that would have corresponded to an eightfold jump that had a number like around 400 was over a three day period. And it seemed weird to say “eightfold jump Saturday” if it was an eightfold jump from three days earlier, rather than like one day earlier. And so I actually don't even know exactly what they were trying to say except, "holy crap, suddenly, there's a lot of COVID cases in Seoul." Or not Seoul, in South Korea, excuse me, but...
They probably would have been better served by doing it that way.
I genuinely couldn't figure out how to rewrite it better.
Do you think that's a case where, you know, they're just taking directly what the government source said?
I mean, yes, but I do...
I mean they're taking it from three governments, right? They're taking it from Korea, Iran, and Italy.
I do want to take a second and channel our journalist co-authors and cut them a lot of slack for a bunch of reasons. Right? Number one, this is a moment of complete global crisis. They're just trying to get stuff out as quickly as possible. Number two, there's kind of no consensus yet on which numbers we should be paying attention to right at that point in time. We have a much better sense now of, kind of which numbers are actually useful to pay attention to—what's a good comparison. And so for both of those reasons, you know—I don't—I completely understand why it's written the way it is. That said, yes, they were trying to relay information from three different governments as quickly as possible, for sure.
Yeah, and this actually, like, from a methodological standpoint, is one of the things that makes the paper especially fascinating for me. So you've done all of this kind of quantitative heavy lifting and processing all of these stories, but then when you zoom in on these particular examples, right, then you kind of switch gears a little bit, and it becomes a bit more qualitative. And that opens up another question that I was, you know, wanting to ask you about, which is like, how did you find it like combining these different methods or using tools that are, you know, typically placed either in the qualitative, or the quantitative methods boxes and like, bringing them together? Was it challenging, like having to switch gears in that way? Or was it just something that kind of happened naturally?
This is a John question, 100%.
Yeah, so I'm trying to tamp down all the welling up of strong feelings about this, which is: I don't think qual and quant exist; they don't exist. They're just methods. They're just tools. We are just looking at behavior or data or something, or a phenomenon in the world. There are different lenses you can put on it, depending on the questions you ask or the—you know—the questions you would like to ask. I mean, I did not even hear about this argument until I came to Knology. Because, at least in the way that cognitive science and psychology are practiced nowadays, there has to be a constant seamless interplay between all sorts of methods and tools, there's just no—there's no way to understand human beings, like just putting a particular lens on them. Or, you know, most of what we look for—I mean, since before the behavioralists—you know, there was a physical stimulus and a physical response. So that means you are looking at something there—nobody's filling out a survey—you are looking at behavior. And there are many ways of understanding behavior. That said, there was some mindfulness about it here, in that we definitely wanted to bring together some of the tools of data science. Because going through the corpus, [and] creating the corpus is computational corpus analysis--you know, part of data science. But then we added the layer of hand coding. So that's human coding. That's a sort of human judgment coming into it. But even that, then, to assure replicability and to assure that we've reached the level of intersubjectivity—that's, again, a quantitative method,
And then this piece is almost—in some ways—almost pure discourse analysis, which is what I led, but the choice of what to analyze came directly from all of the steps that John just described. And not only were the quantitative and the qualitative seamless, but the link from the applied side of things was also very seamless. It's not a coincidence that half the authors of this paper are journalists. We couldn't have done it without that perspective either.
We were going to originally publish John's paper and Jena's paper as one combined together, but we thought it would be easier to find homes for the content if we focused one on a more journalism-focused journal. And John's research is published in a paper in Numeracy, which you can check out: "Surveying the Landscape of Numbers in US News."
Yeah, we realized the audiences who wanted, sort of, the deep dive in the practical tips for journalists, and the audiences who care about the representation of numbers in and of itself might not be exactly the same audience.
But if you're interested in overall, what kinds of stories tended to have more quantification? Or could we boil down those million codes that I said, into some smaller easier to understand representation? then go ahead and check out John's paper.
Yes, because there it's about "can we quantify the qualitative structure?" And make it into a quick and easy category that we could share with each other in conversation, where you know, where ideas have to be very compressed. And then allow you to pull threads from there.
On that point, and also, while we're discussing methodologies, kind of as a last question here, I wanted to ask if there were any particular methodological insights that you came away from this research with? You know, did the study give you any new, new ideas about how to do what we might call mixed-methods or transdisciplinary research more generally? Did you learn anything that you think might inform other current or future projects that you're involved with here at Knology? Just wondering if you have any insights to share with our audience about, you know, how all these things work, and what you might have learned methodologically from doing the research.
I learned that 300 articles is not enough to do data science on, because we were trying to build a system that could at least take a first pass at assigning some of the codes we did. It quickly became obvious that we weren't going to do it at the clause level. It then became obvious that we weren't going to do it at the story level either. Yeah, I mean, it is possible. And we don't need to get into the technical details of that. I mean, beyond saying, you know, if you already have a model of how journalism uses language, you have enough of a head start there that a small sample might be a way of looking at how are they doing it differently? But I'll leave the rest of it to others. That was what I learned and others should be aware of.
My biggest lesson is—I mean, I've been here at Knology for five years. John and I started one day apart. And my, this is an example of a bigger lesson, I think, which is that thinking with someone who has very different training and a very different methods toolkit than you is always a good idea. You don't have to know how to do everything. If you know a little bit about everything, then you can break it up so that you are doing the work that you're best at or most comfortable doing, but still get all the benefits of that larger toolbox. So I feel like my knowledge of many of John's methods is pretty abstract and high level—I couldn't do most of it. But I know enough that we can have a really productive conversation about like—how do we get at this thing, and vice versa? So, you know, we spent a lot of time planning and preparing and sparring and fun things happened at the end of it.
And of course, this is ongoing, right. So you have opportunities for continued collaboration in the future on this project?
Honestly, also, I mean, I feel like I'm leaving Bennett out. But his background in programming in terms of thinking about what is or isn't technically feasible, what are the things that a computer is going to do kind of well and quickly versus what should we not bother trying to automate? For this one in particular, I don't think we could have done it without someone with that kind of very comfortable programming background. I know John has some but not quite the same.
I'm very slow as a programmer, and I write inefficient code. So...anyways...You know, there's a lot of talk about what has been lost from not having offices. I'm not sure how much the coming up with the codebook would have been possible, without all of us really together all the time. It would be very interesting to see how we could do it virtually. Because that one required--that one was very much a process where we were all at each other's elbows.
I can say that all of the analysis and everything after that we did remotely, and it went very smoothly.
Yeah, it worked out. Okay. Right. I mean, we spent some time reading one another sentences over slack call. But that worked out. It was sort of like sitting next to each other. Except, you know, no worries about coughing and...
Yes, which would have been disastrous at that time.
Yeah, moderately terrifying.
Well, on that happy and fitting note. I think we can draw things to a close here. I just want to say thanks so much to all of you for taking the time to help us peer behind the curtains of this fascinating study. It's always great when we have the opportunity to talk about the nuts and bolts of what we do here at Knology. And, again, I just want to thank you for giving everyone such a really detailed understanding of, just like, how you designed and carried out this study. It's such an important piece of research. So thanks for taking the time to get into it with me. And take care y'all.
These materials were produced for Meaningful Math, a research project funded through National Science Foundation Award #DRL-1906802. The authors are solely responsible for the content on this page.
Photo by Annie Spratt on Unsplash
Transcript supported by Otter.ai