Online tableau training, tableau tips, & video tutorials

Technical Features

Memberships

An Introduction to String Calculations in Tableau

Preview

Instructor

Ryan Sleeper

Create custom row-level calculations from text using string functions

Learn how to access all the different string functions from within a calculated field, how to do a row-level calculation that creates a custom segmentation, why sometimes a data type of string shows up as a measure, and more!

Hi, this is Ryan with Playfair Data TV. And in this video, I’m going to be providing an introduction to string calculations.

String calculations are one of two types of what are called Basic calculated fields in Tableau. The first type of Basic calculated field is called Aggregates. And the second type is called Row Level. String calculated fields would be considered a row level, because it will put a new column, and there will be an entry on every single row in the underlying data source.

To introduce the different functions, I’m going to start a calculated field over here in Tableau Desktop. And just so you know, you can use this little data dictionary on the right side of a Tableau calculated field to filter the different types of functions. So there’s a whole list of string functions in here. There’s a few dozen things.

I’m going to let you explore that on your own. I realize that not all these are relevant for everybody. So I’ll let you explore that. But you can look through all these different functions, get a definition of what it is going to do, as well as how it is used in the syntax.

But just to show you one simple example I set up before I started filming this video, these string calculations are– they’re literally infinite what you can do with this. So this is a fairly bad use case, I admit. But just to show you the functionality, I’ve used one of those string functions called STARTSWITH. And I’m using the Sample Superstore dataset.

And I’m pretending that I want to segment my sub-categories into new buckets. So I called this Sub-Category Segments. And I’m saying if the Sub-Category starts with the letter A, I’m going to call that my “A Sub-Categories”.

Moves onto the next line. If the Sub-Category starts with a B or the Sub-Category starts with a C, we’re going to call that our “B and C Sub-Categories”. Everything else, so if it doesn’t start with an A, B, or a C, we’re just going to classify that as “Other”.

Couple of nuances to string calculations. First of all, within this STARTSWITH function, this letter here is not– it’s not case sensitive. So you could have lowercase a, lowercase b. It would still properly classify things for you.

Also for these outcomes, you can use both double quotation marks or single tick marks. However, if the outcome has an actual apostrophe, that single tick mark is going to throw things off. So in other words, if I put an apostrophe within that outcome, now we get an error message. It’s trying to close the string. And then that ‘s’ is standing out there on its own. So we’ve messed up how Tableau is interpreting this. For that reason, I’ve gotten into the habit of always using double quotes for my string outcomes.

So there is the formula, just like I like to quality check my calculated fields that are built with numbers, I also like to quality check my calculated fields that are built with strings. But the process is a little bit different. Instead of opening a calculator and doing the math, I put the original dimension onto the Rows Shelf followed by the newly-created version.

So here’s Sub-Category Segments. Note the data type is String. But just like my calculated fields built with measures, there’s a little equal sign before the data type telling me that that is a calculated field.

I will put that onto the Rows Shelf second so that I should theoretically be able to read this table at a glance. And every time the Sub-Category on the left starts with an A, it should be segmented as an A Sub-Category. Anytime it starts with a B or a C, it will be categorized as a B and C Sub-Category. Everything else will be categorized as Other.

And it looks like it is in fact working. Now that I’ve gained that trust with that calculated field, I no longer need the original. And I can use this dimension anywhere I want. I’ve just created this segmentation on the fly.

Maybe I’ll throw together a quick bar chart that looks at average Profit by these sub-categories. Let’s see. I guess that’s somewhat interesting. Let me throw on another measure, see if I can quickly find any interesting insight that we didn’t have visibility into.

So that’s kind of interesting, the difference between average Profit and average Sales for our different Sub-Category segmentation. If I’m the manager of the B and C sub-categories, I’d really like to know that I’m leading the company in both average Sales amounts as well as average Profit amounts. That’s an insight that we just simply didn’t have visibility into until we made this calculated field using that string function STARTSWITH.

And again, these are called Row-Level calculations, because now if I look at the underlying data, we should see a new column called Sub-Category Segments. And there is an entry on every single row in the dataset. So as I go all the way down, every single one gets its classification.

Two more things I wanted to point out with these calculated fields. There’s one alternative way to creating this segmentation. You could use groups. So instead of making a calculated field, you could look at your original dimension, highlight the dimension members that should be in the same segment, and click this paperclip icon. That groups them together.

Once they are grouped, you can modify the alias. And I can call that “A Sub-Categories”. I’ll do the same thing with B and C Sub-Categories. That groups them, concatenates their names. But I can edit the alias and call it the same thing as I did in my calculated version. And then just to be thorough, I’ll go ahead and do this for everything else. This was our Other category.

And note we get another new dimension. This time it is a group. Just like our calculated version, we can use it as its own dimension to create those new insights that we didn’t have visibility into before.

The gigantic difference between the two is the grouping will be static, while the calculated version will be dynamic. So in other words, if I refresh my data source overnight and a new sub-category that starts with the letter A suddenly appears, this version, the calculated version, we did to start the video will automatically pick up that new sub-category. The static group will not. We would have to manually go in and add that sub-category to that grouping to get it to work.

One last thing I’ll mention, and actually two more things. You may see a data type of String occasionally appear down here in the Measures area. So your outcomes, that would mean that your outcomes are strings. That’s why the data type is String. But a lot of people get confused on why or how a qualitative outcome could be down here and classified as a measure.

The difference is it will appear up here in the Dimensions area if the results are static. So every single time an A sub-category starts with the letter A, it will be segmented as A Sub-Category. If the outcomes are dynamic, it will appear as a measure.

So for example, if I was creating a calculated field to segment things with a qualitative outcome but they were based on numbers– something like if our customers had greater than $5,000 of sales, then call them “Above $5,000”– well, that number in that classification of the customers is going to be dynamic, which is why it might show up down here.

And then the last thing I have to say is I want to make a recommendation. Tableau is extremely good as a prototyping tool. You saw how simple it was to make a calculated field. If we wanted to experiment with this specific segmentation, Tableau is fantastic at that. However, once I knew that I wanted to use this segmentation over and over again, I would recommend that you try to move this calculated field out of Tableau and do it in the data source itself before you open Tableau.

String happens to be the heaviest data type for Tableau to process. So every time we’re reusing this newly created segmentation, Tableau’s having to look through all of our records and classify and crunch not only numbers, but it’s having to crunch text this time, which it takes a little bit longer to do. So once you know that you do in fact want to segment your sub-categories in this example in this way, there are better tools for the job. I would try to prepare that, add a column in my database table before I even connect to it in Tableau.

This has been Ryan with Playfair Data TV – thanks for watching!