Name: Wrangle with confidence: Clean, shape & transform data visually with Spotfire®
Uploaded: 2025-06-17T16:16:05.301Z
Duration: 45 min 9 s
Description: Wrangle with confidence: Clean, shape & transform data visually with Spotfire®

Transcript for "Wrangle with confidence: Clean, shape & transform data visually with Spotfire®": We're thrilled to have you with us. I'm JP Richard-Charman, and I'll be one of your speakers today as well as your host for this session. Now before we get started, I just wanted to cover a few housekeeping items to ensure you have the best experience. The webinar will last up to forty five minutes with a q and a that will be held at the end. If you have any questions during the presentation, please use the q and a panel located on the right side of your screen. We'll address as many questions as we can during the q and a segment at the end. We've also made a few assets linked to today's webinar, available in the doc section of our webinar platform. So please feel free to access these, and you can access them, right next to the q and a panel on the right hand side of your screens. Now after today's session, a recording of today's webinar will be made available on demand, and we'll email you the link shortly after the event. Now with that, let's dive in. I'm excited to introduce our main presenter, Arnaud Varin, principal product manager here at Spotfire. Spotfire is the visual data science platform that makes smart people smarter by combining advanced analytics and industry specific visualizations to solve complex business problems. Spotfire has been in the BI and analytics space for a long time. However, the types of use cases that we address are not the traditional BI use cases. Visual data science is at the intersection of visualizations and advanced analytics, a differentiated approach to what has been previously available in the market. Spotfire continues to be the only vendor in the data science space that approaches complex problem solving with a visual first approach. If you wouldn't mind going forward a few slides, Arnold. Thank you. Now we continue to reimagine the space with our continuous approach to innovation. And with that, as you're aware, comes our packaging. Spotfire offers a comprehensive visual data science suite divided into three core products. So Spotfire Analytics, and now this is truly the core of Spotfire for analyzing data and building analytic applications using interactive visualizations and advanced analytics. Next, Spotfire data science, the robust platform that combines machine learning, modeling, and advanced data science capabilities tailored to various industries. And last but certainly not least, Spotfire Enterprise, For the enterprise organizations that need to scale easily and at demand, truly designed for organizations to publish and share analytic applications securely and widely. Now let's dive into our different offerings into a bit more detail, starting with Spotfire Analytics. Now Spotfire Analytics is at the core of our platform, offering a seamless interactive visual analytics experience with such capabilities as interactive visual analytics, where you can explore disparate data sources seamlessly within the unified analytics environment, drill down and navigate across the linked data tables using fully interactive visualizations to increase speed to insight. Visual data wrangling, our main topic for today's webinar, whereby you're able to wrangle and fix data while analyzing it with easy and nondestructive data blending, shaping, and cleansing directly within the visual analytics environment, powerful geo analytics, whereby you're able to connect seamlessly to spatial data, including leading databases, real time data, GIS data, map services, as well as spatial files, and AI powered recommendations, suggesting the most insightful visualizations match with advanced analytics algorithms, helping with uncovering hidden patterns and data. This all allows businesses to make faster data driven decisions with minimal manual effort. Next, we have Spotify data science, which extends our analytics capabilities with machine learning and predictive modeling and enables our key industry customers to solve their industry specific mission critical challenges and drive speed to insight like never before. Spotfire data science includes all the functionalities I just previously mentioned, that are a part of Spotfire analytics and includes much more with data understanding and preparation, modeling and prediction, process improvement, notably statistical process control and reliability analysis. And last but definitely not least, Spotfire Data Science includes industry specific solutions with specialized visualizations and connectivity for industries such as energy manufacturing. Spotfire Enterprise fosters collaboration while ensuring security and governance. Organizations are able to scale their analytics operations while maintaining security and efficiency. You're able to publish and share insights across the organization. Our built in security and governance controls ensure data and analysis access aligns with privacy regulations and internal policies. Through integration and extensibility, Spotfire connects with diverse data sources, third party tools, and APIs for tailored solutions. Automation keeps in analysis is up to date and distributes them on schedule or in response to key events, ensuring stakeholders always have the latest insights. Additionally, automated report distribution ensures that stakeholders receive the latest insights at scheduled intervals or trigger events. Now a key component of all of our products is the ability to go beyond basic date preparation, being able to deeply explore and analyze data through interactive transformation. In other words, you're able to go far beyond data, far beyond cleaning data, but you're also getting the ability to discover insights through dynamic data interaction with our visual data wrangling capabilities. Now with that, I'd like to hand over to Arnaud to dive into more details on visual data wrangling in Spotfire. Over to you, Arnaud. Thanks, JP. Let me just switch screen here. There we go. Right. So oops. Yeah. There we go. Alright. So visual data wrangling. When most people think about working with data, they imagine a straight line. First, you prepare the data, then you clean it, and only after, maybe you can do something with it and get to analyze it. But in the real world, especially in complex domains like engineering, manufacturing, or life sciences, that's just not how things work. Insights rarely come all at once. They unfold. You notice something unusual in the charts. You zoom in. You join with another dataset. You spot a correlation. That opens up to more questions. So you transform the data again. It is a constant loop of discovery, refinements, and deeper understanding. That's what we mean by data discovery being iterative, and that's exactly how Spotfire is designed to work. Spotfire encourages you to explore first and refine as you go. You don't need to write code or wait for someone else to prepare your data. You start with visuals or let the AI suggest something smart. And from there, every interaction moves you forward. This approach, what we call visual data science, is what makes Spotfire so effective for subject matter experts. It empowers engineers and scientists to lead the analysis in even if they don't have a deep statistical background. Because when you can see your data and interact with it and shape it as you go, discovery happens naturally. This visual and iterative approach to this data discovery isn't just a design choice we made. It's something that's been embraced for years by scientists, engineers, analysts working on some of the most complex problems in the world. And Spotfire has become a go to tool in industries like energy, manufacturing, life sciences, and semiconductors, not because it's flashy, but because it fits the way these professionals think. It helps them stay close to their data, respond to what they see, and move quickly from questions to insights and from insights to questions. Few examples. In oil and gas, reservoir engineers use Spotfire to combine production data with geological models and well logs, helping them identify optimal drilling zones and track decline curves in real time. In semiconductors, yield engineers visualize and correlate wafer defect with manufacturing conditions, spotting spatial patterns that point to underlying process issues. In pharma, scientists combine compound structure data with assay results to identify promising candidates and track dose response patterns across trials. And in manufacturing, process engineers monitor equipment data and batch records to catch anomalies before they become failures, often drilling through thousands of rows and thousands of columns of sensor data to identify one key pattern. These are not simple use cases. They require agility, transparency, and the ability to explore at the speed of thought, and that's where Spotfire shines. It puts control into the ends of domain experts so they don't have to wait on data scientists or IT to uncover the answers they need. So when we say that Spotfire is trusted by experts in complex industries, it's because we've been helping them solve their real world challenges visually, interactively from decades now. Now I want to talk about one of the biggest differentiator of Spotfire. This is the ability to wrangle data as part of the analysis, not as a separate step you do somewhere else or before you begin. In traditional workflows, data wrangling, like cleaning, joining, filtering, reshaping the data are done upfront, often by someone else, actually, and locked in before you even start your analysis. This can slow things down, and worse, it assumes you already know what questions you're going to ask. But in Spotfire, the moment you spot something interesting, like an outlier, a missing value, a trend break, you can act on it immediately right from the chart. You can filter it down, slide it, replace it, or drill deeper, and the changes are instantly reflected in your data analysis. You can also blend datasets on the fly, compute new columns, or response to AI powered suggestions that help you clean or reshape your data with a click. And all of this happens visually. No code, no switching tool, no interrupting your flow. And it's fully traceable in the data canvas, which shows you every transformation you've made to the data in order with the ability to go back and adjust any steps at any time. This makes Spotfire incredibly powerful for users who need flexibility and control. You don't have to stop exploring just because your data is not perfect. You fix it and keep going. That's what we mean when we say render while you explore, not before. So what does this actually looks like in practice? Let me walk you through a typical workflow that the Spot area user might go through, whether they are an engineer, a scientist, an analyst. You start by opening a dataset, maybe from Excel or a database or a live sensor feed, and Spotfire instantly visualizes it. No need to configure anything upfront. You're looking at your data right away. You might notice a spike in the line chart or a strange cluster in scatter plot, so you zoom in, filter, or highlight it, and that's already the beginning of your discovery. Maybe that anomaly makes you realize you need to bring in a second dataset, like equipment settings, lab results, or customer data. You join it on the fly visually. Then Spot here suggest a few visuals you might want to try or flags value that looks inconsistent. You accept a few recommendations, tweak a column, maybe create a calculated metric. And all the while, your visualizations are updating in real time, giving you instant feedback and helping you ask better questions. What's important there is that everything happens in one place. You never leave the spot for your visual environment, and you never lose track of your steps. You're wrangling and exploring at the same time as the tool, and the tool is working with you, not slowing you down. Now that we covered the philosophy and the real world con context, let's see SpotHero in action. Alright. In this quick demo, I'm playing the role of a process engineer at a manufacturing plant. My role is to understand and improve production quality by investigating batch quality issues and discover relations between quality and sensors behavior. I'm starting from scratch here. I launched Spotfire without any data connected yet. Everything you see is built live from zero, so that you can see how visual data wrangling and visual data science in general runs naturally, through the analysis. First, let's bring in some data. It's popular you can load local files, Excel, CSV, text files, or you can connect to a large catalog of data sources, databases, cloud storage, even, streaming sensors data. And the experience is consistent consistent no matter where your data lives. For the purpose of this demo, I'm using a local file. Here, a batch overview file, which contains data about, my production batches for q one, quarter one, from last December to, February. If I open the data panel here, you see that we got some production start and end time, some operators' names, writers' names, some quality results whether a batch passes the quality check or not. We have different shift types and a batch ID identification. Actually, when you are in the data panel and you open the cover here, this is where you manage your columns. This is where you will be able to review, or and to change, if necessary, the category of the the column, the data types, the ordering, the formatting, used standard in your analysis. I also have sensor data logged during production that I want to use as well. So I will bring this in to the sensors reading here, and I open. Here, you see Spotfire offers me the choice to add this new data either as a new data table, so independent from the original data table we imported, add rows to the initial data table data table, or add as columns added to the original data table. And that's what we want. We want to augment the initial, dataset with sensors ready. Project click add as columns. Of course, we can inspect the join setup here. We see that it uses the the batch ID, to match the columns. Here I see the the new columns that will be added to my original data set and the join settings. We can clearly understand understand visually the different ways to join the data. But here, the auto detection seems to work. It works perfectly. So I just I would just click okay there. Everything we've done so far, loading data, joining files, is tracked automatically in the SpatIO data canvas here. This view gives us a visual data pipelines of all the steps taken on the data. At any point, I can trace back or change my wrangling steps, And anything we'll do, along the analysis is recorded and will be recorded there and in a non destructive way. I can always get back, in time to change things and go back to my original data if I need. Now we can begin, the actual analysis to investigate if and why some batches fail. We can explore data in different ways in Spotfire. We can start from the data, to get AI based visual suggestions. We can start from visualizations directly if we know, what exact chart we want. We can use search, or we can also use Spaflier Copilot for guided analysis, asking the AI questions about the data and where to start, for instance. Here, I'm searching I wanna start from the data. So let's see what's the AI in Spafire recoupments. When I select the column here, Spotfire AI will look at relationships and patterns in the data to suggest visualization to me. This is a great way to get started, and I can just drag and drop the visual. I select operator. I get different recommendations. Also notice this recommendation here. Spotfire detected a data quality issue, operator names with double spaces. And I can fix this issue with just one click, and I'm done. This is fixed. This is yet another example of how data wrangling is done visually in Spotfire here through the help of AI. The category names were inconsistent, just a few extra spaces here and there, and Spotfire lets me fix this with just one click. Now let's say I know what I want. I want to create a bar chart, and I want to look at SIFT and their relation with quality results. So just using drag and drop, I configure my visualization. I also want to color this video right here. And you see from the visual, I manipulate the data, These are the different type of data I wanna see or the deaggregations, all from the visuals just by dragging the data. But so here we are looking at failed and passed quality checks based on the type of shift. And you see, again, something in my data looks up here. One of the shift category is spent night with two teas instead of one. And I can fix that. Just selecting the data here, looking and creating a state visualization as a table to look at the raw data. From there, I can replace all the occurrences of this value by the right one, night, and hit apply. And the visual updates instantly. Again, we wrangle data directly from the visualization. No need to open a script editor to write expressions or, use other tools outside. This is done from the visuals. Okay. This bar chart tells us something. Most failures I mean, all failures actually happen during night shift. And these insights, that's our first insights here, leads us to new questions. Is this issue more prominent in recent batches? Is it always this way? What's going on, with these failed batches? Let's look at it over time just by adding a time dimension here, so the time stamp here. I'm tracing this visualization, sending the panels to look if it was a particular particular month or if it's general. And it looks like, this observation is shared, on these, three months, of data that we have. Now I want to dig deeper and add newer batch records and compare this data with more recent performance. So I have another dataset here, which batch records for quarter two, with batch record from, March to May 2025 that I had. And hit okay. And here again, Spotfire recognizes automatically that the data structure of the file I just imported matches with our original data set and suggests attending this data as rows. Of course. Again, I can inspect how the join is set up. No additional columns is added here, identification of the original rows. But what what we are did just works for me. So I hit okay. Here we are extending our dataset with more recent records. Now getting back to our earlier bar chart, you notice it updated to include the new rows automatically, and it looks like the pattern continues night shift, still leads to failures. But what's the reason? We should look at sensors reading now. So I go back to my data panel, select the different readings we have, and I want to look at this over, the time step information. And spotify your free suggest to create a line chart that I just drag over my visualization canvas. Here, it's by day of month. I want more precision. Again, I might manipulate the data all through visuals. Here, I'm at the second at the second, precision. And I want to investigate, the correlation between, the shift, batch failures, and sensor rating. So what I do here, I simply mark the fade the batches in the bar chart, and they get automatically highlighted in the line chart. As you can see, in Spotfire, when you mark data in one chart, it it is this data is automatically highlighted in all the charts as well. That's what we call automatic brush linking. And in this case, it clearly highlights the fake batches, in the in the line charts. We can zoom in the data to get to more precision, look at it over time. And what we clearly see here is that during, phase night batches, temperature and vibration fluctuate abnormally. The temperature may get very high, and that may be the cause of the failure. So we've identified a clear pattern here. Night batches are more likely to fail, and sensor data supports that there is processing stability at night. And, again, these new insights gets us to new questions. Is it the equipment, operators, something else? And the next step might be to compare operators. So we could decide to open the filter panel and look compare the different operators or look through different batches. I could continue like this and drill deeper, but I will stop there as I hope you now get to spot your experience and what we mean by visual data, I mean. Very quickly, we have insights, each informing the next step of our analysis. We started from scratch, and along the way, we joined files. We fixed data issues. We added new data again. We created visuals. We filtered data and used AI powered suggestions, all visually, all iteratively. My quick demo was not just about data wrangling or data preparation as you would do before, the the fun begins in traditional workflows. But in Spotfire, it's something you do as you explore and discover. Spotfire truly makes data wrangling part of the analysis process, and I hope, I successfully demonstrated that to you. If you get back to the slides. Oops. Thank you very much, Arnold. I have one, I have one more thing. Amazing. Well, thank you as well for that incredible demo. I'm sure that everyone watching can agree with me that it was a fantastic demo, really showing a full workflow of how you're wrangling the data from the visualization. But with that, I'll let you finish on, this last slide. Thank you. Just to as a conclusion to to what I said today. Just to bring it all together, if you take one thing away from today, let it be this. With Spotfire, you don't have to choose between, speed and depth or flexibility and control. I just saw you can explore data, visually transform, and save it as you go and then cover insights in real time, all without writing code, without switching tools, without slowing down your process. So whether you're analyzing wealth, multi monitoring yields, investigating anomalies, or optimizing the process, Spotfire here helps you work, the way you think interactively, visually with full transparency. We think this is a different approach to data. It's not just data preparation. It's not just data visualization. It's a smarter, more connected way to get from raw data, to real insights, and it's already been used, by thousands of engineers and scientists and analysts solving the world's toughest problems, to date. Up to you, JP. I'm done. Perfect. Thank you very much. If you won't mind going to the next slide, I'll know. But thank you once again for that insightful presentation, and thank you once again to everyone who joined us today. Now before we get on to our q and a segment, just a few things that we wanted to share. So in terms of webinars, we have two great series that are being added to on a regular basis. So you can see those up on the screen at present. So whether you're looking to find out more about Spotfire or looking to learn about the latest in terms of what's new, don't hesitate to register to the full series, and we're constantly adding new events to our events section on spotfire.com as well. As I mentioned at the very start, recording of today's webinar will be made available soon, so please do keep an eye on your inbox for the on demand link. If you wouldn't mind going to the next slide, Elmo. But in terms of next steps, if you're interested in learning more, feel free to visit our website at spotify.com or contact us directly. There are lots of ways to interact with us. So whether it is via our socials, through our community, so Spotify community. Additionally, our blog site has lots of great content that we share, where we share the latest on visual data science, dive into Spotify data science in more detail. And last but not least, if there are any enhancements that you would like to see or have any ideas that you would like to share with us, please don't hesitate to visit our ideas portal and, add your submit your ideas there, and we'll look to ensure that, your ideas are incorporated into, the roadmap of Spotfire. Additionally, as I mentioned at the very start, we have some documents linked to today's session. So please don't hesitate to access these. We have a net new data sheet on visual data wrangling, three quick tip videos on visual data wrangling as well, which are fantastic and go great in tandem with Amel's demo as well. Additionally, we do have a new Spotfire data science trial, and you can access the link directly on our website. Additionally, in the doc section of this webinar, we added the link to said trial. Now with that, I'd like to move on to our q and a segment. So just a little reminder, if we don't get to your question, we'll follow-up with answers via email. Now in terms of the questions, so we've had a few questions come through. The first one being, how does Spotfire's approach to visual data wrangling differ from traditional ETL tools? Yep. But, hopefully, what I demoed showed clearly the difference. I think it each end tools are a great and matching the manipulated data, but they are they are separate from the analysis process. It's two completely different set of capabilities. In retail tools, you prepare the data. And, usually, if you are the analyst or the scientist who analyze data, it's not even you who will use the detail. It's someone else that will do this part for you. So it's more like it used to be someone else or a data management team, which would then pass it along, to you for for analysis. I think Spotware flips flips that model by by this type of integration between the data analysis, the exploration, the discovery, and data wrangling as part of it. Wrangling is part of the analysis in sport field. Perfect. Thank you, Aaron and Sharlow. Throughout the presentation, you mentioned complex data challenges. So this next question is linked to that. So what types of messy or complex data challenges does Spotfire handle particularly well? And is there any real world examples that you can share? Yeah. I showed simple examples in my demo of messy data that you can quickly fix either directly from the visuals, replacing value, for instance, in the first two clicks maybe, or through the AI recommendation system we have as well. But that then, I mean, when it comes to complex data, a key a key strength of Spotfire being able is this multi source aspect. Right? You will not you will you can analyze one single data file, but where it is strong is when you analyze multiple source at the same time. When you join the data, when you relate the data together, maybe use visuals to relate the data to find correlations between multiple data source. Yeah. Beautiful. Now linked to technical and nontechnical users, how does Spotfire enable collaboration between these technical and nontechnical users during the data preparation process? Yeah. So, when you use Spotfire Enterprise, you get the Spotfire library. And then Spotfire, the enterprise Spotfire system becomes kind of a a shared work workspace where you will find, domain experts being able to share their own assets, analysts doing the same, and also data scientists who can contribute. I haven't demoed this today, but we have, pretty strong capabilities when it comes to predictive modeling or machine learning or, calculations through pythons or are two simple expressions. Right? So data scientists can create their own functions and share that throughout their teams. So each level, each of their level of expertise can share, can share and collaborate, using this particular library. So you might have a you may have you may be a process engineer, who create transformations, visually. You may be, like I said, a data scientist, could, share some workflows through R or Python data functions. That's the way it is sharing visuals, functions through the library. Another thing is the data canvas. Like I showed you in my demo, every when it comes to data analysis, everything is tracked on the data canvas. So from the data canvas, there is no confusion of what has been done and when in the analytics this process. And, and who did what, basically. We also have a versioning system. So when you have a data analysis that is shared with your analysis, every every new update you made you make to analysis is is versioned where you can add a version and and, and comment on what change have been made. So all of these make collaboration easy. Okay. Thank you very much for that. Now this next question is very much linked to what you just mentioned in terms of collaboration. But with the increasing variety and volume of data sources, how scalable is Spotfire's visual data wrangling across enterprise environments? Yeah. I think by design, Spotfire Spotfire is designed for enterprise scale, deployment. We have Spotfire analytic applications that are, good by our customers or partners. Some of them are consumed by thousands of users in parallel at the same time. So Spotware is scalable at the enterprise level. It can also connect to virtually any data source, I guess, from Excel, of course, to, Snowflake to real time IoT systems. And when it comes to data wrangling, what what I showed you today, even if it was simple, it works, across small datasets and large scale, distributed datasets. Behind the scene, Spotfire, pushes the computations to the right place, whether it is, in memory, in the Spotfire in memory engine, or, or in database to to find the best way to optimize performance. Plus, I think we have great integration with standard governance and and and security frameworks in place, meaning that, especially I can even scale across across teams without without sacrificing, control. That's it. Thank you very much, Arnold. Just couple last questions here. Can Spotfire's visual data wrangling capabilities be automated or reused? Yes. So all transformations all everything I did today in the demo, can be automated as data updates. So if the analysis I I created today, I save it to the SpotHero library, and I can set up automatic updates. I can say, okay. I want to update this data every day, every hour, every week, whatever. And which means that every time the data is updated, all the transformation steps that I did are reapplied automatically. That's, one case of automations. When it comes to renew reusability, I guess, there are two different strategies, that we see with what we are. First, the first is to create templates. That's the main use cases. Analysis content templates that contain entire workflows, to apply to similar, datasets. So you create a sporter analysis with different data wrangling step or data preparation or prediction steps, and you save it as a template in the library, and you and reuse that with similar datasets. The other strategy is to use actions, the actions modes framework, like, that we released recently. And with actions, you can package complete analytics workflows, I'd say, that contains data manipulation, preparation, computations, and visualizations in one package. So you could have one action that, for instance, apply some data cleansing and then do some prediction and then automatically create a visualization with the appropriate, visualization types too to understand the results. So I think actions more and more, is I see action one more as a way to to reuse analytical workflows, not only data wrangling steps, but visualizations as well. Fantastic. Well, thank you very much, Samuel. Our last question for today, is on topic of real time. Does Spotfire support real time data wrangling or streaming sources? Yes. Spotfire handles real time data for quite some time now, thanks to our built in streaming capabilities. So you can wrangle streaming data just like static data. You can filter it, aggregate it, and combine it with historical data in the same fashion as I as I showed in the in the demo today. Fantastic. Well, thank you very much, Arnold. We don't seem to have any other questions at this moment in time. But then again, if there are any questions that come up in your mind, please don't hesitate to reach out to us directly, and, we can answer those as well. Now once again, just wanted to thank you all for joining us, for today's session, and we hope to see you at one of our future webinars. Just wanted to highlight our next webinar is actually next Tuesday, and it is a joint webinar between Spotfire and Bark Research where we will be diving into visual data science, specifically, and the need for visual data science within the market. And now with that, just wanted, once again to wish you a great day, and thank you once again. And with that, we will close the webinar. Thank you again.