Transcript
Sam: Welcome to our Semi Insightful podcast. I’m your host, Sam Duchscherer, and today I’m joined by Yoram Barak. Yoram, Welcome.
Yoram: Glad to be here, Sam.
Sam: Thanks, Yoram. Looking into your background, I see that it ranges from animal science, human nutrition, and biotechnology driven innovation. So how on earth did you get pulled into the semiconductor industry— specifically around solving problems with alarm fatigue and maintenance management?
Yoram: Great question, Sam. I have a PhD in animal sciences, that’s correct, focusing on environmental microbiology, and moved into a more specialized area in biotech: how to engineer enzymes. I did post-doc around it, both on environmental and medical aspects, and from there I moved to an industry that does high throughput screening for engineering proteins, enzymes specifically, for industrial applications—both medical, bioindustrial and so on. And in that space, I realized the challenges R&D systems have with automating capability, and this grew on me to a point where I was the point person (in many cases) for how to enhance automation solutions in R&D systems. That’s what I did in biotechnology as well as the pharmaceutical and chemical industries I worked in.
And when there was a point, through the role in innovation of doing kind of Industry 4.0 projects, and Applied reached out to talk to BASF, where I was, and see if there’s an opportunity there, I said that’s a great solution space. Applied has a great portfolio. I didn’t really understand back then why the semiconductor industry would do anything with the chemical or biotech industry but learned through our APG Pharma team and the roadmap and trajectory, that they were looking to penetrate more segments and industries and I liked it. And when the opportunity came to join that team, I said, okay, let’s try it out.
I don’t have a background in software, no developing or marketing either, but it grew on me, and alarm management and maintenance management are the bread and butter of our CIM stack. They are, I would say the lubricating items very necessary, maybe not the forefront of what you must have in your CIM stack for mission-critical operations, but they are the ones that make these work more efficiently.
I’m also the GPM for Asset Trace, which is a traceability engine for durables and consumables, and SmartFactory Monitor that helps you maintain a good hardware operating system of our application and third-party applications and systems in place in-check throughout your CIM operations.
Sam: Wow, that’s great. Very, very diverse background. I love that. I will say that my follow up question then based off kind of your experience: Have you seen a change in the last few years particularly on, you know what makes smarter alarm management and faster resolution so much more important? I think you worded it as you know, it’s kind of the bread and butter of our CIM stack, right? Just elaborate a little bit more on that if there’s been a change in the last few years.
Yoram: There’s an absolute change in the last few years. In the past, our automation stack was focusing heavily on integration, which is still the main focus and what we have to offer largely as the major player in that industry. In recent years, what came to bear more profoundly is the AI capability that you can put on top of it to accelerate things. And the nature of what you need to do to not only improve yield in that sense and then productivity in an incremental nature, but the opportunity space to accelerate things far more rapidly than ever before.
And alarm management, you know, the talking point of faster resolution through alarm management is really how quickly can you get from the alarm onset by, let’s say FD or SPC, or run-to-run systems, to make the command if you will, of “hold “or “stop”; through the severity identification of this alarm; to the end, over to something like Knowledge Advisor to quickly go through the OCAP steps and guide the resolution—the disposition of what needs to be done with this alarm—and close the whole loop back to get the systems up and running in what we call a faster green-to green-element. So, when the system goes down, how quickly can you bring them up, in essence, almost autonomously in the future? That’s what’s changed in recent years: to enable that largely through AI and then our system stacks.
Sam: Based off that example that you listed, can you just explain a little bit more the difference between alarm containment and alarm resolution? I think you kind of mentioned how run-to-run and FD, you might want to do alarm containment, and then maybe alarm resolution you move to Knowledge Advisor. But can you just explain the difference between the two or how this distinction really matters operationally?
Yoram: Yeah, absolutely. So once alarm severity is identified as critical and you need to put the lot on hold or stop something in the process, that’s something that the alarm management can do already. Now there’s a question of what’s next. So, if you don’t expedite the resolution part, engineers still scratch their heads on what to do next, and that’s where Knowledge Advisor comes to play and does it very rapidly. What took hours before in a shift now can be done in minutes. And in the future, as pointed out, if we are successful in deploying AI capability into Knowledge Advisor, it will do it autonomously. So, think about the self-driving fab as a concept from those full-auto type manufacturing systems, going all the way to semi and even manual operation factories currently that can accelerate the way they do things just by this AI layer in the future.
Sam: Is there any particular KPIs that you think can tell whether alarm and maintenance systems are working together effectively?
Yoram: Yeah, the critical part in operation is when… Well, when everything works well, you don’t really pay too much attention to what’s going on. It’s a self-driving or autonomous system, almost. It’s what happens when unscheduled, unplanned downtime takes place. How quickly can you get it back up and running? And this is where integration between alarms and the process quality type stack and downstream productivity stack and the MES are crucial, and even the maintenance system. Think about it. It’s a work order or work request management system that works in high velocity with larger than 1000 tools capabilities—very high availability, high fidelity type systems—that work in the CIM stack like ours. That’s a very high-volume type manufacturing system. Everything has to work across the board on time, 99.9% of the time. And if you don’t have it, then you leave money on the floor because in these high availability, high volume manufacturing systems, if they don’t produce the millions of chips per month or quarter that they are targeting then they lose money because the capital intensity they put forward and the operational cost they have is immense. This system has to work 99% of the time, and that’s where our system plays a major role in doing so.
Sam: So, you’ve painted the vision for me, of what good looks like when implementing these systems and they’re fully mature. But I guess as customers start to do this implementation of especially adding AI into these process flows, where do you think that they should start? Where would AI add the most value with regards to alarm and maintenance management?
Yoram: With alarms specifically, there are several layers where you can think AI can help. Think about the training portion of bringing new engineers to identify what’s important and what’s not. Chatting with alarm data or documentation and best-known methods to classify things, if not already automatically done, minimizes what we call alarm fatigue. You know that it’s well known that some of the new systems have like greater than 1,500 sensors on each tool, which is remarkable. Each one of them has its own drifts and challenges while operating the wafer manufacturing. The nuisance alarm or identifying what alarm severity is at real time can be a challenge, so you have to reclassify and make sure that you minimize this fatigue by occasionally correcting for these drifts.
The detection of misconfigured alarm severity is, as pointed out, also a very good utility for AI (especially in high volume) and the identifying of unrecognized alarm patterns to put forward new alarm code or rules to be implemented as systems mature and change over time. There’s some drift, some change management in place, so the ability to make recommendations on new alarm rules or modifications is a big opportunity for AI systems.
Now that being said, it’s also a challenge. We know that with technology advancement, there’s also some limitations with AI systems. We have to put good guardrails around them, not to be too intrusive to other systems within manufacturing, but also for not just security, but also for non-hallucinating type behavior.
Sam, you’re probably very familiar with the concept of data corruption, right? If you ask simple questions in your ChatGPT or any other—Gemini at home—you ask the question again and again, it will always give you something different in return. So how to make sure that you are tightening this to a point where the answers are robust, consistent, and reproducible, and that can be trusted, is a big challenge for AI systems today. And that’s where I feel—and then I think the whole industry feels—that the human in the loop is still required for many systems these days. But, you know, technology evolves so quickly that it could be that self-driving fabs will be arriving faster than we can imagine.
Sam: Yeah. So you touched on a lot of things there, especially the guardrails. I agree with you where it has to be human in the loop, deterministic worries about data corruption. Would you advise your customers to put all these guardrails in place before any point in time moving to production?
Yoram: It’s hard to know what all the guardrails are, right? But you have to have some confidence level. And this is a message we received also at a recent conference that there was in Europe. There was an interesting use case for alarms coming out of fault detection, a natural alarm management system on its own. The person who presented, it was a continuum of a previous presentation he gave. In the past three years, when Open AI just came out with ChatGPT, whatever version it was, they were having the opportunity to start using it and then it wasn’t as great in the initial round—about 50 to 70% correctness with a lot of human involvement in training. They weren’t so happy with that, so it didn’t even leave the R&D environment.
But now last year, it showed a more progressive tightening as the technology evolved with small models available and much tighter control and more, if you will, self-operated correction by the AI to a point where this year he was able to say yes, we achieved 95 to 100% correctness in the response. And then this was more on the alarm severity categorization, but also making, they call it, a predictor for alarms in the future to a point that they feel that they can go to production in the next year for a small POC. So that’s very encouraging for us and the whole industry. That goes in the right direction if a large manufacturer feels comfortable rolling this out into production.
Sam: So would you think that is an early win that you’ve been a part of that helps convince teams about these AI powered alarms being there to help team members rather than police them? Or are there any other early wins that come to mind when you think about how we start this AI journey with alarm and maintenance systems?
Yoram: The proof is in the pudding, right, Sam? If our customer says, yes, it’s working, we can trust it (even if they don’t do it with our systems), then it’s a community of experts puts the trust upward to management that these systems can work and now is rolling it out with our unique IP. What we are developing internally to enable them to do it on a broader spectrum is not just Alarm Management, Maintenance Management, or Asset Trace as we know. It’s the fully integrated systems that play a role in the fab and that’s what our customers are looking for from us: secure system, high availability, high volume type manufacturing that has added value on top of it. And naturally that’s, as you know, to be cost effective. They don’t want to spend all the money on tokens they got, say for their improved yield or productivity.
Sam: Yeah, makes sense.
Yoram: So, everything needs to work to be in place. And to my point initially, the workforce enablement also has to take place. It’s people, technology, and systems that need to work, and the business of course. All three must work in parallel to enable that to be adopted widely.
Sam: Yeah, absolutely. Absolutely. All right, Yoram, due to time, I like to end these podcasts with some lightning round questions. So changing gears completely into something more fun, and hopefully laughable, short answers. Are you ready for that? I think everyone that has done them have done quite well, so there’s no need to be alarmed.
Yoram: Well, splendid. Let’s do it.
Sam: All right, the first one is if alarm management were a superhero, what would its real superpower need to be?
Yoram: Oh, that’s a good one. It should be the vocality, right? It has to reach out to everyone regardless of where they are in the manufacturing, whatever they tune to, and be able to alarm them to what’s important.
Sam: Oh, that’s a good answer. I love that. So being able to yell really loud but also be clear enough so everyone can hear you.
Yoram: Exactly.
Sam: Yeah. Wonder what the superhero would look like for that. All right next one. Have you personally seen alarms bounce between teams like a hot potato? And hopefully you and my audience listening know that analogy.
Yoram: Exactly. Absolutely right. There’s, as pointed out, on non-fully automated system operation when you think about it, where the alarm puts something on hold or the alarm management puts something on hold or stop, there is a head scratching of what to do next. And that’s the hot potato type of who’s doing what that gets scrambled around. Even if there’s a process in place, there’s an opportunity to improve that and then automate it. And that’s really where we come to play.
Sam: Oh, nice, nice. All right. The last lightning round is a little bit different. It’s actually based on your animal science background. So maybe some people might not get it, but we’ll see how it goes, right? If semiconductor alarm management were like the gut microbiome, how would you define alarm homeostasis?
Yoram: That’s a very good one. For the gut microbiome to be at balance and serve the purpose, it has for your health and benefit, everything needs to be in concert, right? And managing the microbiome is one of the toughest things to do. And you can see many publications going around that topic in recent years. Think about the alarm management as the orchestrator for the type of gut microbiomes or the good, bad, the ugly type microbes that live in the gut that sort between the criteria–very similar to the severity that we are talking about in alarm categorization in manufacturing.
And you have kind of a code base that tells you, “Hey, now this is a not good situation. You need to do A, B, C to correct for that.” And then that’s what alarm management can do for you in essence. It doesn’t really recognize the problem, but it does put the categorization and notify everyone what’s going on. It’s a link between the microbiome and the brain or the neuro microbiology type aspects that tells you there’s a problem, you need to do something, and the brain goes back and says do A, B, C.
Sam: Yeah, yeah, no, I’m glad I ended up asking that question. I think when last-minute we wrote it down, I wasn’t sure about it, but it actually was a great answer. Really ties it together, your background and alarm management. So that’s great.
Well, Yoram, thanks so much for this insightful conversation. I personally learned a lot. And for our audience, if you enjoyed this podcast, be sure to follow this series.
All right. Thanks, Yoram.
Yoram: Thanks, Sam. Great to be here.
