microservices Archives - Lightrun https://lightrun.com/tag/microservices/ Developer Observability Platform Mon, 10 Jul 2023 14:53:27 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.2 https://lightrun.com/wp-content/uploads/2022/11/cropped-fav-1-32x32.png microservices Archives - Lightrun https://lightrun.com/tag/microservices/ 32 32 8 Debugging Tips for IntelliJ IDEA Users You Never Knew Existed https://lightrun.com/eight-debugging-tips-for-intellijidea-users-you-never-knew-existed/ Sun, 14 Jun 2020 12:57:11 +0000 https://lightrun.com/?p=2077 As developers, we’re all familiar with debuggers. We use debugging tools on a daily basis – they’re an essential part of programming. But let’s be honest. Usually, we only use the breakpoint option. If we’re feeling frisky, we might use a conditional breakpoint. But guess what, the IntelliJ IDEA debugger has many powerful and cutting-edge […]

The post 8 Debugging Tips for IntelliJ IDEA Users You Never Knew Existed appeared first on Lightrun.

]]>
As developers, we’re all familiar with debuggers. We use debugging tools on a daily basis – they’re an essential part of programming. But let’s be honest. Usually, we only use the breakpoint option. If we’re feeling frisky, we might use a conditional breakpoint.

But guess what, the IntelliJ IDEA debugger has many powerful and cutting-edge features that are useful for debugging more easily and efficiently. To help, we’ve compiled a list of tips and tricks from our very own developers here at Lightrun. We hope these tips will help you find and resolve bugs faster.

Let’s get started.

1. Use an Exception Breakpoint

Breakpoints are places in the code that stop the program, to enable debugging. They allow inspecting the code behavior and its functions to try to identify the error. IntelliJ offers a wide variety of breakpoints, including line breakpoints, method breakpoints and exception breakpoints.

We recommend using the exception breakpoint. This breakpoint type suspends the program according to an exception type, and not at a pre-defined place. We especially recommend the IntelliJ Exception breakpoint because you can also filter the class or package the exceptions are a part of.

So you can define a breakpoint that will stop on a line that throws NullPointerException and ignore the exceptions that are thrown from files that belong to other libraries. All you have to do is define the package that has your project’s files. This will help you focus the analysis of your code behavior.

Exception breakpoint in IntelliJ IDEA

Lightrun offers snapshots – breakpoints that do not stop the program from running. Learn more here.

2. Use Conditions in Your Breakpoints

This is one of the most under-utilised tools in debuggers and possibly one of the most effective ones. Use conditions to narrow down issues far more easily, to save time and the work of hunting for issues. For example, in a loop you can define a breakpoint that will only stop on the actual bug, relieving you from manually going over loops until you run into an issue!

In the loop below, you can see the breakpoint will stop the service when the agent id value is null. So instead of throwing a null pointer exception we’ll be able to inspect the current state of the VM (virtual machine) before it does.

Notice that a condition can be very elaborate and even invoke methods as part of the condition.

Breakpoint condition in IntelliJ IDEA

Lightrun offers conditions for all its actions: snapshots, logs etc. Learn more here.

3. Enable the “Internal Actions” Menu for Custom Plugin Development  

If you’re writing a custom IntelliJ/IDEA plugin, enable Internal Actions (Tools -> Internal Actions) for easy debugging. This feature includes a lot of convenient options, like a component inspector and a UI debugger. It’s always handy to have a wide set of tools at your disposal, providing you with options you may have never thought of yourself.

To enable Internal Actions select Help -> Edit Custom Properties. Then type in

idea.is.internal=true

and save. Upon restart you should see the new option under the Tools menu.

Internal Actions menu for custom plugin development in IntelliJ IDEA

4. Use the “Analyze Thread Dump” Feature

A thread dump is a snapshot that shows what each thread is doing at a specific time. Thread dumps are used to diagnose system and performance issues. Analyzing thread dumps will enable you to identify deadlocks or contention issues.

We recommend using IntelliJ’s “Analyze Thread Dump” feature because of its convenient browsing capabilities that make the dump easy to analyze. “Analyze Thread Dump” automatically detects a stack trace in the clipboard and instantly places it with links to your source code. This capability is very useful when traversing stack dumps from server logs, because you can instantly jump to the relevant files like you can with a local stack trace.

To access the feature go to the Analyze menu. The IDE supports activating this feature dynamically when the IDE detects a stack trace in the clipboard.

5. Use the Stream Debugger

Java 8 streams are very cool to use but notoriously hard to debug. Streams condense multiple functions into a single statement, so simply stepping over the statements with a debugger is impractical. Instead, you need a tool that can help you analyze what’s going on inside the stream.

IntelliJ has a brand new cool tool, the stream debugger. You can use it to inspect the results of the stream operation visually. When you hit a breakpoint on a stream, press the stream debugger icon in the debugger. You will see the UI mapping of the value of the stream elements at each stage/function of the stream. Thus, each step is visualized and you can see the operations in the stream and detect the problem.

Stream debugger in IntelliJ IDEA (1)

Stream debugger in IntelliJ IDEA (2)

Stream debugger in IntelliJ IDEA (3)

6. Use Field Watchpoints

The Field Watchpoint is a type of breakpoint that suspends the program when the defined field is accessed or modified. This can be very helpful when you investigate and find out that a field has a wrong value and you don’t know why. Watching this field could help finding the fault origin.

To set this breakpoint, simply add it at the line of the desired field. The program will suspend when, for example, the field is modified:

Field watchpoints in IntelliJ IDEA

7. Debug Microservices with the Lightrun Plugin

Lightrun’s IntelliJ plugin enables adding logs, snapshots and performance metrics, even while the service is running. Meaning, you can add instrumentation to the production and staging environments. You can debug monolith microservices, Kubernetes, K8, Docker Swarm, ECS, Big Data workers, serverless, and more. Multi-instance support is available through a tagging mechanism.

The Lightrun plugin is useful for saving time, so instead of going through multiple iterations of local reproduction of environments, restarts and redeployments you can debug straight in production.

Lightrun plugin for IntelliJ IDEA

Want to learn more? Request a demo.

8. Use a Friend – Real or Imaginary

When it comes to brainstorming, 1+1=3. And when it comes to dealing with complex debugging issues, you are going to need all the brainpower you can get. Working with someone provides a fresh set of eyes that views the problem in a different manner and might identify details you missed. Or you both complement each other until you reach the solution. Just by asking each other questions and undermining some of each other’s assumptions, you will reach new conclusions that will help you find the problem. You can also use each other for “Rubber Duck Debugging”, or as we like to call it, “Cheetah debugging”.

Cheetah debugging

We hope these tips by our own developers will help you with your debugging needs. Feel free to share your debugging tips and best practices with us and to share this blog post to help others.

As we mentioned in tip no. 7, Lightrun’s IntelliJ plugin enables developers to debug live microservices without interrupting them. You can securely add logs and performance metrics to production and staging in real-time, on-demand. Start using Lightrun today, or request a demo to learn more.

The post 8 Debugging Tips for IntelliJ IDEA Users You Never Knew Existed appeared first on Lightrun.

]]>
When Disaster Strikes: Production Troubleshooting https://lightrun.com/when-disaster-strikes-production-troubleshooting/ Wed, 04 May 2022 08:38:01 +0000 https://lightrun.com/?p=7235 Tom Granot and myself have had the privilege of Vlad Mihalcea’s online company for a while now. As a result we decided to do a workshop together talking about a lot of the things we learned in the process. This workshop would be pretty informal ad-hoc, just a bunch of guys chatting and showing off […]

The post When Disaster Strikes: Production Troubleshooting appeared first on Lightrun.

]]>
Tom Granot and myself have had the privilege of Vlad Mihalcea’s online company for a while now. As a result we decided to do a workshop together talking about a lot of the things we learned in the process. This workshop would be pretty informal ad-hoc, just a bunch of guys chatting and showing off what we can do with tooling.

In celebration of that I thought I’d write about some of the tricks we discussed amongst ourselves in the past to give you a sense of what to expect when joining us for the workshop but also a useful tool in its own right.

The Problem

Before we begin I’d like to take a moment to talk about production and the role of developers within a production environment. As a hacker I often do everything. That’s OK for a small company but as companies grow we add processes.

Production doesn’t go down in flames as much. Thanks to staging, QA, CI/CD and DevOps who rein in people like me…

So we have all of these things in place. We passed QA, staging and everything’s perfect. Right?

All good, right? Right???

Well… Not exactly.

Sure. Modern DevOps made a huge difference to production quality, monitoring and performance. No doubt. But bugs are inevitable. The ones that slither through are the worst types of vermin. They’re hard to detect and often only happen on scale.

Some problems, like performance issues. Are only noticeable in production against a production database. Staging or dev environments can’t completely replicate modern complex deployments. Infrastructure as Code (IaC) helps a lot with that but even with such solutions, production is at a different scale.

It’s the One Place that REALLY Matters

Everything that isn’t production is in place to facilitate production. That’s it. We can have the best and most extensive tests. With 100% coverage for our local environments. But when our system is running in production behavior is different. We can’t control it completely.

A knee jerk reaction is “more testing”. I see that a lot. If only we had a test for that… The solution is to somehow think of every possible mistake we can make and build a test for that. That’s insane. If we know the mistake, we can just avoid it. The idea that a different team member will have that insight is again wrong. People make similar mistakes and while we can eliminate some bugs in this way. More tests create more problems… CI/CD becomes MUCH slower and results in longer deploy times to production.

That means that when we do have a production bug. It will take much longer to fix because of redundant tests. It means that the whole CI quality process which we need to go through, will take longer. It also means we’ll need to spend more on CI resources…

Logging

Logging solves some of the problems. It’s an important part of any server infrastructure. But the problems are similar to the ones we run into with testing.

We don’t know what will be important when we write a log. Then in production we might find it’s missing. Overlogging is a huge problem in the opposite direction. It can:

  • Demolish performance & caching
  • Incur huge costs due to log retention
  • Make debugging harder due to hard to wade through verbosity

It might still be missing the information we need…

I recently posted to a reddit thread where this comment was also present:

“A team at my company accidentally blew ~100k on Azure Log Analytics during the span of a few days. They set the logging verbosity to a hitherto untested level and threw in some extra replicas as well. When they announced their mistake on Slack, I learned that yes, there is such a thing as too much logging.”  – full thread here.

Again, logging is great. But it doesn’t solve the core problem.

Agility

Our development team needs to be fast and responsive. We need to respond quickly to issues. Sure, we need to try and prevent them in the first place… But like most things in life the law of diminishing returns is in effect here too. There are limits to tests, logs, etc.

For that we need to fully understand the bug fast. Going through the process of reproducing something locally based on hunches is problematic at best. We need a way to observe the problem.

This isn’t new. There are plenty of solutions to look at issues in production e.g. APM tools provide us invaluable insight into our performance in production. They don’t replace profilers. They provide the one data point that matters: how fast is the application that our customers are using!

But most of these tools are geared towards DevOps. It makes sense. DevOps are the people responsible for production, so naturally the monitoring tools were built for them. But DevOps shouldn’t be responsible for fixing R&D bugs or even understanding them… There’s a disconnect here.

Enter Developer Observability

Developers observability is a pillar of observability targeted at developers instead of DevOps. With tools in this field we can instantly get feedback that’s tailored for our needs and reduce the churn of discovering the problem.  Before these tools if a log didn’t exist in the production and we didn’t understand the problem… We had to redeploy our product with “more logs” and cross our fingers…

In Practice and The Workshop…

I got a bit ahead of myself explaining the problem longer than I will explain the solution. I tend to think that’s because the solution is so darn obvious once we “get it”. It’s mostly a matter of details.

Like we all know: the devil is in the details…

Developer observability tools can be very familiar to developers who are used to working with debuggers and IDEs. But they are still pretty different. One example is breakpoints.

It’s Snapshots Now

We all know this drill. Set a breakpoint in the code that doesn’t work and step over until you find the problem. This is so ingrained into our process that we rarely stop to think about this at all.

But if we do this in a production environment the server will be stuck while waiting for us to step over. This might impact all users in the server and I won’t even discuss the security/stability implications (you might as well take a hammer and demolish the server. It’s that bad).

Snapshots do everything a breakpoint does. They can be conditional, like a conditional breakpoint. They contain the stack trace and you can click on elements in the stack. Each frame includes the value of the variables in this specific frame. But here’s the thing: they don’t stop.

So you don’t have “step over” as an option. That part is unavoidable since we don’t stop. You need to rethink the process of debugging errors.

currentTimeMillis()

I love profilers. But when I need to really understand the cost of a method I go to my trusted old currentTimeMillis() call. There’s just no other way to get accurate/consistent performance metrics on small blocks of code.

But as I said before. Production is where it’s at. I can’t just stick micro measurements all over the code and review later.

So developer observability tools added the ability to measure things. Count the number of times a line of code was reached. Or literally perform a tictoc measurement which is equivalent to that currentTimeMillis approach.

See You There

“Only when the tide goes out do you discover who’s been swimming naked.” –   Warren Buffett

I love that quote. We need to be prepared at all times. We need to move fast and be ready for the worst. But we also need practicality. We aren’t original, there are common bugs that we run into left and right. We might notice them faster but mistakes aren’t original.

In the workshop we’ll focus on some of the most common mistakes and demonstrate how we can track them using developer observability. We’ll give real world examples of failures and problems we ran into in the past and as part of our work. I’m very excited about this and hope to see you all there!

 

The post When Disaster Strikes: Production Troubleshooting appeared first on Lightrun.

]]>
Debugging Microservices: The Ultimate Guide https://lightrun.com/debugging-microservices-the-ultimate-guide/ Mon, 20 Jul 2020 10:16:34 +0000 https://lightrun.com/?p=2838 Microservices have come a long way from being a shiny, new cool toy for hypesters to a legitimate architecture that transforms the way modern applications are built. Microservices are loosely coupled, independently deployable and scalable, allow a highly diverse technology stack – and these are just some of their biggest advantages. Also, these are some […]

The post Debugging Microservices: The Ultimate Guide appeared first on Lightrun.

]]>
Microservices have come a long way from being a shiny, new cool toy for hypesters to a legitimate architecture that transforms the way modern applications are built. Microservices are loosely coupled, independently deployable and scalable, allow a highly diverse technology stack – and these are just some of their biggest advantages. Also, these are some of their biggest disadvantages, especially when it comes to debugging microservices.

That’s because all the world’s a trade-off. And all those great advantages come with a price tag attached. For a long time the tag has been too high for many teams. In this blog post I am going to discuss the issue that was (and still is, in some cases) a very significant part of the aforementioned price – difficulties in debugging microservices. Then, I will recommend tools (one of which is our own production debugger Lightrun) and platforms that can help overcome these problems, because microservices aren’t going anywhere.

What is Microservices Architecture

Before we start, though, let’s clearly define a few things. First of all, “microservices” and “serverless” are two different things. Well, right, pretty often microservices are built using serverless architecture, and pretty often the serverless architecture is used bearing microservices in mind. And yet, the main goal of serverless is to reduce the total cost of ownership of an application – i.e. reduce the cost of managing servers and usage bill – and it has nothing to do with microservices. It is still possible to build a monolithic web application running entirely on AWS ElasticBeanstalk or Azure AppService or deploy a microservice on top of a nginx server running on EC2.

After this subtle but legally important distinction, note that I will still address serverless debugging issues alongside microservices debugging issues since they interpolate very often.

Another important thing to mention is that the microservices architecture is just a subclass of a more broad and comprehensive cloud-native paradigm, which introduces even more challenges for development teams (out of the scope of this post). But whether you deploy microservices in the public cloud or totally on-prem, you will face the same difficulties. (I don’t assume your application’s gender, age, religion or cloud. And, of course, the language it is written in.)

Microservices Architecture Creates Microservices Debugging Challenges

Imagine that your huge, cumbersome monolith full of shitty legacy code and written in some old-fashioned, boring dinosaur language starts falling apart into beautiful, tiny microservices. What can go wrong?

A lot of things. And when they do go wrong you will find out that, all of a sudden, you can’t put a breakpoint in that new tiny beautiful microservice! You can’t make your favorite IDE debugger just stop there, you can’t see the stack, the values of variables, the process memory, you can’t pause threads and step through the code line by line (well, I do assume language(s) here, sorry for that).

You can’t do all this because the suspicious code is now not just some class instance running at worst as another thread in the process your IDE is attached to. It is now a dedicated Docker container/Kubernetes pod where it runs written in another language: stateless, asynchronous, lonely.

Or even worse, it is now a Lambda function, which is born and dies hundreds of times in a second somewhere in a distant cloud, throwing NPEs every time it starts. How in the world am I supposed to debug a microservice like that? What have I done?

debugging microservices is scary, learn how from this guide

This post comes to the rescue. There are a lot of techniques, tools, and even startup companies that have emerged to address this problem. It is a vibrant and constantly evolving (which is another way to say poor and incomplete) ecosystem that I will review in two steps: debugging microservices locally (this blog post) and debugging microservices in production.

How to Debug Microservices Locally

Let’s see how it is possible to debug microservices when you either develop them locally or try to reproduce and fix a bug. Before I get into solutions, let’s outline the challenges you will face doing that.

Debugging Microservices Locally: The Challenges

Fragmentation

In a good old-fashioned monolith, the functionality (i.e. adding an item to a shopping cart) you were trying to debug was implemented by a couple of classes. These classes made it easy to gain a holistic view.

Now, the same functionality is implemented by a couple of separate microservices, and each can be either a Docker container, a Kubernetes pod, or a serverless function. You are supposed to run all of them simultaneously in order to reproduce a bug and then, after a fix, perform an impact analysis.

To top it all off, to recreate the exact picture, each one of these services must be of the same version they are in production – either all together with the same version, or, even worse, the version each service was running in the production environment where the bug was reported. Creating this environment is a huge challenge, and if you don’t do it right you won’t be able to properly debug your microservices.

Asynchronous Calls

Direct synchronous method invocations (or, at worst, message queues between threads) are replaced in microservices with either synchronous REST or gRPC API calls. Even worse, sometimes they are replaced with an asynchronous event-driven architecture based on plenty of available message queues (async gRPC is also an option).

Too bad that issues occurring with in-process message queues are nothing like what you face with distributed message queues: the configuration is complicated and has a lot of nuances, latency and performance are not always predictable, operational costs are very high (yes, Kafka, I am looking at you) and you may run out of a budget very quickly if you are using managed solutions.

Distributed State

Forget about stack trace, forget about logs. Actually no, don’t forget about logs, forget about understanding anything by digging into those of a single microservice. Those magic ERROR lines you are looking for may be printed into logs of some other microservice at an undefined time offset, messed up with totally unrelated ERROR lines which were printed while handling a different HTTP request. In other words, recreating the application state which led to a bug is often mission impossible.

Different Languages

Back in the day it was one language to write them all, now it is a Noah’s ark of languages and you might have no idea WTF is going on with this “undefined has no properties” error that some weakly dynamically typed language loves to throw (who let this become a backend language, for crying out loud?).

Technical Difficulties in Running Microservices Locally

Well, that’s what Docker was invented for in the first place, right? Docker-compose up and we are done. OK, but what about a Kubernetes cluster? A Kafka cluster? A bunch of Lambda functions? And then your laptop ~melted~ needs more RAM and CPU.

Now it is easy to see why until recently many teams just gave up. For some it cost days, for some it was weeks of frustration, anger and suppressed aggression – and I didn’t even get to production debugging of microservices. The industry reacted quickly to this mess and came up with plenty of solutions addressing these issues. Granted, these are still not even close to providing the speed and convenience of debugging a monolith with an IDE debugger, but the gap is slowly closing. Let’s take a close look at what you can do.

How You Can Debug Microservices

So what is in our microservices debugging kit as of July 2020? Let’s look at the main tools and platforms out there, and how they can help you.

Cloud Infrastructure-as-a-Code Tools

There are plenty of configuration orchestration tools, which include, among some others, Terraform and AWS CloudFormation, as well as configuration management tools like Ansible or Puppet, which automate deployment and configuration of complex applications. Debugging microservices with these tools allows creating a quick and seamless debugging environment – subject to your budget constraints, of course. To optimize costs, you can offload only some of the services to a remote cloud and run the rest locally on your machine.

Centralized Logging

All microservices should send logs to a centralized, preferably external, service. This way you can investigate, trace and find a root case for a bug much easier than switching between multiple log files in your local text editor. You can choose from plenty of managed services like Logz.io and Datadog, deploy your own ELK stack, or just send the logs to ~/dev/null~ cold S3 storage. In case you do not know when you will need the logs, this is a much cheaper option and you can always fetch them later. The most important thing is to implement a Correlation Identifier, and then there are more best practices you should definitely read about.

Serverless Frameworks IaC

Some of your microservices might be implemented using serverless solutions like FaaS and/or other managed services like API Gateway. There are two main players that provide Infrastructure-as-Code frameworks for serverless: the cloud agnostic Serverless and AWS SAM, which is just an abstraction layer over CloudFormation. Back in the day, it was a real mess to develop and debug FaaS, but these days both allow local debugging, while SAM even allows using a local debugger in popular IDEs (Visual Studio, IntelliJ IDEA) with its handy AWS Toolkit. A real time saver!

Local Containers

Running Docker Compose locally is trivial unless you’re using a sophisticated architecture, such as a Kafka cluster alongside your Docker containers. Then things start getting complicated while still feasible – take a look. 

When it comes to Kubernetes though, it is much more difficult. There are some tools that try to simplify local Kubernetes deployment, such as Microk8s and Minikube, but both require a lot of effort to be invested – well, you should not expect your life to be easy when dealing with Kubernetes anyway.

Dedicated “Debuggers for Microservices”

Not very convincing until now, right? I mean, after a lot of effort you can (barely) create the microservices debugging environment and see logs in a manner which makes sense – things you hardly bother about when debugging a monolith. But what about the debugging capabilities that really matter – setting breakpoints throughout the application, following variable values on the fly, stepping through the code, and changing values during run time? 

If your microservices leverage the Kubernetes platform, you can get all of these, at least to an extent. There are two powerful open source tools, Squash and Telepresence, which allow you to use your local IDE debugger features when debugging the Kubernetes environment, preventing your laptop from melting down when running Minicube.

Squash builds a bridge between some of the popular IDEs and debuggers (here’s the full list) and uses a sidecar approach to deploy its client on every Kubernetes node (the authors claim very low performance and resource consumption overhead). This allows you to use all the powerful features of the local debugger such as live debugging, setting breakpoints, stepping through code, viewing the values of variables, modifying them for troubleshooting, and more. You can find a thorough guide here.

Telepresence operates quite differently: it runs a service you want to debug locally, while connecting it to a remote Kubernetes cluster, so you can develop/test it locally and use any of your favorite local IDE debuggers seamlessly. A bunch of tutorials, FAQs and docs can be found here.

Unless I missed something (let me know in the comments), that’s what you have in your hands in the mid 2020 when it comes to debugging microservices locally. Far from ideal, it is much better than just a couple years ago, and it is constantly getting better.

In the next blog post I will discuss the tools and best practices for debugging microservices in production!

Spoiler: a great tool to debug microservices in production is Lightrun. You can add on-demand logs, performance metrics and snapshots (breakpoints that don’t stop your application) in real time without having to issue hotfixes or reproduce the bug locally – all of which makes life much easier when debugging microservices. You can start using Lightrun today, or request a demo to learn more.

The post Debugging Microservices: The Ultimate Guide appeared first on Lightrun.

]]>