blog Archives - Lightrun https://lightrun.com/tag/blog/ Developer Observability Platform Mon, 10 Jul 2023 14:53:27 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.2 https://lightrun.com/wp-content/uploads/2022/11/cropped-fav-1-32x32.png blog Archives - Lightrun https://lightrun.com/tag/blog/ 32 32 8 Debugging Tips for IntelliJ IDEA Users You Never Knew Existed https://lightrun.com/eight-debugging-tips-for-intellijidea-users-you-never-knew-existed/ Sun, 14 Jun 2020 12:57:11 +0000 https://lightrun.com/?p=2077 As developers, we’re all familiar with debuggers. We use debugging tools on a daily basis – they’re an essential part of programming. But let’s be honest. Usually, we only use the breakpoint option. If we’re feeling frisky, we might use a conditional breakpoint. But guess what, the IntelliJ IDEA debugger has many powerful and cutting-edge […]

The post 8 Debugging Tips for IntelliJ IDEA Users You Never Knew Existed appeared first on Lightrun.

]]>
As developers, we’re all familiar with debuggers. We use debugging tools on a daily basis – they’re an essential part of programming. But let’s be honest. Usually, we only use the breakpoint option. If we’re feeling frisky, we might use a conditional breakpoint.

But guess what, the IntelliJ IDEA debugger has many powerful and cutting-edge features that are useful for debugging more easily and efficiently. To help, we’ve compiled a list of tips and tricks from our very own developers here at Lightrun. We hope these tips will help you find and resolve bugs faster.

Let’s get started.

1. Use an Exception Breakpoint

Breakpoints are places in the code that stop the program, to enable debugging. They allow inspecting the code behavior and its functions to try to identify the error. IntelliJ offers a wide variety of breakpoints, including line breakpoints, method breakpoints and exception breakpoints.

We recommend using the exception breakpoint. This breakpoint type suspends the program according to an exception type, and not at a pre-defined place. We especially recommend the IntelliJ Exception breakpoint because you can also filter the class or package the exceptions are a part of.

So you can define a breakpoint that will stop on a line that throws NullPointerException and ignore the exceptions that are thrown from files that belong to other libraries. All you have to do is define the package that has your project’s files. This will help you focus the analysis of your code behavior.

Exception breakpoint in IntelliJ IDEA

Lightrun offers snapshots – breakpoints that do not stop the program from running. Learn more here.

2. Use Conditions in Your Breakpoints

This is one of the most under-utilised tools in debuggers and possibly one of the most effective ones. Use conditions to narrow down issues far more easily, to save time and the work of hunting for issues. For example, in a loop you can define a breakpoint that will only stop on the actual bug, relieving you from manually going over loops until you run into an issue!

In the loop below, you can see the breakpoint will stop the service when the agent id value is null. So instead of throwing a null pointer exception we’ll be able to inspect the current state of the VM (virtual machine) before it does.

Notice that a condition can be very elaborate and even invoke methods as part of the condition.

Breakpoint condition in IntelliJ IDEA

Lightrun offers conditions for all its actions: snapshots, logs etc. Learn more here.

3. Enable the “Internal Actions” Menu for Custom Plugin Development  

If you’re writing a custom IntelliJ/IDEA plugin, enable Internal Actions (Tools -> Internal Actions) for easy debugging. This feature includes a lot of convenient options, like a component inspector and a UI debugger. It’s always handy to have a wide set of tools at your disposal, providing you with options you may have never thought of yourself.

To enable Internal Actions select Help -> Edit Custom Properties. Then type in

idea.is.internal=true

and save. Upon restart you should see the new option under the Tools menu.

Internal Actions menu for custom plugin development in IntelliJ IDEA

4. Use the “Analyze Thread Dump” Feature

A thread dump is a snapshot that shows what each thread is doing at a specific time. Thread dumps are used to diagnose system and performance issues. Analyzing thread dumps will enable you to identify deadlocks or contention issues.

We recommend using IntelliJ’s “Analyze Thread Dump” feature because of its convenient browsing capabilities that make the dump easy to analyze. “Analyze Thread Dump” automatically detects a stack trace in the clipboard and instantly places it with links to your source code. This capability is very useful when traversing stack dumps from server logs, because you can instantly jump to the relevant files like you can with a local stack trace.

To access the feature go to the Analyze menu. The IDE supports activating this feature dynamically when the IDE detects a stack trace in the clipboard.

5. Use the Stream Debugger

Java 8 streams are very cool to use but notoriously hard to debug. Streams condense multiple functions into a single statement, so simply stepping over the statements with a debugger is impractical. Instead, you need a tool that can help you analyze what’s going on inside the stream.

IntelliJ has a brand new cool tool, the stream debugger. You can use it to inspect the results of the stream operation visually. When you hit a breakpoint on a stream, press the stream debugger icon in the debugger. You will see the UI mapping of the value of the stream elements at each stage/function of the stream. Thus, each step is visualized and you can see the operations in the stream and detect the problem.

Stream debugger in IntelliJ IDEA (1)

Stream debugger in IntelliJ IDEA (2)

Stream debugger in IntelliJ IDEA (3)

6. Use Field Watchpoints

The Field Watchpoint is a type of breakpoint that suspends the program when the defined field is accessed or modified. This can be very helpful when you investigate and find out that a field has a wrong value and you don’t know why. Watching this field could help finding the fault origin.

To set this breakpoint, simply add it at the line of the desired field. The program will suspend when, for example, the field is modified:

Field watchpoints in IntelliJ IDEA

7. Debug Microservices with the Lightrun Plugin

Lightrun’s IntelliJ plugin enables adding logs, snapshots and performance metrics, even while the service is running. Meaning, you can add instrumentation to the production and staging environments. You can debug monolith microservices, Kubernetes, K8, Docker Swarm, ECS, Big Data workers, serverless, and more. Multi-instance support is available through a tagging mechanism.

The Lightrun plugin is useful for saving time, so instead of going through multiple iterations of local reproduction of environments, restarts and redeployments you can debug straight in production.

Lightrun plugin for IntelliJ IDEA

Want to learn more? Request a demo.

8. Use a Friend – Real or Imaginary

When it comes to brainstorming, 1+1=3. And when it comes to dealing with complex debugging issues, you are going to need all the brainpower you can get. Working with someone provides a fresh set of eyes that views the problem in a different manner and might identify details you missed. Or you both complement each other until you reach the solution. Just by asking each other questions and undermining some of each other’s assumptions, you will reach new conclusions that will help you find the problem. You can also use each other for “Rubber Duck Debugging”, or as we like to call it, “Cheetah debugging”.

Cheetah debugging

We hope these tips by our own developers will help you with your debugging needs. Feel free to share your debugging tips and best practices with us and to share this blog post to help others.

As we mentioned in tip no. 7, Lightrun’s IntelliJ plugin enables developers to debug live microservices without interrupting them. You can securely add logs and performance metrics to production and staging in real-time, on-demand. Start using Lightrun today, or request a demo to learn more.

The post 8 Debugging Tips for IntelliJ IDEA Users You Never Knew Existed appeared first on Lightrun.

]]>
Debugging a Wordle Bug https://lightrun.com/debugging-a-wordle-bug/ Sun, 31 Jul 2022 09:16:39 +0000 https://lightrun.com/?p=7589 I have a confession: I’m addicted to Wordle. Especially now that it’s out of style and people don’t post about it. I love that it’s short, I can solve one word and then it’s gone. I don’t feel too bad about the addiction and wasting my time with a game. This cloud debugger tutorial is […]

The post Debugging a Wordle Bug appeared first on Lightrun.

]]>
I have a confession: I’m addicted to Wordle. Especially now that it’s out of style and people don’t post about it. I love that it’s short, I can solve one word and then it’s gone. I don’t feel too bad about the addiction and wasting my time with a game. This cloud debugger tutorial is an enormous challenge for me since the target is a Wordle game. But I’m getting ahead of myself.

As part of the Lightrun Playground we recently released we needed a demo application that will let developers who are new to Lightrun, practice in a “safe environment”.We decided to pick Wordle as our demo application because it’s instantly familiar, visual and not too interactive. A Flappy Bird demo might have been painful to debug. At this point our key challenge was in creating a bug where the debugging process would be interesting enough and yet subtle enough so it won’t be instantly obvious.

Creating a bug like that is surprisingly challenging. We don’t want an overly complex application spanning multiple files. That might make the debugging process too difficult. On the other hand, the bug needs to be subtle enough that we won’t notice it even if we stare directly at it. Here is the bug:

constguess = []
for (leti = 0; i < game.word.length; ++i) {
  if (game.word.includes(guessWord[i])) {
    guess.push({ letter:guessWord[i], check:CHECK_TYPES.LETTER_INCLUDED })
  } else if (guessWord[i] === game.word[i]) {
    guess.push({ letter:guessWord[i], check:CHECK_TYPES.LETTER_MATCHED })
  } else {
    guess.push({ letter:guessWord[i], check:CHECK_TYPES.LETTER_NOT_INCLUDED })
  }
}

Can you spot the problem?

To understand it, we need to first understand the symptom of the bug we chose. When I talk about bugs people’s minds often go to crashes. That can be the case sometimes, but in my experience the most frequent bugs are logic mistakes that occur because the production environment differs in some subtle way from our testing environment. Because of that, we picked a logic bug, unfortunately because of the simplicity constraint I doubt a bug like that would have made it to production. The core lesson still applies.

The bug in this case is that letters in Wordle that should be colored in green, because they’re in the right position in the word, are colored in yellow. This logic is implemented by the code we see above. As you can see, we have three modes:

  • CHECK_TYPES.LETTER_INCLUDED – indicates that a letter should be colored in yellow
  • CHECK_TYPES.LETTER_MATCHED – indicates that the letter should be colored in green
  • CHECK_TYPES.LETTER_NOT_INCLUDED – indicates that the letter is missing and should be gray

Can you spot the problem now?

Don’t scroll down to avoid spoilers….

Here’s working code:

constguess = []
for (leti = 0; i < game.word.length; ++i) {
  if (game.word.includes(guessWord[i])) {
    guess.push({ letter:guessWord[i], check:CHECK_TYPES.LETTER_MATCHED })
  } else if (guessWord[i] === game.word[i]) {
    guess.push({ letter:guessWord[i], check:CHECK_TYPES.LETTER_INCLUDED })
  } else {
    guess.push({ letter:guessWord[i], check:CHECK_TYPES.LETTER_NOT_INCLUDED })
  }
}

The difference is that I swapped two lines of code. We need CHECK_TYPES.LETTER_MATCHED to be tested before the CHECK_TYPES.LETTER_INCLUDED test. The tests must be in-order of significance and green (perfect match) should precede the yellow (partial match).

The process of debugging this is relatively simple. We placed a snapshot/breakpoint on the following line where we saw the values were incorrect at the server code level. I think a more “natural” way to debug this would have been to place a breakpoint on the CHECK_TYPES.LETTER_MATCHED line and once we realize that this was never hit we would have understood what went wrong. For our particular use case of a playground that wasn’t the right approach. We wanted people to see the snapshot (non-breaking breakpoint) getting hit. But other than that, it’s a good bug.

If this still isn’t clear, click the image below to see an animation that explains the bug visually:

Flowchart animation

Teaching Debugging

Debugging is one of those subjects that we don’t learn at University. Yes, there are courses that cover it but not much. You’re mostly expected to pick it up on your own (for example, using a dedicated live debugging tool). This shows to a large part why that’s the case. It’s very hard to create exercises for debugging and even harder to test knowledge.

We could create a more elaborate demo to debug but there we transition to the world of “understanding and explaining that code base”. This isn’t the goal. I’ve gone over a lot of debugging related materials over the past couple of years and this seems to be a universal problem that all of us are struggling with. This is a shame since there are so many tools, techniques and approaches that even experienced developers are missing out on.

In that sense I’m a proponent of teaching debugging without a bug. Debuggers are tools we can explore and use before we have any bug, even as a learning tool. I think we need to be “comfortable” within the debugging environment and should leverage it when there are no bugs. It shouldn’t be a tool we only reach for in the case of an emergency. If we work with a debugger on a regular basis, it will be much easier to track bugs with it when we actually need it.

This is a philosophy I hold for tools such as observability tools, logs, etc. Muscles that we don’t flex on a regular basis lose their mass and weaken. Synthetic problems are OK for a short tutorial but we can’t use them daily and it’s hard to scale them to a full-blown workshop or course.

Finally

How do you feel about the way you learned debugging? Was it in college, university, bootcamp or on-the-job?

Do you feel you know the subject well?

Do you teach debugging to others? If so, how and what techniques do you use? What do you find works best?  I’d love to hear from you on Twitter @debugagent (my dms are open), LinkedIn or comments or any other channel. Private or public.

As a reminder, our Playground is open for business – feel free to roam around, test our your debugging skills and report back!

The post Debugging a Wordle Bug appeared first on Lightrun.

]]>
Testing in Production: Recommended Tools https://lightrun.com/testing-in-production-recommended-tools/ Thu, 11 Jun 2020 07:08:29 +0000 https://lightrun.com/?p=1870 Testing in production has a bad reputation. The same kind “git push   – – force origin master” has. Burning houses and Chuck Norris represent testing in production in memes and that says it all. When done poorly, testing in production very much deserves the sarcasm and negativity. But that’s true for any methodology or technique. […]

The post Testing in Production: Recommended Tools appeared first on Lightrun.

]]>
Testing in production has a bad reputation. The same kind “git push   – – force origin master” has. Burning houses and Chuck Norris represent testing in production in memes and that says it all. When done poorly, testing in production very much deserves the sarcasm and negativity. But that’s true for any methodology or technique.

This blog post aims to shed some light on the testing in production paradigm. I will explain why giants like Google, Facebook and Netflix see it as a legitimate and very beneficial instrument in their CI/CD pipelines. So much, in fact, that you could consider starting using it as well. I will also provide recommendations for testing in production tools, based on my team’s experience.

Testing In Production – Why?

Before we proceed, let’s make it clear: testing in production is not applicable for every software. Embedded software, on-prem high-touch installation solutions or any type of critical systems should not be tested this way. The risks (and as we’ll see further, it’s all about risk management) are too high. But do you have a SaaS solution with a backend that leverages microservices architecture or even just a monolith that can be easily scaled out? Or any other solution that the company engineers have full control over its deployment and configuration? Ding ding ding – those are the ideal candidates.

So let’s say you are building your SaaS product and have already invested a lot of time and resources to implement both unit and integration tests. You have also built your staging environment and run a bunch of pre-release tests on it. Why on earth would you bother your R&D team with tests in production? There are multiple reasons: let’s take a deep dive into each of them.

Staging environments are bad copies of production environments

Yes, they are. Your staging environment is never as big as your production environment – in terms of server instances, load balancers, DB shards, message queues and so on. It never handles the load and the network traffic production does. So, it will never have the number of open TCP/IP connections, HTTP sessions, open file descriptors and parallel writes DB queries perform. There are stress testing tools that can emulate that load. But when you scale, this stops being sufficient very quickly.

Besides the size, the staging environment is never the production one in terms of configuration and state. It is often configured to start a fresh copy of the app upon every release, security configurations are eased up, ACL and services discovery will never handle real-life production scenarios and the databases are emulated by recreating them from scratch with automation scripts (copying production data is often impossible even legally due to privacy regulations such as GDPR). Well, after all, we all try our best. 

At best we can create a bad copy of our production environment. This means our testing will be unreliable and our service susceptible to errors in the real life production environment.

Chasing after maximum reliability before the release costs. A lot.

Let’s just cite Google engineers

“It turns out that past a certain point, however, increasing reliability is worse for a service (and its users) rather than better! Extreme reliability comes at a cost: maximizing stability limits how fast new features can be developed and how quickly products can be delivered to users, and dramatically increases their cost, which in turn reduces the number of features a team can afford to offer.

Our goal is to explicitly align the risk taken by a given service with the risk the business is willing to bear. We strive to make a service reliable enough, but no more reliable than it needs to be.”

Let’s emphasize the point: “Our goal is to explicitly align the risk taken by a given service with the risk the business is willing to bear”. No unit/integration/stating env tests will ever make your release 100% error-free. In fact they shouldn’t (well, unless you are a Boeing engineer). After a certain point, investing more and more in tests and attempting to build a better staging environment will just cost you more compute/storage/traffic resources and will significantly slow you down.

Doing more of the same is not the solution. You shouldn’t spend your engineers’ valuable work hours chasing the dragon trying to diminish the risks. So what should you be doing instead?

Embracing the Risk

Again, citing the great Google SRE Book:

“…we manage service reliability largely by managing risk. We conceptualize risk as a continuum. We give equal importance to figuring out how to engineer greater reliability into Google systems and identifying the appropriate level of tolerance for the services we run. Doing so allows us to perform a cost/benefit analysis to determine, for example, where on the (nonlinear) risk continuum we should place Search, Ads, Gmail, or Photos…. That is, when we set an availability target of 99.99%,we want to exceed it, but not by much: that would waste opportunities to add features to the system, clean up technical debt, or reduce its operational costs.”

So it is not just about when and how you run your tests. It’s about how you manage risks and costs of your application failures. No company can afford its product downtime because of some failed test (which is totally OK in staging). Therefore, it is crucial to ensure that your application handles failures right. “Right”, quoting the great post by Cindy Sridharan, means:

“Opting in to the model of embracing failure entails designing our services to behave gracefully in the face of failure.”

The design of fault tolerant and resilient apps is out of the scope of this post (Netflix Hystrix is still worth a look though). So let’s assume that’s how your architecture is built. In such a case, you can fearlessly roll-out a new version that has been tested just enough internally.

And then, the way to bridge the gap so as to get as close as possible to 100% error-free, is by testing in production. This means testing how our product really behaves and fixing the problems that arise. To do that, you can use a long list of dedicated tools and also expose it to real-life production use cases.

So the next question is – how to do it right?

Testing In Production – How?

Cindy Sridharan wrote a great series of blog posts that discusses the subject in a great depth. Her recent Testing in Production, the safe way blog post depicts a table of test types you can take in pre-production and in production.

One should definitely read carefully through this post. We’ll just take a brief look and review some of the techniques she offers. We will also recommend various tools from each category. I hope you find our recommendations useful.

Load Testing in Production

As simple as it sounds. Depending on the application, it makes sense to stress its ability to handle a huge amount of network traffic, I/O operations (often distributed), database queries, various forms of message queues storming and so on. Some severe bugs appear clearly only upon load testing (hi, memory overwrite). Even if not – your system is always capable of handling a limited amount of a load. So here the failure tolerance and graceful handling of connections dropping become really crucial.

Obviously, performing a load test in the production environment will stress your app configured for the real life use cases, thus it will provide way more useful insights than loading testing in staging.

There are a bunch of software tools for load testing that we recommend, many of them are open sourced. To name a few:

mzbench

mzbench  supports MySQL, PostgreSQL, MongoDB, Cassandra out of the box. More protocols can be easily added. It was a very popular tool in the past, but had  been abandoned by a developer 2 years ago.

HammerDB

HammerDB supports Oracle Database, SQL Server, IBM Db2, MySQL, MariaDB, PostgreSQL and Redis. Unlike mzbench, it is under active development as for May 2020.

Apache JMeter

Apache JMeter focuses more on Web Services (DB protocols supported via JDBC). This the old-fashioned (though somewhat cumbersome) Java tool I was using ten years ago for fun and profit.

BlazeMeter

BlazeMeter is a proprietary tool. It runs JMeter, Gatling, Locust, Selenium (and more) open source scripts in the cloud to enable simulation of more users from more locations. 

Spirent Avalanche Hardware

If you are into heavy guns, meaning you are developing solutions like WAFs, SDNs, routers, and so on, then this testing tool is for you. Spirinet Avalanche is capable of generating up to 100 Gbps, performing vulnerability assessments, QoS and QoE tests and much more. I have to admit – it was my first load testing tool as a fresh graduate working in Checkpoint and I still remember how amazed I was to see its power.  

Shadowing/Mirroring in Production

Send a portion of your production traffic to your newly deployed service and see how it’s handled in terms of performance and possible regressions. Did something go wrong? Just stop the shadowing and put your new service down – with zero impact on production. This technique is also known as “Dark Launch” and described in detail by CRE life lessons: What is a dark launch, and what does it do for me? blog post by Google. 

A proper configuration of load balancers/proxies/message queues will do the trick. If you are developing a cloud native application (Kubernetes / Microservices) you can use solutions like:

HAProxy

HAProxy is an open source easy to configure proxy server.

Envoy proxy 

Envoy proxy is open source and a bit more advanced than HAProxy. Wired to suit the microservice world, this proxy was built into the microservices world and offers functionalities of service discovery, shadowing, circuit breaking and dynamic configuration via API.

Istio

Istio is a full open-source service mesh solution. Under the hood it uses the Envoy proxy as a sidecar container in every pod. This sidecar is responsible for the incoming and outgoing communication. Istio control service access, security, routing and more.

Canarying in Production

Google SRE Book defines “canarying” as the following:

To conduct a canary test, a subset of servers is upgraded to a new version or configuration and then left in an incubation period. Should no unexpected variances occur, the release continues and the rest of the servers are upgraded in a progressive fashion. Should anything go awry, the modified servers can be quickly reverted to a known good state.

This technique, as well as similar (but not the same!) Blue-Green deployment and A/B testing techniques are discussed in this Cristian Posta blog post while the caveats and cons of canarying are reviewed here. As for recommended tools, 

Spinnaker

Netflix open-sourced the Spinnaker CD platform leverages the aforementioned and many other deployment best practices (as in everything Netflix, built bearing microservices in mind).

ElasticBeanstalk

AWS supports Blue/Green deployment with its PaaS ElasticBeanstalk solution

Azure App Services

Azure App Services has its own staging slots capability that allows you to apply the prior techniques with a zero downtime.

LaunchDarkly

LaunchDarkly is a feature flagging solution for canary releases – enabling to perform a gradual capacity testing on new features and  safe rollback if issues are found.

Chaos Engineering in Production

Firstly introduced by Netflix’s ChaosMonkey, Chaos Engineering has emerged to be a separate and very popular discipline. It is not about a “simple” load testing, it is about bringing down services nodes, reducing DB shards, misconfiguring load balancers, causing timeouts  – in other words messing up your production environment as badly as possible.

Winning tools in that area are tools I like to call “Chaos as a service”:

ChaosMonkey

ChaosMonkey is an open source tool by Netflix . It randomly terminates services in your production system, making sure your application is resilient to these kinds of failures.

Gremlin

Gremlin is another great tool for chaos engineering. It allows DevOps (or a chaos engineer) to define simulations and see how the application will react in different scenarios: unavailable resources (CPU / Mem),  state changes (change systime / kill some of the processes), and network failures (packet drops / DNS failures).

Here are some others 

Debugging and Monitoring in Production

The last but not least toolset to be briefly reviewed is monitoring and debugging tools. Debugging and monitoring are the natural next steps after testing. Testing in production provides us with real product data, that we can then use for debugging. Therefore, we need to find the right tools that will enable us to monitor and debug the test results in production.

There are some acknowledged leaders, each one of them addressing the need for three pillars of observability, aka logs, metrics, and traces, in its own way: 

DataDog

DataDog is a comprehensive monitoring tool with amazing tracing capabilities. This helps a lot in debugging with a very low overhead.

Logz.io

Logz.io is all about centralized logs management – its combination with DataDog can create a powerful toolset. 

New Relic

A very strong APM tool, which offers log management, AI ops, monitoring and more.

Prometheus

Prometheus is open source monitoring solution that includes metrics scraping, querying, visualization and alerting. 

Lightrun

Lightrun is a powerful production debugger. It enables adding logs, performance metrics and traces to production and staging in real-time, on demand. Lightrun enables developers to securely adding instrumentation without having to redeploy or restart. Request a demo to see how it works.

To sum up, testing in production is a technique you should pursue and experiment with if you are ready for a paradigm shift from diminishing risks in pre-production to managing risks in production.

Testing in production complements the testing you are used to doing, and adds important benefits such as speeding up the release cycles and saving resources. I covered some different types of production testing techniques and recommended some tools to use. If you want to read more, check out the resources I cited throughout the blog post. Let us know how it goes!

Learn more about Lightrun and let’s chat.

The post Testing in Production: Recommended Tools appeared first on Lightrun.

]]>
Remote Debugging: The Definitive Guide https://lightrun.com/remote-debugging/ Sat, 23 Jan 2021 14:44:00 +0000 https://lightrun.com/?p=6213 With WFH becoming the new normal, companies must find alternative solutions to parts of their current stack that don't work anymore.

The post Remote Debugging: The Definitive Guide appeared first on Lightrun.

]]>
Debugging is a huge part of everyday software development. If you ask a developer what they spend the most time on every day, the answer will probably be debugging. However, the process of finding bugs and errors hidden in code can sometimes be quite tedious and difficult.

There are many different forms of debugging, and countless tools whose primary purpose is to assist developers in debugging faster. (Full disclaimer: one of these tools is our own Lightrun that helps debug live applications in production.)

In this guide, we will talk about remote debugging: what it is, how it is conducted, and why you should consider using it. We will also cover some methods of remote debugging as well as helpful tips to get started.

What Is Remote Debugging?

Debugging is the process of gathering data from various areas of a project until you figure out the root cause of an error. The next step is to optimize this process so it can be used on a distributed system. This is where remote debugging comes into play.

Remote debugging is when you debug an application running in an environment different from your local machine in a way that resembles local debugging. The point of this is for developers to debug components of distributed systems without difficulty. It is essentially the same as opening up a direct connection to the server and debugging directly there. 

Remote debugging: how it works

The size and complexity of current modern systems are astounding, especially in distributed systems. Most big tech companies use the concept of distributed systems: the components of the system are split among many different machines across multiple geographical locations. This gives the system a boost in speed and modularity, but it also makes it more difficult to debug and reason about. The regular debugging process of inserting countless print statements or breakpoints to diagnose the problem doesn’t work, as it would interfere with the running server. 

The oft-used alternative is to clone the server’s code, set up environment variables, run it locally, and attempt to replicate the error. However, this alternative is quite time-consuming and replicating the error is usually difficult, which is why remote debugging is the better option.

Types of Remote Debugging

There are two different types of remote debugging: single-client remote debugging and multi-client remote debugging.

Single-client remote debugging, as the name suggests, is when only one client is connected to an application or a server. It is quite straightforward and doesn’t require too much expert maneuvering.

Multi-client remote debugging is when there are multiple clients connected to an application or server at the same time. This can be much more tricky than single-client due to the added complexity. For instance, some errors may only arise when there is more than one established connection to a server, especially if there are many more connections. 

At the same time, multi-client remote debugging allows for a more realistic debugging scenario as typical servers establish multiple asynchronous connections with clients. The main difference between the two is that the multi-client type offers the potential for finding more errors due to concurrency and multi-threading.

How Does Remote Debugging Work?

The core principle of remote debugging is to establish a connection with the server hosting the back-end or front-end of the web application. This connection gives developers complete access to the machine on which the server is running. From that point, the developer can install or configure any debugging tools they wish to use.

Establishing this connection can be done in several ways. One of the most widely used connection protocols is SSH (Secure Shell). The SSH protocol offers a safe cryptographic way to gain access to other machines and is the most popular protocol for such a task. 

Setting up SSH typically requires generating a public and a private key, as well as credentials that you can use to log in on the machine that you are trying to access. Servers typically have a wide variety of security measures to ensure that not just everyone can access them, even if they have the correct credentials.

Remote debugging: SSH client-server connection

For instance, it only allows connections coming from within a network (typically in the case of a company) or a connection coming from the IP address of your local machine. This limits the number of possible connections to the server.

After getting access to the machine over SSH (i.e., having remote access), the next step is to set up the remote debugger. There are various tools and methods to do this. The core ideas that these methods rely on are listeners and web sockets

Listeners allow a real-time connection to the server such that the debugging process can constantly run without interruption. This is different from an HTTP request, which opens a connection and directly closes it after getting/setting information. Web sockets have a different web protocol than HTTP, and they are the fundamental principle in typical streaming/real-time services.

Remote debugging and WebSockets

The final fundamental parts of remote debugging are more related to classic debugging: breakpoints, logs, and stack traces

Breakpoints allow the code to stop executing at a certain point and check the values of variables at that point. Of course, these can only be used when running the application in debug mode since they will stop the execution once the point is reached. 

Logs and stack traces allow developers to check a wide range of values and variables so that they can pinpoint the location of the variables causing the error.

Methods of Remote Debugging

There are many remote debugging tools. The core principles of these tools are:

  1. Defining “non-breaking” points: You probably have heard of breakpoints. You insert one in a certain line of code. Once the code execution hits that line, the application stops and you can see a stack trace of the values of the variables up to that point for debugging. However, this doesn’t work well when you’re debugging live in production. Non-breaking points can be inserted while your live application is running, without any re-deployment or restarting. They allow developers to see all sorts of stack traces as normal breakpoints.
  2. Security of source code: Remote debugging connections are designed in such a way that the source code isn’t available on the requests sent over the local machine and the server that you are debugging on.
  3. Concurrent debugging: This is useful for debugging race conditions where concurrent threads have entered a “deadlock” situation or for debugging a distributed system.

A good example is IntelliJ’s remote debugger. It is quite straightforward to set up. After creating a project, you will have to add a new remote debugger configuration. First, you need to configure the address of the remote machine where the host app will run and then configure the virtual machine options that the host application needs to be started with. 

Remote debugging is supported in IntelliJ IDEA

Source: How to Debug Remotely in IntelliJ

After that, you can simply set breakpoints, run the application, look at the debug logs, and terminate the remote debugging session when you are done. This will be quite helpful to get a feel of what remote debugging is like before you go ahead and start using a more specialized production-quality remote debugging tool.

We also have our own tool for debugging live applications: Lightrun. You can start using Lightrun with Java, Python and Node.js applications. Lightrun is a debugging tool that allows you to add logs and metrics in real-time, without stopping your live production application. It helps resolve production issues in microservices, serverless, Kubernetes, and more types of applications. It also gives you performance metrics for your code, helping you monitor and resolve performance bottlenecks.

Why Use Remote Debugging?

According to Queue.acm.org, “Software developers spend 35-50 percent of their time validating and debugging software. The cost of debugging, testing, and verification is estimated to account for 50-75 percent of the total budget of software development projects, amounting to more than $100 billion annually.” 

Increasing the efficiency of debugging is very significant. Typical developers debug applications locally, whereas remote debugging offers a much quicker and more efficient solution to debugging.

The next major reason to use remote debugging is to cut down on the unproductive time and resources spent on replicating environments, cloning server code, and configuring things locally. These are often some of the most frustrating tasks that require the input of a lot of team members, especially the senior ones. With remote debugging, this process can be greatly simplified.

Advantages and Disadvantages of Remote Debugging

Advantages of Remote Debugging:

  1. Fast and Budget-Friendly: Remote debugging cuts down on the time developers spend replicating environments and configurations locally to reproduce the errors and fix them. Errors are a fundamental part of the software. No software is truly 100% error-free, and debugging will always be a continuous process. An efficient organization must optimize this process, and the first step to do so is to use remote debugging instead of local debugging.
  2. Full of Modern Features: While classic debugging tools offer logging and stack traces, they don’t provide monitoring, AI-powered log filtering, and non-breaking points. These extra features, offered by remote debugging, can help accelerate the debugging process and improve the overall reliability of the system.

Disadvantages of Remote Debugging:

  1. Heavily Dependent on Permissions: Remote debugging requires access to the server and admin privileges on the server. While this might not be the most difficult process to configure, opening up the server to more remote connections will always increase vulnerabilities, especially if the server contains a database with sensitive information, such as user passwords. 
  2. Sync-sensitive: Source code deployed on the server must be in sync with the code on the remote debugger. While this is possible with web sockets, there is always room for error here, such as scenarios where the editor isn’t entirely in sync and the developer is just debugging on an old version of the source code.

Helpful Tips for Remote Debugging

There are a few different remote debugging tools depending on the programming language being used. 

  1. Java Platform Debugger Architecture: If you are developing for the JVM, have a look at the JPDA. It provides a lot of support for remote debugging. This is also usually used alongside Apache Tomcat and Eclipse.
  2. Visual Studio Debugging Tools: If you are on a different environment and IDE, VS also provides its own remote debugger. Feel free to check out the documentation here. There are a variety of different remote debuggers provided by Microsoft for different languages and different versions of Visual Studio. Remote debugging in VS Code is also possible.
  3. Don’t Forget Logging and Exception Handling: Getting into the habit of only remote debugging might not be the best idea. While remote debugging is definitely useful, it works best if it is being performed alongside effective application logging and accurate exception handling. Neglecting those 2 concepts will lead to a mountain of technical debt.
  4. Selenium Drivers: One more final tip is that current versions of the Selenium driver provide a lot of documentation and methods for remote debugging. For instance, you can find step-by-step instructions online for remote debugging android devices. This is quite helpful since replicating various environments on mobile devices is actually quite difficult.
  5. Take a look at Lightrun: our live debugger for your production environment. With Lightrun, you can inject logs without changing code or redeploying, and add snapshots: breakpoints that don’t stop your production applications. Lightrun supports Java, Python and Node.js applications, and you can start using it today.

Conclusion

To wrap up, it might be time to start abandoning the old habit of printing variables to the console for hours and hours to find an error. There are a lot of advancements that have been made in the debugging area and remote debugging is definitely one of them. It might not be super simple to get started with remote debugging, but hopefully, you have a much better idea of how to do it after reading this article.

The post Remote Debugging: The Definitive Guide appeared first on Lightrun.

]]>
Developers Can Now Debug Running Nomad-Orchestrated Applications Using Lightrun https://lightrun.com/hashicorp-nomad-lightrun-driver/ Fri, 29 Oct 2021 11:39:31 +0000 https://lightrun.com/?p=6502 In basically every modern software organization, building software is not just a matter of writing code – it’s a matter of testing it to ensure it works properly, a matter of creating artifacts out of it that can be used by the end customers, and a matter of deploying them to a customer-accessible location for […]

The post Developers Can Now Debug Running Nomad-Orchestrated Applications Using Lightrun appeared first on Lightrun.

]]>
In basically every modern software organization, building software is not just a matter of writing code – it’s a matter of testing it to ensure it works properly, a matter of creating artifacts out of it that can be used by the end customers, and a matter of deploying them to a customer-accessible location for these customers to be able to actually use it.

As the number of possible languages, runtimes, and deployment options exploded in recent years, it became difficult to manage all the different components that exist in our system. You are required to use different tools to orchestrate, deploy and later on maintain each type of application – a serverless Node function on AWS Lambda, a non-containerized Java application, and a bare-metal server running a Django app are all very different from one another in their topology and the tooling available around them.

Controlling all your system components at scale might prove to be a difficult task to handle, especially due to the unique nature of each of the moving parts – it often requires specialized knowledge of each of the underlying platforms that not all DevOps or service-owning Developers possess. When the inevitable incident comes up, it can take a long while to untangle what exactly happened in the system.

Nomad by Hashicorp is a new kind of workflow orchestrator, one that cares little about the underlying platform of the applications it orchestrates. Unlike other popular orchestrators – Kubernetes chief among them – a Nomad Task does not have to be containerized.

In fact, Nomad Tasks are completely agnostic to the underlying platform by relying on the concept of Task Drivers – core pieces of Nomad that can execute any type of application, provided them in any type of packaging – Docker containers, JVM-based JAR applications, Serverless functions, simple bash scripts and much more.

Lightrun is proud to announce that developers can now debug live, running Nomad-based applications using a native HashiCorp Nomad Lightrun Task Driver. The driver will automatically add Lightrun to a running HashiCorp Nomad task, allowing your developers to add new real-time logs, metrics, and traces to production applications, where none were before – all without redeploying, restarting, or even stopping the running tasks.

Lightrun Actions (Logs, Snapshots, and Metrics) can be added from the IDE or the CLI to any Nomad Task, and then consumed right inside the same context – without ever switching contexts or going through the entire CI/CD pipeline just to get extra visibility into your running applications.

In addition, Lightrun is integrated into the developer workflow and offers integrations with leading APMs, log analysis platforms, and other observability vendors, including Elastic, New Relic, and Datadog – all in order to allow the seamless integration of your Nomad Task logs and the new Lightrun logs that were added in real time.

Lightrun Actions are read-only and protected by the proprietary Lightrun Sandbox, allowing for a safe and performant way to instrument logs, snapshots, and metrics into production. All invocations are checked to ensure no side effects happen, and a rate-limiting mechanism is in place to ensure the logs remain performant over time – even when the machine is under a lot of stress.

While Nomad removes the need to “containerize all the things” with a sane, easy to use and powerful platform for orchestrating any type of application, Lightrun replaces the iterative, non-agile process required to understand how these applications work with a developer-native, fast, and intuitive debugging workflow.

If you want to learn more, you should check out the Lightrun Driver docs or tune in to one of our R&D team leads, Itai, who just gave a talk about the new Nomad Lightrun Driver in a recent HashiTalks session.

The post Developers Can Now Debug Running Nomad-Orchestrated Applications Using Lightrun appeared first on Lightrun.

]]>
Interview with Tom Granot – Developer Observability, KoolKits and Reliability https://lightrun.com/interview-with-tom-granot-developer-observability-koolkits-and-reliability/ Wed, 16 Mar 2022 13:24:44 +0000 https://lightrun.com/?p=7115 In preparation for the upcoming Developer Observability Masterclass we’re hosting at Lightrun with Thoughtworks, RedMonk and JFrog, I sat down for a brief interview with Tom Granot – the Director of Developer Relations at Lightrun. Tom will MC the event as he did for the Developer Productivity Masterclass we ran back in December. Shai: Tell […]

The post Interview with Tom Granot – Developer Observability, KoolKits and Reliability appeared first on Lightrun.

]]>
In preparation for the upcoming Developer Observability Masterclass we’re hosting at Lightrun with Thoughtworks, RedMonk and JFrog, I sat down for a brief interview with Tom Granot – the Director of Developer Relations at Lightrun.

Tom will MC the event as he did for the Developer Productivity Masterclass we ran back in December.

Shai: Tell us a bit about yourself and your background.

Tom: Before joining Lightrun, I worked as an SRE at Actyx. This drove home the importance of observability tools – I got a sense of the potential problems and, to some extent, the blindness that developers face in production.

At Lightrun, I run the Developer Relations group and the open source initiatives, such as KoolKits.

Shai: Observability isn’t new. Why even have this Masterclass?

Tom: This isn’t a Masterclass about observability – it’s about Developer Observability. To explain why the differentiation is required, note that over the past decade or so DevOps and SRE roles became commonplace in many software organizations.

This is great, as it means developers can focus on building their applications and there are DevOps engineers that deal with building and maintaining systems that get this code into the hands of customers. In addition, SREs are there to build systems to operate, maintain and observe the software as it is running in production – and they have their own set of tools to do their job well. The problem starts, of course, when something goes wrong in production that might relate to the actual application logic or performance and not the infrastructure that runs it.

Back in the day when production failed, we could just connect to the server and see what’s going on to understand why a specific path of our code is behaving in a weird way. Now, as application developers, we rarely have access to the production machine ourselves as there are dedicated teams that are supposed to do that work instead of us.

This removes us from “the metal”, and this gap between the developer and the actual code running in production results in a difficulty to understand exactly how our application is behaving in production, and how we should build the next set of features sot that they will behave well in the real world.

Observability tooling is there to fill that gap, but most of the tools are designed for DevOps and SREs. This makes sense as they usually handle the production workloads – but it leaves application developers out of the picture. Developer Observability revolves around tools and practices that were designed with application developers in mind. They provide the value proposition developers need, and “meet them” in their own environments.

This fits cleanly into the “shift left” paradigm, that is all about re-empowering developers. Bringing tools to R&D so we can be as productive as we were a decade ago, when the machine running our code was closer to us (both physically and metaphorically), without sacrificing all the amazing gains we made in this field.

Shai: Can you give examples?

Tom: Sure.

A tool designed for the DevOps crowd – like an APM – will present you with performance information about a web service. It won’t tell you exactly the failure happened – it will just tell you there was a failure.

This is fine for operators, but not so great for a developer.

The obvious comment when I mention this to people is that when you have a failure you might have logs, but in my experience might is key; you usually either have no logs or way too many logs that raise your ingestion bill. And even then, combing through logs to find the right container in the right pod in the right node in a large k8s cluster… that isn’t fun!

Developer observability tools work at the source code level: too fine-grained for OPS, but just right for R&D.

Shai: It sounds like you’re describing debugging. How is this different?

Tom: There’s a great deal of correlation between developer observability and debugging. We use both with a similar intent of tracking a bug or failure. The differentiating factor is the production environment. In it, we need stability, security and scale. We can’t address any of those with regular debuggers.

Shai: Why not?

Tom: Critical mass. Kubernetes and cloud-native revolutionized our industry – it’s now possible to build applications at a scale that only FAANG companies could dream of a few years back. Serverless and hosting made things even more complex and further diverged the production environment from local and staging.

With this much scale and diversity, you’re bringing a lot of complexity to the table. As complexity in production rose quickly over the past decade, we’re now facing a dire need for better comprehension of our production environments; in the past, observability tools were optional. As production complexity rises, they become essential.

Shai: Can you tell me how KoolKits fits into that?

Tom: At Lightrun we keep our finger on the pulse of developer-first innovation. When kubectl debug went into beta, we instantly started evaluating it and tried to understand how we can build on that. For the unfamiliar, Kubectl debug lets you run a new container in the same namespaces as a running pod, for debugging purposes.

This is amazingly useful when tracking some production issues. But one of the pain points in that approach has been the bare nature of the container images you get out of the box with kubectl debug. There are no tools, debuggers, vim etc – you have to bring your own tools.

That’s when Leonid, our CTO, came up with the idea of KoolKits – which is an opinionated set of pre-installed tools for kubectl debug. There’s a variant for each language/platform, e.g. Java (JVM), NodeJS, Python and even Go. We’re pretty excited about it and recently open-sourced it as well – see the project here.

Shai: Thank you so much for this interview, Tom. Do you have any closing thoughts you wish to share?

Tom: I hope everyone joins us for the Developer Observability Masterclass. There will be some amazing industry leaders to talk to and I can’t wait to pick their brains on this awesome new methodology. The Thoughtworks team are the people who brought us knowledge on refactoring, microservices, progressive delivery, etc. well before those were “trends” and common best practices, and James from Redmonk is an industry legend!

This also goes for Baruch Sadogursky, Dev & DevOps Advocate at JFrog who we were lucky to get.

I’m sure we all have a lot to learn from them. See you there!

The post Interview with Tom Granot – Developer Observability, KoolKits and Reliability appeared first on Lightrun.

]]>