Dev Tools archives - Lightrun

Reduce 60% of your Logging Volume, and Save 40% of your Logging Costs with Lightrun Log Optimizer

Eran Kinsbruner — Wed, 01 Mar 2023 18:00:29 +0000

As organizations are adopting more of the FinOps foundation practices and trying to optimize their cloud-computing costs, engineering plays an imperative role in that maturity. Traditional troubleshooting of applications nowadays relies heavily on static logs and legacy telemetry that developers added either when first writing their applications, or whenever they run a troubleshooting session where they lack telemetry and need to add more logs in an ad-hoc fashion. Other observability solutions only mirror the logs that are in the code that’s running. These tools are not scanning or reflecting the effectiveness of existing new and legacy log statements. To modify or add new and more effective logs to an application, existing observability tools rely on developers to add new ad-hoc logs through an iterative, time consuming, and costly redeployment process. Furthermore, a great amount of these logs will most likely be never consumed or used by the developers, hence, they are piling up and creating more noise in the code than value, and they are costly.

To get engineers to act upon cloud cost optimization, they need shared knowledge and visibility into what contributes to the rising costs, and they need this knowledge accessible from their native environment (their IDEs).

Without actionable data and visibility, engineers cannot be aware of cost related attributes, therefore, cannot address such objectives.

To solve such problems and help developers better understand the correlation of static logs to the overall cloud costs, Lightrun is happy to introduce the Log Optimizer. This new solution is a revolutionary automated log-optimization and logging cost-reduction solution that is part of the Lightrun IDE plugins. It allows developers to scan their source code (single file or complete projects) for log waste, and get in seconds the log lines that are replaceable by Lightrun’s dynamic logs.

With the new offering, developers can maintain a cleaner code with less static and inefficient log statements, and engineering managers and business management can benefit from a lower cloud bill that’s associated with logging costs. Such collaborative solutions should contribute to a better culture of cost awareness within the engineering group, and support the adoption of FinOps practices throughout the organization.

In this post, you will learn how by adopting the Log Optimizer you can:

Use logging scanned data inside your IDE to guide you on replacing redundant and static logs with dynamic logs.
Continuously be aware and accountable for logging costs and understand how to optimize them.
Realize how shift left of FinOps practices toward engineering can establish a culture of cost awareness.

Gaining Visibility Into Redundant Logs and Eliminating Noise in your Code

The new Log Optimizer solution, comes pre-packaged with Lightrun IDE plugins, and it is an automated log optimization solution that enables developers to scan their source code for log “waste”.

Developers can run a scan of a single source code file or their entire project written in either Java, JavaScript, TypeScript or Node.JS and get an immediate output of recommended log lines to be omitted from the code.

To get started with the Log Optimizer, please ensure you get the latest version of the Lightrun platform, and follow the instructions on the setup page.

Upon completion of the IDE plugin installation and the authentication steps, you should be ready to go.

From your project folder in the IDE (e.g. IntelliJ Idea), run a Log Optimizer scan for either a specific java source file or the entire project directory (see illustration of the supported scan types of the solution).

Keep in mind that to run the solution, you will need to have the Docker client software running on the machine that scans the source code with its ability to download Docker images from the Docker Hub.

The above add-on to the Lightrun IDE plugin allows you to continuously scan your projects as you add more logs and code into it. The scanning action results in a detailed log optimizer scan output that you can review and analyze in the dedicated Log Optimizer IDE console (see examples below).

Upon a Log Optimizer scan of a single Java source file, a developer will get as an example the following output to the IDE dedicated console.

As you can see, the single Java source file ‘OwnerController.java” has 5 logs that can be omitted from the source code and save the team money on redundant logs, as well as contribute to a cleaner code. The Log Optimizer detected in the above class in line 85 a marker log that is only used for marking that the software reached a certain point in the code. This specific log does not provide new information or insight as part of a troubleshooting process, hence, can be removed. Other examples for logs that can be replaced with dynamic ones or improved and were spotted in the same Java class are logs that contain multiple instructions and are not merged into a single log statement.

With the above in mind, when the Log Optimizer is executed on the full Java project, the output showcases a complete scan results with potentially more opportunities for log optimization actions. As can be observed in the below screenshot, a complete scanning output of the Java project within the IntelliJ IDE on the same PetClinic project was able to uncover 9 unique logs that are candidates for exclusion from the project.

The wider scan has identified more cases for log optimization such as the one in “PetController.java” where the tool detected a log within the method “initCreationForm” that has a marker just before the return statement.

Now that the value of log optimization around cost savings and cleaner source code is clear, the next step is to decide which log lines are going to be removed and be replaced as needed with dynamic logs, and which ones have a justification to remain inside the project. This scanning activity should be an ongoing and continuous process since the code keeps on changing on a constant basis.

Embed Continuous Log Optimization Practice within the Engineering Team

As mentioned above, making engineering aware and accountable for logging costs and providing them with the right tools to optimize such recurring expenses, starts with solutions that are accessible to the developers from their native IDEs and with concrete actionable data.

By visualizing to developers the aggregated redundant log lines in their project, it triggers an action that results in optimization. The best and most recommended practice to make engineering adopt such log optimization tools and techniques is to make this activity be part of the CI/CD per software iteration or release.

By continuous execution of a log optimization tool, developers get used to the practice, and are realizing the value of logging only when needed and where needed within their source code.

Establish a Culture of Cost Awareness within Engineering and Shift Left FinOps Practices

Engineering feels more empowered and accountable for cost efficiency around their activities when they get clear visibility and information around their software deliverables within their comfort zone environment.

That is why we’ve built the Log Optimizer solution on top of our IDE plugins, so it fully aligns with how developers are already troubleshooting and monitoring their code. With the ability for developers to run on-demand a logging scan on a single source file or on the entire code repository, they are in a frictionless and natural manner adopt a culture of cost-optimization and accountability. That step of awareness and accountability falls under the category of “Inform” within the FinOps framework methodology. With this gathered information, engineering and FinOps teams can move towards the higher maturity phases that are Optimize and Operate, and find the right paths toward cost optimization.

It is an ongoing process, however, as engineers practice and develop a culture of cost awareness, the entire organization can be better aligned, aware, and accountable around FinOps practices and opportunities for cost savings.

Bottom Line

Lightrun’s Log Optimizer solution aims to enable organizations ability to cut on logging costs through a unique visualization of redundant logging data directly within the developer IDE. It is a great step toward enabling both engineering and FinOps teams to break down silos between them; act on optimization logging cost based on pure data; and make this practice a continuous and automated way of developing and delivering new software.

Learn mode from our documentation on how to get started with the Log Optimizer.

The post Reduce 60% of your Logging Volume, and Save 40% of your Logging Costs with Lightrun Log Optimizer appeared first on Lightrun.

A Peek into the Next Generation Observability Solutions

Lightrun Team — Fri, 16 Dec 2022 08:33:35 +0000

As organizations strive to meet the challenges of digital transformation, they are adopting newer technologies to build more robust software systems. Next generation observability solutions are paving the way to help them meander this maze to deliver better customer experiences and drive business results.

What is Next Generation Observability?

Observability is the practice of monitoring an IT (Information Technology) system to probe its internal state without the need to modify the source code. This approach ensures that any anomalies are detected and diagnosed in runtime, and there is complete visibility into all aspects of the system without requiring any code changes.

Historically, logging has been the most commonly used approach to achieve observability. However, it is not a truly non-invasive approach, since additional logs cannot be added during runtime without altering the source code. The realm of observability begins where the scope of logging ends, and it plays a significant role in the holistic performance analysis of a software system beyond just debugging.

In the context of traditional monolithic software architectures, observability is a much superior approach for runtime debugging. However, given the architecture shift towards microservices, observability’s role expands to cover interlinked software modules in cloud-native deployment environments. Such systems have various moving parts distributed across multiple cloud deployments that require:

Proactive Monitoring: For a more comprehensive view of the system, which monitors the health of individual components and understands how they interact with each other.
Advanced Analytics: For collating the data to analyze the overall performance of the system, thereby generating insights across multiple deployments.
Seamless Integration: To provide better integrations with the tech stack and external monitoring tools.

Next generation observability is a step toward achieving such breakthroughs in a cloud-native application deployment which is the de facto model for building planet-scale web applications.

The Three Directions for Next Generation Observability

The core notion of observability is the capture of MELT data. MELT stands for Metrics, Events, Logs, and Traces.

Metrics capture the numerical measurements of system performance, such as database query time. Events and logs capture the occurrence of a specific behavior of the system. Traces are detailed records of system activity that can be used to unearth the root cause of errors. Together these four forms of data constitute the recorded output of applying an observability solution to a system.

As systems become more distributed and ephemeral, the traditional approach to observability is untenable for providing the required visibility. MELT data generated from microservices or distributed sub-components are scattered. There is a need to unify this data to draw some inferences, thereby increasing the complexity of integrating observability solutions with the system.

Next generation observability solutions cover the gap between traditional monolithic system architecture and the progression toward a cloud-native, globally distributed deployment architecture. Additionally, it also addresses the specific observability requirements for specialized applications, such as AI/ML infrastructure or payment gateways.

Accordingly, there are a few evolutionary paths for next generation observability solutions.

Newer Observability Features

In the world of observability, MELT data gathering is key to understanding what is happening with your system. Basic observability solutions can all collect and analyze this data. Newer observibility features revolve around:

Advanced data collation: With the shift towards hybrid cloud and decentralization, collating MELT data from multiple disparate sub-systems is not an easy task. Therefore, one of the areas for next generation observability solutions is advanced data analysis using statistical models or data visualizations.
Observability with BI: Another area for advanced data analytics with observability data is the integration with business analytics to drive better business outcomes. In this way, observability data helps analyze business application performance, which in turn predicts business performance.
DevOps integration: Due to the increased focus on software release via continuous integration, and continuous delivery, there is a scope for embedding observability features into the CI/CD pipeline such that it is possible to check system performance at the pipeline, via automated test cases.

Domain Specific Observability

Most observability suites are applicable for general software systems. However, they do not extend well for certain types of software. Artificial Intelligence (AI) based software is one such type. With the rise of AI and ML-based applications, traditional observability solutions fall short since AI and ML observability solutions require a proactive means of observation into the full cycle of the ML models. This includes, but is not limited to, detection of the drifts in model accuracy, skews in input data, and general model performance analysis.

Like AI and ML, certain business processes require specific observability capabilities. Payment processing is one such critical process. It relies upon external integrations to expedite bank payments, in an efficient and automated way. Any time the system fails to process a payment, there is a huge backlog accumulated. A real-time payment observability solution addresses these issues by capturing certain business process-specific key performance indicators (KPIs). In the case of payment observability, the typical KPIs are transaction volumes, average response times, average transaction values, and total transaction value by the merchant.

Deployment Centric Observability

Software deployment architecture has undergone a total transformation in the last few years. A preference for cloud-native applications mainly drives this. From an observability point of view, this shift poses a few challenges, as well as some opportunities to build advanced features on next-generation observability platforms.

Debugging complexity in microservices: An application built with microservices architecture is much more difficult to debug. That is because it has a highly distributed deployment with hundreds of runtime instances across multiple servers. Next generation observability platforms solve this problem by building intelligent information gathering such that data across all instances are collated and presented to portray a unified system-wide performance snapshot.
Hybrid cloud: The hybrid cloud deployment model adds a dimension of complexity in the form of edge computing. This approach necessitates additional observability at the edge and collation of data across multiple edge deployments. Next generation observability solutions are gearing up to this challenge by adopting a distributed agent approach, wherein observability data is captured from multiple edge sites via slave agents, and orchestrated by a hosted master agent at a central site.

Observability for Enhanced Developer Experience

Most observability solutions are built with a heavy focus on the DevOps and ITOps side of the operations. They combine tools for monitoring, telemetry, analysis, and a host of other features to manage the observability data. In contrast, the defects identified through observability are mostly routed back to developers. Therefore, along with the experience of operations teams, developer experience with observability solutions is an important consideration.

Continuous Observability

One trend that emerged out of the observability practice is the feedback loop from production to the development environment. After all, any anomalies reported through the observability data in production have to be looped back to development as a potential bug. However, it is easier said than done since replicating issues and capturing logs and metrics in production environments have constraints. But assuming there is a way of doing it, there is a scope to build a streamlined process whereby developers and SRE teams can probe the production environment and get immediate answers, which are fed back to the development cycle.

Lightrun aims to solve this problem by introducing continuous observability in the developer’s workflow. It seamlessly integrates within the software tech stack as well as the developer IDEs and toolchain to provide an interface to generate logs, traces, and metrics, irrespective of the environment. This enables developers to create a better understanding of live application behavior on the fly.

If you are building an application on Java, Python, or Node.js, do give Lightrun a try. You can book a demo to get a sneak peek into the Lightrun platform.

The post A Peek into the Next Generation Observability Solutions appeared first on Lightrun.

Why Real-Time Debugging Becomes Essential in Platform Engineering

Amir Ish Shalom — Thu, 19 Oct 2023 11:03:55 +0000

Introduction

Platform engineering has been one of the hottest keywords in the software community in recent years. As a natural extension of DevOps and the shift-left mentality it fosters, platform engineering is a subfield within software engineering that focuses on building and maintaining tools, workflows, and frameworks that allow developers to build and test their applications efficiently. While platform engineering can take many forms, most commonly, the byproduct of platform engineering is an Internal Developer Platform (IDP) that enables self-services capabilities for developers.

One of the notable challenges with building a successful platform engineering organization is that there still exists a big gap between dev and ops teams in terms of the tools and the domains they operate in. While the promise of DevOps is to bridge that gap, oftentimes traditional tools designed by and for operations teams are blindly applied to internal developer platforms, drastically reducing their effectiveness. In order for IDPs to be truly self-service and beneficial for all parties involved, observability must play a key role. Without observability, developers will not be able to gather insights into their applications and debug as true owners of their code.

Important to note that platform engineering comes to serve the wider organization at maximum scale across multi-cloud providers (AWS, GCP, Azure), multi-environments (QA, CI, Pre-Production, Production), and multi-runtime languages (Java, C#, .Net, Python, etc.). Being able to debug and troubleshoot all of the above mentioned configurations and code bases in a standardized way is a huge challenge as well as a critical pillar for success.

As a matter of fact, in a recent article that covers the core skills that are required from a platform engineer, 2 out of the top 8 skills were around developer observability and debugging.

Core Skills Required from a Platform Engineer (Source: SpiceWorks)

In this article, we will explore some of the key components of platform engineering and how they manifest in internal developer platforms. We will then shift our focus to the growing importance and adoption of developer focused real-time observability in IDPs and how traditional observability tooling often falls short. Finally, we’ll look at how Lightrun’s dynamic observability tooling can unlock the true value of IDPs.

Key Components of Platform Engineering

Platform engineering came largely as a response to the difference in the idealistic promises and the stark realities of DevOps in practice. While the “you write it, then you run it” ethos of DevOps sounds good, the reality is not so simple. With the rise of cloud native architectures and microservices, we now have more complex moving components to run an application. It is unrealistic to ask developers to not only write their code but also be well-versed in what traditionally falls under the Ops bucket (e.g., IaC, CI/CD, etc).

So platform engineering is a more practical response to carry on the spirit of DevOps while acknowledging the real-world constraints. Some of the key components of platform engineering includes:

Promoting DevOps Practices: This includes IaC, CI/CD, fast iterations, modular deployments, etc.
Enabling Self-Service: Platform engineering teams should enable developers to build and test their applications easily. This touches not only on the build pipeline, but also the infrastructure and other related third-party APIs and services that developers can spin up and connect to on demand.
Providing Tools and Automation: As a follow up to the first two points, platform engineering teams should provide a collection of tools, scripts, and frameworks to automate various tasks to speed up developer lifecycles and reduce human error.
Balancing Abstraction and Flexibility: There should be a good balance between abstracting away the underlying infrastructure to support a scalable and performant platform with exposing important metrics, logs, and other observability data points for engineers to troubleshoot issues. In addition, this allows ownership of services by developers (DevOps practice) without the overhead of understanding all infrastructure parts. Basically shifting left to the developers without the cost of infrastructure complexity.

In short, the platform engineering team acts as a liaison between developers and other infrastructure-related teams to provide tools and platforms for developers to write, build, and deploy code without diving too deep into the complexities of modern infrastructure stacks.

Internal Developer Platforms

These principles are best seen in internal developer platforms. IDPs cover the entire application lifecycle beyond the traditional CI/CD pipeline responsibilities. IDPs provide developers with a flexible platform in which they can quickly iterate on testing their applications as if it is done locally. More specifically, this includes:

Provisioning a new and isolated environment to deploy and test their applications.
Ability to add, modify, and remove configuration, secrets, services, and dependencies on demand.
Fast iteration between building and deploying new versions as well as the ability to rollback.
Scaling up or down based on load.
Production-like environment with guardrails built in to not accidentally cause outages or degradation in service for other teams.
Enablement for developers to understand at all times their application costs and allow them to participate and own the overall cost optimization efforts.

In other words, IDPs provide developers a self-service platform that glues together all the tools behind the scenes in a cohesive manner.

Importance of Real-Time Debugging within an IDP

One of the critical components of a self-service platform is observability through real-time debugging. Without exposing adequate levels of observability to the developers, IDPs will remain a black box that will trigger more support tasks once things go wrong, which defeats the purpose of setting up a self-service platform in the first place. Ideally, developers have access to logs, metrics, traces, and other important pieces of information to troubleshoot the issue and iterate based on the feedback.

As such, real-time observability plays a critical role in creating a successful platform engineering organization and a robust IDP. Platform engineers and VP’s of platform engineering that are building IDPs today are investing and prioritizing the need to efficiently collect logs, metrics, and traces and surface the most relevant signals for developers to detect, troubleshoot, and respond to those issues.

Real-Time Debugging within IDP using Lightrun

Lightrun offers a unique solution that aligns with the principles of platform engineering and adds observability in a way that fits with existing developer workflows. Lightrun provides a standard developer observability platform for real-time debugging that allows developers across multiple clouds, environments, runtime languages and IDEs the ability to debug complex issues fast without a need for iterative SDLC cycle and redeployments.

Specifically, provide developers in real time with:

Dynamic logging: developers can add new logs without stopping or restarting their applications to simply add a new log. This can be added conditionally to only show up in certain scenarios to reduce the noise.
Snapshots: snapshots emulate what breakpoints would give in a local context. It takes a snapshot of the current execution including environment variables, configuration, and other stack traces at run time.
Metrics: developers often don’t think about preemptively adding metrics. Now with Lightrun, they can be collected on demand.

These dynamic observability tools are as mentioned integrated into IDEs that developers already use to write their code. Compared to traditional observability tools like APMs or logging aggregators, Lightrun allows developers to add or remove various logs, snapshots, or metrics on demand without having to go through the expensive iteration cycle or adding logs, raising a PR for review, and waiting for changes to take effect. Especially in the context of IDPs, this dynamic approach enables developers a truly self-service method to troubleshoot and debug their applications.

Summary

The rise of platform engineering in recent years has significantly improved developer productivity and experience. Internal developer platforms address a growing problem of increased complexities in developing and deploying modern applications. As more organizations embrace platform engineering and build out internal developer platforms, observability is becoming an imperative tool in standardizing real-time debugging within the IDP tool stack for a truly self-service platform. With Lightrun’s suite of dynamic observability tooling, platform engineering teams can unlock the true potential of IDPs for increased developer productivity.

The post Why Real-Time Debugging Becomes Essential in Platform Engineering appeared first on Lightrun.

Putting Developers First: The Core Pillars of Dynamic Observability

Eran Kinsbruner — Sun, 24 Sep 2023 13:55:04 +0000

Introduction

Organizations today must embrace a modern observability approach to develop user-centric and reliable software. This isn’t just about tools; it’s about processes, mentality, and having developers actively involved throughout the software development lifecycle up to production release.

In recent years, the concept of observability has gained prominence in the world of software development and operations. Rooted in three foundational pillars—logging, metrics, and tracing—observability provides a comprehensive understanding of application behavior. These pillars allow teams to diagnose and address issues with greater precision and efficiency.

However, a notable challenge in observability is that many tools available today are designed by and for operations teams. Their primary focus often lies in monitoring, alerting, and system health from an infrastructural standpoint. This design bias can leave developers, who require a different granularity and data context, somewhat in the lurch. Instead of offering insights into code behavior, performance bottlenecks, or specific code-level issues, traditional observability tools may present data in a way that’s more aligned with operational needs. This mismatch underscores the importance of creating or adopting observability tools that cater explicitly to developers, ensuring that they can gain actionable insights from the system and application data in a manner that resonates with their specific workflow and challenges.

With the surge in adopting a platform engineering approach, there’s a profound shift in how organizations perceive and manage the Software Development Life Cycle. At the heart of this approach is providing developers with a robust platform that abstracts away infrastructural complexities and offers tools and services that accelerate development. As platform engineering becomes a catalyst for advanced SDLC management, there is a pressing need to elevate observability proficiency across organizations. Platform engineering, by design, involves a profound intersection of development and operations, which necessitates that the engineers possess a unique blend of skills. Among the emerging skill sets, debugging and observability stand out as paramount.

Why Developer Ownership is Non-negotiable

Over recent years, the software engineering industry has recognized the importance of granting developers ownership of their products to ensure software reliability, agility, and ease of maintenance. Developers should have control over their code, from creation to deployment. They must be able to deploy, rollback, observe, and debug code in production in order to speed up the feedback loop at the core, enabling faster improvements.

The software and overall user experience could improve with the right tools and responsibilities. Real-time debugging in a production environment is invaluable as developers have more context and knowledge to quickly fix the issue as they understand the recent changes best.

The Lightrun Three Pillars of Dynamic Observability

Lightrun offers a suite of features designed to enhance developers’ capabilities. One standout aspect is Lightrun’s ability to debug applications right in the live environment, providing real-time, on-demand insights irrespective of where the application is running.

Pillar 1. Dynamic Logging

Text logging remains a fundamental debugging tool. However, using it in remote environments presents challenges. Centralized logging platforms have grown, offering centralized log ingestion with efficient search capabilities. Yet, they often fall short for real-time remote debugging, mainly because of inherent delays, focus on post-event analysis, and disconnection from the local development environment.

In debugging remote environments, traditional logging can slow down the feedback loop for the developer, as adding a log line usually requires at least an entire CI/CD pipeline run, and most often, deploying a new version to production is impossible or hard to do frequently.`

Many developers opt for overlogging to compensate, leading to increased storage, computation, and possible licensing costs, not counting the difficulty of navigating a massive amount of logs to find the required piece of information.

Finally, log tools are often poorly integrated into developers’ IDEs, resulting in an unnecessary learning curve and shifting developers’ attention away from their primary environment. In some extreme cases, developers lack direct access to production logs because the organization cannot offer a method for secure access.

On the other hand, Lightrun Dynamic Logging enables developers to add new logs without halting the application. This ensures uninterrupted access to crucial data directly from the developer IDE. There’s also the possibility to log only when a specific code-level condition is true, significantly reducing the amount of information that needs to be evaluated to pinpoint an issue.

Pillar 2. Snapshots

Traditional debugging methods often involve a fragmented approach: logs for raw data, metrics for system health overviews, traces for request flows across services, and the occasional breakpoint to dive deep into a specific problem. While each tool offers its distinct advantage, developers often find themselves bouncing between them, trying to piece together a comprehensive understanding of what’s happening within their code. This approach can slow debugging and leave significant gaps in understanding, especially when attempting to correlate high-level data and specific code behaviors. Also, the powerful debugging model where the developer can put breakpoints in the applications can not be directly translated into running live applications, as you can not block them easily.

On the other hand, Lightrun Snapshots introduce a paradigm shift in the debugging process by acting as virtual breakpoints that don’t disrupt the flow of application execution. Unlike traditional breakpoints, which halt execution for inspection, Lightrun Snapshots seamlessly blend into the running application, allowing developers to add conditions, evaluate expressions, and delve deep into any code-level object without ever having to stop, restart, or redeploy the application. Integrated completely within the developer’s IDE, these snapshots not only offer a debugger-like experience but also enable a deeper connection to live applications by alerting developers when specific code segments are executed. This dynamic and continuous approach to debugging, compatible with a range of platforms like AWS, Azure, and Kubernetes, ensures that developers can gain deep insights into their applications right beside the source code, making debugging more intuitive and efficient.

Pillar 3. Metrics

Traditionally, just like with logs, developers have often felt the need to preemptively add many metrics, trying to cover all bases. This scattershot approach not only clutters the telemetry data but also risks overlooking that one critical metric needed during a production issue. Lightrun, however, challenges this paradigm by offering dynamic, code-level metrics. Instead or in addition to instrumenting the application with metrics upfront, Lightrun allows for the real-time insertion of precise metrics directly into live applications, ensuring relevance and accuracy without compromising the execution or state of the application.

With its comprehensive suite of tools, developers can gain insights ranging from the frequency of a specific line being executed with the Counter, to the time efficiency of methods with Method Duration and even block-wise timing with TicToc. Custom Metrics further broaden the scope, granting the freedom to export any numeric expression into a trackable metric.

In Summary

With its suite of features, including dynamic logging, snapshots, and real-time metrics, Lightrun integrates seamlessly with developers’ existing IDEs, positioning itself as an essential ally in the modern development toolkit. If you’re looking to stay ahead in the competitive development space, Lightrun might just be your answer. Dive into its functionalities on the playground, or schedule a demo to experience its capabilities firsthand!

The post Putting Developers First: The Core Pillars of Dynamic Observability appeared first on Lightrun.

Troubleshooting Cloud Native Applications at Runtime

Moshe Sambol — Wed, 18 Oct 2023 18:07:57 +0000

Co-Authored with Gilles Ramone (Chronosphere)

Chronosphere and Lightrun demonstrate how their combined solutions empower developers with optimized end-to-end observability

===========================================================================================================

Introduction

Organizations are moving to micro-services and container-based architectures because these modern environments enable speed, efficiency, availability, and the power to innovate and scale more quickly. However, when it comes to troubleshooting distributed cloud native applications, teams face a unique set of challenges due to the dynamic and decentralized nature of these systems. To name a few:

Lack of visibility: With components spread across various cloud services and environments, gaining comprehensive visibility into the entire system can be difficult. Access to production environments is generally strictly limited to ensure the safety of customer-facing systems. This makes it challenging to understand run-time anomalies and identify the root cause of issues.
Complexity: Distributed systems are inherently complex, with numerous microservices, APIs, and dependencies. Understanding how these components interact and affect one another can be daunting when troubleshooting.
Challenges with container orchestration: When using serverless systems and container orchestration platforms like Kubernetes, processes can be ephemeral, making it very challenging to identify the resources related to specific users or user segments, and to capture and analyze the state of the system relevant to specific traffic.
Cost of monitoring and logging: Setting up effective monitoring and logging across all components is crucial, but it is costly to aggregate and complex to correlate logs and metrics from various sources.

Addressing these challenges requires a combination of a robust observability platform and tooling that simplifies complexity and helps developers understand the behavior of their deployed applications.

These tools must address organizational concerns for security and data privacy. The best observability strategy will enable the ongoing “Shift left” – giving developers access to and responsibility for the quality, durability, and resilience of their code, in every environment in which it runs. Doing so will enable a more proactive approach to software maintenance and excellence throughout the Software Development Life Cycle.

Efficient troubleshooting requires not just gathering data, but making sense of that data: identifying the highest priority signals from the vast quantity and variety produced by large deployments. Chronosphere turbo-charges issue triage by collecting and then prioritizing observability data, providing a centralized observability analysis and optimization solution.

Rather than aggregating and storing all data for months, at ever increasing cost to store and access, Chronosphere pre-processes the data and optimizes it, substantially reducing cost and improving performance.

When leveraging Chronosphere together with Lightrun, engineers are rapidly guided to the most incident-relevant observability data that helps them identify the impacted service. From there, they can connect directly from their local IDE via the Lightrun plugin to debug the live application deployment. With Chronosphere’s focus and Lightrun’s live view of the running application, developers can quickly understand system behavior, complete their investigation at minimal cost, and close the cycle of the troubleshooting process.

Chronosphere + Lightrun: A technical walk through

Ready to see Chronosphere’s ability to separate signal from noise in a metrics-heavy application, and Lightrun’s on-demand, developer-initiated observability capabilities in action?

To demonstrate, we’re going to use a small web application that’s deployed to the cloud and under load. Lightrun’s application provides simple functionality – users are presented with a list of pictures and they can bookmark those they like most.

In this example, we’ve been alerted by Chronosphere about something amiss in our application’s behavior: it seems that some users are experiencing particularly high latency on some operations. Chronosphere pinpoints this to the “un-like” operation.

But why only some users?

The app designers are doing some A/B testing to see how users react to various configurations that may improve the site’s usability and performance. They use feature flags to randomly select subsets of users to get slightly different experiences. The percent of the audience exposed to each feature flag is determined by a config file.

Unfortunately, in our rush to roll out the feature flag controlled experiments, we neglected to include logging, so we have no information about which users are included in each of the experiment groups.

The feature flags – possibly individually, possibly in combination – may be causing the latency that Chronosphere has identified. In order to know for sure, we’ll need to add some logging, which means creating a new build and rolling out an update. Kubernetes would let us do this without bringing down the entire application, but that might just further confuse which users are getting which feature flags, so it seems that some down time may be our best option.

Well, that would be the situation without Lightrun.

Since we’ve deployed Lightrun’s agent, we can introduce observability on-demand, with no change to our code, no new build and no restarts required. That means that we can add new logging to the running system without access to the containers, without changing any code.

We can safely gather the application state, just as we’d see it if we had connected a debugger, without opening any ports and without pausing the running application!

Lightrun provides remote, distributed observability directly in the interface where developers feel most at home: their existing IDE (integrated development environment). With Lightrun’s IDE plugins, adding observability on the fly is simply a matter of right-clicking in your code, choosing the pods of interest, and hitting submit.

Back to the issue at hand, we’ll use a dynamic log to get a quick feel for who is using the system. Lightrun quickly shows us that we’ve got a bunch of users actively bookmarking pictures. By using Lightrun tags, we’re able to gather information from across a distributed deployment without needing details of the running instances.

That’s nice, but it’s still hard to tell what’s going on with a specific user who’s now complaining about the latency. We use conditional logging to reduce the noise and zoom in on that specific user’s activity. From there we can see that their requests are being received, but we still need to answer the question: what’s going on?

What we really want is a full picture, including:

This user’s feature flags
The list of items they’ve bookmarked
And anything else from the environment that could be relevant.

Enter Lightrun snapshots – virtual breakpoints that show us the state without causing any interruption in service.

Creating a snapshot is just as easy as adding a log – we choose the tags that represent our deployment, add any conditions so that we’ll just get the user we’re interested in – regardless of which pod is serving that user at the moment. And there we have it, all of the session state affecting that user’s interaction with the application.

With this information we can see that one of our feature flags is to blame – it looks like it’s only partially implemented. It’s a good thing that only a small percentage of our audience is getting this one! Oops.

Before we roll out a fix, let’s get an idea of how many users are being affected by each of our feature flags. We can use Lightrun’s on-demand metrics to add counters to measure how often each block within our code is being reached. And we can add tic-tocs to measure the latency impact of this code, just in case our experimentation is also slowing down the site’s responsiveness.

Watch below the full troubleshooting workflow done both through Chropnosphere and Lightrun observability platforms.

The Chronosphere and Lightrun Combined Solution

It’s imperative to have all observability data going into a cloud native observability platform like Chronosphere, which helps alert us to the needle in the haystack of all the telemetry our distributed applications are producing. And with Lightrun developers are able to query the state of the live system right in their IDE, where they can dynamically generate additional telemetry to send to Chronosphere, for end to end analysis.

By using these solutions together, we leverage the unique capabilities provided by each. The result is full cloud native observability: understanding what’s going on in our code, right now, wherever it is deployed, at cloud native scale. Zooming in on the details that matter despite the complexity of the code and the deployment. Combining new, on-demand logs and metrics with those which are always produced by our code – for control, cost management, and automatic outlier alerting.

With developer-native, full-cycle observability, these powerful tools are supporting rapid issue triage, analysis, and resolution. This is essential to organizations realizing maximum observability benefits while maintaining control over their cloud native costs.

Feel free to contact us with any inquiries or to arrange an assessment.

The post Troubleshooting Cloud Native Applications at Runtime appeared first on Lightrun.

Lightrun Empowers Developers with Next Generation Metric Tools for Java Performance Troubleshooting

Eran Kinsbruner — Sun, 30 Jul 2023 14:57:50 +0000

Introduction

When it comes to debugging performance related issues, the range of these issues together with their root cause can be overwhelming to developers.

Resolving performance issues is a challenging task due to the multitude of potential factors that can contribute to their occurrence. These factors range from inefficient code or architecture that lacks scalability, to specific infrastructure problems related to hardware and storage. Additionally, reproducing performance issues in a local environment that mimics the production setup can be difficult, as well as identifying and addressing these issues for specific sets of users. Developers often encounter these common challenges when attempting to resolve performance problems. Furthermore, developers may also face a lack of expertise in utilizing Application Performance Monitoring (APM) tools, which, in any case, may not offer code-level insights and actionable information.

For developers who want to address code-specific inquiries when troubleshooting performance issues, such as:

Determining how many times a particular line of code is executed

Whether a specific line of code is reached

The execution time of a method

The execution time of a code block

Pinpointing the exact lines of code responsible for downtime or performance problems can be a challenging and intimidating task. Moreover, identifying performance anomalies within a specific area of the product becomes exceedingly difficult without access to comprehensive insights and meaningful correlations between metrics obtained from various sources, such as the database, CPU usage, network latency, and so on.

Marketplace Gap

While the APM and profiling marketplace today is extremely advanced and offers a range of tools and solutions for monitoring and alerting the developers and Ops when a degraded service occurs, these tools are not operating within the source code or the IDE itself, and requires the developers to be very well versed in these tools and domain instead of him focusing on the code areas that are causing these issues. Having the ability to combine APM tools and dashboards with a developer native observability solution is a perfect bridge to the existing gap and the above-mentioned challenges. That was the rationale behind launching the advanced Lightrun Metrics for performance observability and debugging.

Solution Overview

When trying to address the above mentioned gaps as well as challenges, Lightrun Metrics comes to focus on the following 4 use cases and shift left performance observability:

It does so by providing developers with 4 key metrics that are being collected and consumed within the Java IntelliJ IDE as well as piped to leading APM tools dashboards.

4-type Metrics Collection

The Lightrun Metrics dedicated tab within the IDE plugin consists of the following:

Counters* (coming Soon), TicToc, Method Duration, and Custom Metrics. Below are some details on what each of these provide.

Counters

A Lightrun Counter is added to a single line of code. It counts the number of times the code line is reached. You can add a counter to any and as many lines of code you need. From the Lightrun IDE plugin, you can specify the conditions (as Boolean expressions) when to record the line execution

TicToc (Block Duration)

The Lightrun Tic & Toc metric measures the elapsed time for executing a specified block of code, within the same function

Method Duration

The Method Duration metric measures the elapsed time for executing a given method

Custom Metric

Lightrun metrics enables developers to design their own custom metric, using conditional expressions that evaluate to a numeric result of type long. Custom metrics are all about value distribution and correctness within a given Java application, and it allows developers to transform a specific variable value into statistics and detect anomalies or other distribution trends. Custom metrics can be created using the configuration form in the Lightrun IDE plugin.

Example/How To Use

To get started with Lightrun Metrics, developers would need to create a new Lightrun account and install the IntelliJ dedicated plugin. Once the setup is complete developers can start creating metrics across the 4 types highlighted above. Here are examples of the creation and understanding of them.

Note that you can use the Lightrun actions against a single application instance or a few based on the amount of agents attached to these instances and collect metrics and averages.

TicToc

To add a TicToc and measure a block of code execution duration, go to the Lightrun plugin within the IntelliJ IDE and right click the line of code to add the specific action.

As you can see in the above screenshot, developers can gain runtime code-level execution performance within the TicToc graph and examine averages across deployments and over time.

Custom Metric

To build a custom expression around the code under investigation, go to the Lightrun plugin within the IntelliJ IDE and right click the line of code to add this specific action.

Visualizing the action output within 3rd party tools can be done through our various integrations and by setting the target output within the IDE plugin user interface.

Below is a complete demo video that shows the entire solution and the different metric options used in a Java application that’s running with 20 agents attached.

Bottom Line

Shifting left developer observability and employing the observability driven development workflow can be much easier through such tools and capabilities. When developers are better equipped with tools that fit their skillset and that operate from within their native environments, they are much more productive, hence, can resolve the issues at hand much faster. They can specifically analyze performance issues from the code level and reduce the overall MTTR for such issues, and they can do so by not having to change the application state since these actions are all added and consumed in runtime.

Get Started with Lightrun Metrics!

The post Lightrun Empowers Developers with Next Generation Metric Tools for Java Performance Troubleshooting appeared first on Lightrun.

Maximizing CI/CD Pipeline Efficiency: How to Optimize your Production Pipeline Debugging?

Eran Kinsbruner — Mon, 15 May 2023 11:43:07 +0000

Introduction

At one particular time, a developer would spend a few months building a new feature. Then they’d go through the tedious soul-crushing effort of “integration.” That is, merging their changes into an upstream code repository, which had inevitably changed since they started their work. This task of Integration would often introduce bugs and, in some cases, might even be impossible or irrelevant, leading to months of lost work.

Hence Continuous Integration and continuous deployment (CI/CD) was born, enabling teams to build and deploy software at a much faster pace since the inception of the paradigm shift allowing for extremely rapid release cycles.

Continuous Integration aims to keep team members in sync through automated testing, validation, and immediate feedback. Done correctly, it will instill confidence in knowing that the code shipped adheres to the standards required to be production ready.

However, although many positive factors are derived from CI/CD pipeline, this has since evolved into a complex puzzle of moving parts and steps for organizations where problems occur frequently.

Usually, errors that occur in the pipeline happen after the fact. You have N number of pieces to the puzzle that could fail even if you can resolve some of these issues by piping your logs to a centralized logging service and tracing from there. You are not able to replay the issue.

You may argue for the case of static debugging instances. In this process, one usually traces an error via a stack trace exception or error that occurs. Then you make calculated guesses about where the issue may have happened.

This is then usually followed by some code changes and local testing to simulate the issue and then followed by Deploying the code and going through a vicious cycle of cat and mouse to identify issues.

Issues with CI/CD Pipelines and Debugging

Let’s break down some fundamental issues plaguing most CI/CD pipelines. CI/CD builds, and production deployments rely on testing and performance criteria. Functional testing and validation testing could be automated but challenging due to the scope of different scenarios in place.

Identifying the root cause of the issue

It can be challenging to determine the exact cause of a failure within a CI/CD pipeline while debugging complex pipelines consisting of many stages and interdependent processes can be difficult to understand and comprehend what went wrong and how to fix it.

At its core, a lack of observability and limited access to logs or lack of relevant information can make it challenging to diagnose issues, and at times, the inverse excessive logging and saturation cause tunnel vision.

Another contributing factor If code coverage is low as well edge case scenarios that could potentially be breaking your pipeline will be hard to discover for those that work in a Monorepo environment, issues are exacerbated where shared dependencies and configurations originate from multiple teams or developers that push code without verification cause a dependence elsewhere to break the build deploy pipeline.

How to Optimize your CI/CD Pipeline?

There will be times when you believe you’ve done everything correctly, but something still needs to be fixed.

Your pipeline should have a structured review process.
You need to ensure the pipeline supports automated tests.
Parallelization should be part of your design, with caching of artifacts where applicable.
The pipeline should be built, so it Fails fast with — a feedback loop.
Monitoring should be by design.
Keep builds and tests lean.

All these tips won’t help much if you don’t have a way to observe your pipeline.

Why Should Your CI/CD Pipeline be Observable?

A consistent, reliable pipeline helps teams to integrate and deploy changes more frequently. If any part of this pipeline fails, the ability to release new features and bug fixes grinds to a halt.

An observed pipeline helps you stay on top of any problems or risks to your CI/CD pipeline.

An observed pipeline provides developers with the visibility of their tests, and will they will finally know whether the build process they triggered was successful or not. If it fails, the “Where failed” question is answered immediately.

Not knowing what’s going on in the overall CI/CD process and need to know overall visibility to see how it’s going and overall performance is no longer a topic of discussion.

Tracing issues via different interconnected services and understanding the processing they undergo end to end can be difficult, especially when reproducing the same problem in the same environment is complex or restrictive.

Usually, DevOps and Developers generally try to reproduce issues via their Local environment to understand the root cause, which brings additional complexities of local replication.

Architecture CI/CD pipeline

To put things into context — let’s work through an example of a typical production CI/CD pipeline.

CI/CD pipeline with GitActions and AWS CDK AWS Beanstalk

The CODE

The CI/CD pipeline starts with the source code via Github with git actions to trigger the pipeline. GitHub provides ways to version code but does not track the impact of changes of commits developers make into the repository. For example,

What if a certain change introduces a bug?
What if a branch merge resulted in a successful build but failed deployment?
What if the deployment was successful, then a user received an exception, and it’s already live in production?

BUILD Process

The build process with test cases for code coverage is a critical point of failure for most deployments. If a build fails, the team needs to be notified immediately in order to identify and resolve the problem quickly. You may say they are options like setting alerts to your slack channel or email notifications that can be configured.

Those additional triggers can alert you though they need to provide the ability to trace and debug the issues in a timely manner as one still needs to dig into the code. Failure may be due to some more elusive problems such as missing dependencies.

Unit & Integration TESTS

It’s not enough to know that your build was successful. It also has to pass tests to ensure changes don’t introduce bugs or regressions. Frameworks such as JUnit, NUnit, and pytest generate test result reports though these reports output failed cases but not the how part.

Deploy Application

Most pipelines have pivoted to infrastructure as code where code dictates how Infrastructure provisioning is done. In our example AWS CDK lets you manage the infrastructure as code using Python. While empowering Developers we have the additional complexity of code added which becomes hard to debug.

Post-Deploy Health Checks

Most deployments have an extra step to verify the health such as illustrated in our pipeline. Such checks may include Redis health, and Database health. Since these checks are driven by code we yet have another opportunity for failure that may hinder our success metric.

Visually Illustrating Points of Failure in the CI/CD Pipeline

Below illustrates places that can potentially go wrong i.e Points of failure. This exponentially gets more complex depending on how your CI/CD pipeline has been developed.

Points of failure in our example CI/CD pipeline

Dynamic Debugging and Logging to Remedy your Pipeline Ailments

Let’s take a look at how we can quickly figure out what is going on in our complex pipeline. A new approach of shifting left observability which is the practice of incorporating observability into the early stages of the software development lifecycle via applying a Lightrun CI/CD pipeline Observable Pattern.

Lightrun takes a developer-native observability first approach with the platform; we can begin the process of strategically adding in agent libraries in each component in our CI/CD pipeline, as illustrated below.

Lightrun CI/CD pipeline pattern

Each agent will be able to observe and introspect your code as part of the runtime, allowing you to directly hook into your pipeline line directly from your IDE via a Lightrun plugin or CLI.

This will allow you then to add virtual breakpoints with logging expressions of your choosing to your code in real-time directly from your IDE, i.e., in essence, remote debugging and remote logging such as you would do on your local environment by directly linking into production.

Since virtual breakpoints are non-intrusive and capture the application’s context, such as variables, stack trace, etc., when they’re hit, This means no interruptions to execute code in the pipeline and no further redeployments would be required to optimize your pipeline.

Lightrun Agents can be baked into Docker images as part of the build cycles. This pattern can be further extended by making a base Docker image that has your Lightrun unified configurations inherited by all microservices as part of the build, forming a chain of agents for tracking.

Log placement in parts of the test and deploy build pipeline paired with real-time alerting when log points are reached can minimize challenges in troubleshooting without redeployments.

For parts of code that do not have enough code coverage — all we will need to do is add Lightrun counter metric to bridge the gap to form a coverage tree of dependencies to assist in tracing and scoping what’s been executed and its frequency.

Additional Metrics via the Tic & Toc metric that measures the elapsed time between two selected lines of code within a function or method for measuring performance.

Customized metrics can further be added using custom parameters with simple or complex expressions that return a long int results

Log output will immediately be available for analysis via either your IDE or Lightrun Management Portal. By eliminating arduous and time-consuming CI/CD cycles, developers can quickly drill down into their application’s state anywhere in the code to determine the root cause of errors.

How to Inject Agents into your CI/CD pipeline?

Below we will illustrate using python. You’re free to replicate the same with other supported languages.

Install the Lightrun plugin.
Authenticated IDE pycharm with your Lightrun account
Install the python agent by running python -m pip install lightrun.

pip install lightrun

Add the following code to the beginning of your entrypoint function

import os

LIGHTRUN_KEY = os.environ.get('YOUR_LIGHTRUN_KEY')

LIGHTRUN_SERVER = os.environ.get('YOUR_LIGHTRUN_SERVER_URL')

def import_lightrun():

   try:

       import lightrun

       lightrun.enable(com_lightrun_server=LIGHTRUN_SERVER, company_key=LIGHTRUN_KEY, lightrun_wait_for_init=True, lightrun_init_wait_time_ms=10000,  metadata_registration_tags='[{"name": ""}]')

   except ImportError as e:

       print("Error importing Lightrun: ", e)

as part of the enable function call, you can specify lightrun_wait_for_init=True and lightrun_init_wait_time_ms=10000 as part of the Python agent configuration.

These two configuration parameters will ensure that the Lightrun agent starts up fast enough to work within short-running service functions and apply a wait time of about 10000 milliseconds before fetching Lightrun actions from the management portal. take note these are optional parameters that can be ignored if it doesn’t make sense to apply them for long-lived code execution cycles e.g running a Django project or fast API microservice applications if your using another language like java the same principles apply.

Once your agent is configured, you can call import_lightrun() function in __init__.py part of your pipeline code can be made to ensure agents are invoked when the pipeline starts.

Deploy your code, and open your IDE with access to all your code, including your deployment code.

Select the lines of code you wish to trace and open up the Lightrun terminal and console output window shipped with the agent plugin.

Adding logging to live executing code with Lightrun directly from your IDE

Achieving Unified output to your favorite centralized logging service

If we wish to pipe out logs instead of using the IDE, you can tap into third-party integrations to consolidate the CI/CD pipeline, as illustrated below.

If you notice an unusual event, you can drill down to the relevant log messages to determine the root cause of the problem and begin planning for a permanent fix in the next triggered deploy cycle.

Validation of CI/CD Pipeline Code State

One of the benefits of an observed pipeline is that we can fix the pipeline versioning issues. Without correct tagging, how do you know your builds have the expected commits it gets hard to tell the difference without QA effort.

By adding dynamic log entries at strategic points in the code, we can validate new features and committed code in the pipeline that was introduced into the platform by examining dynamic log output before it reaches production.

This becomes very practical if you work in an environment with a lot of guard rails and security lockdowns on production servers. You don’t worry about contending with incomplete local replications.

Final thoughts

A shift left approach observability in CI/CD pipeline optimization approach can increase your MTTR average time it takes to recover from production pipeline failures which can have a high impact on deploying critical bugs to production.

You can start using Lightrun today, or request a demo to learn more.

The post Maximizing CI/CD Pipeline Efficiency: How to Optimize your Production Pipeline Debugging? appeared first on Lightrun.

A Comprehensive Guide to Troubleshooting Celery Tasks with Lightrun

Leonid Blouvstein — Tue, 09 May 2023 13:42:41 +0000

This article explores the challenges associated with debugging Celery applications and demonstrates how Lightrun’s non-breaking debugging mechanisms simplify the process by enabling real-time debugging in production without changing a single line of code.

Celery: Powerful but Challenging

Celery is not only a powerful, but also a widely adopted distributed task queue that allows developers to effectively manage and schedule tasks asynchronously. As evident from its GitHub repository, which shows 96k open-source projects utilizing it, Celery has become the go-to tool for Python developers, including those who work with popular frameworks such as Django, FastAPI, and Flask.

Moreover, while Celery’s popularity is a testament to its usefulness, it also means that developers need to be prepared to deal with a high volume of issues. As of now, there are over 7.5k issues on Celery’s Github repository (both open and closed). While a larger number of issues can be indicative of a tool’s popularity, the complexity of Celery’s functionality also plays a role.

That being said, the complexity of Celery’s functionality means that debugging can still be a challenge, even for experienced developers. Without the right debugging tools or approach, identifying the source of an issue can be a time-consuming and frustrating process. It’s not uncommon for developers to spend hours reading through the documentationor manually debugging their code.

The complexity can be particularly daunting when developers need to make changes to their code, deploy it to multiple environments, test it thoroughly, and then push it to production.

Fortunately, Lightrun, the cloud-native observability platform, makes debugging Celery applications more accessible and efficient. Its non-breaking debugging mechanisms allow developers to debug Celery applications in real-time, even in production, without the need to modify the codebase.

This is what we are going to examine in this post. Read on to discover how.

The Code Used in this Example

We are going to start with an application that allows users to book online products and services based on their availability. When handling high-traffic web applications that involve booking services or products, Celery can play a crucial role in ensuring that the application is scalable and efficient. This is why we are going to use Celery mainly for the transactional part.

You can find the source here (celery branch).

These are the database schemas we will be using: a product table, a table for transactions, and a table for orders.


class Product(models.Model):
    product_id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    name = models.CharField(max_length=100)
    description = models.TextField(default="Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.")
    price = models.DecimalField(max_digits=10, decimal_places=2, default=10.00)
    stock_quantity = models.PositiveIntegerField(default=10)

    def get_absolute_url(self):
        return reverse("shop:product_detail", kwargs={"product_id": self.product_id})

class Transaction(models.Model): 
    transaction_id = models.UUIDField(primary_key=True, editable=False, blank=False, default=uuid.uuid4)
    product = models.ForeignKey(Product, on_delete=models.CASCADE)    
    user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE, blank=True, null=True)

class Order(models.Model):
    order_id = models.UUIDField(primary_key=True, editable=False, blank=False, default=uuid.uuid4)
    user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE, blank=True, null=True)
    transaction = models.OneToOneField(Transaction, on_delete=models.CASCADE)
    product = models.ForeignKey(Product, on_delete=models.CASCADE)

    def get_absolute_url(self):
        return reverse("shop:order_detail", kwargs={"transaction_id": self.transaction.transaction_id})
    
    def get_full_url(self):
        return "{}{}".format(settings.FULL_SITE_URL, self.get_absolute_url())

@receiver(post_save, sender=Transaction)
def create_order(sender, instance, created, **kwargs):
    if created:
        order = Order.objects.create(transaction=instance, product=instance.product, user=instance.user)
        sender = "admin@example.com"
        receiver = instance.user.email
        subject = "Order confirmation"
        message = f"Order for product {instance.product.name} has been confirmed. Your can view your order at {order.get_full_url()}"
        send_mail(subject, message, sender, [receiver], fail_silently=False)

We also have 4 views:

def product_detail(request, product_id):
    product = Product.objects.get(product_id=product_id)    
    return render(request, "shop/product_detail.html", {"product": product,})

def process_transaction(request, product_id):
    transaction_id = process_transaction_task.delay(product_id, request.user.id)
    return HttpResponseRedirect(reverse("shop:transaction_detail", kwargs={"transaction_id": transaction_id, "product_id": product_id}))

def transaction_detail(request, transaction_id, product_id):              
    message = """
    Order successful.

    You will receive an email with your order details.

    Transaction ID: {}

    Product ID: {}

    """.format(transaction_id, product_id)            
    return render(request, "shop/transaction_detail.html", {"message": message})
        
def order_detail(request, transaction_id):
    order = Order.objects.get(transaction__transaction_id=transaction_id)
    return render(request, "shop/order_detail.html", {"order": order,})

The process_transaction view should handle the payment using a Celery task process_transaction_task.

This is the task:

@task(name="myapp.tasks.process_transaction_task")
def process_transaction_task(product_id, user_id):
		# payment processing simulation
    time.sleep(2)
    product = Product.objects.get(product_id=product_id)
    user = User.objects.get(id=user_id)
    transaction = Transaction.objects.create(product=product, user=user)
    transaction_id = transaction.transaction_id
    product.stock_quantity -= 1
    product.save()    
    return transaction_id

Bug Hunting, the Tedious Way

After dockerizing, building, and deploying the application to a Kubernetes cluster, everything appears to be running smoothly until a customer reports an issue with their transaction details. Specifically, the customer claims that the transaction ID displayed on the post-checkout page is different from the one received in their email.

This is the id that shows on the post-checkout page (4cb73430-17e5-40fc-a153-d0c820230115)

While the one that is sent to the user by email is b9454d45-6b7c-462a-8aba-14829927eef4

Such problems are deadly for the coherence and integrity of data in this use case. So how come the transaction ID discrepancy occurred in the first place? One possibility is that there was a bug in the code that caused the transaction ID to be generated incorrectly. Alternatively, there could have been an issue with the email-sending process that caused the incorrect ID to be included in the message.

In a standard development process, developers initiate by examining the code locally, identifying and resolving any existing issues. After rectifying these concerns, they redeploy the code to a testing environment. An automated test or a CI/CD pipeline then assesses the code to confirm that the implemented fixes do not negatively impact other functionalities. Once the code undergoes comprehensive testing and evaluation, it is deployed to production. This procedure might be iterative if the initial attempt at fixing the code is unsuccessful, or when developers require additional logs and traces to better comprehend issues occurring in the production environment.

Bug Hunting, The Lightrun Way

Using Lightrun, the debugging phase will only take a few seconds.

Adding the following code to your celery.py file (where Celery is initiated) is the first step required before starting. Also, you need to create a Lightrun account to get your key.

from celery.signals import task_prerun
import os

@task_prerun.connect()
def task_prerun(**kwargs):
    """
    This function is called before each task is executed. It enables Lightrun to track the task.
    """
    try:
        import lightrun
        lightrun.enable(
            company_key=os.environ.get('LIGHTRUN_COMPANY_KEY'),
            metadata_registration_tags='[{"name": "dev"}]'
        )
    except ImportError as e:
        print("Error importing Lightrun: ", e)

If you are using VSCode, start by installing the Lightrun extension. Lightrun currently supports IntelliJ IDEA, PyCharm, WebStorm, Visual Studio Code (VSCode), VSCode for the web (vscode.dev), and code-server.

Now, let’s go back to the task code:

@task(name="myapp.tasks.process_transaction_task") def process_transaction_task(product_id, user_id): # payment processing simulation time.sleep(2) product = Product.objects.get(product_id=product_id) user = User.objects.get(id=user_id) transaction = Transaction.objects.create(product=product, user=user) transaction_id = transaction.transaction_id product.stock_quantity -= 1 product.save() return transaction_id

Right-click on the last line, click on “Lightrun”, then choose “Insert a Snapshot” from VSCode menu.

A snapshot is a one-time “breakpoint” that doesn’t block the celery task from running; as opposed to a traditional breakpoint, snapshots collect the stack trace and variables without interrupting the task at all. By replicating the steps taken by your users (in this case, a straightforward checkout), the Lightrun VSCode extension begins capturing the task’s stack trace from the environment configured earlier through lightrun.enable.

For example, in our development environment, we are using:

lightrun.enable( company_key=os.environ.get('LIGHTRUN_COMPANY_KEY'), metadata_registration_tags='[{"name": "dev"}]' )

In our production environment, we can use:

lightrun.enable( company_key=os.environ.get('LIGHTRUN_COMPANY_KEY'), metadata_registration_tags='[{"name": "prod"}]'

You will find the different registration tags you manage on the same Lightrun panel. This is how it shows on VSCode:

Back to the snapshot we captured which is available on the “Snapshot” tab in the same panel:

By clicking on the snapshot, you will be able to access the stack trace of the Celery job, including the function itself, the Celery call, the worker, and so on.

You can start by inspecting the first call, if it is not helpful, move to the second, and so on. For example, after accessing the trace_task which is the second trace, we can see the request dictionary processed by the task.

Surprisingly the correlation_id has the same value as the transaction id.

By definition, in the Celery protocol, correlation_id is the task UUID. This means that transaction_id is capturing the value of the Celery task id instead of the real transaction id.

def process_transaction(request, product_id):
>>  transaction_id = process_transaction_task.delay(product_id, request.user.id)
    return HttpResponseRedirect(reverse("shop:transaction_detail", kwargs={"transaction_id": transaction_id, "product_id": product_id}))

This is how by navigating through the stack trace, we were able to understand that there was a bug. In other words, the transaction id is getting a wrong value.

A fix here should be similar to the following:

def process_transaction(request, product_id):
    transaction_id = process_transaction_task.delay(product_id, request.user.id)
    transaction_id.wait()
    transaction_id = transaction_id.result
    return HttpResponseRedirect(reverse("shop:transaction_detail", kwargs={"transaction_id": transaction_id, "product_id": product_id}))

Conditional Filtering – More Accurate Debugging

In specific scenarios, it becomes necessary to filter snapshots based on criteria such as user, product, or other objects or values. This can be achieved by incorporating a conditional statement while applying the same logic:

In the given example above, we capture traces exclusively for the user with an ID of 1. However, the condition could be different, such as the product ID. The range of applicable conditions depends on your specific use case. Ultimately, this feature enables more precise debugging experiences.

Instant Access to Observability

What sets Lightrun apart is its ability to allow developers to debug their code in production, without the need to add a single line of code to their codebase. This means that developers can diagnose and fix issues in real-time, as they arise, without having to go through the traditional debugging process of deploying, testing, and re-deploying their code.

Lightrun achieves this by using “non-breaking breakpoints” that allow developers to inspect and modify the state of their running application, without interrupting its execution. This means that developers can gain full visibility into the execution of their Celery tasks, as well as the values of variables and functions, in real-time.

Using Lightrun to debug Celery applications is a game-changer for software engineering teams as it saves time, effort, and resources. As a result, software teams, including developers, operation, and observability teams, can achieve a quicker time-to-market, enhanced user experiences, and increased overall productivity

Using Lightrun with Celery is straightforward, and it can be integrated seamlessly into any Celery-based application. With just a few clicks, we have got immediate feedback on an ambiguous issue that arose in production without the need to redeploy a new code.

What’s next?

You can start using Lightrun today or request a demo to learn more. Alternatively, take a look at our Playground where you can play around with Lightrun in a real, live app without any configuration required.

The post A Comprehensive Guide to Troubleshooting Celery Tasks with Lightrun appeared first on Lightrun.

7 Things You Need to Know About Github’s Sponsors-Only Repositories

Lightrun Team — Tue, 14 Feb 2023 11:38:11 +0000

Open-source software is driving some of the most exciting innovations today. According to The Linux Foundation, open-source constitutes about 70 to 90% of all modern software solutions. But it isn’t all fun and games: open-source software is free, which brings about operational inefficiencies due to a lack of financial support for their developers.

Platforms like Switch and SubStack started incentivizing paid subscription models to solve this problem. GitHub followed suit with the launch of sponsors-only repositories to support its developers and incentivize open-source investment. So what are these sponsor-only repositories all about?

This article discusses how sponsorship works in GitHub, its latest updates, and everything you need to know about GitHub’s sponsors-only repositories.

What are sponsor-only repositories?

Launched in February 2022, the GitHub sponsor-only repositories are private repositories that only sponsors of the open-source project can access. The idea of the sponsor-only repositories augments the already functioning GitHub sponsorship feature that allows anybody to contribute financially to open-source developers and projects.

How sponsorships work in GitHub

GitHub launched GitHub sponsors in 2019, enabling developers and organizations to receive financial compensation for their open-source contributions.

With sponsorships, anyone (individuals and organizations) can fund open-source projects to help improve the software’s performance and reliability. In addition to supporting the projects, sponsorships incentivize more people to contribute to open source. More people can afford to contribute and make a lucrative career from open source through this funding model.

Sponsorship payments can be linked to subscriptions as they are divided into multiple tiers, which grant sponsors different access and benefits. These tiers can be either a one-time or monthly recurring payment. Below are some examples of projects that have installed this sponsorship model:

The curl project has about 10 billion installations, 800 community contributors, and eight maintainers.
The OpenSSL project enables over 1.3 billion websites worldwide, with 600 community contributors and 18 maintainers.

To learn more about GitHub sponsorships, check out the GitHub documentation.

GitHub’s latest sponsorship updates

In the culture of improving existing technologies or tools, GitHub has released updates to its sponsorships feature. Below are some of the new updates and the problems they solve.

Setting minimum custom sponsorship amounts.

GitHub added support for custom sponsorship amounts. Now you can add a minimum amount and recommend an amount sponsors can contribute to aid the development of an open-source project.

Enabling developers to write custom welcome messages for every sponsorship tier.

This new update allows you to create custom messages for sponsors after subscribing to a sponsorship tier. To write these messages, go to the “Sponsors tiers” page and edit your chosen one.

New call-to-action to sponsor-enabled repositories to give more visibility to the program.

GitHub has added a new call-to-action to inform people that you have enabled GitHub sponsors.

Transaction metadata

You can now add metadata to the URLs of your sponsor pages so you can observe what attracts new sponsors to you. To understand more about the transaction metadata, check out its documentation.

7 things you need to know about GitHub’s sponsors-only repositories

Sponsor-only repositories offer a great deal of customizability for both developers and sponsors. Let’s walk through some of the most critical features of this new model.

1. Linking to a sponsorship tier

GitHub sponsors allow developers or organizations to add up to 10 one-time and 10 monthly tiers. Sponsors can then choose the tier they prefer. Developers can add perks and rewards to different levels once a sponsor subscribes. According to GitHub, examples of bonuses you can add are:

Early access to new versions
Logo or name in README
Weekly newsletter updates

The maximum amount a developer or an organization can charge for a tier is US$12,000 per month. Check out the GitHub documentation to learn more about sponsorship tiers and how to add them.

Adding repositories to a sponsorship tier

Regarding perks and rewards, you can add private repositories to different sponsorship tiers. Adding repositories to sponsorship tiers restricts access to sponsors that have subscribed to these tiers. It also rescinds access once a sponsor cancels their subscription.

2. Granting different levels of access

Alongside adding different repositories to different sponsorship tiers, with GitHub sponsors, you can add different access levels depending on the sponsorship tier a sponsor has subscribed to. For example, you might offer a basic level that includes access to your project’s documentation and a higher level that provides access to a private support channel or a monthly video call with you.

3. Managing invites

Once you add a private repository and its access to a tier, GitHub automatically sends repository invitations to new sponsors. However, organizations cannot be invited to private repositories associated with a sponsorship tier; only personal accounts can.

4. Providing exclusive access to a private repository

A sponsors-only repository is a feature that enables developers to share their work with their sponsors only. These repositories are private by default, and a person can request access to these repositories by becoming a sponsor. This exclusivity is the biggest perk of sponsors-only repositories.

5. Engaging in private discussions

GitHub sponsors allow developers to host private discussions regarding their open-source projects with its sponsors. Through this platform, sponsors can easily stay in touch with the developers they support and get instant feedback on their projects and future updates.

6. Offering early access

With GitHub sponsors-only gifting sponsors access to repositories, as developers build projects, sponsors can access early features and be actively involved in the development process.

7. Transferring repositories added to sponsorship tiers

After adding a private repository to a sponsorship tier, you can transfer it to a different tier or account/organization. However, there are a few aspects to consider. Here are the various situations that may arise during and after the repository transfer:

If you transfer the repository from an organization to a different account or organization, the current sponsors will also move, so you don’t have to worry about losing access. However, new sponsors won’t be added. The new owner of the repository has the power to remove existing sponsors if they wish to do so.
If you transfer the repository from a personal account to an organization, the personal account will continue to have admin access. Current sponsors will continue to have access too, and GitHub will add new sponsors to the repository automatically.
GitHub removes all current sponsors if you transfer the repository to a personal account. New sponsors won’t be automatically added.

Improving open-source software in production

GitHub sponsors-only repository is a great feature to support open-source developers in maintaining and improving their projects. However, companies still face considerable security risks when using sponsored projects.

To better manage these risks and improve visibility over your code, you can use continuous debugging and observability platforms like Lightrun. Lightrun lets developers easily add logs, performance metrics, and snapshots and directly resolve bugs in real-time in their production environments. Ready for the smoothest debugging ride? Start a free trial today!

The post 7 Things You Need to Know About Github’s Sponsors-Only Repositories appeared first on Lightrun.

Top 10 Logging Frameworks Across Various Programming Platforms

Lightrun Team — Wed, 21 Dec 2022 11:41:01 +0000

A logging framework is a software tool that helps developers output diagnostic information during the execution of a program. This information is used to debug the program or monitor its performance. There are many different logging frameworks available, starting with simple logging libraries to full-fledged logging and observability platforms.

Logging is an essential part of coding guidelines for any software development project. From humble beginnings as console print messages and file logging libraries for basic debuggability, logging frameworks have now evolved into full-scale platforms offering a variety of advanced features. These include the ability to log to multiple destinations, custom log output formats, log analytics, and streaming.

Choosing the right logging framework for your application can be a difficult decision. On one hand, you would anyways need a logging library. However, depending on the complexity and scale of your application, you may need to invest in some advanced frameworks and platforms. This decision is further narrowed down by the choice of programming language or runtime environment.

In this post, we have analyzed the ten logging frameworks, across the major programming languages. The choice of frameworks is based on their logging capabilities, advanced features, and popularity across various developer community metrics.

Criteria For Analyzing Logging Frameworks

We have analyzed each of the logging frameworks based on the following parameters:

First Release: Month and year of the first release of the framework
Supported Programming Language/ Runtime: Programming languages or runtime environments supported by the framework.
Pricing Model: Pricing information, including free, freemium, or paid options.
Type of Framework (Library / Component / Platform): Classification of the framework into
- Library: Just a server-side / client-side logging library
- Component: Library along with additional tools and external integrations.
- Platform: Cloud hosted platform for log management and advanced observability tools.
Advanced Features: Unique and advanced features supported by the framework beyond the essential logging mechanism.
Popularity: Popularity of the framework based on the GitHub, Stack Overflow, and Twitter metrics.

So, let’s dig in and help you identify the best logging framework for your development needs.

The Contenders for Popular Logging Frameworks

1. Log4j 2

Among the different logging frameworks in Java, Log4j2 stands out as one of the best. It was released as a successor to the popular Log4j utility. Log4j2 is an Apache project and is widely used within the Java community.

First Release	July 2014
Supported Programming Language/ Runtime	Java, and JSR 223 scripting languages
Pricing Model	Free (Open Source).
Type of Framework (Library / Component / Platform)	Library
Advanced Features	It addresses the vulnerabilities in Log4J with a separate set of APIs, and also supports custom log levels, and lazy logging.
Popularity	It is hugely popular with over 2.8k GitHub stars and an active stream of questions on StackOverflow.

2. JSNLog

JSNLog is a JavaScript logging utility for logging client-side events on the server-side. As a result, JSNLog makes it easy to debug JavaScript events, correlating with server events. JSNLog is open source and available on GitHub. It can run on browsers as well as the standalone mode of operation, with .NET and Node.js support on the server-side.

First Release	December 2013
Supported Programming Language/Runtime	.NET Core, Node.js
Pricing Model	Free (Open Source)
Type of Framework (Library / Component / Platform)	Component
Advanced Features	It supports logging of JavaScript client-side events, runtime errors and AJAX timeouts at the server, with batch logging and logging of objects.
Popularity	It is one of the earliest JavaScript logging libraries, but it is declining in popularity with not too many active threads of discussions on GitHub and StackOverflow

3. Serilog

Serilog is one of the best .NET logging frameworks for structured and flexible logging. It is easy to set up with any .NET application and supports a lot of sinks for forwarding log output. It also integrates easily with a variety of different logging frameworks.

First Release	July 2016
Supported Programming Language / Runtime	.NET
Pricing Model	Free (Open Source)
Type of Framework (Library / Component / Platform)	Library
Advanced Features	It has support for its own internal debug logging and has a huge set of pre-built integrations with external databases, logging frameworks, and other collaboration tools.
Popularity	It is very popular with over 5k GitHub stars and an active stream of question threads on Stack Overflow.

4. Monolog

Monolog is a logging library for PHP. It provides a simple and flexible way to log your PHP code to various destinations, such as log files, sockets, inboxes, databases, and web services. Monolog is available for both PHP 5 and 7. Monolog is packed with logging handlers for popular collaboration tools such as Slack, and databases like MongoDB. It is compatible with popular frameworks like Laravel, and Symfony.

First Release	July 2013
Supported Programming Language / Runtime	PHP
Pricing Model	Free (Open Source)
Type of Framework (Library / Component / Platform)	Component
Advanced Features	It supports PSR-3 logging standards, and integrations for popular software, and web services.
Popularity	Hugely popular with over 19k stars on GitHub.

5. Loggly

Loggly is a cloud-based log management service that provides real-time visibility of a system’s activity. It supports the ability to search, filter, and analyze log data to identify trends and troubleshoot issues. Loggly also offers alerting, dashboards, and integration with popular monitoring tools.

First Release	September 2012
Supported Programming Language / Runtime	Supports almost all the mainstream languages such as PHP, Java, C++, Python, and more.
Pricing Model	Freemium with $79/month as starting price
Type of Framework (Library / Component / Platform)	Platform
Advanced Features	It supports DevOps integration for infrastructure monitoring, and an extensive set of tools for log analysis
Popularity	Quite a popular platform with an active stream of discussions on Stack Overflow and has over 7k followers on Twitter.

6. Winston

Winston is a widely used Node.js logging framework. It provides support for multiple transport mechanisms, including the console, files, HTTP, and various third-party services. Winston has been designed to be a simple and universal logging library with standardized logging levels, and support for custom levels. Winston is open source software released under the MIT license.

First Release	June, 2020
Supported Programming Language / Runtime	Node.js
Pricing Model	Free (Open Source)
Type of Framework (Library / Component / Platform)	Library
Advanced Features	It supports multiple transports for streaming log messages to external applications, and log querying.
Popularity	It is a new library but has amassed huge popularity with over 18k stars on GitHub

7. Pino

Pino is yet another logging library for Node.js applications. Pino supports integration with popular Node.js web frameworks, such as Express.js, Fastify, Koa, and more. Pino has low overhead and claims to be 5x faster than other alternatives.

First Release	March, 2016
Supported Programming Language / Runtime	Node.js
Pricing Model	Free (Open Source)
Type of Framework (Library / Component / Platform)	Component
Advanced Features	It has direct integrations with popular Node.js web frameworks and supports pretty-printing and asynchronous logging.
Popularity	It is a popular choice for Node.js logging with over 9k stars on GitHub.

8. Zap

Zap is a powerful logging tool for Golang. It is one of the most performant logging options for Golang that can track and monitor your applications. It is easy to set up and is extendable to support external log sinks such as Apache Kafka.

First Release	February, 2017
Supported Programming Language / Runtime	Golang
Pricing Model	Free (Open Source)
Type of Framework (Library / Component / Platform)	Library
Advanced Features	It supports both structured logging and the normal leveled logging, with extension library to support other log sinks.
Popularity	Very popular Golang logging library with nearly 16k stars on GitHub

Real-time, Dynamic Logging

If you are looking for a non-intrusive way to add logs to your application, Lightrun is your choice. It is an observability platform that can inject logs, traces and metrics in a running application, in real-time, without altering a single line of source code.

Lightrun is designed to be a developer-native platform from the ground up. Be it a production, staging or a development environment, developers can leverage the Lightrun IDE plugins to gain valuable insights about the application performance right within their IDE workflow. Lightrun currently supports Java, Python and Node.js. You can start using Lightrun today, or request a demo to learn more.

The post Top 10 Logging Frameworks Across Various Programming Platforms appeared first on Lightrun.