Browse Source

Fixed file structure

master
Martins Eglitis 5 months ago
parent
commit
60981c208b
5 changed files with 29 additions and 586 deletions
  1. +29
    -27
      Martins_Eglitis_Master_Thesis_Proposal.lyx
  2. BIN
      Martins_Eglitis_Master_Thesis_Proposal.pdf
  3. +0
    -0
      chalmers-tracing-in-distributed-systems.bib
  4. +0
    -559
      docs/Martins_Eglitis_Master_Thesis_Proposal.lyx
  5. BIN
      docs/Martins_Eglitis_Master_Thesis_Proposal.pdf

+ 29
- 27
Martins_Eglitis_Master_Thesis_Proposal.lyx View File

@@ -282,8 +282,8 @@ literal "false"
\end_inset

.
Besides the user-defined goal, other tangential goals are serviced, like
collecting statistics, checking grammar, or showing tailored ads.
Besides the user-defined goal, other tangential goals are serviced, such
as collecting statistics, checking grammar, or showing tailored ads.
Horizontally scaled systems, consisting of a myriad of less powerful servers,
are well-suited for networking related workloads
\begin_inset CommandInset citation
@@ -332,7 +332,7 @@ literal "false"
\begin_layout Standard
This project will focus on researching the field and implement the findings
by building a modern tracing system for a distributed wireless solution
used at Cisco.
used by Cisco.
The system should be capable of collecting data from the controller, the
access point, and other deployed devices such as authentication servers
and analytics services.
@@ -361,13 +361,14 @@ literal "false"
The initial requirements for Dapper were: 1) Low overhead - some applications
are very sensitive to network data increase or latency.
2) Application-level transparency - teams and developers are not keen changing
their codebase on demand therefore implementation has to be done in lower
levels, for example, in common libraries (threading, control-flow, RPC).
their codebase on demand therefore the tracing has to be implemented in
lower levels, for example, in common libraries (threading, control-flow,
RPC).
3) Scalability - Drapper has to be able to support existing and new services
for at least 5 years.
The requirements of the distributed wireless tracing solution at Cisco
might be similar to those listed by Google, except for scale, which is
limited for security reasons.
The requirements for the distributed wireless tracing solution at Cisco
closely are similar to those listed by Google, except for scale, which
is limited for security reasons.
It, therefore, makes Dapper a very appealing and useful research platform
for this project.
Zipkin
@@ -378,8 +379,8 @@ literal "false"

\end_inset

, an open-source Dapper alternative, should be used as a drop-in replacement
if decided so.
, an open-source project very similar to Dapper, will be used instead as
a drop-in replacement.
\end_layout

\begin_layout Standard
@@ -395,8 +396,8 @@ literal "false"
), is the need for modification of the underlying instrumentation, for example,
the common low-level libraries the services are using.
If the scope is of the project limited, which is true in this case, the
chances of it being embedded in every service are low.
An alternative is to use so-called black-box schemas (Project5
chances of applying it across every service are low.
The other approach is to use so-called black-box schemas (Project5
\begin_inset CommandInset citation
LatexCommand cite
key "inproceedings"
@@ -413,10 +414,12 @@ literal "false"
\end_inset

).
Unfortunately, active work on these projects has ended several years ago
(Sherlock repository was archived 2 years ago).
The downsides of black-box schemas are decreased accuracy and large overhead
due to the statistical regression techniques used.
However, there is one major advantage - no code modifications are required
at any level, which might be useful since direct access to a service or
at any level, which might be useful when direct access to a service or
instrumentation is blocked.
\end_layout

@@ -462,7 +465,7 @@ Due to the nature of the internship and lack of security clearance, some
\end_layout

\begin_layout Itemize
Enterprise networking services can have a large codebase and are usually
Enterprise networking products can have a large codebase and are usually
written in
\begin_inset Quotes eld
\end_inset
@@ -471,19 +474,18 @@ low-level
\begin_inset Quotes erd
\end_inset

languages, for example, C/C+.
languages such as C/C+.
It makes the learning curve steep.
\end_layout

\begin_layout Itemize
Existing services might not be homogenous and implementing code will thus
require more individual adjustments across the instrumentation libraries
or applications.
Existing products might not be homogenous and implementing code will thus
require more individual adjustments across the instrumentation libraries.
\end_layout

\begin_layout Itemize
All non-trivial software is known to potentially contain bugs introducing
security vulnerabilities, unwanted program behavior, etc.
security vulnerabilities, unwanted program behavior.
All these factors can impact the speed and quality of development.
\end_layout

@@ -502,9 +504,9 @@ literal "false"

\end_inset

), techniques (different data models, collection methods), problems that
can be solved (tracing, security auditing, pattern checking), advantages
and disadvantages, etc.
), understanding techniques (different data models, collection methods),
problems that can be solved (tracing, security audits, pattern checking),
advantages and disadvantages, etc.
\end_layout

\begin_layout Standard
@@ -519,8 +521,8 @@ The second part is to implement the acquired knowledge in building the actual
\end_layout

\begin_layout Standard
The last part is evaluating the tracing system both quantitively and qualitative
ly.
The last part consists of evaluation of the tracing system.
The tracer has to be evaluated both quantitively and qualitatively.
Depending on the quality of the deliverable, two test environments are
possible - either testing or production.
The deliverable will be deployed in the production environment only if
@@ -530,7 +532,7 @@ ly.
code reviews, heavy testing, benchmarking, etc.
The results for some quantitative metrics such as latency, network data
overhead, resource usage will be collected, analyzed and compared against
other results like Dapper
other results such as Dapper
\begin_inset CommandInset citation
LatexCommand cite
key "sigelmanDapperLargeScaleDistributed"
@@ -541,11 +543,11 @@ literal "false"
.
Qualitative metrics such as application-level transparency and ease of
implementation will be investigated by surveying different teams and developers
at Cisco.
within Cisco.
\begin_inset CommandInset bibtex
LatexCommand bibtex
btprint "btPrintCited"
bibfiles "docs/chalmers-tracing-in-distributed-systems"
bibfiles "chalmers-tracing-in-distributed-systems"
options "plain"

\end_inset


BIN
Martins_Eglitis_Master_Thesis_Proposal.pdf View File


docs/chalmers-tracing-in-distributed-systems.bib → chalmers-tracing-in-distributed-systems.bib View File


+ 0
- 559
docs/Martins_Eglitis_Master_Thesis_Proposal.lyx View File

@@ -1,559 +0,0 @@
#LyX 2.3 created this file. For more info see http://www.lyx.org/
\lyxformat 544
\begin_document
\begin_header
\save_transient_properties true
\origin unavailable
\textclass article
\begin_preamble
\usepackage{inputenc}

\usepackage{natbib}
\usepackage{hyperref}

\usepackage{graphicx}

\usepackage[colorinlistoftodos]{todonotes}

\usepackage{parskip}
\setlength{\parskip}{10pt}

\usepackage{tikz}
\usetikzlibrary{arrows, decorations.markings}

\usepackage{chngcntr}
\counterwithout{figure}{section}
\end_preamble
\use_default_options true
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding auto
\fontencoding global
\font_roman "default" "default"
\font_sans "default" "default"
\font_typewriter "default" "default"
\font_math "auto" "auto"
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100 100
\font_tt_scale 100 100
\use_microtype false
\use_dash_ligatures true
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\paperfontsize default
\spacing single
\use_hyperref false
\papersize default
\use_geometry false
\use_package amsmath 1
\use_package amssymb 1
\use_package cancel 1
\use_package esint 1
\use_package mathdots 1
\use_package mathtools 1
\use_package mhchem 1
\use_package stackrel 1
\use_package stmaryrd 1
\use_package undertilde 1
\cite_engine basic
\cite_engine_type default
\biblio_style plain
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\justification true
\use_refstyle 1
\use_minted 0
\index Index
\shortcut idx
\color #008000
\end_index
\secnumdepth 2
\tocdepth 2
\paragraph_separation indent
\paragraph_indentation default
\is_math_indent 0
\math_numbering_side default
\quotes_style english
\dynamic_quotes 0
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header

\begin_body

\begin_layout Standard
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
begin{titlepage}
\end_layout

\end_inset


\end_layout

\begin_layout Standard
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
centering
\end_layout

\end_inset


\end_layout

\begin_layout Standard

\shape smallcaps
\size largest
Master thesis project proposal
\begin_inset Newline newline
\end_inset


\end_layout

\begin_layout Standard
\begin_inset VSpace 0.5cm
\end_inset


\end_layout

\begin_layout Standard

\series bold
\size huge
Tracing in Distributed Systems
\end_layout

\begin_layout Standard
\begin_inset VSpace 2cm
\end_inset


\end_layout

\begin_layout Standard

\size larger
Martins Eglitis eglitis@student.chalmers.se
\end_layout

\begin_layout Standard
\begin_inset VSpace 1.5cm
\end_inset


\end_layout

\begin_layout Standard

\size large
Relevant completed courses:
\end_layout

\begin_layout Itemize
EDA093, Operating Systems
\end_layout

\begin_layout Itemize
EDA387, Computer Networks
\end_layout

\begin_layout Itemize
EDA263, Computer Security
\end_layout

\begin_layout Itemize
EDA491, Network Security
\end_layout

\begin_layout Standard
\begin_inset VSpace vfill
\end_inset


\end_layout

\begin_layout Standard
\begin_inset VSpace vfill
\end_inset


\end_layout

\begin_layout Standard

\size large
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
today
\end_layout

\end_inset


\begin_inset Newline newline
\end_inset


\end_layout

\begin_layout Standard
\begin_inset ERT
status open

\begin_layout Plain Layout


\backslash
end{titlepage}
\end_layout

\end_inset


\end_layout

\begin_layout Section
Introduction
\end_layout

\begin_layout Standard
Distributed systems are ubiquitous, providing daily services such as network
applications (web search, online shopping, gaming), communications (networks,
sensors), transportation, and many more.
Although there is no single definition of distributed system, it can be
perceived as a system that is logically or functionally distributing the
workload of the goal over multiple processing units
\begin_inset CommandInset citation
LatexCommand cite
key "ghoshDistributedSystemsAlgorithmic2015"
literal "false"

\end_inset

.
\end_layout

\begin_layout Standard
In many situations, the number of actual computers that serve a single goal
is tremendous.
For example, there are thousands of servers involved in serving a web request
when using the Google search engine
\begin_inset CommandInset citation
LatexCommand cite
key "sigelmanDapperLargeScaleDistributed"
literal "false"

\end_inset

.
Besides the user-defined goal, other tangential goals are serviced, such
as collecting statistics, checking grammar, or showing tailored ads.
Horizontally scaled systems, consisting of a myriad of less powerful servers,
are well-suited for networking related workloads
\begin_inset CommandInset citation
LatexCommand cite
key "barrosoWebSearchPlanet2003"
literal "false"

\end_inset

.
\end_layout

\begin_layout Standard
The execution trace often leads outside the boundaries of a single entity
- services can be managed by different teams in different countries, using
various programming languages and frameworks
\begin_inset CommandInset citation
LatexCommand cite
key "sigelmanDapperLargeScaleDistributed"
literal "false"

\end_inset

.
Yet when an error occurs somewhere in the execution path, it is crucial
to point to the problematic area.
The culprit is the massive scale which currently is a burden to state-of-the-ar
t monitoring systems as they can only observe such unwanted events happening
\begin_inset CommandInset citation
LatexCommand cite
key "alvaroOKWHYTRACING"
literal "false"

\end_inset

.
For example, by observing spikes in latencies such systems do not pinpoint
the actual root cause of the problem.
Even if the problem has been localized, more problems arise - how to relate
the record in the log file to other log files? How to make sure the record
refers to the same service? How to match records efficiently across thousands
of servers?
\end_layout

\begin_layout Standard
This project will focus on researching the field and implement the findings
by building a modern tracing system for a distributed wireless solution
used by Cisco.
The system should be capable of collecting data from the controller, the
access point, and other deployed devices such as authentication servers
and analytics services.
Among others, use cases of such systems include troubleshooting errors,
anomaly detection, latency problems discovery, exploring dependencies and
validating functionality.
\end_layout

\begin_layout Section
Context
\end_layout

\begin_layout Standard
One of the earlier works in the field of tracing in distributed systems
presents Drapper
\begin_inset CommandInset citation
LatexCommand cite
key "sigelmanDapperLargeScaleDistributed"
literal "false"

\end_inset

.
Google has been using the closed-source tracer in their production environments
for at least 2 years, thus proving the maturity of the solution.
The initial requirements for Dapper were: 1) Low overhead - some applications
are very sensitive to network data increase or latency.
2) Application-level transparency - teams and developers are not keen changing
their codebase on demand therefore the tracing has to be implemented in
lower levels, for example, in common libraries (threading, control-flow,
RPC).
3) Scalability - Drapper has to be able to support existing and new services
for at least 5 years.
The requirements for the distributed wireless tracing solution at Cisco
closely are similar to those listed by Google, except for scale, which
is limited for security reasons.
It, therefore, makes Dapper a very appealing and useful research platform
for this project.
Zipkin
\begin_inset CommandInset citation
LatexCommand cite
key "OpenZipkinDistributedTracing"
literal "false"

\end_inset

, an open-source project very similar to Dapper, will be used instead as
a drop-in replacement.
\end_layout

\begin_layout Standard
One of the disadvantages of annotation-based schemas (Dapper or X-Trace
\begin_inset CommandInset citation
LatexCommand cite
key "AWSXRayDistributed"
literal "false"

\end_inset

), is the need for modification of the underlying instrumentation, for example,
the common low-level libraries the services are using.
If the scope is of the project limited, which is true in this case, the
chances of applying it across every service are low.
The other approach is to use so-called black-box schemas (Project5
\begin_inset CommandInset citation
LatexCommand cite
key "inproceedings"
literal "false"

\end_inset

or Sherlock
\begin_inset CommandInset citation
LatexCommand cite
key "arbezzanoGianarbSherlock2019"
literal "false"

\end_inset

).
Unfortunately, active work on these projects has ended several years ago
(Sherlock repository was archived 2 years ago).
The downsides of black-box schemas are decreased accuracy and large overhead
due to the statistical regression techniques used.
However, there is one major advantage - no code modifications are required
at any level, which might be useful when direct access to a service or
instrumentation is blocked.
\end_layout

\begin_layout Section
Goals and challenges
\end_layout

\begin_layout Standard
The main goals for the project are:
\end_layout

\begin_layout Itemize
Research state-of-the-art on distributed tracing.
\end_layout

\begin_layout Itemize
Define the tracing data model.
For example, sampling rates, entry points, system components, integration
with other observability methods.
\end_layout

\begin_layout Itemize
Write low overhead collector code and encapsulate in libraries or applications.
\end_layout

\begin_layout Itemize
Integrate with a tracing lookup tool such as Zipkin, X-Ray, AppDynamics.
\end_layout

\begin_layout Itemize
Evaluate the code based on relevant metrics.
For example, network data overhead and latency, system resource usage (CPU,
memory, storage), scalability, ease of implementation and transparency.
\end_layout

\begin_layout Standard
Some of the challenges are:
\end_layout

\begin_layout Itemize
Due to the nature of the internship and lack of security clearance, some
parts of the system might not be accessible.
\end_layout

\begin_layout Itemize
Enterprise networking products can have a large codebase and are usually
written in
\begin_inset Quotes eld
\end_inset

low-level
\begin_inset Quotes erd
\end_inset

languages such as C/C+.
It makes the learning curve steep.
\end_layout

\begin_layout Itemize
Existing products might not be homogenous and implementing code will thus
require more individual adjustments across the instrumentation libraries.
\end_layout

\begin_layout Itemize
All non-trivial software is known to potentially contain bugs introducing
security vulnerabilities, unwanted program behavior.
All these factors can impact the speed and quality of development.
\end_layout

\begin_layout Section
Approach
\end_layout

\begin_layout Standard
The first part is to research the field of distributed tracing in-depth.
It includes reading academic papers as well as understanding the capabilities
of tooling available (tracing systems, frameworks like OpenTelemetry
\begin_inset CommandInset citation
LatexCommand cite
key "OpenTelemetry"
literal "false"

\end_inset

), understanding techniques (different data models, collection methods),
problems that can be solved (tracing, security audits, pattern checking),
advantages and disadvantages, etc.
\end_layout

\begin_layout Standard
The second part is to implement the acquired knowledge in building the actual
tracing system at Cisco.
Depending on the findings from the first part, it could mean adjusting
an existing tracing system, building one from scratch, or mixing.
Working closely with the software engineering team will be necessary to
flatten the learning curve of enterprise subsystems, study the C/C++ libraries
used, find the optimal data structure and algorithms, implement the collector,
and finally integrate with a lookup tool.
\end_layout

\begin_layout Standard
The last part consists of evaluation of the tracing system.
The tracer has to be evaluated both quantitively and qualitatively.
Depending on the quality of the deliverable, two test environments are
possible - either testing or production.
The deliverable will be deployed in the production environment only if
it is accepted by a higher instance as it may appear in instrumentation
throughout the company.
It usually means having all the required functionality, documentation,
code reviews, heavy testing, benchmarking, etc.
The results for some quantitative metrics such as latency, network data
overhead, resource usage will be collected, analyzed and compared against
other results such as Dapper
\begin_inset CommandInset citation
LatexCommand cite
key "sigelmanDapperLargeScaleDistributed"
literal "false"

\end_inset

.
Qualitative metrics such as application-level transparency and ease of
implementation will be investigated by surveying different teams and developers
within Cisco.
\begin_inset CommandInset bibtex
LatexCommand bibtex
btprint "btPrintCited"
bibfiles "chalmers-tracing-in-distributed-systems"
options "plain"

\end_inset


\end_layout

\end_body
\end_document

BIN
docs/Martins_Eglitis_Master_Thesis_Proposal.pdf View File


Loading…
Cancel
Save