Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Andreas Tsagkaropoulos
morphemic-preprocessor
Commits
0c62933e
Commit
0c62933e
authored
Mar 15, 2021
by
Jean-Didier Totow
Browse files
performance model, persistent storage
parent
766ef22d
Changes
124
Hide whitespace changes
Inline
Side-by-side
morphemic-datasetmaker/CHANGES.txt
0 → 100644
View file @
0c62933e
morphemic-datasetmaker/Dataset_Maker.egg-info/PKG-INFO
0 → 100644
View file @
0c62933e
Metadata-Version: 1.0
Name: Dataset-Maker
Version: 0.0.1
Summary: Python package for creating a dataset using InfluxDB data points
Home-page: http://git.dac.ds.unipi.gr/morphemic/datasetmaker
Author: Jean-Didier Totow
Author-email: totow@unipi.gr
License: LICENSE.txt
Description: 1. Generality
Dataset maker is morphemic python library for
building dataset from data points registered into InfluxDB.
Dataset maker receives the name of an application, the start time
and the tolerance interval. More details are provided below.
2. InfluxDB format
Data points in InfluxDB should have the following format for being used
correctly by the dataset maker:
measurement : "application_name" #mandatory
timestamp : timestamp #optional
fields : dictionnary containing metric exposed by the given application
cpu_usage, memory_consumption, response_time, http_latency
tags : dictionnary of metrics related information
The JSON describing the above information is the following:
Ex.:
{"measurement": "application_name",
"timestamp": 155655476.453,
"fields": {
"cpu_usage": 40,
"memory_consumption": 67.9,
"response_time": 28,
"http_latency": 12
},
"tags": {
"core": 2 #cpu_usage of 40% is the usage of the cpu core number 2
}
}
If data points are presented as the above format, the dataset maker will output
a csv (application_name.csv) file with the following schema:
time, cpu_usage, memory_consumption, response_time, http_latency, core
3. Usage
Warming : make sure the above variables exist before importing dataset make library
from morphemic.dataset import DatasetMaker
data_maker = DatasetMaker(application, start, configs)
response = data_maker.make()
application, string containing the application name
start, when to start building the dataset
Ex.: '10m' , build dataset containg data point stored the 10 last minute
Ex.: '3h', three hours
Ex.: '4d', four days
leave empty or set to None if you wish all data points stored in your InfluxDB
configs is dictionnary containg parameters
{
"hostname": hostname or IP of InfluxDB
"port": port of InfluxDB
"username": InfluxDB username
"password": password of the above user
"dbname": database name
"path_dataset": path where the dataset will be saved
}
the response contains
{'status': True,'url': url, 'application': application_name, 'features': features}
or if an error occured
{'status': False,'message': "reason of the error"}
Platform: UNKNOWN
morphemic-datasetmaker/Dataset_Maker.egg-info/SOURCES.txt
0 → 100644
View file @
0c62933e
README.txt
setup.py
Dataset_Maker.egg-info/PKG-INFO
Dataset_Maker.egg-info/SOURCES.txt
Dataset_Maker.egg-info/dependency_links.txt
Dataset_Maker.egg-info/requires.txt
Dataset_Maker.egg-info/top_level.txt
morphemic/__init__.py
morphemic/dataset/__init__.py
\ No newline at end of file
morphemic-datasetmaker/Dataset_Maker.egg-info/dependency_links.txt
0 → 100644
View file @
0c62933e
morphemic-datasetmaker/Dataset_Maker.egg-info/requires.txt
0 → 100644
View file @
0c62933e
pandas
influxdb
morphemic-datasetmaker/Dataset_Maker.egg-info/top_level.txt
0 → 100644
View file @
0c62933e
morphemic
morphemic-datasetmaker/LICENCE.txt
0 → 100644
View file @
0c62933e
Copyright (c) 2021 unipi.gr
MIT License
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
\ No newline at end of file
morphemic-datasetmaker/README.txt
0 → 100644
View file @
0c62933e
1. Generality
Dataset maker is morphemic python library for
building dataset from data points registered into InfluxDB.
Dataset maker receives the name of an application, the start time
and the tolerance interval. More details are provided below.
2. InfluxDB format
Data points in InfluxDB should have the following format for being used
correctly by the dataset maker:
measurement : "application_name" #mandatory
timestamp : timestamp #optional
fields : dictionnary containing metric exposed by the given application
cpu_usage, memory_consumption, response_time, http_latency
tags : dictionnary of metrics related information
The JSON describing the above information is the following:
Ex.:
{"measurement": "application_name",
"timestamp": 155655476.453,
"fields": {
"cpu_usage": 40,
"memory_consumption": 67.9,
"response_time": 28,
"http_latency": 12
},
"tags": {
"core": 2 #cpu_usage of 40% is the usage of the cpu core number 2
}
}
If data points are presented as the above format, the dataset maker will output
a csv (application_name.csv) file with the following schema:
time, cpu_usage, memory_consumption, response_time, http_latency, core
3. Usage
Warming : make sure the above variables exist before importing dataset make library
from morphemic.dataset import DatasetMaker
data_maker = DatasetMaker(application, start, configs)
response = data_maker.make()
application, string containing the application name
start, when to start building the dataset
Ex.: '10m' , build dataset containg data point stored the 10 last minute
Ex.: '3h', three hours
Ex.: '4d', four days
leave empty or set to None if you wish all data points stored in your InfluxDB
configs is dictionnary containg parameters
{
"hostname": hostname or IP of InfluxDB
"port": port of InfluxDB
"username": InfluxDB username
"password": password of the above user
"dbname": database name
"path_dataset": path where the dataset will be saved
}
the response contains
{'status': True,'url': url, 'application': application_name, 'features': features}
or if an error occured
{'status': False,'message': "reason of the error"}
\ No newline at end of file
morphemic-datasetmaker/datasetmaker.egg-info/PKG-INFO
0 → 100644
View file @
0c62933e
Metadata-Version: 1.0
Name: datasetmaker
Version: 0.0.1
Summary: Python package for creating a dataset using InfluxDB data points
Home-page: http://git.dac.ds.unipi.gr/morphemic/datasetmaker
Author: Jean-Didier Totow
Author-email: totow@unipi.gr
License: LICENSE.txt
Description: 1. Generality
Dataset maker is morphemic python library for
building dataset from data points registered into InfluxDB.
Dataset maker receives the name of an application, the start time
and the tolerance interval. More details are provided below.
2. InfluxDB format
Data points in InfluxDB should have the following format for being used
correctly by the dataset maker:
measurement : "application_name" #mandatory
timestamp : timestamp #optional
fields : dictionnary containing metric exposed by the given application
cpu_usage, memory_consumption, response_time, http_latency
tags : dictionnary of metrics related information
The JSON describing the above information is the following:
Ex.:
{"measurement": "application_name",
"timestamp": 155655476.453,
"fields": {
"cpu_usage": 40,
"memory_consumption": 67.9,
"response_time": 28,
"http_latency": 12
},
"tags": {
"core": 2 #cpu_usage of 40% is the usage of the cpu core number 2
}
}
If data points are presented as the above format, the dataset maker will output
a csv (application_name.csv) file with the following schema:
time, cpu_usage, memory_consumption, response_time, http_latency, core
3. Usage
Warming : make sure the above variables exist before importing dataset make library
from morphemic.dataset import DatasetMaker
data_maker = DatasetMaker(application, start, configs)
response = data_maker.make()
application, string containing the application name
start, when to start building the dataset
Ex.: '10m' , build dataset containg data point stored the 10 last minute
Ex.: '3h', three hours
Ex.: '4d', four days
leave empty or set to None if you wish all data points stored in your InfluxDB
configs is dictionnary containg parameters
{
"hostname": hostname or IP of InfluxDB
"port": port of InfluxDB
"username": InfluxDB username
"password": password of the above user
"dbname": database name
"path_dataset": path where the dataset will be saved
}
the response contains
{'status': True,'url': url, 'application': application_name, 'features': features}
or if an error occured
{'status': False,'message': "reason of the error"}
Platform: UNKNOWN
morphemic-datasetmaker/datasetmaker.egg-info/SOURCES.txt
0 → 100644
View file @
0c62933e
README.txt
setup.py
datasetmaker.egg-info/PKG-INFO
datasetmaker.egg-info/SOURCES.txt
datasetmaker.egg-info/dependency_links.txt
datasetmaker.egg-info/requires.txt
datasetmaker.egg-info/top_level.txt
morphemic/__init__.py
morphemic/dataset/__init__.py
\ No newline at end of file
morphemic-datasetmaker/datasetmaker.egg-info/dependency_links.txt
0 → 100644
View file @
0c62933e
morphemic-datasetmaker/datasetmaker.egg-info/requires.txt
0 → 100644
View file @
0c62933e
pandas
influxdb
morphemic-datasetmaker/datasetmaker.egg-info/top_level.txt
0 → 100644
View file @
0c62933e
morphemic
morphemic-datasetmaker/morphemic/__init__.py
0 → 100644
View file @
0c62933e
morphemic-datasetmaker/morphemic/__pycache__/__init__.cpython-36.pyc
0 → 100644
View file @
0c62933e
File added
morphemic-datasetmaker/morphemic/__pycache__/__init__.cpython-37.pyc
0 → 100644
View file @
0c62933e
File added
morphemic-datasetmaker/morphemic/dataset/__init__.py
0 → 100644
View file @
0c62933e
import
os
,
json
,
time
from
influxdb
import
InfluxDBClient
import
pandas
as
pd
from
datetime
import
datetime
url_path_dataset
=
None
class
Row
():
def
__init__
(
self
,
features
,
metricsname
):
self
.
features
=
features
if
"time"
in
self
.
features
:
time_str
=
self
.
features
[
"time"
]
_obj
=
datetime
.
strptime
(
time_str
,
'%Y-%m-%dT%H:%M:%S.%fZ'
)
self
.
features
[
"time"
]
=
int
(
_obj
.
timestamp
())
if
'application'
in
metricsname
:
metricsname
.
remove
(
'application'
)
for
field_name
in
metricsname
:
if
not
field_name
in
self
.
features
:
self
.
features
[
field_name
]
=
None
def
getTime
(
self
):
if
"time"
in
self
.
features
:
return
self
.
features
[
"time"
]
if
"timestamp"
in
self
.
features
:
return
self
.
features
[
"timestamp"
]
return
None
def
makeCsvRow
(
self
):
if
"application"
in
self
.
features
:
del
self
.
features
[
"application"
]
result
=
''
for
key
,
_value
in
self
.
features
.
items
():
result
+=
"{0},"
.
format
(
_value
)
return
result
[:
-
1
]
+
"
\n
"
class
Dataset
():
def
__init__
(
self
):
self
.
rows
=
{}
self
.
size
=
0
def
addRow
(
self
,
row
):
self
.
rows
[
row
.
getTime
()]
=
row
self
.
size
+=
1
def
reset
(
self
):
self
.
rows
=
{}
self
.
size
=
0
print
(
"Dataset reset"
)
def
getSize
(
self
):
return
self
.
size
def
sortRows
(
self
):
return
sorted
(
list
(
self
.
rows
.
values
()),
key
=
lambda
x
:
x
.
getTime
(),
reverse
=
True
)
def
getRows
(
self
):
return
list
(
self
.
rows
.
values
())
def
getRow
(
self
,
_time
,
tolerance
):
for
i
in
range
(
tolerance
):
if
int
(
_time
+
i
)
in
self
.
rows
:
return
self
.
rows
[
int
(
_time
+
i
)]
return
None
def
save
(
self
,
metricnames
,
application_name
):
if
"application"
in
metricnames
:
metricnames
.
remove
(
"application"
)
dataset_content
=
''
for
metric
in
metricnames
:
dataset_content
+=
"{0},"
.
format
(
metric
)
dataset_content
=
dataset_content
[:
-
1
]
+
"
\n
"
for
row
in
list
(
self
.
rows
.
values
()):
dataset_content
+=
row
.
makeCsvRow
()
_file
=
open
(
url_path_dataset
+
"{0}.csv"
.
format
(
application_name
),
'w'
)
_file
.
write
(
dataset_content
)
_file
.
close
()
return
url_path_dataset
+
"{0}.csv"
.
format
(
application_name
)
class
DatasetMaker
():
def
__init__
(
self
,
application
,
start
,
configs
):
self
.
application
=
application
self
.
start_filter
=
start
self
.
influxdb
=
InfluxDBClient
(
host
=
configs
[
'hostname'
],
port
=
configs
[
'port'
],
username
=
configs
[
'username'
],
password
=
configs
[
'password'
],
database
=
configs
[
'dbname'
])
self
.
dataset
=
Dataset
()
self
.
tolerance
=
5
global
url_path_dataset
url_path_dataset
=
configs
[
'path_dataset'
]
if
url_path_dataset
[
-
1
]
!=
"/"
:
url_path_dataset
+=
"/"
def
getIndex
(
self
,
columns
,
name
):
return
columns
.
index
(
name
)
def
makeRow
(
self
,
columns
,
values
):
row
=
{}
index
=
0
for
column
in
columns
:
row
[
column
]
=
values
[
index
]
index
+=
1
return
row
def
prepareResultSet
(
self
,
result_set
):
result
=
[]
columns
=
result_set
[
"series"
][
0
][
"columns"
]
series_values
=
result_set
[
"series"
][
0
][
"values"
]
index
=
0
for
_values
in
series_values
:
row
=
self
.
makeRow
(
columns
,
_values
)
result
.
append
(
row
)
return
result
def
make
(
self
):
try
:
self
.
influxdb
.
ping
()
except
Exception
as
e
:
print
(
"Could not establish connexion with InfluxDB, please verify connexion parameters"
)
print
(
e
)
return
{
"message"
:
"Could not establish connexion with InfluxDB, please verify connexion parameters"
}
if
self
.
getData
()
==
None
:
return
{
"message"
:
"No data found"
}
metricnames
,
_data
=
self
.
getData
()
for
_row
in
_data
:
row
=
Row
(
_row
,
metricnames
)
self
.
dataset
.
addRow
(
row
)
print
(
"Rows construction completed"
)
print
(
"{0} rows found"
.
format
(
self
.
dataset
.
getSize
()))
#self.dataset.sortRows()
url
=
self
.
dataset
.
save
(
metricnames
,
self
.
application
)
features
=
self
.
getFeatures
(
url
)
if
features
==
None
:
return
{
'status'
:
False
,
'message'
:
'An error occured while building dataset'
}
return
{
'status'
:
True
,
'url'
:
url
,
'application'
:
self
.
application
,
'features'
:
features
}
def
getFeatures
(
self
,
url
):
try
:
df
=
pd
.
read_csv
(
url
)
return
df
.
columns
.
to_list
()
except
Exception
as
e
:
print
(
"Cannot extract data feature list"
)
return
None
def
extractMeasurement
(
self
,
_json
):
return
_json
[
"series"
][
0
][
"columns"
]
def
getData
(
self
):
query
=
None
try
:
if
self
.
start_filter
!=
None
and
self
.
start_filter
!=
""
:
query
=
"SELECT * FROM "
+
self
.
application
+
" WHERE time > now() - "
+
self
.
start_filter
else
:
query
=
"SELECT * FROM "
+
self
.
application
result_set
=
self
.
influxdb
.
query
(
query
=
query
)
series
=
self
.
extractMeasurement
(
result_set
.
raw
)
#self.influxdb.close() #closing connexion
return
[
series
,
self
.
prepareResultSet
(
result_set
.
raw
)]
except
Exception
as
e
:
print
(
"Could not collect query data points"
)
print
(
e
)
return
None
morphemic-datasetmaker/morphemic/dataset/__pycache__/__init__.cpython-36.pyc
0 → 100644
View file @
0c62933e
File added
morphemic-datasetmaker/morphemic/dataset/__pycache__/__init__.cpython-37.pyc
0 → 100644
View file @
0c62933e
File added
morphemic-datasetmaker/setup.py
0 → 100644
View file @
0c62933e
from
setuptools
import
setup
setup
(
name
=
'datasetmaker'
,
version
=
'0.0.1'
,
author
=
'Jean-Didier Totow'
,
author_email
=
'totow@unipi.gr'
,
packages
=
[
'morphemic'
,
'morphemic.dataset'
],
scripts
=
[],
url
=
'http://git.dac.ds.unipi.gr/morphemic/datasetmaker'
,
license
=
'LICENSE.txt'
,
description
=
'Python package for creating a dataset using InfluxDB data points'
,
long_description
=
open
(
'README.txt'
).
read
(),
install_requires
=
[
"pandas"
,
"influxdb"
,
],
)
Prev
1
2
3
4
5
…
7
Next
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment