Skip to content
GitLab
Menu
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Melodic
morphemic-preprocessor
Commits
cce47c75
Commit
cce47c75
authored
Oct 15, 2021
by
Jean-Didier
Browse files
change datasetlib name
parent
3aa09bf6
Pipeline
#16678
passed with stage
in 1 minute and 54 seconds
Changes
22
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
forecaster-cnn/Dockerfile
View file @
cce47c75
...
...
@@ -5,7 +5,7 @@ RUN mkdir /app
RUN
mkdir
-p
/app/log
ADD
. /app
RUN
pip
install
/app/
dataset
lib
RUN
pip
install
/app/lib
RUN
pip
install
-r
/app/amq_client/requirements.txt
WORKDIR
/app
...
...
datasetlib
@
c9c6d3c9
Subproject commit c9c6d3c954b57f9dd3b5109514bd033da00c95db
forecaster-cnn/lib/CHANGES.txt
0 → 100644
View file @
cce47c75
forecaster-cnn/lib/Dataset_Maker.egg-info/PKG-INFO
0 → 100644
View file @
cce47c75
Metadata-Version: 1.0
Name: Dataset-Maker
Version: 0.0.1
Summary: Python package for creating a dataset using InfluxDB data points
Home-page: http://git.dac.ds.unipi.gr/morphemic/datasetmaker
Author: Jean-Didier Totow
Author-email: totow@unipi.gr
License: LICENSE.txt
Description: 1. Generality
Dataset maker is morphemic python library for
building dataset from data points registered into InfluxDB.
Dataset maker receives the name of an application, the start time
and the tolerance interval. More details are provided below.
2. InfluxDB format
Data points in InfluxDB should have the following format for being used
correctly by the dataset maker:
measurement : "application_name" #mandatory
timestamp : timestamp #optional
fields : dictionnary containing metric exposed by the given application
cpu_usage, memory_consumption, response_time, http_latency
tags : dictionnary of metrics related information
The JSON describing the above information is the following:
Ex.:
{"measurement": "application_name",
"timestamp": 155655476.453,
"fields": {
"cpu_usage": 40,
"memory_consumption": 67.9,
"response_time": 28,
"http_latency": 12
},
"tags": {
"core": 2 #cpu_usage of 40% is the usage of the cpu core number 2
}
}
If data points are presented as the above format, the dataset maker will output
a csv (application_name.csv) file with the following schema:
time, cpu_usage, memory_consumption, response_time, http_latency, core
3. Usage
Warming : make sure the above variables exist before importing dataset make library
from morphemic.dataset import DatasetMaker
data_maker = DatasetMaker(application, start, configs)
response = data_maker.make()
application, string containing the application name
start, when to start building the dataset
Ex.: '10m' , build dataset containg data point stored the 10 last minute
Ex.: '3h', three hours
Ex.: '4d', four days
leave empty or set to None if you wish all data points stored in your InfluxDB
configs is dictionnary containg parameters
{
"hostname": hostname or IP of InfluxDB
"port": port of InfluxDB
"username": InfluxDB username
"password": password of the above user
"dbname": database name
"path_dataset": path where the dataset will be saved
}
the response contains
{'status': True,'url': url, 'application': application_name, 'features': features}
or if an error occured
{'status': False,'message': "reason of the error"}
Platform: UNKNOWN
forecaster-cnn/lib/Dataset_Maker.egg-info/SOURCES.txt
0 → 100644
View file @
cce47c75
README.txt
setup.py
Dataset_Maker.egg-info/PKG-INFO
Dataset_Maker.egg-info/SOURCES.txt
Dataset_Maker.egg-info/dependency_links.txt
Dataset_Maker.egg-info/requires.txt
Dataset_Maker.egg-info/top_level.txt
morphemic/__init__.py
morphemic/dataset/__init__.py
\ No newline at end of file
forecaster-cnn/lib/Dataset_Maker.egg-info/dependency_links.txt
0 → 100644
View file @
cce47c75
forecaster-cnn/lib/Dataset_Maker.egg-info/requires.txt
0 → 100644
View file @
cce47c75
pandas
influxdb
forecaster-cnn/lib/Dataset_Maker.egg-info/top_level.txt
0 → 100644
View file @
cce47c75
morphemic
forecaster-cnn/lib/LICENCE.txt
0 → 100644
View file @
cce47c75
Copyright (c) 2021 unipi.gr
MIT License
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
\ No newline at end of file
forecaster-cnn/lib/README.txt
0 → 100644
View file @
cce47c75
1. Generality
Dataset maker is morphemic python library for
building dataset from data points registered into InfluxDB.
Dataset maker receives the name of an application, the start time
and the tolerance interval. More details are provided below.
2. InfluxDB format
Data points in InfluxDB should have the following format for being used
correctly by the dataset maker:
measurement : "application_name" #mandatory
timestamp : timestamp #optional
fields : dictionnary containing metric exposed by the given application
cpu_usage, memory_consumption, response_time, http_latency
tags : dictionnary of metrics related information
The JSON describing the above information is the following:
Ex.:
{"measurement": "application_name",
"timestamp": 155655476.453,
"fields": {
"cpu_usage": 40,
"memory_consumption": 67.9,
"response_time": 28,
"http_latency": 12
},
"tags": {
"core": 2 #cpu_usage of 40% is the usage of the cpu core number 2
}
}
If data points are presented as the above format, the dataset maker will output
a csv (application_name.csv) file with the following schema:
time, cpu_usage, memory_consumption, response_time, http_latency, core
3. Usage
Warming : make sure the above variables exist before importing dataset make library
from morphemic.dataset import DatasetMaker
data_maker = DatasetMaker(application, start, configs)
response = data_maker.make()
application, string containing the application name
start, when to start building the dataset
Ex.: '10m' , build dataset containg data point stored the 10 last minute
Ex.: '3h', three hours
Ex.: '4d', four days
leave empty or set to None if you wish all data points stored in your InfluxDB
configs is dictionnary containg parameters
{
"hostname": hostname or IP of InfluxDB
"port": port of InfluxDB
"username": InfluxDB username
"password": password of the above user
"dbname": database name
"path_dataset": path where the dataset will be saved
}
the response contains
{'status': True,'url': url, 'application': application_name, 'features': features}
or if an error occured
{'status': False,'message': "reason of the error"}
\ No newline at end of file
forecaster-cnn/lib/datasetmaker.egg-info/PKG-INFO
0 → 100644
View file @
cce47c75
Metadata-Version: 1.0
Name: datasetmaker
Version: 0.0.1
Summary: Python package for creating a dataset using InfluxDB data points
Home-page: http://git.dac.ds.unipi.gr/morphemic/datasetmaker
Author: Jean-Didier Totow
Author-email: totow@unipi.gr
License: LICENSE.txt
Description: 1. Generality
Dataset maker is morphemic python library for
building dataset from data points registered into InfluxDB.
Dataset maker receives the name of an application, the start time
and the tolerance interval. More details are provided below.
2. InfluxDB format
Data points in InfluxDB should have the following format for being used
correctly by the dataset maker:
measurement : "application_name" #mandatory
timestamp : timestamp #optional
fields : dictionnary containing metric exposed by the given application
cpu_usage, memory_consumption, response_time, http_latency
tags : dictionnary of metrics related information
The JSON describing the above information is the following:
Ex.:
{"measurement": "application_name",
"timestamp": 155655476.453,
"fields": {
"cpu_usage": 40,
"memory_consumption": 67.9,
"response_time": 28,
"http_latency": 12
},
"tags": {
"core": 2 #cpu_usage of 40% is the usage of the cpu core number 2
}
}
If data points are presented as the above format, the dataset maker will output
a csv (application_name.csv) file with the following schema:
time, cpu_usage, memory_consumption, response_time, http_latency, core
3. Usage
Warming : make sure the above variables exist before importing dataset make library
from morphemic.dataset import DatasetMaker
data_maker = DatasetMaker(application, start, configs)
response = data_maker.make()
application, string containing the application name
start, when to start building the dataset
Ex.: '10m' , build dataset containg data point stored the 10 last minute
Ex.: '3h', three hours
Ex.: '4d', four days
leave empty or set to None if you wish all data points stored in your InfluxDB
configs is dictionnary containg parameters
{
"hostname": hostname or IP of InfluxDB
"port": port of InfluxDB
"username": InfluxDB username
"password": password of the above user
"dbname": database name
"path_dataset": path where the dataset will be saved
}
the response contains
{'status': True,'url': url, 'application': application_name, 'features': features}
or if an error occured
{'status': False,'message': "reason of the error"}
Platform: UNKNOWN
forecaster-cnn/lib/datasetmaker.egg-info/SOURCES.txt
0 → 100644
View file @
cce47c75
README.txt
setup.py
datasetmaker.egg-info/PKG-INFO
datasetmaker.egg-info/SOURCES.txt
datasetmaker.egg-info/dependency_links.txt
datasetmaker.egg-info/requires.txt
datasetmaker.egg-info/top_level.txt
morphemic/__init__.py
morphemic/dataset/__init__.py
\ No newline at end of file
forecaster-cnn/lib/datasetmaker.egg-info/dependency_links.txt
0 → 100644
View file @
cce47c75
forecaster-cnn/lib/datasetmaker.egg-info/requires.txt
0 → 100644
View file @
cce47c75
pandas
influxdb
forecaster-cnn/lib/datasetmaker.egg-info/top_level.txt
0 → 100644
View file @
cce47c75
morphemic
forecaster-cnn/lib/morphemic/__init__.py
0 → 100644
View file @
cce47c75
forecaster-cnn/lib/morphemic/__pycache__/__init__.cpython-36.pyc
0 → 100644
View file @
cce47c75
File added
forecaster-cnn/lib/morphemic/__pycache__/__init__.cpython-37.pyc
0 → 100644
View file @
cce47c75
File added
forecaster-cnn/lib/morphemic/dataset/__init__.py
0 → 100644
View file @
cce47c75
import
os
,
json
,
time
from
influxdb
import
InfluxDBClient
import
pandas
as
pd
from
datetime
import
datetime
url_path_dataset
=
None
class
Row
():
def
__init__
(
self
,
features
,
metricsname
):
self
.
features
=
features
if
"time"
in
self
.
features
:
time_str
=
self
.
features
[
"time"
]
_obj
=
datetime
.
strptime
(
time_str
,
'%Y-%m-%dT%H:%M:%S.%fZ'
)
self
.
features
[
"time"
]
=
int
(
_obj
.
timestamp
())
if
'application'
in
metricsname
:
metricsname
.
remove
(
'application'
)
for
field_name
in
metricsname
:
if
not
field_name
in
self
.
features
:
self
.
features
[
field_name
]
=
None
def
getTime
(
self
):
if
"time"
in
self
.
features
:
return
self
.
features
[
"time"
]
if
"timestamp"
in
self
.
features
:
return
self
.
features
[
"timestamp"
]
return
None
def
makeCsvRow
(
self
):
if
"application"
in
self
.
features
:
del
self
.
features
[
"application"
]
result
=
''
for
key
,
_value
in
self
.
features
.
items
():
result
+=
"{0},"
.
format
(
_value
)
return
result
[:
-
1
]
+
"
\n
"
class
Dataset
():
def
__init__
(
self
):
self
.
rows
=
{}
self
.
size
=
0
def
addRow
(
self
,
row
):
self
.
rows
[
row
.
getTime
()]
=
row
self
.
size
+=
1
def
reset
(
self
):
self
.
rows
=
{}
self
.
size
=
0
print
(
"Dataset reset"
)
def
getSize
(
self
):
return
self
.
size
def
sortRows
(
self
):
return
sorted
(
list
(
self
.
rows
.
values
()),
key
=
lambda
x
:
x
.
getTime
(),
reverse
=
True
)
def
getRows
(
self
):
return
list
(
self
.
rows
.
values
())
def
getRow
(
self
,
_time
,
tolerance
):
for
i
in
range
(
tolerance
):
if
int
(
_time
+
i
)
in
self
.
rows
:
return
self
.
rows
[
int
(
_time
+
i
)]
return
None
def
save
(
self
,
metricnames
,
application_name
):
if
"application"
in
metricnames
:
metricnames
.
remove
(
"application"
)
dataset_content
=
''
for
metric
in
metricnames
:
dataset_content
+=
"{0},"
.
format
(
metric
)
dataset_content
=
dataset_content
[:
-
1
]
+
"
\n
"
for
row
in
list
(
self
.
rows
.
values
()):
dataset_content
+=
row
.
makeCsvRow
()
_file
=
open
(
url_path_dataset
+
"{0}.csv"
.
format
(
application_name
),
'w'
)
_file
.
write
(
dataset_content
)
_file
.
close
()
return
url_path_dataset
+
"{0}.csv"
.
format
(
application_name
)
class
DatasetMaker
():
def
__init__
(
self
,
application
,
start
,
configs
):
self
.
application
=
application
self
.
start_filter
=
start
self
.
influxdb
=
InfluxDBClient
(
host
=
configs
[
'hostname'
],
port
=
configs
[
'port'
],
username
=
configs
[
'username'
],
password
=
configs
[
'password'
],
database
=
configs
[
'dbname'
])
self
.
dataset
=
Dataset
()
self
.
tolerance
=
5
global
url_path_dataset
url_path_dataset
=
configs
[
'path_dataset'
]
if
url_path_dataset
[
-
1
]
!=
"/"
:
url_path_dataset
+=
"/"
def
getIndex
(
self
,
columns
,
name
):
return
columns
.
index
(
name
)
def
makeRow
(
self
,
columns
,
values
):
row
=
{}
index
=
0
for
column
in
columns
:
row
[
column
]
=
values
[
index
]
index
+=
1
return
row
def
prepareResultSet
(
self
,
result_set
):
result
=
[]
columns
=
result_set
[
"series"
][
0
][
"columns"
]
series_values
=
result_set
[
"series"
][
0
][
"values"
]
index
=
0
for
_values
in
series_values
:
row
=
self
.
makeRow
(
columns
,
_values
)
result
.
append
(
row
)
return
result
def
make
(
self
):
try
:
self
.
influxdb
.
ping
()
except
Exception
as
e
:
print
(
"Could not establish connexion with InfluxDB, please verify connexion parameters"
)
print
(
e
)
return
{
"message"
:
"Could not establish connexion with InfluxDB, please verify connexion parameters"
}
if
self
.
getData
()
==
None
:
return
{
"message"
:
"No data found"
}
metricnames
,
_data
=
self
.
getData
()
for
_row
in
_data
:
row
=
Row
(
_row
,
metricnames
)
self
.
dataset
.
addRow
(
row
)
print
(
"Rows construction completed"
)
print
(
"{0} rows found"
.
format
(
self
.
dataset
.
getSize
()))
#self.dataset.sortRows()
url
=
self
.
dataset
.
save
(
metricnames
,
self
.
application
)
features
=
self
.
getFeatures
(
url
)
if
features
==
None
:
return
{
'status'
:
False
,
'message'
:
'An error occured while building dataset'
}
return
{
'status'
:
True
,
'url'
:
url
,
'application'
:
self
.
application
,
'features'
:
features
}
def
getFeatures
(
self
,
url
):
try
:
df
=
pd
.
read_csv
(
url
)
return
df
.
columns
.
to_list
()
except
Exception
as
e
:
print
(
"Cannot extract data feature list"
)
return
None
def
extractMeasurement
(
self
,
_json
):
return
_json
[
"series"
][
0
][
"columns"
]
def
getData
(
self
):
query
=
None
try
:
if
self
.
start_filter
!=
None
and
self
.
start_filter
!=
""
:
query
=
"SELECT * FROM "
+
self
.
application
+
" WHERE time > now() - "
+
self
.
start_filter
else
:
query
=
"SELECT * FROM "
+
self
.
application
result_set
=
self
.
influxdb
.
query
(
query
=
query
)
series
=
self
.
extractMeasurement
(
result_set
.
raw
)
#self.influxdb.close() #closing connexion
return
[
series
,
self
.
prepareResultSet
(
result_set
.
raw
)]
except
Exception
as
e
:
print
(
"Could not collect query data points"
)
print
(
e
)
return
None
forecaster-cnn/lib/morphemic/dataset/__pycache__/__init__.cpython-36.pyc
0 → 100644
View file @
cce47c75
File added
Prev
1
2
Next
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment