Commit cce47c75 authored by Jean-Didier's avatar Jean-Didier
Browse files

change datasetlib name

parent 3aa09bf6
Pipeline #16678 passed with stage
in 1 minute and 54 seconds
......@@ -5,7 +5,7 @@ RUN mkdir /app
RUN mkdir -p /app/log
ADD . /app
RUN pip install /app/datasetlib
RUN pip install /app/lib
RUN pip install -r /app/amq_client/requirements.txt
WORKDIR /app
......
datasetlib @ c9c6d3c9
Subproject commit c9c6d3c954b57f9dd3b5109514bd033da00c95db
Metadata-Version: 1.0
Name: Dataset-Maker
Version: 0.0.1
Summary: Python package for creating a dataset using InfluxDB data points
Home-page: http://git.dac.ds.unipi.gr/morphemic/datasetmaker
Author: Jean-Didier Totow
Author-email: totow@unipi.gr
License: LICENSE.txt
Description: 1. Generality
Dataset maker is morphemic python library for
building dataset from data points registered into InfluxDB.
Dataset maker receives the name of an application, the start time
and the tolerance interval. More details are provided below.
2. InfluxDB format
Data points in InfluxDB should have the following format for being used
correctly by the dataset maker:
measurement : "application_name" #mandatory
timestamp : timestamp #optional
fields : dictionnary containing metric exposed by the given application
cpu_usage, memory_consumption, response_time, http_latency
tags : dictionnary of metrics related information
The JSON describing the above information is the following:
Ex.:
{"measurement": "application_name",
"timestamp": 155655476.453,
"fields": {
"cpu_usage": 40,
"memory_consumption": 67.9,
"response_time": 28,
"http_latency": 12
},
"tags": {
"core": 2 #cpu_usage of 40% is the usage of the cpu core number 2
}
}
If data points are presented as the above format, the dataset maker will output
a csv (application_name.csv) file with the following schema:
time, cpu_usage, memory_consumption, response_time, http_latency, core
3. Usage
Warming : make sure the above variables exist before importing dataset make library
from morphemic.dataset import DatasetMaker
data_maker = DatasetMaker(application, start, configs)
response = data_maker.make()
application, string containing the application name
start, when to start building the dataset
Ex.: '10m' , build dataset containg data point stored the 10 last minute
Ex.: '3h', three hours
Ex.: '4d', four days
leave empty or set to None if you wish all data points stored in your InfluxDB
configs is dictionnary containg parameters
{
"hostname": hostname or IP of InfluxDB
"port": port of InfluxDB
"username": InfluxDB username
"password": password of the above user
"dbname": database name
"path_dataset": path where the dataset will be saved
}
the response contains
{'status': True,'url': url, 'application': application_name, 'features': features}
or if an error occured
{'status': False,'message': "reason of the error"}
Platform: UNKNOWN
README.txt
setup.py
Dataset_Maker.egg-info/PKG-INFO
Dataset_Maker.egg-info/SOURCES.txt
Dataset_Maker.egg-info/dependency_links.txt
Dataset_Maker.egg-info/requires.txt
Dataset_Maker.egg-info/top_level.txt
morphemic/__init__.py
morphemic/dataset/__init__.py
\ No newline at end of file
Copyright (c) 2021 unipi.gr
MIT License
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
\ No newline at end of file
1. Generality
Dataset maker is morphemic python library for
building dataset from data points registered into InfluxDB.
Dataset maker receives the name of an application, the start time
and the tolerance interval. More details are provided below.
2. InfluxDB format
Data points in InfluxDB should have the following format for being used
correctly by the dataset maker:
measurement : "application_name" #mandatory
timestamp : timestamp #optional
fields : dictionnary containing metric exposed by the given application
cpu_usage, memory_consumption, response_time, http_latency
tags : dictionnary of metrics related information
The JSON describing the above information is the following:
Ex.:
{"measurement": "application_name",
"timestamp": 155655476.453,
"fields": {
"cpu_usage": 40,
"memory_consumption": 67.9,
"response_time": 28,
"http_latency": 12
},
"tags": {
"core": 2 #cpu_usage of 40% is the usage of the cpu core number 2
}
}
If data points are presented as the above format, the dataset maker will output
a csv (application_name.csv) file with the following schema:
time, cpu_usage, memory_consumption, response_time, http_latency, core
3. Usage
Warming : make sure the above variables exist before importing dataset make library
from morphemic.dataset import DatasetMaker
data_maker = DatasetMaker(application, start, configs)
response = data_maker.make()
application, string containing the application name
start, when to start building the dataset
Ex.: '10m' , build dataset containg data point stored the 10 last minute
Ex.: '3h', three hours
Ex.: '4d', four days
leave empty or set to None if you wish all data points stored in your InfluxDB
configs is dictionnary containg parameters
{
"hostname": hostname or IP of InfluxDB
"port": port of InfluxDB
"username": InfluxDB username
"password": password of the above user
"dbname": database name
"path_dataset": path where the dataset will be saved
}
the response contains
{'status': True,'url': url, 'application': application_name, 'features': features}
or if an error occured
{'status': False,'message': "reason of the error"}
\ No newline at end of file
Metadata-Version: 1.0
Name: datasetmaker
Version: 0.0.1
Summary: Python package for creating a dataset using InfluxDB data points
Home-page: http://git.dac.ds.unipi.gr/morphemic/datasetmaker
Author: Jean-Didier Totow
Author-email: totow@unipi.gr
License: LICENSE.txt
Description: 1. Generality
Dataset maker is morphemic python library for
building dataset from data points registered into InfluxDB.
Dataset maker receives the name of an application, the start time
and the tolerance interval. More details are provided below.
2. InfluxDB format
Data points in InfluxDB should have the following format for being used
correctly by the dataset maker:
measurement : "application_name" #mandatory
timestamp : timestamp #optional
fields : dictionnary containing metric exposed by the given application
cpu_usage, memory_consumption, response_time, http_latency
tags : dictionnary of metrics related information
The JSON describing the above information is the following:
Ex.:
{"measurement": "application_name",
"timestamp": 155655476.453,
"fields": {
"cpu_usage": 40,
"memory_consumption": 67.9,
"response_time": 28,
"http_latency": 12
},
"tags": {
"core": 2 #cpu_usage of 40% is the usage of the cpu core number 2
}
}
If data points are presented as the above format, the dataset maker will output
a csv (application_name.csv) file with the following schema:
time, cpu_usage, memory_consumption, response_time, http_latency, core
3. Usage
Warming : make sure the above variables exist before importing dataset make library
from morphemic.dataset import DatasetMaker
data_maker = DatasetMaker(application, start, configs)
response = data_maker.make()
application, string containing the application name
start, when to start building the dataset
Ex.: '10m' , build dataset containg data point stored the 10 last minute
Ex.: '3h', three hours
Ex.: '4d', four days
leave empty or set to None if you wish all data points stored in your InfluxDB
configs is dictionnary containg parameters
{
"hostname": hostname or IP of InfluxDB
"port": port of InfluxDB
"username": InfluxDB username
"password": password of the above user
"dbname": database name
"path_dataset": path where the dataset will be saved
}
the response contains
{'status': True,'url': url, 'application': application_name, 'features': features}
or if an error occured
{'status': False,'message': "reason of the error"}
Platform: UNKNOWN
README.txt
setup.py
datasetmaker.egg-info/PKG-INFO
datasetmaker.egg-info/SOURCES.txt
datasetmaker.egg-info/dependency_links.txt
datasetmaker.egg-info/requires.txt
datasetmaker.egg-info/top_level.txt
morphemic/__init__.py
morphemic/dataset/__init__.py
\ No newline at end of file
import os, json, time
from influxdb import InfluxDBClient
import pandas as pd
from datetime import datetime
url_path_dataset = None
class Row():
def __init__(self, features,metricsname):
self.features = features
if "time" in self.features:
time_str = self.features["time"]
_obj = datetime.strptime(time_str,'%Y-%m-%dT%H:%M:%S.%fZ')
self.features["time"] = int(_obj.timestamp())
if 'application' in metricsname:
metricsname.remove('application')
for field_name in metricsname:
if not field_name in self.features:
self.features[field_name] = None
def getTime(self):
if "time" in self.features:
return self.features["time"]
if "timestamp" in self.features:
return self.features["timestamp"]
return None
def makeCsvRow(self):
if "application" in self.features:
del self.features["application"]
result = ''
for key, _value in self.features.items():
result += "{0},".format(_value)
return result[:-1] + "\n"
class Dataset():
def __init__(self):
self.rows = {}
self.size = 0
def addRow(self,row):
self.rows[row.getTime()] = row
self.size +=1
def reset(self):
self.rows = {}
self.size = 0
print("Dataset reset")
def getSize(self):
return self.size
def sortRows(self):
return sorted(list(self.rows.values()), key=lambda x: x.getTime(), reverse=True)
def getRows(self):
return list(self.rows.values())
def getRow(self,_time, tolerance):
for i in range(tolerance):
if int(_time + i) in self.rows:
return self.rows[int(_time+i)]
return None
def save(self,metricnames,application_name):
if "application" in metricnames:
metricnames.remove("application")
dataset_content = ''
for metric in metricnames:
dataset_content += "{0},".format(metric)
dataset_content = dataset_content[:-1] + "\n"
for row in list(self.rows.values()):
dataset_content += row.makeCsvRow()
_file = open(url_path_dataset + "{0}.csv".format(application_name),'w')
_file.write(dataset_content)
_file.close()
return url_path_dataset + "{0}.csv".format(application_name)
class DatasetMaker():
def __init__(self, application, start, configs):
self.application = application
self.start_filter = start
self.influxdb = InfluxDBClient(host=configs['hostname'], port=configs['port'], username=configs['username'], password=configs['password'], database=configs['dbname'])
self.dataset = Dataset()
self.tolerance = 5
global url_path_dataset
url_path_dataset = configs['path_dataset']
if url_path_dataset[-1] != "/":
url_path_dataset += "/"
def getIndex(self, columns, name):
return columns.index(name)
def makeRow(self,columns, values):
row = {}
index = 0
for column in columns:
row[column] = values[index]
index +=1
return row
def prepareResultSet(self, result_set):
result = []
columns = result_set["series"][0]["columns"]
series_values = result_set["series"][0]["values"]
index = 0
for _values in series_values:
row = self.makeRow(columns,_values)
result.append(row)
return result
def make(self):
try:
self.influxdb.ping()
except Exception as e:
print("Could not establish connexion with InfluxDB, please verify connexion parameters")
print(e)
return {"message": "Could not establish connexion with InfluxDB, please verify connexion parameters"}
if self.getData() == None:
return {"message":"No data found"}
metricnames, _data = self.getData()
for _row in _data:
row = Row(_row,metricnames)
self.dataset.addRow(row)
print("Rows construction completed")
print("{0} rows found".format(self.dataset.getSize()))
#self.dataset.sortRows()
url = self.dataset.save(metricnames,self.application)
features = self.getFeatures(url)
if features == None:
return {'status': False, 'message': 'An error occured while building dataset'}
return {'status': True,'url': url, 'application': self.application, 'features': features}
def getFeatures(self, url):
try:
df = pd.read_csv(url)
return df.columns.to_list()
except Exception as e:
print("Cannot extract data feature list")
return None
def extractMeasurement(self, _json):
return _json["series"][0]["columns"]
def getData(self):
query = None
try:
if self.start_filter != None and self.start_filter != "":
query = "SELECT * FROM " + self.application +" WHERE time > now() - "+ self.start_filter
else:
query = "SELECT * FROM " + self.application
result_set = self.influxdb.query(query=query)
series = self.extractMeasurement(result_set.raw)
#self.influxdb.close() #closing connexion
return [series, self.prepareResultSet(result_set.raw)]
except Exception as e:
print("Could not collect query data points")
print(e)
return None
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment