Not triggered DSS scenarios monitoring
You might come across situation when you have DSS scenarios which are supposed to be running in some time interval (daily, weekly, etc.) but they have not been running for a while and you do not know about that because there is no out of the box monitoring of this situation in DSS. It becomes more and more difficult to manually follow the scenario’s setup when you have tens of projects and scenarios. In this article I am going to show how to monitor scenario’s schedule setup via DSS Python API and notify the team via slack message about unscheduled scenarios.
DSS scenarios have an option to report on scenario runs. There could be sent an email or slack message in case of run failure or success, it depends on setup. List of all possible reporters can be found in DSS Documentation
There might be different reasons why scenario is not scheduled or better to say scenarios where is not enabled auto trigger. Typical case is forgetting to enable the auto trigger feature after bundle deployment into automation node as this is always disabled.
Basically all what is needed is just a few methods from REST API related to handling the Projects and Scenarios. Which information you need to know in order to decide if the scenario runs according defined schedule or not?
- scenario last run time
- load frequency
When I was exploring DSSScenario methods which are available for interacting with scenarios I have not found a way how to get the trigger setup information, especially the frequency. Maybe it is somewhere else than I have been looking for but I have solved it in different way. I used tags and marked each scenario with tag ‘daily’ or ‘monthly’ based on the running frequency.
Scenario’s tags are then accessible via list called tags which is part of each item retrieved by list_scenarios() method on the project handler.
Last run time of the scenario is retrieved by method get_last_runs(limit=1). Again method is available in DSSScenario class. You will get list with many details about last run. Start time of the that run can be retrieved by get_start_time() which you have to call on returned handle for the DSSScenarioRun
Now we have all needed information to verify if scenario runs according schedule. Let’s try to go through the algorithm in detail.
Lets list all projects:
client = dataiku.api_client()
projects = client.list_projects()
We have many projects in DSS and I do not want to check all of them as some of them are sandboxes of the users so I limit the scanning only for projects which have status In Production and get the list of their scenarios.
for project in projects:
if 'projectStatus' in project and project['projectStatus'] == 'In production':
prj = client.get_project(project['projectKey'])
scenario_list = prj.list_scenarios()
Now let’s go through each scenario and get its last run:
for scenario in scenario_list:
scenario_handler = prj.get_scenario(scenario['id'])
last_run = scenario_handler.get_last_runs(limit=1)
Now we need to calculate the difference between last run time and current date to verify if scenario is running on desired schedule. In my case there are just two possibilities. Either it runs daily or monthly.
date_diff = datetime.today() - last_run.get_start_time()
baseline = 1 if scenario['tags'] == 'daily' else 30
And now if the difference is greater than desired frequency let’s create a slack message which will be sent to the team.
if date_diff.days - baseline >= 1:
message += "Following scenario has not run in required time interval \n"
message += ":red_circle: *" + scenario['name'] + "* in " + project['projectKey'] + " Project \n"
message += "*Run frequency:* " + scenario['tags'] + "\n"
message += "*Last run time:* " + last_run.get_start_time().strftime("%d-%m-%Y %H:%M:%S") + "\n"
message += "Scenario has not been running in last *" + str(date_diff.days) + "* days! Please check. \n"
message += "------------------------------------------------\n"
Finally we pass the message to the scenario variable which we will use in the reporter setup.
scenario_obj = Scenario()
I have in place one more check and message is sent also in case the scenario has not run yet even though it has set the frequency tag. In that case get_last_runs() method does not return anything so we can catch it:
if not last_run:
not_run_msg = 'Following scenario has not run yet in Production. Please check if that is ok. \n'
not_run_msg += ":question: *" + scenario['name'] + "* in " + project['projectKey'] + " Project \n"
not_run_msg += "*Run frequency:* " + scenario['tags'] + "\n"
not_run_msg += "---------------------------------------------\n"
Last thing what we need is scenario which will run whole this code and will have defined a slack reporter to send constructed message to the team.
This scenario runs daily after all my daily flows are done. In case of any issue I will receive similar slack message:
And that’s it. It took me quite a while to find correct methods in DSS REST API but in the end I think it is not such complicated and it is really just a few lines of codes to get this done. Hope you like it! Enjoy ✌🏻