SPECIAL SESSION #23
Measuring AI: Methods, Metrics and Standard to Benchmark (Physical) AI Systems
ORGANIZED BY
Matteo Matteucci
Politecnico di Milano, Italy
Agnes Delaborde
Laboratoire National de Métrologie et d'Essais, France
ABSTRACT
Measuring and evaluating the output of AI and Autonomous Robotics systems is not a straightforward process. This is due to the intrinsic property of agency possessed by the system under test: task execution is subject to the system’s own decisions and to its own (often imprecise and/or partial) perception of its environment. Measuring autonomous systems requires novel metrological approaches.
MAIN TOPICS
Topics of interest for this Special Session include but are not limited to:
- Designing benchmarks for autonomous systems: testbeds, protocols, evaluation metrics
- Task benchmarks and Functionality benchmarks: combining module-level and system-level test provides additional insight
- Real-world experience: the METRICS project
- What does “good performance” mean for a robot? When is Robot A better than Robot B? Why are these issues crucial to industrial adoption of autonomous systems?
- Bringing metrology to new grounds
ABOUT THE ORGANIZERS
Matteo Matteucci, PhD, is Full Professor at the Department of Electronics, Information and Bioengineering of Politecnico di Milano, Italy. His main research topics are pattern recognition, machine learning, machine perception, robotics, computer vision and signal processing. He is deeply in the field of Robot Benchmarking; he participated as benchmarking expert in the FP7 EU funded RoSta Project. He has been the Coordinator of the European project RAWSEEDS (20062009, http://www.rawseeds.org) a Specific Support Action in the FP6 for the development of a benchmarking toolkit for multi-sensor SLAM algorithms. He has been the Principal Investigator for Politecnico di Milano (Partner) of the FP7 project RoCKIn (20132015, http://www.rockinrobotchallenge.eu/) for the design and execution of two international competitions for the benchmarking of autonomous robots in the home environment (RoCKIn@Home) and at work (RoCKIn@Work).
Agnes Delaborde, PhD, is Head of Department of “AI evaluation and cybersecurity” at Laboratoire National de Métrologie et d'Essais. Agnes Delaborde has been a research engineer at the Laboratoire National de Métrologie et d'Essais since 2017 performing research on the evaluation of robotics, she focuses on metrics and experimental plans for the evaluation of Human-Robot Interaction. She has previously worked for 7 years at LIMSI-CNRS on Human-Robot Interaction modeling and specialized in experimental plans for data collection in robotic environments. She is a member of the UNM-81 standardization committee in charge of industrial robotics.