Skip to content

LucianPopaLP/ECUScan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 

Repository files navigation

Constraint-Guided Clustering for Identifying in-Vehicle Electronic Control Units from Voltage Data

The repository contains the python code required to reproduce the results from our work together with a link and details related to the ECUPrint Aligned dataset and scripts that were used to align and trim the voltage data from original ECUPrint dataset and create the ECUPrint Aligned dataset.

Dataset

The dataset linked in this repository represents the aligned and filtered input used by the Promoter-Censor algorithm proposed in our paper: Constraint-Guided Clustering for Identifying In-Vehicle Electronic Control Units from Voltage Data.

image

Motivation: The original ECUPrint dataset was created by our group three years ago for statistical analysis of ECU voltage samples, where slight misalignments or incomplete entries had only a limited impact. However, for machine-learning classifier benchmarking, we later observed that these inconsistencies led to distorted performance estimates. To address this issue, we sanitized the dataset by aligning and filtering the samples to ensure that the reported results reflect classifier performance rather than data irregularities.

Briefly, the modifications compared to the ECUPrint dataset are the following:

  • All bits from the 10 vehicles are aligned (the Python scripts used for alignment are also available as align_data_JD.py for John Deere data and align_data_car.py for passenger cars data as part of this repo, in the misc folder)
  • Samples are cut to exactly 1600 time-steps for passenger cars and 2500 time-steps for the heavy-duty vehicle
  • Acknowledgement bits are removed because they do not come from the ECU that is the sender of the ID
  • Incomplete bits lacking the falling edge were discarded to ensure dataset consistency

As a consequence the following IDs were removed the ECUPrint Aligned Dataset: 0x370 (Corsa), 0x511 (Duster), 0x4DE (Logan), 0x3A9, 0x43C, 0x171 (Ecosport), 0x428 (ix35) and 1 bit was removed for IDs 0x294, 0x19B (Civic). The sanitized dataset retains 175,378 samples from the original 181,874 samples of the ECUPrint dataset.

Result: The Ground Truth resulting from the new methodology is slightly different from the original ECUPrint paper and is available in this pdf.

Independent corroboration: We also verified the number of ECUs in the Ford vehicles with a diagnostic tool (FORScan v2.3.65) together with the electrical wiring diagrams and it matches the number of ECUs that we identified using Constraint-Guided Clustering. Documents used for determination of electrical wiring diagrams are:

Data links

File Download Notes
ECUPrint_Aligned.zip link1 (University website)
link2 (OneDrive)
Aligned ECUPrint CAN voltage samples, allocated per Vehicle, ECU and ID

More details related to the bit aligning concept, applied filters and insights related to the dataset structure and file contents are described below.

Data pre-processing

ECUPrint raw voltage data was collected from 10 vehicles, ranging from small cars to SUVs and a heavy-duty vehicle with a Pico Scope 5000 Series.

Sample Alignment and Trimming - For each frame carrying a specific ID the ECUPrint dataset contains isolated dominant bits, i.e., a transition from recessive to dominant state and back. In the original files from the ECUPrint dataset, the rising edges and falling edges from each dominant bit are not aligned at the same index, as shown in the images below for the samples corresponding to ID 4F1 from the Hyundai i20 (left image) and to ID 04EF0021 from the John Deere tractor (right image).

image image

We aligned the bits for each ID at the same index, which led to a different number of time-steps per file out of which we preserve only 1600 time-steps for passenger cars and 2500 time-steps for the heavy-duty vehicle. An examples of the newly aligned bits is shown in the images below. They correspond to ID 4F1 from the Hyundai i20 (left image) and to ID 04EF0021 from the John Deere tractor (right image).

image image

Removal of ACK Bits - Analyzing the bits from the ECUPrint dataset, we have found some acknowledgement bits instead of genuine dominant bits that were removed from the alignment process and are not part of the ECUPrint Aligned Dataset. An example is shown for one of the Honda Civic files that had different samples for ID 19B compared to all other files that had the right samples.

image

Removal of Non-Isolated Bits - For some IDs, the ECUPrint dataset does not contain single isolated dominant bits. The voltage samples for those bits had a continuous plateau level while for isolated bits, the file ends with the samples of the falling edge. These IDs were removed from the alignment process and are not part of the ECUPrint Aligned Dataset. An example is shown for ID 511 from the Dacia Duster that had different samples compared to all other IDs from the same ECU.

image

Establishment of a New ECU Allocation - Based on the newer analysis of the voltage samples for all of the IDs, we found that there is a different number of ECUs for some vehicles compared to the determination from ECUPrint. There are two additional ECUs determined for the Ford Kuga and 1 additional ECU determined for the Ford Fiesta and Ford Ecosport while there is 1 ECU less for the Hyundai i20. This is also due to the use of some voltage bits that were left as "Unclassified" in the original ECUPrint dataset and were not grouped with a particular ECU since the clock skew could not be determined for those IDs based on the collected frames. The updated ECU allocation from the ECUPrint Aligned dataset provides the newly determined ground truth allocation of IDs to ECUs to the best of our knowledge.

Number Vehicle Model year No. of IDs No. of identified ECUs Voltage bits
(i) Honda Civic 2012-2017 43 6 14,567
(ii) Opel Corsa 2006-2014 28 4 9,131
(iii) Hyundai i20 2014-2020 40 6 17,767
(iv) John Deere Tractor 2010-2018 39 3 4,021
(v) Dacia Duster 2010-2017 11 3 8,942
(vi) Dacia Logan 2012-2019 45 6 31,297
(vii) Hyundai ix35 2009-2015 26 6 19,856
(viii) Ford Fiesta 2017-2020 47 7 21,729
(ix) Ford Kuga 2013-2019 70 11 28,024
(x) Ford Ecosport 2018-2021 85 5 20,044
Total - - 432 57 175,378

Dataset Content

The dataset is structured as described below. We provide the raw CAN voltage samples measured with the PicoScope with a sample interval of 2 nanoseconds (sample rate was set to 500 MS/s). CAN voltages are collected for 10 cars (175,378 sampled bits) with ECU allocation. Data is allocated to specific ECUs based on the analysis in our work. Note that this distribution is to the best we could ascertain based on our analysis, we do not claim this separation to be absolute. 


Folder structure

CAN voltage samples with ECU allocation
|
|------ DUSTER
|            |----ECU1
|            |----ECU2
|            |----ECU3
|------ LOGAN
|            |----ECU1
|            |----ECU2
|            |----ECU3
|            |----ECU4
|            |----ECU5
|            |----ECU6
|------ ECOSPORT
|            |----ECU1
|            |----ECU2
|            |----ECU3
|            |----ECU4
|            |----ECU5
|------ FIESTA
|            |----ECU1
|            |----ECU2
|            |----ECU3
|            |----ECU4
|            |----ECU5
|            |----ECU6
|            |----ECU7
|------ KUGA
|            |----ECU1
|            |----ECU2
|            |----ECU3
|            |----ECU4
|            |----ECU5
|            |----ECU6
|            |----ECU7
|            |----ECU8
|            |----ECU9
|            |----ECU10
|            |----ECU11
|------ CIVIC
|            |----ECU1
|            |----ECU2
|            |----ECU3
|            |----ECU4
|            |----ECU5
|            |----ECU6
|------ I20
|            |----ECU1
|            |----ECU2
|            |----ECU3
|            |----ECU4
|            |----ECU5
|            |----ECU6
|------ IX35
|            |----ECU1
|            |----ECU2
|            |----ECU3
|            |----ECU4
|            |----ECU5
|            |----ECU6
|------ JOHNDEERE
|            |----ECU1
|            |----ECU2
|            |----ECU3
|------ CORSA
|            |----ECU1
|            |----ECU2
|            |----ECU3
|            |----ECU4

Voltage data is stored in csv format and has some metadata included before the raw voltage samples. The metadata contains the following information in the first rows from each file: 



ID (hexadecimal)],
[ID (decimal)],
[DLC (decimal)],
[Timestamp, Channel 1 (CANH), Channel 2 (CANL)],
[Measurement unit],

The metadata is followed by the actual raw voltage samples:

[Voltage data (1600 samples/file for cars and 2500 samples/file for the John Deere tractor)].

Prerequisites

Steps to run all the provided code are detailed below. Then, all section from the notebook need to be run in order to get the results that are presented in our work.

To run the code, ensure you have the following installed:

Code

Once all prerequisites are installed, the following variables from the python code need to be customized:

  • data_path <- set to your location following the example with the provided path
  • scaling <- set to 0 (without) or 1 (with) for choosing without/with rescaling of voltage data that supports clustering improvements
  • nrows <- set to 1600 (passenger cars) or 2400 (heavy industry vehicle) for chossing between number of data samples/file for passenger cars or the heavy industry vehicle

Publication

Feel free to use our dataset for research purposes by giving credit to our paper below.

B. Groza, P. Iosif and L. Popa, "Constraint-Guided Clustering for Identifying in-Vehicle Electronic Control Units from Voltage Data". pdf

@article{groza26constraint,
title={Constraint-Guided Clustering for Identifying in-Vehicle Electronic Control Units from Voltage Data},
author={Groza, Bogdan and Iosif, Patricia and Popa, Lucian},
conference={},
year={2026},
publisher={}
}

Contacts

  • lucian.popa [at] aut.upt.ro
  • bogdan.groza [at] upt.ro

About

Repository with details for the ECUScan python code and dataset (aligned voltage data)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors