Datasets


Download

The dataset is composed of hourly average sensor measurements of eleven variables (nine input and two target variables). There are a total of 36 733 instances collected over five years. The nine input measurements (independent variables) can be grouped into two as ambient variables (e.g. temperature, humidity, pressure) and process parameters (e.g. turbine energy yield, air filter difference pressure).

Predicting CO and NOx emissions from gas turbines: novel data and a benchmark PEMS. Kaya, H.; Tufekci, P.; and Uzun, E. Turkish Journal of Electrical Engineering & Computer Sciences, 27(6): 4783-4796. 11 2019.

Download

The dataset has 51 web pages for 50 different websites that contain different languages including Turkish, Bosnian, English, Spanish, German, Albanian and fields such as article, dictionary, health, movie, newspaper. Moreover, 4-5 CSS selectors were prepared for each website. Click for collector source code (in Python)

A regular expression generator based on CSS selectors for efficient extraction from HTML pages. Uzun, E. Turkish Journal of Electrical Engineering & Computer Sciences, 28: 3389-3401. 2020. - DOI: 10.3906/elk-2004-67

Download 1: Features

Download 2: Features + Textual data

This dataset is obtained from 200 Websites. These Websites contains 58 different countries including Albania, Australia, Austria, Azerbaijan, Bahrain, Bangladesh, Belarus, Bolivia, Bosnia and Herzegovina, Botswana, Brazil, Bulgaria, Cameroon, Canada, China, Cuba, Czechia, Egypt, Finland, France, Germany, Greece, Guatemala, India, Indonesia, Iran, Italy, Japan, Jordan, Kazakhstan, Kyrgyzstan, Laos, Latvia, Liberia, Macedonia, Madagascar, Malaysia, Mexico, Montenegro, Nepal, New Zealand, Pakistan, Philippines, Romania, Russia, Slovakia, Spain, Tanzania, Turkey, Uganda, Ukraine, Ukraine, USA, Uzbekistan, Venezuela, Vietnam, Zambia, and Zimbabwe. 100 web pages are downloaded for each website. So 20,000 web pages are collected. A dataset consisting of 635,015 images is created from these web pages. 22,682 of these images are relevant images. In this dataset, 30 features are created for an image.

Click for this dataset creator codes

A Novel Web Scraping Approach Using the Additional Information Obtained from Web Pages. Uzun, E. IEEE Access, 8: 61726-61740. 2020. - DOI: 10.1109/ACCESS.2020.2984503

Dataset 1 : Click for downloading the datasets and statistical results (between 2014-2020: EPL, Serie A, Bundesliga).

This dataset contains features based on tactical formation with market values of players in matches. Data were collected from the matches of three major leagues, including the Serie A, Premier League, and Bundesliga, between 2014-2020. The seasons between 2014-2019 is prepared as the training dataset and the season 2019-2020 is the testing dataset. Moreover, we prepared the sub-testing dataset by eliminating some weeks.

Dataset 2: Click for downloading the datasets after pre-processing process. (between 2011-2020: EPL, Serie A).

This dataset contains features based on tactical formation with market values of players in matches. Data were collected from the matches of two major leagues, including the Serie A, and Premier League, between 2011-2020. The seasons between 2011-2019 is prepared as the training dataset and the season 2019-2020 is the testing dataset.



For click the bibtex of papers