Datasets


Download

The dataset is composed of hourly average sensor measurements of eleven variables (nine input and two target variables). There are a total of 36 733 instances collected over five years. The nine input measurements (independent variables) can be grouped into two as ambient variables (e.g. temperature, humidity, pressure) and process parameters (e.g. turbine energy yield, air filter difference pressure).

Predicting CO and NOx emissions from gas turbines: novel data and a benchmark PEMS. Kaya, H.; Tufekci, P.; and Uzun, E. Turkish Journal of Electrical Engineering & Computer Sciences, 27(6): 4783-4796. 11 2019.

Download

The dataset has 51 web pages for 50 different websites that contain different languages including Turkish, Bosnian, English, Spanish, German, Albanian and fields such as article, dictionary, health, movie, newspaper. Moreover, 4-5 CSS selectors were prepared for each website. Click for collector source code (in Python)

A regular expression generator based on CSS selectors for efficient extraction from HTML pages. Uzun, E. Turkish Journal of Electrical Engineering & Computer Sciences, 28: 3389-3401. 2020. - DOI: 10.3906/elk-2004-67

Download 1: Features

Download 2: Features + Textual data

This dataset is obtained from 200 Websites. These Websites contains 58 different countries including Albania, Australia, Austria, Azerbaijan, Bahrain, Bangladesh, Belarus, Bolivia, Bosnia and Herzegovina, Botswana, Brazil, Bulgaria, Cameroon, Canada, China, Cuba, Czechia, Egypt, Finland, France, Germany, Greece, Guatemala, India, Indonesia, Iran, Italy, Japan, Jordan, Kazakhstan, Kyrgyzstan, Laos, Latvia, Liberia, Macedonia, Madagascar, Malaysia, Mexico, Montenegro, Nepal, New Zealand, Pakistan, Philippines, Romania, Russia, Slovakia, Spain, Tanzania, Turkey, Uganda, Ukraine, Ukraine, USA, Uzbekistan, Venezuela, Vietnam, Zambia, and Zimbabwe. 100 web pages are downloaded for each website. So 20,000 web pages are collected. A dataset consisting of 635,015 images is created from these web pages. 22,682 of these images are relevant images. In this dataset, 30 features are created for an image.

Click for this dataset creator codes

A Novel Web Scraping Approach Using the Additional Information Obtained from Web Pages. Uzun, E. IEEE Access, 8: 61726-61740. 2020. - DOI: 10.1109/ACCESS.2020.2984503



For click the bibtex of papers