Developing AI-Compatible Instructional Design Datasets: A Computer Science Research Project

Term: 
2024-2025 Summer
Faculty Department of Project Supervisor: 
Foundations Development Program
Number of Students: 
5

This research project explores the emerging intersection of artificial intelligence and instructional design through focused data development for an AI-powered training design platform. Sophomore, junior and senior Computer Science students will engage in research that addresses the fundamental data challenges in applying machine learning to educational content creation.
The project will focus on two primary research areas: (1) the creation and curation of high-quality instructional design datasets necessary for AI model training, and (2) the conversion and alignment of collected data into appropriate formats such as JSONL that would be compatible with the AI models to be trained. These research areas address crucial challenges in developing effective AI systems for instructional design by ensuring both data quality and technical compatibility with modern machine learning frameworks.
Students participating in the dataset development track will design and implement systematic approaches for collecting, classifying, and validating instructional materials across diverse domains. This will involve developing web scraping tools with appropriate ethical safeguards, creating annotation systems for labeling effective instructional patterns, and implementing quality assessment metrics to ensure dataset integrity.
In the data conversion and alignment track, students will develop pipelines to transform collected instructional design materials into structured formats optimized for AI training. They will work with JSONL and other machine-readable formats, implementing data cleaning techniques, designing consistent schema for diverse training materials, and creating validation tools to ensure format compliance and data quality.
Throughout this research experience, students will apply their knowledge of programming, data structures, algorithms, and data processing while gaining hands-on experience with data pipeline development and preparation for AI systems. The project provides valuable experience in addressing real-world challenges at the intersection of technology and education, preparing students for careers or advanced study in artificial intelligence, educational technology, and data engineering.
Deliverables will include documented datasets, data processing pipelines, format conversion tools, and a comprehensive analysis of the quality and usability of the prepared data for AI model training in instructional design applications.
Research Areas:

  • Artificial Intelligence
  • Natural Language Processing
  • Educational Technology
  • Human-Computer Interaction
  • Data Science

Required Technical Skills

  • Web Development Basics: HTML/CSS/JavaScript for interface components and testing
  • Version Control: Experience with Git for collaborative development

Desired Abilities

  • Problem-Solving: Ability to decompose complex problems into manageable components
  • Self-Directed Learning: Willingness to learn new technologies and concepts independently
  • Critical Thinking: Evaluating effectiveness of approaches and identifying limitations
  • Communication Skills: Ability to document work and present findings clearly
  • Collaborative Mindset: Comfortable working in teams with different responsibilities
  • Attention to Detail: Particularly important for data annotation and model evaluation
  • Data Analysis: Ability to interpret results and draw meaningful conclusions
  • Research Ethics: Awareness of ethical considerations in research involving AI and data
  • Literature Review: Capacity to survey existing research and identify relevant work

 
Besides the skills above, to apply to this Research Project:
- Applicants must be at least in their second year of study in the Computer Science. 
- Please submit a brief (250-word maximum) statement addressing:

  1. Why you're interested in this research
  2. Relevant coursework or projects you've completed
  3. Which aspects of data collection or processing most interest you 

Send your statements to ilkem.kayican@sabanciuniv.edu with the subject line "PURE Summer 2025"

Related Areas of Project: 
Computer Science and Engineering