Framework for Easy Deployment of Compressed and Optimized Models (FEDCOM)
A Collaborative Initiative Between Odia Generative AI and the Norwegian BioAI Lab for Deep Learning Model Compression and Deployment
Problem Statement
Compressing deep learning models to make them suitable for resource-constrained environments is a challenging task. The process typically involves applying advanced techniques like quantization, pruning, and distillation, which demand substantial technical expertise and extensive experimentation. These complexities often hinder users from deploying efficient models on devices with limited computational power or memory, thereby creating a gap between the potential of cutting-edge AI and its practical usability in real-world applications.
Solution
Our project offers a structured framework to simplify the compression of deep learning models, addressing the challenges of deploying them in resource-constrained environments. The workflow allows users to load pre-trained models, select and configure compression techniques such as quantization, pruning, and distillation, and perform optional fine-tuning or calibration to recover accuracy and evaluate performance metrics. Furthermore, the framework supports exporting deployment-ready models and provides visual insights into trade-offs between model size, inference speed, and accuracy. This approach streamlines the compression process, making it accessible and efficient for users with diverse levels of expertise.
Scope
The initial phase of the framework development will focus on establishing a robust foundation with the following features:
Model Support
- Support multimodal deep learning models
Compression Techniques
- Quantization: Reduce precision to optimize memory and computational efficiency.
- Pruning: Remove redundant weights and structures to minimize model size.
- Knowledge Distillation: Implement teacher-student training workflows to create lightweight models without significant loss in performance.
Workflow Capabilities
- Load pre-trained models and allow users to configure compression parameters (e.g., sparsity level, quantization bits).
- Include optional fine-tuning or calibration steps to recover accuracy post-compression.
- Evaluate compressed models with metrics like inference speed, memory usage, and task accuracy.
Minimal Visualization and Reporting
- Provide basic visualizations to highlight the trade-offs between model size, speed, and accuracy.
- Offer concise reports summarizing the impact of applied compression techniques.
Deployment Readiness
Export compressed models in deployment-ready formats compatible with resource-constrained environments.
Ease of Use
Ensure a simple and intuitive interface for configuring and running compression workflows, making it accessible to users with varying expertise levels.
High-Level Architecture
Modules to be developed
Team
Dr. Sonal Khosla
Researcher, OdiaGenAI
(Project Coordinator)
AR Kamaldeen
AI Engineer, OdiaGenAI
Debasish Dhal
AI Engineer, OdiaGenAI
Sambit Sekhar
Founder, OdiaGenAI
Prof. Dilip Prasad
Professor, UiT The Arctic University of Norway
SK Sahid
AI Engineer, OdiaGenAI
Sahil Khan
Intern, OdiaGenAI
Pritiprava Mishra
Researcher, OdiaGenAI
Contact
Feel free to reach out to us with any questions about the project or collaboration opportunities.