《印刷与数字媒体技术研究》

2026, 02, No.241 270-280

视觉大模型增强的空间特征融合的表情识别方法

宋晨

1.南京林业大学信息科学技术学院、人工智能学院

基金项目(Foundation):

邮箱(Email):

DOI: 10.19370/j.cnki.cn10-1886/ts.2026.02.029

发布时间： 2026-04-10

出版时间： 2026-04-10

移动端阅读

0	0	4
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

在人脸表情识别中，特征提取方法容易受到噪声影响，存在特征提取能力不足和准确度不高的问题。为此，本研究提出了一种视觉大模型增强的空间特征融合的表情识别模型（VCL-EmoNet）。首先，在模型构建阶段，通过迁移学习的方式引入预训练的ViT(Vision Transformer)模型，使其在前期能够获取丰富的通用特征以及全局依赖关系特征。然后，采用不同大小的卷积核以增大感受野，从而提取图像的多尺度特征，并结合Sobel算子和Laplacian算子获取图像的边缘和强纹理特征。接着，引入空间注意力机制和通道注意力机制，使模型能够聚焦于图像的关键区域，有效捕捉关键信息，合理调整模型权重分布。最后，引入视觉大模型预测损失作为约束条件的联合损失函数来辅助优化图像分类任务。本研究模型在AffectNet、FER2013和FERPlus数据集的实验中，分别取得72.5%、74.8%和91.1%的识别准确率。相较于传统算法，该模型在特征提取方面的能力显著提高，面部表情识别结果更加准确。

关键词： 人脸表情识别; 视觉大模型; 多尺度残差卷积; 联合损失函数; Transformer;

Abstract：

The feature extraction methods in facial expression recognition were easily affected by noise, resulting in insufficient feature extraction ability and low accuracy. To address this, a visual large-scale model enhanced spatial feature fusion expression recognition model(VCL-EmoNet) was proposed in this study. Firstly, in the model construction stage, a pre-trained ViT(Vision Transformer) model was introduced through transfer learning to enable it to obtain rich general features and global dependency relationship features in the early stage. Then, different sizes of convolution kernels were used to increase the receptive field, thereby extracting multi-scale features of the image, and combining the Sobel operator and Laplacian operator to obtain edge and strong texture features of the image. Next, the spatial attention mechanism and channel attention mechanism were introduced to enable the model to focus on key areas of the image, effectively capture key information, and adjust the weight distribution of the model reasonably. Finally, a joint loss function with visual large model prediction loss as a constraint condition was introduced to assist in optimizing image classification tasks. The model proposed in this study achieved recognition accuracy of 72.5%, 74.8%, and 91.1% in the AffectNet, FER2013, and FERPlus datasets, respectively. Compared to traditional algorithms, it improves the feature extraction ability of facial recognition models, making facial expression recognition results more accurate.

KeyWords： Facial expression recognition; Visual large model; Multi-scale residual convolution; Joint loss function; Transformer;

如需获取全文，请访问cnki.net

参考文献

[1]EKMAN P,FRIESEN W V.Constants Across Cultures in the Face And Emotion[J].Journal of Personality and Social Psychology,1971,17(2):124-129.

[2]熊晖,齐元胜,来永爱,等.基于OpenCV+Dilb图像识别处理的印刷车间疲劳预测系统设计[J].印刷与数字媒体技术研究,2024,(6):67-73.XIONG Hui,QI Yuan-sheng,LAI Yong-ai,et al.Design of Fatigue Prediction System for Print Shop Based on OpenCV+Dilb Image Recognition Processing[J].Printing and Digital Media Technology Study,2024,(6):67-73.

[3]WANG Z F,ZHANG K H,LUO W H.HTNet for Micro-expression Recognition[J].Neurocomputing,2024,602:128196.

[4]ZHANG Z Y,MU X M,GAO L.Recognizing Facial Expressions Based on Gabor Filter Selection[C]//Proceedings of the 4th International Congress on Image and Signal Processing.IEEE,2011:1544-1548.

[5]ZHU Y N,LI X X,WU G H.Face Expression Recognition Based on Equable Principal Component Analysis and Linear Regression Classification[C]//Proceedings of the 3rd International Conference on Systems and Informatics.IEEE,2016:876-880.

[6]MAZEN F M A,NASHAT A A,SEOUD R A A A A.Real Time Face Expression Recognition along with Balanced FER2013 Dataset using CycleGAN[J].International Journal of Advanced Computer Science and Applications,2021,12(6):153-164.

[7]胡蝶,唐敏,朱永梁,等.基于注意力机制的人脸情绪识别[J].智能计算机与应用,2024,14(8):65-69.HU Dei,TANG Min,ZHU Yong-liang,et al.Facial Emotion Recognition Based on Attention Mechanism[J].Intelligent Computers and Applications,2024,14(8):65-69.

[8]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is All You Need[C]//Proceedings of the31~(st)International Conference on Neural Information Processing Systems.2017,5:6000-6010.

[9]DEEPAK K J,POURYA S,PARAMJIT S.Extended Deep Neural Network for Facial Emotion Recognition[J].Pattern Recognition Letters,2019,120:69-74.

[10]王坤侠,余万成,胡玉霞.嵌入混合注意力机制的Swin Transformer人脸表情识别[J].西北大学学报(自然科学版),2024,54(2):168-176.WANG Kun-xia,YU Wan-cheng,HU Yu-xia.Facial Expression Recognition in Swin Transformer by Embedding Hybrid Attention Mechanism[J].Journal of Northwest University(Natural Science Edition),2024,54(2):168-176.

[11]左义海,白武尚,何秋生.NCA-MobileNet:一种轻量化人脸表情识别方法[J].液晶与显示,2024,39(4):522-531.ZUO Yi-hai,BAI Wu-shang,HE Qiu-sheng.NCAMobileNet:A Lightweight Facial Expression Recognition Method[J].Chinese Journal of Liquid Crystals and Displays,2024,39(4):522-531.

[12]OZIOMA C O,KANYIFEECHUKWU J O,HASHIM I B,et al.Hybrid Facial Expression Recognition(FER2013)Model for Real-Time Emotion Classification and Prediction[DB/OL].(2022-08-01).https://doi.org/10.4855 0/arXiv.2206.09509.

[13]ALEXEY D,LUCAS B,ALEXANDER K,et al.An Image is Worth 16x16 Words:Transformers for Image Recognition at Scale[DB/OL].(2021-06-03).https://doi.org/10.485 5 0/arXiv.2010.11929.

[14]NING C,VEN J K,CHEE S C.Enhancing Facial Expression Recognition Under Data Uncertainty Based on Embedding Proximity[J].IEEE Access,2024,12:85324-85337.

[15]WANG K,PENG X,YANG J,et al.Suppressing Uncertainties for Large-Scale Facial Expression Recognition[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:6896-6905.

[16]SHE J,HU Y,SHI H,et al.Dive into Ambiguity:Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:6244-6253.

[17]ZHANG Y,WANG C,LING X,et al.Learn from all:Erasing Attention Consistency for Noisy Label Facial Expression Recognition[C]//Proceedings of the European Conference on Computer Vision.Springer,2022:418-434.

[18]BOUDOURI Y,BOHI A.EmoNeXt:An Adapted ConvNeXt for Facial Emotion Recognition[C]//Proceedings of the 2023 IEEE 25~(th)International Workshop on Multimedia Signal Processing.IEEE,2023:1-6.

[19]ZHANG S,ZHANG Y,ZHANG Y H,et al.A DualDirection Attention Mixed Feature Network for Facial Expression Recognition[J].Electronics,2023,12(7):3595.

基本信息:

DOI：10.19370/j.cnki.cn10-1886/ts.2026.02.029

中图分类号:TP391.41;TP18

引用信息:

[1]宋晨.视觉大模型增强的空间特征融合的表情识别方法[J].印刷与数字媒体技术研究,2026,No.241(02):270-280.DOI:10.19370/j.cnki.cn10-1886/ts.2026.02.029.

发布时间：

2026-04-10

出版时间：

2026-04-10

请选择需要下载的pdf数据

《印刷与数字媒体技术研究》

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈

请选择需要下载的pdf数据

《印刷与数字媒体技术研究》

使用微信“扫一扫”功能。将此内容分享给您的微信好友或者朋友圈

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈