ABSTRACT
The design and implementation of a portable meeting recorder is presented. Composed of an omni-directional video camera with four-channel audio capture, the system saves a view of all the activity in a meeting and the directions from which people spoke. Subsequent analysis computes metadata that includes video activity analysis of the compressed data stream and audio processing that helps locate events that occurred during the meeting. Automatic calculation of the room in which the meeting occurred allows for efficient navigation of a collection of recorded meetings. A user interface is populated from the metadata description to allow for simple browsing and location of significant events.
- Foote, J. and Kimber, D., "FlyCam: Practical panoramic video and automatic camera control," Proceedings of International Conference on Multimedia & Expo, vol.3, pp. 1419--1422, 2000. Google ScholarDigital Library
- Gross, R., Bett, M. Yu, H., Zhu, X., Pan, Y., Yang, J., Waibel, A., "Towards a multimodal meeting record," Proceedings of International Conference on Multimedia and Expo, pp. 1593--1596, New York, 2000.Google ScholarCross Ref
- Sun, X., Foote, J., Kimber, D., and Manjunath, "Panoramic video capturing and compressed domain virtual camera control", ACM Multimedia, pp. 229--238, 2001. Google ScholarDigital Library
- Rui, Y., Gupta, A., and Cadiz, J., "Viewing meetings captured by an omni-directional camera", ACM CHI 2001, pp. 450--457, Seattle, March 31- April 4, 2001. Google ScholarDigital Library
- Waibel, A., Bett, M., Metze, F., Ries, K., Schaaf, T., Schultz, T., Soltau, H., Yu, H., and Zechner, K., "Advances in automatic meeting record creation and access", Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 597--600, 2001.Google ScholarCross Ref
- Hauptmann, A. G., and Smith, M., "Text speech and vision for video segmentation: The informedia project," Proceedings of the AAAI Fall Symposium on Computational Models for Integrating Language and Vision, 1995.Google Scholar
- Maybury, M., Merlino, A., and Rayson, J., "Segmentation, content extraction and visualization of broadcast news video using multistream analysis", AAAI, 1997.Google Scholar
- Myers, B. A., Casares, J. P., Stevens, S., Dabbish, L., Yocum, D., and Corbett, A., "A multi-view intelligent editor for digital video libraries", Joint Conference on Digital Libraries, Roanoke, VA, June 24--28, 2001. Google ScholarDigital Library
- Foote, J., Boreczky, J., Girgensohn, A., and Wilcox, L., "An intelligent media browser using automatic multimodal analysis", ACM Multimedia, pp. 375--380, 1998. Google ScholarDigital Library
- Lee, D. "Segmenting People in Meeting Videos Using Mixture Background and Object Models," Proc. of Pacific Rim Conf. on Multimedia, Taiwan, Dec. 16--18, 2002. Google ScholarDigital Library
- Stauffer, C. and Grimson, W.E.L, "Adaptive Background Mixture Models for Real-Time Tracking," Proceedings of Computer Vision and Pattern Recognition, pp. 246--252, 1999.Google Scholar
- Gross, R., Yang, J., Waibel, A., "Face Recognition in a Meeting Room", IEEE International Conference on Automatic Face and Gesture Recognition, 294--299, 2000. Google ScholarDigital Library
- Hsu, R.L., Abdel-Mottaleb, M., and Jain, A. K., "Face detection in color images", Proc. International Conference on Image Processing, pp. 1046--1049, 2001.Google Scholar
- Yang, M.H., Kriegman, D.J., Ahuja, N., "Detecting Faces in Images: A Survey", PAMI(24), No. 1, pp. 34--58, January 2002. Google ScholarDigital Library
- Kapralos, B., Jenkin, M., Milios E., and Tsotsos, J.: "Eyes 'n Ears Face Detection", 2001 International Conference on Image Processing, vol 1, pp. 66--69, 2001.Google ScholarCross Ref
- Abdel-Mottaleb, M. and Elgammal, A., "Face Detection in complex environments from color images," IEEE ICIP, pp. 622--626, Oct. 1999.Google Scholar
- Yang, J., Zhu, X., Gross, R., Kominek, J., Y. Pan, Waibel, A., "Multimodal People ID for a Multimedia Meeting Browser," Proceedings of ACM Multimedia, pp. 159--168, 1999. Google ScholarDigital Library
- Pingali, G. S., Opalach, A., Carlbom, I., "Multimedia retrieval through spatio-temporal activity maps", ACM Multimedia, pp. 129--136, 2001. Google ScholarDigital Library
- Divakaran, A., Vetro, A., Asai, K., Nishikawa, H., "Video browsing system based on compressed domain feature extraction", IEEE Transactions on Consumer Electronics, vol. 46, pp. 637--644, 2000. Google ScholarDigital Library
- Erol, B., Kossentini, F., "Local motion descriptors", IEEE Workshop on Multimedia Signal Processing, pp. 467--472, 2001.Google Scholar
- Dorai, C., Kobla, V., "Perceived visual motion descriptors from MPEG-2 for content-based HDTV annotation and retrieval", IEEE 3rd Workshop on Multimedia Signal Processing, pp. 147--152, 1999.Google ScholarCross Ref
- Sun, X., Divakaran, A., Manjunath, B.S., "A motion activity descriptor and its extraction in compressed domain," Proc. IEEE Pacific-Rim Conference on Multimedia (PCM '01), pp. 450--457, 2001. Google ScholarDigital Library
- ISO/IEC JTC1/SC29/WG11, "Multimedia Content Description Interface - Part 3 Visual". Publicly available at http://mpeg.telecomitalialab.com/ working_documents.htm, March 2001.Google Scholar
- Aramvith, S., and Sun, M.T., "MPEG-1 and MPEG-2 video standards", Handbook of Image and Video Processing, pp. 597--610, Academic Publishers, 2000.Google Scholar
- ISO/IEC, "Information technology - generic coding of moving pictures and associated audio information: Video," 13818-2, 1995.Google Scholar
- Arons, B., "Speech skimmer: A system for interactively skimming recorded speech", ACM Transactions on Computer-Human Interaction, vol 4, pp. 3--38, 1997. Google ScholarDigital Library
- Pfau, T., Ellis, D.P.W., and Stolcke, A., "Multispeaker Speech Activity Detection for the ICSI Meeting Recorder", Proc. IEEE Automatic Speech Recognition and Understanding Workshop, 2001.Google ScholarCross Ref
- Kimber, D., and L. Wilcox, L., "Acoustic segmentation for audio browsers," in Proc. Interface Conference. Sydney, Australia, 1996.Google Scholar
- Tritschler, A. and Gopinath, R., "Improved Speaker Segmentation and Segments Clustering using the Bayesian Information Criterion", Proc. of Eurospeech, pp. 679--682, 1999.Google Scholar
- Johnson, S.E., "Who Spoke When? - Automatic Segmentation and Clustering for Determining Speaker Turns", Proc. Eurospeech, Vol. 5, pp. 2211--2214, 1999.Google Scholar
- Graham, J., "The MuVIE Client System: A Multimedia Visualization and Integration Environment," Ricoh Innovations, March 2002.Google Scholar
Index Terms
- Portable meeting recorder
Recommendations
B-box Mixer: An Interactive UI for Generating B-box Music
MM '15: Proceedings of the 23rd ACM international conference on MultimediaB-box is a form of vocal percussion that imitates rhythms in various types of sound, especially musical instruments. As b-box becoming popular, more and more people want to learn b-box and make their own b-box music. However, not everyone has the talent ...
TILES audio recorder: an unobtrusive wearable solution to track audio activity
WearSys '18: Proceedings of the 4th ACM Workshop on Wearable Systems and ApplicationsMost existing speech activity trackers used in human subject studies are bulky, record raw audio content which invades participant privacy, have complicated hardware and non-customizable software, and are too expensive for large-scale deployment. The ...
Precise pitch profile feature extraction from musical audio for key detection
The majority of pieces of music, including classical and popular music,are composed using music scales, such as keys. The key or the scale information of a piece provides important clues on its high level musical content, like harmonic and melodic ...
Comments