Audio Files Present Challenges For Computer Forensics and E-Discovery
Audio Files Present Challenges For Computer Forensics and E-Discovery
This can include data in its myriad forms such as email, instant messaging data, data generated by business computer applications, faxes, and text messages. But key sources include voice sent via network avenues or stored on digital devices, such as VOIP (Voice Over Internet Protocol), voice mail, audio-video, web conferencing, white boarding, and .wav files. Such integrated communications can save money from operating budgets.
Savings accrue from, among other expenses, doing away with long distance charges when using VOIP, from dispensing with the need for travel to meetings when they can be held in a virtual environment, or from travel to far-away classes when an instructor or team can be using a whiteboard from disparate physical locations. Savings like these accrue to the 26% of businesses that have adopted them. But when litigation demands discoverable data, .wav and voice-based files can be difficult and costly for a computer forensics expert or an e-discovery system to search and index.
There are many tools designed for searching text files, and even for text from deleted files. These range from computer forensic suites such as EnCase and Access Forensic Toolkit that each costs thousands of dollars, to open source tools, including hex editors that cost the user nothing at all. The more extensive packages may be less expensive in the long run when billable humans are added to the mix.
There are many wildly expensive e-discovery systems in place to assist in storing and indexing the large masses of data that are generated on a daily basis in the corporate environment. Services may be outsourced, or brought in-company. Again the cost of putting the systems and procedures into place may pale against the sanctions and fines that could result from not being ready for litigation, should it arise.
There are also many effective tools for scanning paper documents into text files, which are then searchable.
While many of the tools for searching and storing data are effective, and accurate, when it comes to audio, no such level of accuracy or ease yet exists for the purpose of searching for specific information. There are currently three means of searching audio: phonetic search, transcribing by hand, and automatic transcription.
Phonetic search technology matches wave patterns, or phonemes, to a library of known wave patterns. For example, the acronym "B2B" would be represented by the following phonemes: "_B _IY _T _UW _B _IY" (Wikipedia example from Nexidia, a company involved in speech recognition systems). Given the wide variation in modes of speaking, pronunciation, accents and dialects, the accuracy of this method is spotty. It produces many false hits. And while it may identify sections and phrases that are of interest, it doesn't transcribe the audio into text - the audio must then be listened to.
Manual transcription of audio so that transcribed text can then be automatically searched, is time-consuming. As it depends upon a listener to type the words as they are heard, this labor-intensive task can also be very expensive. There may be security concerns, as the audio goes outside the company (or perhaps the country) to be transcribed.
Machine transcription is the one automated means of converting audio to text. But it suffers from accuracy issues. It compares "heard" audio with known libraries, again facing issues of differing pronunciations, terms not in existing libraries, and clarity of recording. While high-quality recordings can lend themselves to recognition rates of 85% or so (a positive-looking number until compared with the nearly 100% accuracy of pure text searches), when dealing with voice mail, accuracy dips down as low as 40%.
The new Federal Rules of Civil Procedure (FRCP) require companies to have a means of identifying key communications and data sources. That data must then be saved. For the sake of efficiency, both in the optimizing amount of storage required, and diminishing the volume of data that must be identified and produced for litigation, it is also important to be able to accurately identify data that is unnecessary.
While requirements for retention of data increase, and storage costs go down, identifying what audio should be kept and what should be deleted can be costly. As such information is digitized, it must nonetheless be stored and indexed (or searched after the fact). The technology is not mature, and is evolving. There may be an opening for an innovative company to prosper here, especially if able to produce some kind of breakthrough in voice-to-text technology. In the meanwhile, companies face a difficult issue in deciding what stays and what goes.
Steve Burgess is a freelance technology writer, a practicing computer forensics specialist as the principal of Burgess Forensics, and a contributor to the recently released Scientific Evidence in Civil and Criminal Cases, 5th Edition by Moenssens, et al. Mr. Burgess may be reached athttp://www.burgessforensics.com or via email atsteve@burgessforensics.com
Aluminum Signs for Every Event, Location or Occasion – Parking, No parking, No Loitering… Affordable Software Development Company in India Affordable home loans get much cheaper this Season Search Engine Optimization for online growth Beautiful Bedroom Sets - Yes, They Can Be Very Affordable Web and Ecommerce trend in Saudi Arabia Outsourcing search engine optimization - Get in the Ranking Race Now or By no means! Preparing For Your American Idol Audition - How to Turn Yourself Into a Pop Star Wedding car decorations - Do it yourself Get Very Affordable Online Appointment Management System How To Make A Rap Song For Broadcast Car alarm— keep your vehicle in a safe condition Affordable And Luxury Travel With - cheap first class airfare