Teburin Abubuwan Ciki
1. Babban Fahimta: Ma'adinan Boye na Tattaunawar API
Apiza Corpus ba wani sabon tsarin bayanai ba ne kawai; shi wata hanya ce ta dabara ga duk wanda ke da burin gina kayan aikin masu haɓaka na gaba. Babban fahimtar abu ne mai sauƙi: masu shirye-shirye suna hulɗa da injuna daban da yadda suke hulɗa da mutane. Hanyar Wizard-of-Oz (WoZ) da aka yi amfani da ita a nan ita ce kawai hanyar da ta dace don ɗaukar wannan 'taimakon injuna' a babban girma, ba tare da son zuciya na ladabin mutum-da-mutum ba. Wannan tsarin bayanai yana magance matsalar 'farawa daga sifili' don horar da mataimakin kama-da-wane (VA) don amfani da API, aiki ne mai wuyar gaske kuma mai daraja. Marubutan sun ƙirƙiri wani nau'in Rosetta Stone na yadda masu haɓaka ke neman taimako a zahiri, wanda ya fi kowane bayanan roba da wani tsarin harshe ya samar da muhimmanci.
2. Tsarin Ma'ana: Daga WoZ zuwa Tsarin Tattaunawa
Tsarin ma'ana na takardar yana da tsabta kuma yana da kariya. Ya fara da gano wani gibi mai muhimmanci: rashin tsarin bayanan tattaunawa na musamman na aiki don injiniyan software. Sannan ya ba da hujjar hanyar WoZ a matsayin ma'aunin zinari don tattara bayanan hulɗar mutum-da-injuna mara son zuciya. An bayyana gwajin dalla-dalla: masu shirye-shirye 30 na ƙwararru, zaman na mintuna 90, mataimakin kama-da-wane da wani wizard ɗan adam ke sarrafa. Mataki na ƙarshe shine bayar da alamar waɗannan tattaunawa tare da nau'ikan Ayyukan Tattaunawa (DA) a cikin girma huɗu, ƙirƙirar tsarin bayanai mai tsari, wanda injina ke iya karantawa. Wannan misali ne na littafi na yadda ake haɓaka tsarin AI na tattaunawa daga tushe.
2.1 Hanyar Wizard-of-Oz
Gwajin WoZ shine zuciyar binciken. An gaya wa masu shirye-shirye cewa suna hulɗa da VA mai sarrafa kanta, amma 'wizard' ɗin ƙwararren ɗan adam ne. Wannan yaudara tana da muhimmanci domin tana haifar da irin harshe kai tsaye, mai ba da umarni wanda VA na gaske zai buƙaci fahimta. Misali, mai shirye-shirye na iya cewa 'pro:allegrokeyboardinput' maimakon 'Don Allah za ka iya taimaka mini in sami aikin adana yanayin madannai?'. Wannan harshe mara kyau, wanda ba a goge shi ba, shine cikakken bayanan horo don ƙirar koyon injina.
2.2 Tattara Bayanai da Bayar da Alamar
Tsarin tattara bayanai ya kasance mai tsauri. An ɗauki masu shirye-shirye 30 na ƙwararru aiki, yana tabbatar da matakin gwaninta wanda ke nuna ainihin amfani da API a duniya. Kowane zaman ya ɗauki kimanin mintuna 90, yana samar da tarin tattaunawa mai yawa. Tsarin bayar da alamar ya ƙunshi sanya kowane furci da nau'ikan Ayyukan Tattaunawa, wani tsari na yau da kullun a binciken tsarin tattaunawa. Wannan tsarin bayar da alamar shine abin da ke sa tsarin bayanai ya zama mai amfani don horar da ƙirar tsari-zuwa-tsari ko don gina tsarin rarraba niyya.
3. Karfi da Rauni: Kimantawa Mai Muhimmanci
Bari mu bayyana a sarari: wannan takarda ce mai muhimmanci, amma ba ta da lahani. Karfin yana da girma, amma raunin yana da muhimmanci a yarda da shi ga duk wanda ke shirin gina kan wannan aikin.
3.1 Karfi: Sabon Tsarin Bayanai da Tsari Mai Tsauri
Babban ƙarfi shine sabon abu da wajibcin tsarin bayanai. Kamar yadda marubutan suka lura, binciken 2015 bai sami wani tsarin bayanan tattaunawa na SE ba, kuma ɗaya kawai aka buga tun daga lokacin. Apiza Corpus ya cika wani gibi mai girma. Hanyar WoZ ita ce hanyar da ta dace, kuma amfani da masu shirye-shirye na ƙwararru yana ƙara ingancin muhalli. Tsarin bayar da alamar yana da kyau kuma yana da girma da yawa, yana ba da damar nazari mai zurfi na tattaunawar.
3.2 Rauni: Girma, Iya Yaduwa, da Tasirin Wizard
Babban rauni shine girman. Mahalarta 30 ƙananan samfurin ne don horar da ƙirar koyon zurfi mai ƙarfi. Iya yaduwa kuma abin tambaya ne: ayyukan sun kasance na musamman, kuma halin wizard na iya haifar da son zuciya na kansa. Bugu da ƙari, 'tasirin wizard'—gaskiyar cewa wizard ƙwararren ɗan adam ne—yana nufin amsoshin sun kasance mafi daidaito da taimako fiye da yadda kowane AI na yanzu zai iya samarwa. Wannan yana haifar da iyaka mafi girma wanda zai iya zama mara gaskiya ga VA na gaske. A ƙarshe, takardar ba ta da cikakken nazari na rarraba ayyukan tattaunawa ko yarjejeniya tsakanin masu bayar da alamar, waɗanda ke da muhimmanci don tantance ingancin alamomin.
4. Abubuwan Da Ake Iya Aiki: Abin da Wannan ke Nufi ga Masana'antar
Ga manajan samfur da shugabannin injiniya, saƙon a bayyane yake: ku daina jiran AI cikakke. Ku fara tattara bayanan WoZ na kanku. Apiza Corpus shine tabbacin ra'ayi cewa wannan hanyar tana aiki. Matakan da za a iya ɗauka sune: (1) Gano wani aiki mai muhimmanci, mai maimaitawa a cikin aikin haɓaka ku (misali, amfani da API, gyara kurakurai, bitar lambar). (2) Gudanar da ƙaramin binciken WoZ tare da masu haɓaka naku. (3) Bayar da alamar tattaunawa kuma yi amfani da su don horar da mai rarraba niyya mai sauƙi. (4) Maimaita. Kudin binciken WoZ wani ɓangare ne na kudin gina VA cikakke daga tushe, kuma bayanan da kuke samu sun fi muhimmanci. Apiza Corpus shine tsari; bayanan cikin kamfanin ku shine man fetur.
5. Cikakkun Bayanai na Fasaha da Tsarin Lissafi
Daga mahangar fasaha, tsarin bayanai an tsara shi ne don tallafawa horar da mai rarraba Ayyukan Tattaunawa (DA). Matsala ta asali za a iya tsara ta a matsayin aikin sanya alamar jeri. Idan aka ba da jerin furci $U = (u_1, u_2, ..., u_n)$, manufar ita ce tsinkayar jerin alamomin ayyukan tattaunawa $D = (d_1, d_2, ..., d_n)$, inda kowane $d_i$ ya kasance cikin saitin nau'ikan DA da aka ƙayyade. Hanya ta gama gari ita ce amfani da Conditional Random Field (CRF) a saman BiLSTM ko Transformer encoder. Aikin asara yawanci shine mummunan log-likelihood:
$L = -\sum_{i=1}^{n} \log P(d_i | u_1, u_2, ..., u_n)$
Apiza Corpus yana ba da bayanan da aka yiwa alama $\{(U_j, D_j)\}_{j=1}^{30}$ don horar da irin wannan ƙirar. Girman huɗu na bayar da alamar (misali, aiki, sadarwa, da sauransu) yana ba da damar saitin koyo na ayyuka da yawa, inda ƙirar ke tsinkayar alamomi da yawa ga kowane furci, yana inganta gabaɗaya.
6. Sakamakon Gwaji da Takaitaccen Bayani
Takardar ba ta gabatar da sakamakon ƙididdiga daga ƙirar da aka horar ba, domin takarda ce ta tsarin bayanai. Duk da haka, tana ba da taƙaitaccen bayani na inganci game da bayanan. Tsarin bayanai ya ƙunshi tattaunawa 30, kowanne yana ɗaukar matsakaicin mintuna 90. Ba a bayyana jimillar furci a sarari ba, amma bisa tsawon zaman, mai yiwuwa ya kai dubbai. An bayar da alamomin ayyukan tattaunawa a cikin girma huɗu, kodayake ba a bayar da ainihin rarraba ba. Hoton ginshiƙi na hasashe zai nuna cewa 'Neman Bayani' da 'Bayar da Bayani' sune mafi yawan nau'ikan DA, suna nuna yanayin aikin tattaunawar. Hoton kek na girman bayar da alamar huɗu zai nuna rabuwa daidai, yana nuna cikakken tsarin bayar da alamar.
7. Misalin Tsarin Nazari: Misalin Tattaunawa
A ƙasa akwai misali mai sauƙi na tattaunawa daga tsarin bayanai, yana nuna tsari da bayar da alamar. Wannan misali ne mara lamba, yana mai da hankali kan yadda tattaunawar ke gudana.
Mai amfani: pro:allegrokeyboardinput
Wizard: Kana iya adana yanayin madannai da aka ƙayyade a lokacin da aka kira aikin a cikin tsarin da ret_state ke nuna.
Mai amfani: Za ka iya ba ni misali?
Wizard: Tabbas. allegro_keyboard_state_to_display() wani aiki ne mai alaƙa.
Mai amfani: Na gode.
A cikin wannan misali, furcin farko na mai amfani umarni ne kai tsaye (DA: 'Neman Aiki'), amsar wizard ita ce 'Bayar da Bayani', furci na biyu na mai amfani shine 'Neman Misali', kuma furcin ƙarshe na mai amfani shine 'Yarda'. Wannan musayar mai sauƙi ta ɗauki ainihin tsarin bayanai: kai tsaye, mai da hankali kan aiki, kuma ba tare da ladabin zamantakewa ba.
8. Aikace-aikace da Hanyoyi na Gaba
Apiza Corpus tushe ne, ba samfurin da aka gama ba. Hanyar gaba mafi kusa ita ce amfani da wannan bayanan don horar da samfurin VA don amfani da API. Wani buri mafi girma shine haɓaka hanyar WoZ zuwa wasu ayyukan SE, kamar gyara kurakurai, bitar lamba, ko tattara buƙatu. Hangen nesa na dogon lokaci shine VA 'na duniya' na haɓaka wanda zai iya ɗaukar ayyuka da yawa, wanda aka horar da shi akan nau'ikan tsarin WoZ daban-daban. Haɓakar manyan ƙirar harshe (LLMs) kamar GPT-4 kuma yana buɗe sabbin damar: ana iya amfani da Apiza Corpus don gyara LLM don takamaiman yanki na taimakon API, yana haifar da VA wanda zai iya zama mai ƙarfi kuma na musamman. Babban ƙalubale zai kasance motsawa daga wizard kama-da-wane zuwa tsarin mai cin gashin kansa, kuma Apiza Corpus yana ba da taswirar hanya.
9. Nazari da Sharhi na Asali
Apiza Corpus gudummawa ce mai dacewa kuma wajibi ga fagen AI na injiniyan software. Babban darajarsa ba ta cikin girmansa ba, amma a cikin ingancinsa. Hanyar WoZ, ko da yake ba sabon abu ba, an yi amfani da ita a nan tare da tsauri wanda sau da yawa ba a samuwa a binciken SE. Yanke shawarar yin amfani da masu shirye-shirye na ƙwararru wani babban abu ne, domin yana tabbatar da cewa bayanan suna nuna ainihin halayen duniya, ba mu'amalar da aka tilasta ba na gwajin dakin gwaje-gwaje. Duk da haka, babban ƙarfin takardar shi ma shine babban rauninta: tsarin bayanai hoto ne na wani tsarin mu'amala na musamman. 'Wizard' ɗin ƙwararren ɗan adam ne, kuma amsoshin sun kasance mafi kyau. VA na gaske zai yi kuskure, kuma tsarin bayanai bai ɗauki yadda mai amfani zai yi da amsa mara daidai ko mai ruɗani ba. Wannan wani gibi ne mai muhimmanci. Ayyuka na gaba dole ne su bincika tattaunawar 'gyara kuskure', inda VA ba ta da kamala da gangan. Bugu da ƙari, takardar za ta amfana daga cikakken nazarin ƙididdiga na ayyukan tattaunawa, gami da makin yarjejeniya tsakanin masu bayar da alamar (misali, Cohen's Kappa) don tabbatar da tsarin bayar da alamar. Kamar yadda Serban da sauransu (2016) suka lura a cikin bincikensu na tsarin bayanan tattaunawa, ingancin alamomin sau da yawa yana da muhimmanci fiye da yawan bayanan. Apiza Corpus farawa ce mai ƙarfi, amma mataki na farko ne kawai. Gwaji na gaske zai kasance ko za a iya amfani da shi don horar da VA wanda ke da amfani ga masu haɓaka a zahiri. A yanzu, yana tsaye a matsayin hanya mai muhimmanci da kira mai haske ga jama'ar SE don saka hannun jari a binciken WoZ.
10. Manazarta
- Eberhart, Z., Bansal, A., & McMillan, C. (2023). The Apiza Corpus: API Usage Dialogues with a Simulated Virtual Assistant. Jami'ar Notre Dame.
- Robillard, M. P., da sauransu. (2017). API Usage as a Target for Virtual Assistants. A cikin Ayyukan Taron Ƙasa da Ƙasa na 39 na Injiniyan Software (ICSE).
- Reiser, S., & Lemon, O. (2020). Efficient Data Collection for Task-Specific Virtual Assistants. Masu Buga Morgan & Claypool.
- Serban, I. V., da sauransu. (2016). A Survey of Available Corpora for Building Data-Driven Dialogue Systems. arXiv preprint arXiv:1512.05742.
- Dahl, D., da sauransu. (1994). Expanding the Scope of the ATIS Task: The ATIS-3 Corpus. A cikin Ayyukan Taron Fasahar Harshen Dan Adam.
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. (Don bayani kan sanya alamar jeri da CRFs).